Gene identification ------------------- - Gene calling was performed with Prodigal v2.6.3 (Hyatt et al., 2010) and marker genes identified and aligned using HMMER v3.1b1 (Eddy, 2011). Marker genes and corresponding HMMs are from the Pfam v27 (Finn et al., 2014) and TIGRFAMs v15.0 (Haft et al., 2003) databases. Tree inference -------------- - Bacteria reference tree is inferred with FastTree v2.1.10 under the WAG model from the concatenated alignment of 120 ubiquitous bacterial genes (Parks et al., 2018) - Archaea reference tree is inferred with IQ-Tree v1.6.9 under the PMSF model from the concatenated alignment of 122 ubiquitous archaeal genes (Parks et al., 2018), using FastTree v2.1.10 to infer an initial guide tree Identifying 16S rRNA sequences ------------------------------ - Sequences are identified using nhmmer v3.1b2 (Wheeler and Eddy, 2013) with the 16S rRNA model (RF00177) from the RFAM database (Kalvari et al., 2018). Average nucleotide identity --------------------------- Average nucleotide identity (ANI) and alignment fraction (AF) values were calculated with FastANI v1.3 (Jain et al., 2018). Additional information ---------------------- Please consult the following GTDB publications for additional information: Parks, D. H., et al. (2018). A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology, 36: 996-1004. Parks DH, et al. (2019). Selection of representative genomes for 24,706 bacterial and archaeal species clusters provide a complete genome-based taxonomy. bioRxiv: https://doi.org/10.1101/771964 Chaumeil P-A, et al. (2019). GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics, btz848: https://doi.org/10.1093/bioinformatics/btz848. REFERENCES ---------- Eddy SR. 2011. Accelerated Profile HMM Searches. PLoS Comput Biol 7: e1002195. Finn RD, et al. 2014. Pfam: The protein families database. Nucleic Acids Res 42: D222-230. Haft DH, Selengut JD, White O. 2003. The TIGRFAMs database of protein families. Nucl Acids Res 31: 371-373. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. 2010. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119. Jain C, et al. (2018). High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communication 9: 5114. Kalvari I, et al. 2018. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 46(D1):D335-D342. Parks DH, et al. 2017. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol 2: 1533-42. Wheeler TJ, Eddy SR. 2013. nhmmer: DNA homology search with profile HMMs. Bioinformatics. 2013 Oct 1;29(19):2487-9.