bac_taxonomy_.tsv GTDB taxonomy for bacterial genomes. bac120_.tree Newick tree spanning the dereplicated bacterial genomes set inferred from the concatenation of 120 proteins and used to curate the GTDB taxonomy. bac120_msa_.faa FASTA file of the trimmed multiple sequence alignment used to infer the bac120 tree. bac120_msa_marker_info_.tsv Information about each of the 120 proteins used to infer the bac120 tree. The order of proteins in this file indicates the order in which they were concatenate. bac120_msa_mask_.txt Mask indicating which columns were trimmed from the bac120 alignment. bac120_msa_individual_genes_.tar.gz Multiple sequence alignments of the 120 bacterial proteins. bac_metadata_.tsv Metadata for all bacterial genomes including GTDB, NCBI, SILVA, and Greengene taxonomies, completeness and contamination estimates, assembly statistics, and genomic properties. bac_ssu_.fna FASTA file of 16S rRNA gene sequences identified within the dereplicated bacterial genomes set. The assigned taxonomy reflects the genome from which the sequence was obtained. In a small number of cases the 16S rRNA sequences are incongruent with this taxonomic assignment and therefore the 16S rRNA may not be representative of the genome. bac_arb_.arb ARB database containing the bacterial reference tree and metadata used to curate the GTDB taxonomy. gtdb_uba_mags.tar.gz Genomic files for 3,087 UBA genomes used to infer the GTDB taxonomy. NCBIvs_Bacteria.xlsx Correspondence between standardly named NCBI and GTDB taxa ordered by degree of polyphyly.