Databases for RefineM --------------------- Provided files span the set of representative genomes at GTDB. Example usage: -------------- Decompress the protein and 16S rRNA databases: > tar -xzf gtdb_r80_protein_db.2017-11-09.tar.gz > tar -xzf gtdb_r80_ssu_db.2018-01-18.tar.gz DIAMOND and BLAST protein databases can then be created as follows: > diamond makedb -p 32 -d gtdb_r80_protein_db.2017-11-09.faa --in gtdb_r80_protein_db.2017-11-09.faa > makeblastdb -dbtype prot -in gtdb_r80_protein_db.2017-11-09.faa A BLAST database for the 16S rRNA genes can be created with: > makeblastdb -dbtype nucl -in gtdb_r80_ssu_db.2018-01-18.fna [GTDB r95] gtdb_r95_protein_db.2020-07-30.faa.gz GTDB r95 (http://gtdb.ecogenomic.org/). The provided FASTA file must be formatted to a DIAMOND or BLAST database before use. gtdb_r95_taxonomy.2020-07-30.tsv: Taxonomy file contains GTDB taxonomy information for the 31,910 genomes. - the 16S file for GTDB R80 is recommended for use with RefineM [GTDB r89] gtdb_r89_protein_db.2019-09-27.faa.gz GTDB r89 (http://gtdb.ecogenomic.org/). The provided FASTA file must be formatted to a DIAMOND or BLAST database before use. gtdb_r89_taxonomy.2019-09-27.tsv: Taxonomy file contains GTDB taxonomy information for the 24,706 genomes. - the 16S file for GTDB R80 is recommended for use with RefineM [GTDB r86] gtdb_r86_protein_db.2017-11-09.tar.gz: Consists of 92,117,422 proteins from the 27,372 genomes used to defined GTDB r86 (http://gtdb.ecogenomic.org/). The provided FASTA file must be formatted to a DIAMOND or BLAST database before use. gtdb_r80_taxonomy.2017-12-15.tsv: Taxonomy file contains GTDB taxonomy information for the 27,372 genomes. - the 16S file for GTDB R80 is recommended for use with RefineM [GTDB r80] gtdb_r80_protein_db.2017-11-09.tar.gz: Consists of 73,689,358 proteins from the 23,170 genomes used to defined GTDB r80 (http://gtdb.ecogenomic.org/). The provided FASTA file must be formatted to a DIAMOND or BLAST database before use. gtdb_r80_ssu_db.2018-01-18.tar.gz: 16S rRNA genes >1200 bp identified within the 23,170 genomes used to define GTDB r80 (http://gtdb.ecogenomic.org/). The provided FASTA file must be formatted to a BLAST database before use. gtdb_r80_taxonomy.2017-12-15.tsv: Taxonomy file contains GTDB taxonomy information for the 23,170 genomes.