Gut sequence database

This is a custom database focusing on the human gut microbiome sequencing data

Gut genome sequences

The database contains sequenced and fully annotated reference genomes for the human gut bacteria from public sources. Currently this includes 823 genome sequences for microbial (mostly bacterial) organisms from the human gastrointestinal tract.

Data access The following data sets are provided for download:

  • Metadata of the included genome sequences
  • FASTA Genome sequence database in FASTA format
  • Genbank Genome sequence database in Genbank format

Data description Metadata for 745 of the genome sequences (90.5%) is summarized in project_catalog.csv, including information of organism name, domain, gene count, the current sequencing quality, HMP/GOLD/NCBI/Genbank/IMG/HDMD IDs, sequencing center, and other information. The data set contains 737 bacterial genomes and 2 archaeal genomes. For further 6 genomes the domain information was not listed in the original data source but could be manually classified into the bacterial domain in all cases. The data collection includes both quality draft sequences (308; 41%), and completed genomes (437; 59%). For further details of genome annotation, see the Genbank file.

Details of data retrieval and database construction The reference sequences for annotated bacterial genomes were downloaded from HMP reference genomes database. We filtered the results to include only bacterial genomes from the human gastrointestinal tract (accessed March 18, 2018), yielding 823 genome sequences. This data collection was stored in FASTA (ASM) and Genbank formats.

Gut gene catalog

The gut microbiome gene catalog was retrieved from the Integrated reference catalog of the human gut microbiome (Li et al., 2014). This represents the state-of-the-art collection of gut microbiome gene sequences. The sequence data is based on 1267 intestinal samples, and includes 9,879,896 Open Reading Frames (ORFs). 21.3% of these sequences have been assigned to Phylum level taxonomic annotations. See the original publication for further technical details.

The following files are available for download:

Database maintenance and updates

This custom database will be updated when new sequenced and fully annotated reference genomes from the human gut microbiota become available. Contact the project coordinator for further details (see contact).


