Assembling the Tree of Life: Enterobacteriaceae

The Enterobacteriaceae are a diverse family of Gram-negative prokaryotes that last shared a common ancestor approximately 500 million years ago. Enterobacteria occupy a variey of ecological niches, including both plant and animal hosts, and many are biomedically or agriculturally relevant. Genome sequences of members of this family are abundant, but taxonomically biased, with the majority of genera unrepresented by complete or ongoing genome projects. Very few type strains have been targeted for genome projects making it particularly challenging to reconcile the genetic basis of phenotypic characteristics traditionally used in species level identification and systematics. Lateral gene transfer is extensive in some lineages of enterobacteria creating discordant phylogenetic signals for some combinations of loci and taxa. Previous molecular phylogenetic reconstructions for this family suffer from one or more of the following limitations: using incomplete taxon sets, neglecting to include type strains, relying on one or a small number of genes, providing no estimates of confidence, failing to account for lineage specific rate or compositional biases, and/or lacking justification for evolutionary model selection.

This project will generate genome sequences for 100 additional enterobacteria to sample previously neglected genera and species, targeting type strains to provide a much-needed link between molecular phylogenetics and classical prokaryotic systematics. Another product of this work is improved genome-scale methods that allow missing taxa for some genes to make use of the maximal data available, include gene-specific model fitting, accommodate compositional heterogeneity, and provide a rigorous statistical framework to assess support for alternate solutions. Application of these new methods to the type strain genomes will provide a robust reconstruction of the dominant phylogeny for this family and identify which regions of the genome contribute discordant phylogenetic signals due to lateral gene transfer.