Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the ∼18,000 families of orthologous genes, we found ∼2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome.

Author Summary

Although abundant knowledge has been accumulated regarding the E. coli laboratory strain K-12, little is known about the evolutionary trajectories that have driven the high diversity observed among natural isolates of the species, which encompass both commensal and highly virulent intestinal and extraintestinal pathogenic strains. We have annotated or re-annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Although recombination rates are much higher than mutation rates, we were able to reconstruct a robust phylogeny based on the ∼2,000 genes common to all strains. Based on this phylogeny, we established the evolutionary scenario of gains and losses of thousands of specific genes, identifying functional classes under opposite selection pressures. This genome flux is confined to very few positions in the chromosome, which are the same for every genome. Notably, we identified few or no extraintestinal virulence-specific genes. We also defined a long-scale structure of recombination in the genome with lower recombination rates at the terminus of replication. These findings demonstrate that, despite a very high gene flow, genes can co-exist in an organised genome.

Related collections

Most cited references 98

Record: found
Abstract: found
Article: not found

Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome".

H Tettelin, V Masignani, M. J. Cieslewicz … (2005)

The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.

0 comments Cited 929 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea.

Hirohisa Kishino, Masami Hasegawa (1989)

A maximum likelihood method for inferring evolutionary trees from DNA sequence data was developed by Felsenstein (1981). In evaluating the extent to which the maximum likelihood tree is a significantly better representation of the true tree, it is important to estimate the variance of the difference between log likelihood of different tree topologies. Bootstrap resampling can be used for this purpose (Hasegawa et al. 1988; Hasegawa and Kishino 1989), but it imposes a great computation burden. To overcome this difficulty, we developed a new method for estimating the variance by expressing it explicitly. The method was applied to DNA sequence data from primates in order to evaluate the maximum likelihood branching order among Hominoidea. It was shown that, although the orangutan is convincingly placed as an outgroup of a human and African apes clade, the branching order among human, chimpanzee, and gorilla cannot be determined confidently from the DNA sequence data presently available when the evolutionary rate constancy is not assumed.

0 comments Cited 640 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data.

O. Gascuel (1997)

We propose an improved version of the neighbor-joining (NJ) algorithm of Saitou and Nei. This new algorithm, BIONJ, follows the same agglomerative scheme as NJ, which consists of iteratively picking a pair of taxa, creating a new mode which represents the cluster of these taxa, and reducing the distance matrix by replacing both taxa by this node. Moreover, BIONJ uses a simple first-order model of the variances and covariances of evolutionary distance estimates. This model is well adapted when these estimates are obtained from aligned sequences. At each step it permits the selection, from the class of admissible reductions, of the reduction which minimizes the variance of the new distance matrix. In this way, we obtain better estimates to choose the pair of taxa to be agglomerated during the next steps. Moreover, in comparison with NJ's estimates, these estimates become better and better as the algorithm proceeds. BIONJ retains the good properties of NJ--especially its low run time. Computer simulations have been performed with 12-taxon model trees to determine BIONJ's efficiency. When the substitution rates are low (maximum pairwise divergence approximately 0.1 substitutions per site) or when they are constant among lineages, BIONJ is only slightly better than NJ. When the substitution rates are higher and vary among lineages,BIONJ clearly has better topological accuracy. In the latter case, for the model trees and the conditions of evolution tested, the topological error reduction is on the average around 20%. With highly-varying-rate trees and with high substitution rates (maximum pairwise divergence approximately 1.0 substitutions per site), the error reduction may even rise above 50%, while the probability of finding the correct tree may be augmented by as much as 15%.

0 comments Cited 615 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Genet

Journal ID (publisher-id): plos

Journal ID (pmc): plosgen

Title: PLoS Genetics

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-7390

ISSN (Electronic): 1553-7404

Publication date Collection: January 2009

Publication date (Print): January 2009

Publication date (Electronic): 23 January 2009

Volume: 5

Issue: 1

Electronic Location Identifier: e1000344

Affiliations

[1 ]Atelier de BioInformatique, Université Pierre et Marie Curie - Paris 6 (UPMC), Paris, France

[2 ]Microbial Evolutionary Genomics, Institut Pasteur, CNRS URA2171, Paris, France

[3 ]Faculté de Médecine, Université Paris 7 Denis Diderot, INSERM U722, Site Xavier Bichat, Paris, France

[4 ]Génoscope, Institut de Génomique, CEA, Evry, France

[5 ]Faculté de Médecine, Université Paris 5 René Descartes, INSERM U571, Paris, France

[6 ]Université Paris 7 Denis Diderot, Hôpital Robert Debré (APHP), EA 3105, Paris, France

[7 ]Plate-Forme Génomique, Institut Pasteur, Paris, France

[8 ]Laboratoire de Génomique Comparative, CNRS UMR8030, Institut de Génomique, CEA, Génoscope, Evry, France

[9 ]UR1077 Mathématique, Informatique, et Génome, INRA, Jouy en Josas, France

[10 ]Unité de Génétique des Génomes Bactériens, Institut Pasteur, CNRS URA2171, Paris, France

[11 ]UR888 Unité des Bactéries Lactiques et Pathogènes Opportunistes, INRA, Jouy en Josas, France

[12 ]Faculté de Médecine, Université Paris 5 René Descartes, INSERM U570, Paris, France

[13 ]Unité de Génétique des Biofilms, Institut Pasteur, CNRS URA2172, Paris, France

[14 ]Veterans Affairs Medical Center, Minneapolis, Minnesota, United States of America

[15 ]Department of Medicine, University of Minnesota, Minneapolis, Minnesota, United States of America

[16 ]Pathogénie Bactérienne des Muqueuses, Institut Pasteur, Paris, France

[17 ]Université Grenoble 1 Joseph Fourier, CNRS UMR 5163, Grenoble, France

Universidad de Sevilla, Spain

Author notes

* E-mail: cmedigue@ 123456genoscope.cns.fr (CM); erocha@ 123456pasteur.fr (EPCR); erick.denamur@ 123456inserm.fr (ED)

Conceived and designed the experiments: OT VB EPCR ED. Performed the experiments: VB CB OC CD LG SM SO BV. Analyzed the data: MT CH OT VB SB PB EB SB OB AC HC SC AD MD MEK EF JMG AMG JJ CLB ML VMJ IM XN MAP CP ZR CSR DS JT DV CM EPCR ED. Contributed reagents/materials/analysis tools: MT CH OT VB CM EPCR. Wrote the paper: MT CH OT JJ CM EPCR ED.

Article

Publisher ID: 08-PLGE-RA-1131R2

DOI: 10.1371/journal.pgen.1000344

PMC ID: 2617782

PubMed ID: 19165319

SO-VID: 159b4c40-246b-480a-80f7-28eb631debc0

Copyright © Touchon et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 2 September 2008

Date accepted : 16 December 2008

Page count

Pages: 25

Comments

Comment on this article

scite_

Cited by 419

See all cited by

Most referenced authors 2,646

See all reference authors

Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths

Read this article at

Abstract

Author Summary

Related collections

G3: Genes|Genomes|Genetics

Most cited references 98

Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome".

Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea.

BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 12

Cited by 419

Most referenced authors 2,646