Positive Selection Inhibits Plasmid Coexistence in Bacterial Genomes

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

ABSTRACT

Plasmids play an important role in bacterial evolution by transferring niche-adaptive functional genes between lineages, thus driving genomic diversification. Bacterial genomes commonly contain multiple, coexisting plasmid replicons, which could fuel adaptation by increasing the range of gene functions available to selection and allowing their recombination. However, plasmid coexistence is difficult to explain because the acquisition of plasmids typically incurs high fitness costs for the host cell. Here, we show that plasmid coexistence was stably maintained without positive selection for plasmid-borne gene functions and was associated with compensatory evolution to reduce fitness costs. In contrast, with positive selection, plasmid coexistence was unstable despite compensatory evolution. Positive selection discriminated between differential fitness benefits of functionally redundant plasmid replicons, retaining only the more beneficial plasmid. These data suggest that while the efficiency of negative selection against plasmid fitness costs declines over time due to compensatory evolution, positive selection to maximize plasmid-derived fitness benefits remains efficient. Our findings help to explain the forces structuring bacterial genomes: coexistence of multiple plasmids in a genome is likely to require either rare positive selection in nature or nonredundancy of accessory gene functions among the coexisting plasmids.

Related collections

Most cited references 45

Record: found
Abstract: found
Article: found

Is Open Access

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li, Richard Durbin (2009)

Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 11028 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Aaron McKenna, Matthew Hanna, Eric R. Banks … (2010)

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

0 comments Cited 5918 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Integrative Genomics Viewer

James Robinson, Helga Thorvaldsdóttir, Wendy Winckler … (2011)

To the Editor Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole genome sequencing, epigenetic surveys, expression profiling of coding and non-coding RNAs, SNP and copy number profiling, and functional assays. Analysis of these large, diverse datasets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large datasets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data poses a significant challenge to the development of such tools. To address this challenge we developed the Integrative Genomics Viewer (IGV), a lightweight visualization tool that enables intuitive real-time exploration of diverse, large-scale genomic datasets on standard desktop computers. It supports flexible integration of a wide range of genomic data types including aligned sequence reads, mutations, copy number, RNAi screens, gene expression, methylation, and genomic annotations (Figure S1). The IGV makes use of efficient, multi-resolution file formats to enable real-time exploration of arbitrarily large datasets over all resolution scales, while consuming minimal resources on the client computer (see Supplementary Text). Navigation through a dataset is similar to Google Maps, allowing the user to zoom and pan seamlessly across the genome at any level of detail from whole-genome to base pair (Figure S2). Datasets can be loaded from local or remote sources, including cloud-based resources, enabling investigators to view their own genomic datasets alongside publicly available data from, for example, The Cancer Genome Atlas (TCGA) 1 , 1000 Genomes (www.1000genomes.org/), and ENCODE 2 (www.genome.gov/10005107) projects. In addition, IGV allows collaborators to load and share data locally or remotely over the Web. IGV supports concurrent visualization of diverse data types across hundreds, and up to thousands of samples, and correlation of these integrated datasets with clinical and phenotypic variables. A researcher can define arbitrary sample annotations and associate them with data tracks using a simple tab-delimited file format (see Supplementary Text). These might include, for example, sample identifier (used to link different types of data for the same patient or tissue sample), phenotype, outcome, cluster membership, or any other clinical or experimental label. Annotations are displayed as a heatmap but more importantly are used for grouping, sorting, filtering, and overlaying diverse data types to yield a comprehensive picture of the integrated dataset. This is illustrated in Figure 1, a view of copy number, expression, mutation, and clinical data from 202 glioblastoma samples from the TCGA project in a 3 kb region around the EGFR locus 1, 3 . The investigator first grouped samples by tumor subtype, then by data type (copy number and expression), and finally sorted them by median copy number over the EGFR locus. A shared sample identifier links the copy number and expression tracks, maintaining their relative sort order within the subtypes. Mutation data is overlaid on corresponding copy number and expression tracks, based on shared participant identifier annotations. Several trends in the data stand out, such as a strong correlation between copy number and expression and an overrepresentation of EGFR amplified samples in the Classical subtype. IGV’s scalable architecture makes it well suited for genome-wide exploration of next-generation sequencing (NGS) datasets, including both basic aligned read data as well as derived results, such as read coverage. NGS datasets can approach terabytes in size, so careful management of data is necessary to conserve compute resources and to prevent information overload. IGV varies the displayed level of detail according to resolution scale. At very wide views, such as the whole genome, IGV represents NGS data by a simple coverage plot. Coverage data is often useful for assessing overall quality and diagnosing technical issues in sequencing runs (Figure S3), as well as analysis of ChIP-Seq 4 and RNA-Seq 5 experiments (Figures S4 and S5). As the user zooms below the ~50 kb range, individual aligned reads become visible (Figure 2) and putative SNPs are highlighted as allele counts in the coverage plot. Alignment details for each read are available in popup windows (Figures S6 and S7). Zooming further, individual base mismatches become visible, highlighted by color and intensity according to base call and quality. At this level, the investigator may sort reads by base, quality, strand, sample and other attributes to assess the evidence of a variant. This type of visual inspection can be an efficient and powerful tool for variant call validation, eliminating many false positives and aiding in confirmation of true findings (Figures S6 and S7). Many sequencing protocols produce reads from both ends (“paired ends”) of genomic fragments of known size distribution. IGV uses this information to color-code paired ends if their insert sizes are larger than expected, fall on different chromosomes, or have unexpected pair orientations. Such pairs, when consistent across multiple reads, can be indicative of a genomic rearrangement. When coloring aberrant paired ends, each chromosome is assigned a unique color, so that intra- (same color) and inter- (different color) chromosomal events are readily distinguished (Figures 2 and S8). We note that misalignments, particularly in repeat regions, can also yield unexpected insert sizes, and can be diagnosed with the IGV (Figure S9). There are a number of stand-alone, desktop genome browsers available today 6 including Artemis 7 , EagleView 8 , MapView 9 , Tablet 10 , Savant 11 , Apollo 12 , and the Integrated Genome Browser 13 . Many of them have features that overlap with IGV, particularly for NGS sequence alignment and genome annotation viewing. The Integrated Genome Browser also supports viewing array-based data. See Supplementary Table 1 and Supplementary Text for more detail. IGV focuses on the emerging integrative nature of genomic studies, placing equal emphasis on array-based platforms, such as expression and copy-number arrays, next-generation sequencing, as well as clinical and other sample metadata. Indeed, an important and unique feature of IGV is the ability to view all these different data types together and to use the sample metadata to dynamically group, sort, and filter datasets (Figure 1 above). Another important characteristic of IGV is fast data loading and real-time pan and zoom – at all scales of genome resolution and all dataset sizes, including datasets comprising hundreds of samples. Finally, we have placed great emphasis on the ease of installation and use of IGV, with the goal of making both the viewing and sharing of their data accessible to non-informatics end users. IGV is open source software and freely available at http://www.broadinstitute.org/igv/, including full documentation on use of the software. Supplementary Material 1

0 comments Cited 3598 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Michael A. Brockhurst:

ORCID: https://orcid.org/0000-0003-0362-820X

David S. Guttman: Role: Editor

Journal

Journal ID (nlm-ta): mBio

Journal ID (iso-abbrev): mBio

Journal ID (pmc): mbio

Journal ID (publisher-id): mbio

Title: mBio

Publisher: American Society for Microbiology (1752 N St., N.W., Washington, DC )

ISSN (Electronic): 2150-7511

Publication date (Electronic, pub): 11 May 2021

Publication date (Electronic, collection): May-Jun 2021

Volume: 12

Issue: 3

Electronic Location Identifier: e00558-21

Affiliations

[a ]Division of Evolution and Genomic Sciences, School of Biological Sciences, University of Manchester, Manchester, United Kingdom

[b ]Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom

[c ]Department of Biology, University of York, York, United Kingdom

[d ]Department of Evolution, Ecology and Behaviour, Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom

University of Toronto

Author information

Ellie Harrison https://orcid.org/0000-0002-2050-4631

Michael A. Brockhurst https://orcid.org/0000-0003-0362-820X

Article

Publisher ID: mBio00558-21

DOI: 10.1128/mBio.00558-21

PMC ID: 8262885

PubMed ID: 33975933

SO-VID: 987b33e9-1ed1-4274-a9f8-dea515da18e3

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.

History

Date received : 26 February 2021

Date accepted : 3 April 2021

Page count

supplementary-material: 10, Figures: 5, Tables: 0, Equations: 0, References: 45, Pages: 12, Words: 8279

Funding

Funded by: FP7 Ideas: European Research Council (FP7 Ideas), FundRef https://doi.org/10.13039/100011199;

Award ID: 311490

Award Recipient : Michael Brockhurst

Funded by: UKRI | Natural Environment Research Council (NERC), FundRef https://doi.org/10.13039/501100000270;

Award ID: NE/R008825/1

Award Recipient : Michael Brockhurst

Funded by: UKRI | Biotechnology and Biological Sciences Research Council (BBSRC), FundRef https://doi.org/10.13039/501100000268;

Award ID: BB/R006253/1

Award Recipient : Michael Brockhurst

Funded by: Leverhulme Trust, FundRef https://doi.org/10.13039/501100000275;

Award ID: PLP-2014-242

Award Recipient : Michael Brockhurst

Custom metadata

cover-date May/June 2021

ScienceOpen disciplines: Life sciences

Keywords: experimental evolution,horizontal gene transfer,plasmid biology

Data availability:

ScienceOpen disciplines: Life sciences

Keywords: experimental evolution, horizontal gene transfer, plasmid biology

Comments

Comment on this article

scite_

Cited by 10

See all cited by

Most referenced authors 598

See all reference authors

Positive Selection Inhibits Plasmid Coexistence in Bacterial Genomes

Read this article at

ABSTRACT

Related collections

Bacterial extracellular vesicles

Most cited references 45

Fast and accurate short read alignment with Burrows–Wheeler transform

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Integrative Genomics Viewer

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 191

Cited by 10

Most referenced authors 598