70
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      BBMerge – Accurate paired shotgun read merging via overlap

      research-article
      1 , 2 , 1 , *
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.

          Related collections

          Most cited references15

          • Record: found
          • Abstract: found
          • Article: not found

          FLASH: fast length adjustment of short reads to improve genome assemblies.

          Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds. The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash. t.magoc@gmail.com.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Paired-end mapping reveals extensive structural variation in the human genome.

            Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Accurate multiplex polony sequencing of an evolved bacterial genome.

              We describe a DNA sequencing technology in which a commonly available, inexpensive epifluorescence microscope is converted to rapid nonelectrophoretic DNA sequencing automation. We apply this technology to resequence an evolved strain of Escherichia coli at less than one error per million consensus bases. A cell-free, mate-paired library provided single DNA molecules that were amplified in parallel to 1-micrometer beads by emulsion polymerase chain reaction. Millions of beads were immobilized in a polyacrylamide gel and subjected to automated cycles of sequencing by ligation and four-color imaging. Cost per base was roughly one-ninth as much as that of conventional sequencing. Our protocols were implemented with off-the-shelf instrumentation and reagents.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Project administrationRole: SoftwareRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Software
                Role: SupervisionRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                26 October 2017
                2017
                : 12
                : 10
                : e0185056
                Affiliations
                [1 ] DOE Joint Genome Institute, Walnut Creek, CA, United States of America
                [2 ] National Renewable Energy Laboratory, Golden, CO, United States of America
                Massey University, NEW ZEALAND
                Author notes

                Competing Interests: The authors have declared no competing interests exist.

                Author information
                http://orcid.org/0000-0002-3126-2199
                Article
                PONE-D-17-13379
                10.1371/journal.pone.0185056
                5657622
                29073143
                c2aaec0e-a55e-46d2-b25f-fbf6dca4ba44

                This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

                History
                : 6 April 2017
                : 6 September 2017
                Page count
                Figures: 6, Tables: 3, Pages: 15
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000015, U.S. Department of Energy;
                Award ID: DE-AC02-05CH11231
                This work was conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, and is supported under Contract No. DE-AC02-05CH11231. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Genetics
                Genomics
                Metagenomics
                Research and Analysis Methods
                Database and Informatics Methods
                Bioinformatics
                Sequence Analysis
                Sequence Alignment
                Biology and life sciences
                Molecular biology
                Molecular biology techniques
                Cloning
                DNA cloning
                Shotgun Sequencing
                Research and analysis methods
                Molecular biology techniques
                Cloning
                DNA cloning
                Shotgun Sequencing
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Sequencing Techniques
                Shotgun Sequencing
                Research and Analysis Methods
                Molecular Biology Techniques
                Sequencing Techniques
                Shotgun Sequencing
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Sequence Assembly Tools
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Sequence Assembly Tools
                Social Sciences
                Sociology
                Social Systems
                Computer and Information Sciences
                Computer Software
                Biology and Life Sciences
                Organisms
                Bacteria
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genome Annotation
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genome Annotation
                Custom metadata
                Mock community data are available from http://genome.jgi.doe.gov/MeCorS/MeCorS.home.html. Synthetic data generated from the genome of Chlamydomonas reinhardtii (v3.0) is available at https://genome.jgi.doe.gov/Chlre3/Chlre3.home.html and ftp://ftp.jgi-psf.org/pub/JGI_data/Chlamy/v3.0/Chlre3.fasta.gz.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article