30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Scaling read aligners to hundreds of threads on general-purpose processors

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          General-purpose processors can now contain many dozens of processor cores and support hundreds of simultaneous threads of execution. To make best use of these threads, genomics software must contend with new and subtle computer architecture issues. We discuss some of these and propose methods for improving thread scaling in tools that analyze each read independently, such as read aligners.

          Results

          We implement these methods in new versions of Bowtie, Bowtie 2 and HISAT. We greatly improve thread scaling in many scenarios, including on the recent Intel Xeon Phi architecture. We also highlight how bottlenecks are exacerbated by variable-record-length file formats like FASTQ and suggest changes that enable superior scaling.

          Availability and implementation

          Experiments for this study: https://github.com/BenLangmead/bowtie-scaling .

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: found
          • Article: not found

          Quake: quality-aware detection and correction of sequencing errors

          We introduce Quake, a program to detect and correct errors in DNA sequencing reads. Using a maximum likelihood approach incorporating quality values and nucleotide specific miscall rates, Quake achieves the highest accuracy on realistically simulated reads. We further demonstrate substantial improvements in de novo assembly and SNP detection after using Quake. Quake can be used for any size project, including more than one billion human reads, and is freely available as open source software from http://www.cbcb.umd.edu/software/quake.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Lighter: fast and memory-efficient sequencing error correction without counting

            Lighter is a fast, memory-efficient tool for correcting sequencing errors. Lighter avoids counting k-mers. Instead, it uses a pair of Bloom filters, one holding a sample of the input k-mers and the other holding k-mers likely to be correct. As long as the sampling fraction is adjusted in inverse proportion to the depth of sequencing, Bloom filter size can be held constant while maintaining near-constant accuracy. Lighter is parallelized, uses no secondary storage, and is both faster and more memory-efficient than competing approaches while achieving comparable accuracy. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0509-9) contains supplementary material, which is available to authorized users.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Extending reference assembly models

              The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 February 2019
                18 July 2018
                18 July 2018
                : 35
                : 3
                : 421-432
                Affiliations
                [1 ]Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
                [2 ]Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
                Author notes
                To whom correspondence should be addressed. E-mail: langmea@ 123456cs.jhu.edu
                Author information
                http://orcid.org/0000-0003-2437-1976
                Article
                bty648
                10.1093/bioinformatics/bty648
                6361242
                30020410
                6130d879-11d3-403f-ada8-8647d2840676
                © The Author(s) 2018. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 10 February 2018
                : 19 June 2018
                : 17 July 2018
                Page count
                Pages: 12
                Funding
                Funded by: Intel Parallel Computing Center
                Funded by: National Institutes of Health 10.13039/100000002
                Funded by: National Institute of General Medical Sciences 10.13039/100000057
                Award ID: R01GM118568
                Funded by: Texas Advanced Computing Center
                Funded by: TACC
                Award ID: TG-CIE170020
                Funded by: Extreme Science and Engineering Discovery Environment
                Funded by: XSEDE
                Funded by: National Science Foundation 10.13039/100000001
                Award ID: ACI-1548562
                Categories
                Original Papers
                Sequence Analysis

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article