33
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Scaling read aligners to hundreds of threads on general-purpose processors

      1 , 2 , 1 , 2 , 1 , 1
      Bioinformatics
      Oxford University Press (OUP)

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Abstract Motivation General-purpose processors can now contain many dozens of processor cores and support hundreds of simultaneous threads of execution. To make best use of these threads, genomics software must contend with new and subtle computer architecture issues. We discuss some of these and propose methods for improving thread scaling in tools that analyze each read independently, such as read aligners. Results We implement these methods in new versions of Bowtie, Bowtie 2 and HISAT. We greatly improve thread scaling in many scenarios, including on the recent Intel Xeon Phi architecture. We also highlight how bottlenecks are exacerbated by variable-record-length file formats like FASTQ and suggest changes that enable superior scaling. Availability and implementation Experiments for this study: https://github.com/BenLangmead/bowtie-scaling . Bowtie http://bowtie-bio.sourceforge.net . Bowtie 2 http://bowtie-bio.sourceforge.net/bowtie2 . HISAT http://www.ccb.jhu.edu/software/hisat Supplementary information Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: found
          • Article: not found

          Quake: quality-aware detection and correction of sequencing errors

          We introduce Quake, a program to detect and correct errors in DNA sequencing reads. Using a maximum likelihood approach incorporating quality values and nucleotide specific miscall rates, Quake achieves the highest accuracy on realistically simulated reads. We further demonstrate substantial improvements in de novo assembly and SNP detection after using Quake. Quake can be used for any size project, including more than one billion human reads, and is freely available as open source software from http://www.cbcb.umd.edu/software/quake.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Lighter: fast and memory-efficient sequencing error correction without counting

            Lighter is a fast, memory-efficient tool for correcting sequencing errors. Lighter avoids counting k-mers. Instead, it uses a pair of Bloom filters, one holding a sample of the input k-mers and the other holding k-mers likely to be correct. As long as the sampling fraction is adjusted in inverse proportion to the depth of sequencing, Bloom filter size can be held constant while maintaining near-constant accuracy. Lighter is parallelized, uses no secondary storage, and is both faster and more memory-efficient than competing approaches while achieving comparable accuracy. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0509-9) contains supplementary material, which is available to authorized users.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Extending reference assembly models

              The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.
                Bookmark

                Author and article information

                Journal
                Bioinformatics
                Oxford University Press (OUP)
                1367-4803
                1460-2059
                February 01 2019
                February 01 2019
                July 18 2018
                February 01 2019
                February 01 2019
                July 18 2018
                : 35
                : 3
                : 421-432
                Affiliations
                [1 ]Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
                [2 ]Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
                Article
                10.1093/bioinformatics/bty648
                6130d879-11d3-403f-ada8-8647d2840676
                © 2018

                http://creativecommons.org/licenses/by/4.0/

                History

                Comments

                Comment on this article