13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon’s somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Sentieon DNASeq Variant Calling Workflow Demonstrates Strong Computational Performance and Accuracy

          As reliable, efficient genome sequencing becomes ubiquitous, the need for similarly reliable and efficient variant calling becomes increasingly important. The Genome Analysis Toolkit (GATK), maintained by the Broad Institute, is currently the widely accepted standard for variant calling software. However, alternative solutions may provide faster variant calling without sacrificing accuracy. One such alternative is Sentieon DNASeq, a toolkit analogous to GATK but built on a highly optimized backend. We conducted an independent evaluation of the DNASeq single-sample variant calling pipeline in comparison to that of GATK. Our results support the near-identical accuracy of the two software packages, showcase optimal scalability and great speed from Sentieon, and describe computational performance considerations for the deployment of DNASeq.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Sentieon Genomics Tools - A fast and accurate solution to variant calling from next-generation sequence data

            In the past six years worldwide capacity for human genome sequencing has grown by more than five orders of magnitude, with costs falling by nearly two orders of magnitude over the same period. The rapid expansion in the production of next-generation sequence data and the use of these data in a wide range of new applications has created a need for improved computational tools for data processing. The Sentieon Genomics tools provide an optimized reimplementation of the most accurate pipelines for calling variants from next-generation sequence data, resulting in more than a 10-fold increase in processing speed while providing identical results to best practices pipelines. Here we demonstrate the consistency and improved performance of Sentieon's tools relative to BWA, GATK, MuTect, and MuTect2 through analysis of publically available human exome, low-coverage genome, and high-depth genome sequence data.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              pblat: a multithread blat algorithm speeding up aligning sequences to genomes

              Background The blat is a widely used sequence alignment tool. It is especially useful for aligning long sequences and gapped mapping, which cannot be performed properly by other fast sequence mappers designed for short reads. However, the blat tool is single threaded and when used to map whole genome or whole transcriptome sequences to reference genomes this program can take days to finish, making it unsuitable for large scale sequencing projects and iterative analysis. Here, we present pblat (parallel blat), a parallelized blat algorithm with multithread and cluster computing support, which functions to rapidly fine map large scale DNA/RNA sequences against genomes. Results The pblat algorithm takes advantage of modern multicore processors and significantly reduces the run time with the number of threads used. pblat utilizes almost equal amount of memory as when running blat. The results generated by pblat are identical with those generated by blat. The pblat tool is easy to install and can run on Linux and Mac OS systems. In addition, we provide a cluster version of pblat (pblat-cluster) running on computing clusters with MPI support. Conclusion pblat is open source and free available for non-commercial users. It is easy to install and easy to use. pblat and pblat-cluster would facilitate the high-throughput mapping of large scale genomic and transcript sequences to reference genomes with both high speed and high precision.
                Bookmark

                Author and article information

                Journal
                Genomics Inform
                Genomics Inform
                GNI
                Genomics & Informatics
                Korea Genome Organization
                1598-866X
                2234-0742
                March 2020
                31 March 2020
                : 18
                : 1
                : e10
                Affiliations
                Department of Pediatrics, Nemours Alfred I duPont Hospital for Children, Wilmington, DE 19803, USA
                Author notes
                [* ]Corresponding author: E-mail: karl.franke@ 123456nemours.org
                Author information
                http://orcid.org/0000-0002-5904-2921
                http://orcid.org/0000-0002-2037-0389
                Article
                gi-2020-18-1-e10
                10.5808/GI.2020.18.1.e10
                7120354
                32224843
                0523cef1-77e8-422b-b767-cc7e377eb86e
                (c) 2020, Korea Genome Organization

                (CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 26 February 2020
                : 5 March 2020
                : 6 March 2020
                Categories
                Clinical Genomics

                Genetics
                clinical genomics,genome analysis toolkit,gpus,next generation sequencing,variant detection

                Comments

                Comment on this article