30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Fast and accurate mutation detection in whole genome sequences of multiple isogenic samples with IsoMut

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Detection of somatic mutations is one of the main goals of next generation DNA sequencing. A wide range of experimental systems are available for the study of spontaneous or environmentally induced mutagenic processes. However, most of the routinely used mutation calling algorithms are not optimised for the simultaneous analysis of multiple samples, or for non-human experimental model systems with no reliable databases of common genetic variations. Most standard tools either require numerous in-house post filtering steps with scarce documentation or take an unpractically long time to run. To overcome these problems, we designed the streamlined IsoMut tool which can be readily adapted to experimental scenarios where the goal is the identification of experimentally induced mutations in multiple isogenic samples.

          Methods

          Using 30 isogenic samples, reliable cohorts of validated mutations were created for testing purposes. Optimal values of the filtering parameters of IsoMut were determined in a thorough and strict optimization procedure based on these test sets.

          Results

          We show that IsoMut, when tuned correctly, decreases the false positive rate compared to conventional tools in a 30 sample experimental setup; and detects not only single nucleotide variations, but short insertions and deletions as well. IsoMut can also be run more than a hundred times faster than the most precise state of art tool, due its straightforward and easily understandable filtering algorithm.

          Conclusions

          IsoMut has already been successfully applied in multiple recent studies to find unique, treatment induced mutations in sets of isogenic samples with very low false positive rates. These types of studies provide an important contribution to determining the mutagenic effect of environmental agents or genetic defects, and IsoMut turned out to be an invaluable tool in the analysis of such data.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12859-017-1492-4) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references14

          • Record: found
          • Abstract: found
          • Article: not found

          Detection and quantification of rare mutations with massively parallel sequencing.

          The identification of mutations that are present in a small fraction of DNA templates is essential for progress in several areas of biomedical research. Although massively parallel sequencing instruments are in principle well suited to this task, the error rates in such instruments are generally too high to allow confident identification of rare variants. We here describe an approach that can substantially increase the sensitivity of massively parallel sequencing instruments for this purpose. The keys to this approach, called the Safe-Sequencing System ("Safe-SeqS"), are (i) assignment of a unique identifier (UID) to each template molecule, (ii) amplification of each uniquely tagged template molecule to create UID families, and (iii) redundant sequencing of the amplification products. PCR fragments with the same UID are considered mutant ("supermutants") only if ≥95% of them contain the identical mutation. We illustrate the utility of this approach for determining the fidelity of a polymerase, the accuracy of oligonucleotides synthesized in vitro, and the prevalence of mutations in the nuclear and mitochondrial genomes of normal cells.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            SAMBLASTER: fast duplicate marking and structural variant read extraction

            Motivation: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times. Results: We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results. Availability and implementation: SAMBLASTER is open-source C++ code and freely available for download from https://github.com/GregoryFaust/samblaster. Contact: imh4y@virginia.edu
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Sequence-specific error profile of Illumina sequencers

              We identified the sequence-specific starting positions of consecutive miscalls in the mapping of reads obtained from the Illumina Genome Analyser (GA). Detailed analysis of the miscall pattern indicated that the underlying mechanism involves sequence-specific interference of the base elongation process during sequencing. The two major sequence patterns that trigger this sequence-specific error (SSE) are: (i) inverted repeats and (ii) GGC sequences. We speculate that these sequences favor dephasing by inhibiting single-base elongation, by: (i) folding single-stranded DNA and (ii) altering enzyme preference. This phenomenon is a major cause of sequence coverage variability and of the unfavorable bias observed for population-targeted methods such as RNA-seq and ChIP-seq. Moreover, SSE is a potential cause of false single-nucleotide polymorphism (SNP) calls and also significantly hinders de novo assembly. This article highlights the importance of recognizing SSE and its underlying mechanisms in the hope of enhancing the potential usefulness of the Illumina sequencers.
                Bookmark

                Author and article information

                Contributors
                pipeko@caesar.elte.hu
                dkrib@caesar.elte.hu
                molnar.janos@ttk.mta.hu
                poti.adam@ttk.mta.hu
                marcin@cbs.dtu.dk
                bodri@complex.elte.hu
                tusnady.gabor@ttk.mta.hu
                Zoltan.Szallasi@childrens.harvard.edu
                csabai@complex.elte.hu
                szuts.david@ttk.mta.hu
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                31 January 2017
                31 January 2017
                2017
                : 18
                : 73
                Affiliations
                [1 ]ISNI 0000 0001 2294 6276, GRID grid.5591.8, Department of Physics of Complex Systems, , Eötvös Loránd University, ; H-1117 Budapest, Hungary
                [2 ]ISNI 0000 0004 0512 3755, GRID grid.425578.9, , Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, ; H-1117 Budapest, Hungary
                [3 ]ISNI 0000 0001 2181 8870, GRID grid.5170.3, Center for Biological Sequence Analysis, Department of Systems Biology, , Technical University of Denmark, ; DK-2800 Lyngby, Denmark
                [4 ]ISNI 0000 0004 0378 8438, GRID grid.2515.3, , Computational Health Informatics Program (CHIP), Boston Children’s Hospital, ; Boston, USA
                [5 ]ISNI 000000041936754X, GRID grid.38142.3c, , Harvard Medical School, ; Boston, MA 02215 USA
                [6 ]ISNI 0000 0001 0942 9821, GRID grid.11804.3c, MTA-SE-NAP, Brain Metastasis Research Group, 2nd Department of Pathology, , Semmelweis University, ; H-1091 Budapest, Hungary
                Author information
                http://orcid.org/0000-0001-7985-0136
                Article
                1492
                10.1186/s12859-017-1492-4
                5282906
                28143617
                400d4853-3536-4f35-af78-ce4db1d8f463
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 22 September 2016
                : 20 January 2017
                Funding
                Funded by: Momentum Grant of the Hungarian Academy of Sciences
                Award ID: LP2011-015
                Funded by: Novo Nordisk Foundation Interdisciplinary Synergy Programme
                Award ID: NNF15OC0016584
                Award ID: NNF15OC0016584
                Funded by: FundRef http://dx.doi.org/10.13039/100001006, Breast Cancer Research Foundation;
                Funded by: Basser Foundation
                Funded by: Széchenyi Progam, Hungary
                Award ID: KTIA_NAP_13-2014- 0021
                Funded by: Momentum Grant of the Hungarian Academy of Sciences
                Award ID: LP2012-035
                Funded by: Országos Tudományos Kutatási Alapprogramok (HU)
                Award ID: OTKA K104586
                Funded by: FundRef http://dx.doi.org/10.13039/501100003549, Országos Tudományos Kutatási Alapprogramok;
                Award ID: OTKA K104586
                Funded by: FundRef http://dx.doi.org/10.13039/501100007601, Horizon 2020;
                Award ID: 643476
                Award ID: 643476
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2017

                Bioinformatics & Computational biology
                next generation sequencing,mutagenesis,somatic mutation detection,multiple isogenic samples,low false positive rate,demonstrative algorithm

                Comments

                Comment on this article