77
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The rapid evolution of 454 GS-FLX sequencing technology has not been accompanied by a reassessment of the quality and accuracy of the sequences obtained. Current strategies for decision-making and error-correction are based on an initial analysis by Huse et al. in 2007, for the older GS20 system based on experimental sequences. We analyze here the quality of 454 sequencing data and identify factors playing a role in sequencing error, through the use of an extensive dataset for Roche control DNA fragments.

          Results

          We obtained a mean error rate for 454 sequences of 1.07%. More importantly, the error rate is not randomly distributed; it occasionally rose to more than 50% in certain positions, and its distribution was linked to several experimental variables. The main factors related to error are the presence of homopolymers, position in the sequence, size of the sequence and spatial localization in PT plates for insertion and deletion errors. These factors can be described by considering seven variables. No single variable can account for the error rate distribution, but most of the variation is explained by the combination of all seven variables.

          Conclusions

          The pattern identified here calls for the use of internal controls and error-correcting base callers, to correct for errors, when available (e.g. when sequencing amplicons). For shotgun libraries, the use of both sequencing primers and deep coverage, combined with the use of random sequencing primer sites should partly compensate for even high error rates, although it may prove more difficult than previous thought to distinguish between low-frequency alleles and errors.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: found
          • Article: not found

          Pyrobayes: an improved base caller for SNP discovery in pyrosequences.

          Previously reported applications of the 454 Life Sciences pyrosequencing technology have relied on deep sequence coverage for accurate polymorphism discovery because of frequent insertion and deletion sequence errors. Here we report a new base calling program, Pyrobayes, for pyrosequencing reads. Pyrobayes permits accurate single-nucleotide polymorphism (SNP) calling in resequencing applications, even in shallow read coverage, primarily because it produces more confident base calls than the native base calling program.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            New generation sequencers as a tool for genotyping of highly polymorphic multilocus MHC system.

            Accurate genotyping of complex systems, such as the major histocompatibility complex (MHC) often requires simultaneous analysis of multiple co-amplifying loci. Here we explore the utility of the massively parallel 454 sequencing method as a universal tool for genotyping complex MHC systems in nonmodel vertebrates. The power of this approach stems from the use of tagged polymerase chain reaction (PCR) primers to identify individual amplicons which can be simultaneously sequenced to the arbitrarily chosen coverage. However, the error-prone sequencing technology poses considerable challenges as it may be difficult to discriminate between sequencing errors and true rare alleles; due to complex nature of artefacts and errors, efficient quality control is required. Nevertheless, our study demonstrates the parallel 454 sequencing can be an efficient genotyping platform for MHC and provides an alternative to classical genotyping methods. We introduced procedures to identify the threshold that can be used to reduce number of genotyping errors by eliminating most of artefactual alleles (AA) representing PCR or sequencing errors. Our procedures are based on two expectations: first, that AA should be relatively rare, both overall and on per-individual basis, and second, that most AA result from errors introduced to sequences of true alleles. In our data set, alleles with an average per-individual frequency below 3% most likely represented artefacts. This threshold will vary in other applications according to the complexity of the genotyped system. We strongly suggest direct assessment of genotyping error in every experiment by running a fraction of duplicates: individuals amplified in independent PCRs. © 2009 Blackwell Publishing Ltd.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Estimation of nucleotide diversity, disequilibrium coefficients, and mutation rates from high-coverage genome-sequencing projects.

              Recent advances in sequencing strategies have made it feasible to rapidly obtain high-coverage genomic profiles of single individuals, and soon it will be economically feasible to do so with hundreds to thousands of individuals per population. While offering unprecedented power for the acquisition of population-genetic parameters, these new methods also introduce a number of challenges, most notably the need to account for the binomial sampling of parental alleles at individual nucleotide sites and to eliminate bias from various sources of sequence errors. To minimize the effects of both problems, methods are developed for generating nearly unbiased and minimum-sampling-variance estimates of a number of key parameters, including the average nucleotide heterozygosity and its variance among sites, the pattern of decomposition of linkage disequilibrium with physical distance, and the rate and molecular spectrum of spontaneously arising mutations. These methods provide a general platform for the efficient utilization of data from population-genomic surveys, while also providing guidance for the optimal design of such studies.
                Bookmark

                Author and article information

                Journal
                BMC Genomics
                BMC Genomics
                BioMed Central
                1471-2164
                2011
                19 May 2011
                : 12
                : 245
                Affiliations
                [1 ]Aix-Marseille Université, CNRS, IRD, UMR 6116 - IMEP, Equipe Evolution Génome Environnement, Centre Saint-Charles, Case 36, 3 place Victor Hugo, 13331 Marseille Cedex 3, France
                [2 ]Genoscreen, Genomic Platform and R&D, Campus de l'Institut Pasteur, 1 rue du Professeur Calmette, Bâtiment Guérin, 4ème étage, 59000 Lille, France
                [3 ]Institut National de la Recherche Agronomique, UMR 1301, Equipe BPI, 400 route des Chappes, BP 167, 06903 Sophia-Antipolis Cedex, France
                [4 ]UMR CBGP (INRA/IRD/Cirad/Montpellier SupAgro), Campus international de Baillarguet, CS 30016, F-34988 Montferrier-sur-Lez cedex, France
                Article
                1471-2164-12-245
                10.1186/1471-2164-12-245
                3116506
                21592414
                d9212b2a-f0e0-4595-90cb-0a652cea5044
                Copyright ©2011 Gilles et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 20 July 2010
                : 19 May 2011
                Categories
                Research Article

                Genetics
                Genetics

                Comments

                Comment on this article