Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome

Wenger, Aaron M.; Peluso, Paul; Rowell, William J; Chang, Pi-Chuan; Hall, Richard J; Concepcion, Gregory T.; Ebler, Jana; Fungtammasan, Arkarachai; Kolesnikov, Alexey M.; Olson, Nathan D.; Töpfer, Armin; Alonge, Michael; Mahmoud, Medhat; Qian, Yufeng; Chin, Chen-Shan; Phillippy, Adam M; Schatz, Michael C.; Myers, Gene; DePristo, Mark A; Ruan, Jue; Marschall, Tobias; Sedlazeck, Fritz J; Zook, Justin M.; Li, Heng; Koren, Sergey; Carroll, Andrew; Rank, David R; Hunkapiller, Michael W

doi:10.1038/s41587-019-0217-9

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Record: found
Abstract: found
Article: not found

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome

research-article

Author(s): Aaron M. Wenger ¹ , Paul Peluso ¹ , William J. Rowell ¹ , Pi-Chuan Chang ² , Richard J. Hall ¹ , Gregory T. Concepcion ¹ , Jana Ebler ³ ^, ⁴ ^, ⁵ , Arkarachai Fungtammasan ⁶ , Alexey Kolesnikov ² , Nathan D. Olson ⁷ , Armin Töpfer ¹ , Michael Alonge ⁸ , Medhat Mahmoud ⁹ , Yufeng Qian ¹ , Chen-Shan Chin ⁶ , Adam M. Phillippy ¹⁰ , Michael C. Schatz ⁸ , Gene Myers ¹¹ , Mark A. DePristo ² , Jue Ruan ¹² , Tobias Marschall ³ ^, ⁴ , Fritz J. Sedlazeck ⁹ , Justin M. Zook ⁷ , Heng Li ¹³ , Sergey Koren ¹⁰ , Andrew Carroll ² , David R. Rank ¹ ^, ^* , Michael W. Hunkapiller ¹ ^, ^*

Publication date (Electronic): 12 August 2019

Journal: Nature biotechnology

Read this article at

ScienceOpenPublisher PMC

Bookmark

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the ‘genome in a bottle’ (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.

Related collections

Most cited references 22

Record: found
Abstract: found
Article: not found

An integrated semiconductor device enabling non-optical genome sequencing.

Jonathan Rothberg, Wolfgang Hinz, Todd M. Rearick … (2011)

The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.

0 comments Cited 552 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Extensive sequencing of seven human genomes to characterize benchmark reference materials

Justin Zook, David Catoe, Jennifer McDaniel … (2016)

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.

0 comments Cited 315 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Accurate multiplex polony sequencing of an evolved bacterial genome.

Jay Shendure, Gregory Porreca, Nikos B Reppas … (2005)

We describe a DNA sequencing technology in which a commonly available, inexpensive epifluorescence microscope is converted to rapid nonelectrophoretic DNA sequencing automation. We apply this technology to resequence an evolved strain of Escherichia coli at less than one error per million consensus bases. A cell-free, mate-paired library provided single DNA molecules that were amplified in parallel to 1-micrometer beads by emulsion polymerase chain reaction. Millions of beads were immobilized in a polyacrylamide gel and subjected to automated cycles of sequencing by ligation and four-color imaging. Cost per base was roughly one-ninth as much as that of conventional sequencing. Our protocols were implemented with off-the-shelf instrumentation and reagents.

0 comments Cited 302 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 9604648

Journal ID (pubmed-jr-id): 20305

Journal ID (nlm-ta): Nat Biotechnol

Journal ID (iso-abbrev): Nat. Biotechnol.

Title: Nature biotechnology

ISSN (Print): 1087-0156

ISSN (Electronic): 1546-1696

Publication date Nihms-submitted: 12 July 2019

Publication date (Electronic): 12 August 2019

Publication date (Print): October 2019

Publication date PMC-release: 12 February 2020

Volume: 37

Issue: 10

Pages: 1155-1162

Affiliations

[1. ]Pacific Biosciences, Menlo Park, CA, USA

[2. ]Google Inc., Mountain View, CA, USA

[3. ]Center for Bioinformatics, Saarland University, Saarbrücken, Germany

[4. ]Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany

[5. ]Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, Saarbrücken, Germany

[6. ]DNAnexus, Mountain View, CA, USA

[7. ]National Institute of Standards and Technology, Gaithersburg, MD, USA

[8. ]Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA

[9. ]Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA

[10. ]Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA

[11. ]Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany

[12. ]Agricultural Genomics Institute, Chinese Academy of Agricultural Sciences, Shenzhen, China

[13. ]Dana-Farber Cancer Institute, Boston, MA, USA

Author notes

[†]

These authors contributed equally to this work.

Author Contributions

A.M.W., D.R.R., M.W.H., and P.P. designed the study. D.R.R. and P.P. developed the sample preparation protocol and performed sample preparation. D.R.R., P.P., and Y.Q. performed sequencing. A.C., A.K., C-S.C., M.A.D., and P.C. adapted the algorithms and implementation of DeepVariant. A.C., A.F., A.K., A.M.P., A.M.W., A.T., C-S.C., D.R.R., F.J.S., G.M., G.T.C., H.L., J.E., J.M.Z., J.R., M.A., M.A.D., M.C.S., M.M., N.D.O., P.C., P.P., R.J.H., S.K., T.M., and W.J.R. performed analysis. A.C., A.M.P., C-S.C., D.R.R., F.J.S., J.M.Z., M.A.D., M.C.S., and M.W.H. supervised analysis. A.C., A.M.W., D.R.R., G.M., J.M.Z., P.P., R.J.H., S.K., and W.J.R. wrote the manuscript. All authors reviewed and approved the final manuscript.

[* ]Address correspondence to M.W.H. ( mhunkapiller@ 123456pacb.com ) or D.R.R. ( drank@ 123456pacb.com ).

Article

Manuscript ID: NIHMS1533949

DOI: 10.1038/s41587-019-0217-9

PMC ID: 6776680

PubMed ID: 31406327

SO-VID: 5e15d399-221c-41ba-83aa-ebe2507342b3

License:

Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 22

An integrated semiconductor device enabling non-optical genome sequencing.

Extensive sequencing of seven human genomes to characterize benchmark reference materials

Accurate multiplex polony sequencing of an evolved bacterial genome.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 72

Cited by 603

Most referenced authors 7,638