34
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Towards population-scale long-read sequencing

      review-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Long-read sequencing technologies have now reached a level of accuracy and yield that allows their application to variant detection at a scale of tens to thousands of samples. Concomitant with the development of new computational tools, the first population-scale studies involving long-read sequencing have emerged over the past 2 years and, given the continuous advancement of the field, many more are likely to follow. In this Review, we survey recent developments in population-scale long-read sequencing, highlight potential challenges of a scaled-up approach and provide guidance regarding experimental design. We provide an overview of current long-read sequencing platforms, variant calling methodologies and approaches for de novo assemblies and reference-based mapping approaches. Furthermore, we summarize strategies for variant validation, genotyping and predicting functional impact and emphasize challenges remaining in achieving long-read sequencing at a population scale.

          Abstract

          Long-read sequencing at the population scale presents specific challenges but is becoming increasingly accessible. In this Review, Sedlazeck and colleagues discuss the major platforms and analytical tools, considerations in project design and challenges in scaling long-read sequencing to populations.

          Related collections

          Most cited references158

          • Record: found
          • Abstract: found
          • Article: not found

          The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

          Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A global reference for human genetic variation

            The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Minimap2: pairwise alignment for nucleotide sequences

              Heng Li (2018)
              Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.
                Bookmark

                Author and article information

                Contributors
                fritz.sedlazeck@bcm.edu
                Journal
                Nat Rev Genet
                Nat Rev Genet
                Nature Reviews. Genetics
                Nature Publishing Group UK (London )
                1471-0056
                1471-0064
                28 May 2021
                : 1-16
                Affiliations
                [1 ]GRID grid.11486.3a, ISNI 0000000104788040, Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, , VIB, ; Antwerp, Belgium
                [2 ]GRID grid.5284.b, ISNI 0000 0001 0790 3681, Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, , University of Antwerp, ; Antwerp, Belgium
                [3 ]GRID grid.29857.31, ISNI 0000 0001 2097 4281, Department of Biology, , Penn State University, ; Pennsylvania, PA USA
                [4 ]GRID grid.39382.33, ISNI 0000 0001 2160 926X, Human Genome Sequencing Center, , Baylor College of Medicine, ; Houston, TX USA
                Author information
                http://orcid.org/0000-0002-5248-8197
                http://orcid.org/0000-0001-6040-2691
                Article
                367
                10.1038/s41576-021-00367-3
                8161719
                34050336
                e4b95b1c-e2ca-41a3-bca2-2b6db7f9f8f8
                © Springer Nature Limited 2021

                This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.

                History
                : 20 April 2021
                Categories
                Review Article

                genome informatics,population genetics,sequencing
                genome informatics, population genetics, sequencing

                Comments

                Comment on this article