8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s13059-024-03252-4.

          Related collections

          Most cited references48

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Sequence Alignment/Map format and SAMtools

          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

            Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Welcome to the Tidyverse

                Bookmark

                Author and article information

                Contributors
                wangdp@grandomics.com
                wudongdong@mail.kiz.ac.cn
                wangsheng@mail.kiz.ac.cn
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                26 April 2024
                26 April 2024
                2024
                : 25
                : 107
                Affiliations
                [1 ]GRID grid.512030.5, GrandOmics Biosciences, ; Beijing, 102206 China
                [2 ]GRID grid.419010.d, ISNI 0000 0004 1792 7072, Key Laboratory of Genetic Evolution and Animal Models, , Kunming Institute of Zoology, Chinese Academy of Sciences, ; Kunming, 650223 China
                [3 ]Centro de Investigación de Genética y Biología Molecular (CIGBM), Instituto de Investigación, Facultad de Medicina, Universidad de San Martín de Porres, ( https://ror.org/03deqdj72) Lima, 15102 Peru
                [4 ]Institute of Medical Genetics, Cardiff University, ( https://ror.org/03kk7td41) Heath Park, Cardiff, CF14 4XN UK
                [5 ]School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, ( https://ror.org/017zhmm22) Xi’an, China
                [6 ]GRID grid.410727.7, ISNI 0000 0001 0526 1937, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, , Chinese Academy of Agricultural Sciences, ; Shenzhen, 518120 China
                [7 ]GRID grid.12981.33, ISNI 0000 0001 2360 039X, State Key Laboratory of Ophthalmology, , Zhongshan Ophthalmic Center, Sun Yat-Sen University, ; #7 Jinsui Road, Tianhe District, Guangzhou, China
                [8 ]GRID grid.419010.d, ISNI 0000 0004 1792 7072, Kunming Primate Research Center, and National Research Facility for Phenotypic and Genetic Analysis of Model Animals (Primate Facility), , National Resource Center for Non-Human Primates, Kunming Institute of Zoology, Chinese Academy of Sciences, ; Kunming, 650107 China
                [9 ]GRID grid.419010.d, ISNI 0000 0004 1792 7072, Yunnan Key Laboratory of Biodiversity Information, , Kunming Institute of Zoology, Chinese Academy of Sciences, ; Kunming, China
                [10 ]GRID grid.419010.d, ISNI 0000 0004 1792 7072, Kunming Natural History Museum of Zoology, , Kunming Institute of Zoology, Chinese Academy of Sciences, ; Kunming, China
                Author information
                http://orcid.org/0000-0003-2611-3559
                Article
                3252
                10.1186/s13059-024-03252-4
                11046930
                38671502
                79611255-5163-4822-9c7d-8c8df81c5503
                © The Author(s) 2024

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 12 June 2023
                : 17 April 2024
                Categories
                Software
                Custom metadata
                © BioMed Central Ltd., part of Springer Nature 2024

                Genetics
                long reads,genome assembly,error-correction,human genomes,segmental duplication
                Genetics
                long reads, genome assembly, error-correction, human genomes, segmental duplication

                Comments

                Comment on this article