36
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Integrating Hi-C links with assembly graphs for chromosome-scale assembly

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.

          Author summary

          Hi-C technology was originally proposed to study the 3D organization of a genome. Recently, it has also been applied to assemble large eukaryotic genomes into chromosome-scale scaffolds. Despite this, there are few open source methods to generate these assemblies. Existing methods are also prone to small inversion errors due to noise in the Hi-C data. In this work, we address these challenges and develop a method, named SALSA2. SALSA2 uses sequence overlap information from an assembly graph to correct inversion errors and provide accurate chromosome-scale assemblies.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: not found

          Assembly algorithms for next-generation sequencing data.

          The emergence of next-generation sequencing platforms led to resurgence of research in whole-genome shotgun assembly algorithms and software. DNA sequencing data from the Roche 454, Illumina/Solexa, and ABI SOLiD platforms typically present shorter read lengths, higher coverage, and different error profiles compared with Sanger sequencing data. Since 2005, several assembly software packages have been created or revised specifically for de novo assembly of next-generation sequencing data. This review summarizes and compares the published descriptions of packages named SSAKE, SHARCGS, VCAKE, Newbler, Celera Assembler, Euler, Velvet, ABySS, AllPaths, and SOAPdenovo. More generally, it compares the two standard methods known as the de Bruijn graph approach and the overlap/layout/consensus approach to assembly. Copyright 2010 Elsevier Inc. All rights reserved.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Organization of the mitotic chromosome.

            Mitotic chromosomes are among the most recognizable structures in the cell, yet for over a century their internal organization remains largely unsolved. We applied chromosome conformation capture methods, 5C and Hi-C, across the cell cycle and revealed two distinct three-dimensional folding states of the human genome. We show that the highly compartmentalized and cell type-specific organization described previously for nonsynchronous cells is restricted to interphase. In metaphase, we identified a homogenous folding state that is locus-independent, common to all chromosomes, and consistent among cell types, suggesting a general principle of metaphase chromosome organization. Using polymer simulations, we found that metaphase Hi-C data are inconsistent with classic hierarchical models and are instead best described by a linearly organized longitudinally compressed array of consecutive chromatin loops.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Paths, trees, and flowers

                Bookmark

                Author and article information

                Contributors
                Role: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Formal analysisRole: MethodologyRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Writing – review & editing
                Role: Data curationRole: Writing – review & editing
                Role: SupervisionRole: Writing – review & editing
                Role: InvestigationRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: InvestigationRole: SoftwareRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                August 2019
                21 August 2019
                : 15
                : 8
                : e1007273
                Affiliations
                [1 ] Department of Computer Science, University of Maryland, College Park, Maryland, United States of America
                [2 ] Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America
                [3 ] Arima Genomics, San Diego, California, United States of America
                Ottawa University, CANADA
                Author notes

                Sergey Koren has received travel and accommodation expenses to speak at Oxford Nanopore Technologies conferences. Anthony Schmitt and Siddarth Selvaraj are employees of Arima Genomics, a company commercializing Hi-C DNA sequencing technologies.

                Author information
                http://orcid.org/0000-0003-1381-4081
                http://orcid.org/0000-0002-9809-8127
                http://orcid.org/0000-0001-8431-1428
                http://orcid.org/0000-0001-9617-5304
                http://orcid.org/0000-0003-2983-8934
                http://orcid.org/0000-0002-1472-8962
                Article
                PCOMPBIOL-D-19-00061
                10.1371/journal.pcbi.1007273
                6719893
                31433799
                d4877977-d1ad-4170-9c04-3c4068e72948

                This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

                History
                : 11 January 2019
                : 18 July 2019
                Page count
                Figures: 8, Tables: 2, Pages: 19
                Funding
                The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. AS and SS were funded by generous support from NHGRI (grant\# 1R44HG009584). JG and MP were supported by NIH grant R01-AI-100947 to MP. SK, AR, BPW, and AMP were supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. AR was also supported by a grant from the Korean Visiting Scientist Training Award (KVSTA) through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health \& Welfare, Republic of Korea (grant number: HI17C2098). This work utilized the computational resources of the NIH HPC Biowulf cluster ( http://hpc.nih.gov).
                Categories
                Research Article
                Computer and Information Sciences
                Data Visualization
                Infographics
                Graphs
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genomic Libraries
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genomic Libraries
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Sequence Assembly Tools
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Sequence Assembly Tools
                Research and Analysis Methods
                Database and Informatics Methods
                Bioinformatics
                Sequence Analysis
                Sequence Alignment
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Gene Mapping
                Chromosome Mapping
                Research and Analysis Methods
                Molecular Biology Techniques
                Gene Mapping
                Chromosome Mapping
                Biology and Life Sciences
                Computational Biology
                Genomics Statistics
                Biology and Life Sciences
                Genetics
                Genomics
                Genomics Statistics
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Sequencing Techniques
                Genome Sequencing
                Research and Analysis Methods
                Molecular Biology Techniques
                Sequencing Techniques
                Genome Sequencing
                Biology and Life Sciences
                Genetics
                Genetic Loci
                Custom metadata
                vor-update-to-uncorrected-proof
                2019-09-03
                All relevant data are within the manuscript and its Supporting Information files.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article