17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      YaHS: yet another Hi-C scaffolding tool

      brief-report
      , ,
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Summary

          We present YaHS, a user-friendly command-line tool for the construction of chromosome-scale scaffolds from Hi-C data. It can be run with a single-line command, requires minimal input from users (an assembly file and an alignment file) which is compatible with similar tools and provides assembly results in multiple formats, thereby enabling rapid, robust and scalable construction of high-quality genome assemblies with high accuracy and contiguity.

          Availability and implementation

          YaHS is implemented in C and licensed under the MIT License. The source code, documentation and tutorial are available at https://github.com/sanger-tol/yahs.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Comprehensive mapping of long-range interactions reveals folding principles of the human genome.

          We describe Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. We constructed spatial proximity maps of the human genome with Hi-C at a resolution of 1 megabase. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.

            The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aeaegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Towards complete and error-free genome assemblies of all vertebrate species

              High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1 – 4 . To address this issue, the international Genome 10K (G10K) consortium 5 , 6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences. The Vertebrate Genome Project has used an optimized pipeline to generate high-quality genome assemblies for sixteen species (representing all major vertebrate classes), which have led to new biological insights.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                January 2023
                16 December 2022
                16 December 2022
                : 39
                : 1
                : btac808
                Affiliations
                Department of Genetics, University of Cambridge , Cambridge CB2 3EH, UK
                Wellcome Sanger Institute, Wellcome Genome Campus , Cambridge CB10 1SA, UK
                Department of Genetics, University of Cambridge , Cambridge CB2 3EH, UK
                Wellcome Sanger Institute, Wellcome Genome Campus , Cambridge CB10 1SA, UK
                Department of Genetics, University of Cambridge , Cambridge CB2 3EH, UK
                Wellcome Sanger Institute, Wellcome Genome Campus , Cambridge CB10 1SA, UK
                Author notes
                To whom correspondence should be addressed. rd109@ 123456cam.ac.uk
                Author information
                https://orcid.org/0000-0002-1735-2630
                https://orcid.org/0000-0002-2715-4187
                https://orcid.org/0000-0002-9130-1006
                Article
                btac808
                10.1093/bioinformatics/btac808
                9848053
                36525368
                a4be23b5-60d7-4728-a212-bd2ddab27907
                © The Author(s) 2022. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 30 June 2022
                : 09 December 2022
                : 12 December 2022
                : 15 December 2022
                : 27 December 2022
                Page count
                Pages: 3
                Funding
                Funded by: Wellcome, DOI 10.13039/100010269;
                Award ID: 207492
                Award ID: 218328
                Award ID: 220540
                Categories
                Applications Note
                Genome Analysis
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article