8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      YaHS: yet another Hi-C scaffolding tool

      brief-report
      , ,
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Summary

          We present YaHS, a user-friendly command-line tool for the construction of chromosome-scale scaffolds from Hi-C data. It can be run with a single-line command, requires minimal input from users (an assembly file and an alignment file) which is compatible with similar tools and provides assembly results in multiple formats, thereby enabling rapid, robust and scalable construction of high-quality genome assemblies with high accuracy and contiguity.

          Availability and implementation

          YaHS is implemented in C and licensed under the MIT License. The source code, documentation and tutorial are available at https://github.com/sanger-tol/yahs.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Comprehensive mapping of long-range interactions reveals folding principles of the human genome.

          We describe Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. We constructed spatial proximity maps of the human genome with Hi-C at a resolution of 1 megabase. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.

            The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aeaegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The complete sequence of a human genome*

              Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion base pair (bp) sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million bp of sequence containing 1,956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies. Twenty years after the initial drafts, a truly complete sequence of a human genome reveals what has been missing.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                January 2023
                16 December 2022
                16 December 2022
                : 39
                : 1
                : btac808
                Affiliations
                Department of Genetics, University of Cambridge , Cambridge CB2 3EH, UK
                Wellcome Sanger Institute, Wellcome Genome Campus , Cambridge CB10 1SA, UK
                Department of Genetics, University of Cambridge , Cambridge CB2 3EH, UK
                Wellcome Sanger Institute, Wellcome Genome Campus , Cambridge CB10 1SA, UK
                Department of Genetics, University of Cambridge , Cambridge CB2 3EH, UK
                Wellcome Sanger Institute, Wellcome Genome Campus , Cambridge CB10 1SA, UK
                Author notes
                To whom correspondence should be addressed. rd109@ 123456cam.ac.uk
                Author information
                https://orcid.org/0000-0002-1735-2630
                https://orcid.org/0000-0002-2715-4187
                https://orcid.org/0000-0002-9130-1006
                Article
                btac808
                10.1093/bioinformatics/btac808
                9848053
                36525368
                a4be23b5-60d7-4728-a212-bd2ddab27907
                © The Author(s) 2022. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 30 June 2022
                : 09 December 2022
                : 12 December 2022
                : 15 December 2022
                : 27 December 2022
                Page count
                Pages: 3
                Funding
                Funded by: Wellcome, DOI 10.13039/100010269;
                Award ID: 207492
                Award ID: 218328
                Award ID: 220540
                Categories
                Applications Note
                Genome Analysis
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article