YaHS: yet another Hi-C scaffolding tool

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Summary

We present YaHS, a user-friendly command-line tool for the construction of chromosome-scale scaffolds from Hi-C data. It can be run with a single-line command, requires minimal input from users (an assembly file and an alignment file) which is compatible with similar tools and provides assembly results in multiple formats, thereby enabling rapid, robust and scalable construction of high-quality genome assemblies with high accuracy and contiguity.

Availability and implementation

YaHS is implemented in C and licensed under the MIT License. The source code, documentation and tutorial are available at https://github.com/sanger-tol/yahs.

Supplementary information

Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: not found

Comprehensive mapping of long-range interactions reveals folding principles of the human genome.

Erez Lieberman-Aiden, Nynke van Berkum, Louise Williams … (2009)

We describe Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. We constructed spatial proximity maps of the human genome with Hi-C at a resolution of 1 megabase. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.

0 comments Cited 1735 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.

Olga Dudchenko, Sanjit Batra, Arina Omer … (2017)

The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aeaegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species.

0 comments Cited 878 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The complete sequence of a human genome*

Sergey Nurk, Sergey Koren, Arang Rhie … (2022)

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion base pair (bp) sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million bp of sequence containing 1,956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies. Twenty years after the initial drafts, a truly complete sequence of a human genome reveals what has been missing.

0 comments Cited 719 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Chenxi Zhou:

ORCID: https://orcid.org/0000-0002-1735-2630

Shane A McCarthy:

ORCID: https://orcid.org/0000-0002-2715-4187

Richard Durbin:

ORCID: https://orcid.org/0000-0002-9130-1006

Can Alkan: Role: Associate Editor

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date Collection: January 2023

Publication date (Electronic): 16 December 2022

Publication date PMC-release: 16 December 2022

Volume: 39

Issue: 1

Electronic Location Identifier: btac808

Affiliations

Department of Genetics, University of Cambridge , Cambridge CB2 3EH, UK

Wellcome Sanger Institute, Wellcome Genome Campus , Cambridge CB10 1SA, UK

Department of Genetics, University of Cambridge , Cambridge CB2 3EH, UK

Wellcome Sanger Institute, Wellcome Genome Campus , Cambridge CB10 1SA, UK

Department of Genetics, University of Cambridge , Cambridge CB2 3EH, UK

Wellcome Sanger Institute, Wellcome Genome Campus , Cambridge CB10 1SA, UK

Author notes

To whom correspondence should be addressed. rd109@ 123456cam.ac.uk

Author information

Chenxi Zhou https://orcid.org/0000-0002-1735-2630

Shane A McCarthy https://orcid.org/0000-0002-2715-4187

Richard Durbin https://orcid.org/0000-0002-9130-1006

Article

Publisher ID: btac808

DOI: 10.1093/bioinformatics/btac808

PMC ID: 9848053

PubMed ID: 36525368

SO-VID: a4be23b5-60d7-4728-a212-bd2ddab27907

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 30 June 2022

Date revision received : 09 December 2022

Date: 12 December 2022

Date accepted : 15 December 2022

Date: 27 December 2022

Page count

Pages: 3

Funding

Funded by: Wellcome, DOI 10.13039/100010269;

Award ID: 207492

Award ID: 218328

Award ID: 220540

Comments

Comment on this article

scite_

Cited by 217

See all cited by

Most referenced authors 927

See all reference authors

- Version 1

YaHS: yet another Hi-C scaffolding tool

Read this article at

Abstract

Summary

Availability and implementation

Supplementary information

Related collections

REPO4EU WP2 Tools

Most cited references 17

Comprehensive mapping of long-range interactions reveals folding principles of the human genome.

De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.

The complete sequence of a human genome*

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 88

Cited by 217

Most referenced authors 927