ODGI: understanding pangenome graphs

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way.

Results

We wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs.

Availability and implementation

ODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm.

Supplementary information

Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 41

Record: found
Abstract: found
Article: found

Is Open Access

The Sequence Alignment/Map format and SAMtools

Heng Li, Bob Handsaker, Alec Wysoker … (2009)

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 14785 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

BEDTools: a flexible suite of utilities for comparing genomic features

Aaron Quinlan, Ira Hall (2010)

Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools Contact: aaronquinlan@gmail.com; imh4y@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online.

0 comments Cited 6990 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

MultiQC: summarize analysis results for multiple tools and samples in a single report

Philip Andrew Ewels, Måns Magnusson, Sverker Lundin … (2016)

Motivation: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and large sample sets. Assessing analysis results across an entire project can be time consuming and error prone; batch effects and outlier samples can easily be missed in the early stages of analysis. Results: We present MultiQC, a tool to create a single report visualising output from multiple tools across many samples, enabling global trends and biases to be quickly identified. MultiQC can plot data from many common bioinformatics tools and is built to allow easy extension and customization. Availability and implementation: MultiQC is available with an GNU GPLv3 license on GitHub, the Python Package Index and Bioconda. Documentation and example reports are available at http://multiqc.info Contact: phil.ewels@scilifelab.se

0 comments Cited 2270 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Andrea Guarracino:

ORCID: https://orcid.org/0000-0001-9744-131X

Simon Heumos:

ORCID: https://orcid.org/0000-0003-3326-817X

Sven Nahnsen:

ORCID: https://orcid.org/0000-0002-4375-0691

Pjotr Prins

Erik Garrison:

ORCID: https://orcid.org/0000-0003-3821-631X

Peter Robinson: Role: Associate Editor

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date Collection: 01 July 2022

Publication date (Electronic): 13 May 2022

Publication date PMC-release: 13 May 2022

Volume: 38

Issue: 13

Pages: 3319-3326

Affiliations

Genomics Research Centre, Human Technopole , Milan 20157, Italy

Quantitative Biology Center (QBiC), University of Tübingen , Tübingen 72076, Germany

Biomedical Data Science, Department of Computer Science, University of Tübingen , Tübingen 72076, Germany

Quantitative Biology Center (QBiC), University of Tübingen , Tübingen 72076, Germany

Biomedical Data Science, Department of Computer Science, University of Tübingen , Tübingen 72076, Germany

Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center , Memphis, TN 38163, USA

Author notes

The authors wish it to be known that, in their opinion, Andrea Guarracino and Simon Heumos should be regarded as Joint First Authors.

To whom correspondence should be addressed. Email: egarris5@ 123456uthsc.edu

Author information

Andrea Guarracino https://orcid.org/0000-0001-9744-131X

Simon Heumos https://orcid.org/0000-0003-3326-817X

Sven Nahnsen https://orcid.org/0000-0002-4375-0691

Erik Garrison https://orcid.org/0000-0003-3821-631X

Article

Publisher ID: btac308

DOI: 10.1093/bioinformatics/btac308

PMC ID: 9237687

PubMed ID: 35552372

SO-VID: 7b9d4491-0a91-48f3-8b84-e58fe49ced55

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 09 November 2021

Date revision received : 18 March 2022

Date: 23 April 2022

Date: 28 May 2022

Page count

Pages: 8

Funding

Funded by: National Institutes of Health, DOI 10.13039/100000002;

Award ID: NIDA U01DA047638

Funded by: NIGMS, DOI 10.13039/100000057;

Award ID: R01GM123489

Funded by: NSF PPoSS;

Award ID: #2118709

Funded by: Federal Ministry for Economic Affairs and Energy of Germany;

Funded by: BMBF, DOI 10.13039/501100002347;

Funded by: German Network for Bioinformatics Infrastructure, DOI 10.13039/501100018929;

Award ID: 031A537B

Award ID: 031A533A

Award ID: 031A538A

Award ID: 031A533B

Award ID: 031A535A

Award ID: 031A537C

Award ID: 031A534A

Award ID: 031A532B

Comments

Comment on this article

scite_

Cited by 26

See all cited by

- Version 1

ODGI: understanding pangenome graphs

Read this article at

Abstract

Motivation

Results

Availability and implementation

Supplementary information

Related collections

Genetoberfest

Most cited references 41

The Sequence Alignment/Map format and SAMtools

BEDTools: a flexible suite of utilities for comparing genomic features

MultiQC: summarize analysis results for multiple tools and samples in a single report

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 341

Cited by 26