Navigating bottlenecks and trade-offs in genomic data analysis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

<p xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" class="first" dir="auto" id="d7377948e93">Genome sequencing and analysis allow researchers to decode the functional information hidden in DNA sequences as well as to study cell to cell variation within a cell population. Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance. Recent methodological advances, such as data sketching, accelerators and domain-specific libraries/languages, promise to address these modern computational challenges. However, despite being more efficient, these innovations come with a new set of trade-offs, both expected, such as accuracy versus memory and expense versus time, and more subtle, including the human expertise needed to use non-standard programming interfaces and set up complex infrastructure. In this Review, we discuss how to navigate these new methodological advances and their trade-offs. </p>

Related collections

Most cited references 124

Record: found
Abstract: found
Article: found

Is Open Access

The Sequence Alignment/Map format and SAMtools

Heng Li, Bob Handsaker, Alec Wysoker … (2009)

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 16600 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Fast gapped-read alignment with Bowtie 2.

Ben Langmead, Steven L Salzberg (2012)

As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

0 comments Cited 15742 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Basic local alignment search tool.

Stephen F Altschul, Warren Gish, Webb Miller … (1990)

A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.

0 comments Cited 10735 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Bonnie Berger: (View ORCID Profile)

Yun William Yu: (View ORCID Profile)

Journal

Title: Nature Reviews Genetics

Abbreviated Title: Nat Rev Genet

Publisher: Springer Science and Business Media LLC

ISSN (Print): 1471-0056

ISSN (Electronic): 1471-0064

Publication date Created: April 2023

Publication date (Electronic): December 07 2022

Publication date (Print): April 2023

Volume: 24

Issue: 4

Pages: 235-250

Article

DOI: 10.1038/s41576-022-00551-z

PMC ID: 10204111

PubMed ID: 36476810

SO-VID: 716ad068-d1fa-4041-b982-d9c951e15fa9

License:

https://www.springernature.com/gp/researchers/text-and-data-mining

History

Data availability:

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Navigating bottlenecks and trade-offs in genomic data analysis

Read this article at

Abstract

Related collections

Core Readings in Statistical Mediation Analysis

Most cited references 124

The Sequence Alignment/Map format and SAMtools

Fast gapped-read alignment with Bowtie 2.

Basic local alignment search tool.

Author and article information

Contributors

Journal

Article

History

Comments

Comment on this article

Similar content 5,863

Cited by 10

Most referenced authors 3,315