Efficient real-time selective genome sequencing on resource-constrained devices

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Third-generation nanopore sequencers offer selective sequencing or “Read Until” that allows genomic reads to be analyzed in real time and abandoned halfway if not belonging to a genomic region of “interest.” This selective sequencing opens the door to important applications such as rapid and low-cost genetic tests. The latency in analyzing should be as low as possible for selective sequencing to be effective so that unnecessary reads can be rejected as early as possible. However, existing methods that employ a subsequence dynamic time warping (sDTW) algorithm for this problem are too computationally intensive that a massive workstation with dozens of CPU cores still struggles to keep up with the data rate of a mobile phone–sized MinION sequencer.

Results

In this article, we present Hardware Accelerated Read Until (HARU), a resource-efficient hardware–software codesign-based method that exploits a low-cost and portable heterogeneous multiprocessor system-on-chip platform with on-chip field-programmable gate arrays (FPGA) to accelerate the sDTW-based Read Until algorithm. Experimental results show that HARU on a Xilinx FPGA embedded with a 4-core ARM processor is around 2.5× faster than a highly optimized multithreaded software version (around 85× faster than the existing unoptimized multithreaded software) running on a sophisticated server with a 36-core Intel Xeon processor for a SARS-CoV-2 dataset. The energy consumption of HARU is 2 orders of magnitudes lower than the same application executing on the 36-core server.

Conclusions

HARU demonstrates that nanopore selective sequencing is possible on resource-constrained devices through rigorous hardware–software optimizations. The source code for the HARU sDTW module is available as open source at https://github.com/beebdev/HARU, and an example application that uses HARU is at https://github.com/beebdev/sigfish-haru.

Related collections

Most cited references 50

Record: found
Abstract: found
Article: found

Is Open Access

The Sequence Alignment/Map format and SAMtools

Heng Li, Bob Handsaker, Alec Wysoker … (2009)

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 14718 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Minimap2: pairwise alignment for nucleotide sequences

Heng Li (2018)

Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.

0 comments Cited 4209 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens

Dimitra Repana, Joel Nulsen, Lisa Dressler … (2019)

The Network of Cancer Genes (NCG) is a manually curated repository of 2372 genes whose somatic modifications have known or predicted cancer driver roles. These genes were collected from 275 publications, including two sources of known cancer genes and 273 cancer sequencing screens of more than 100 cancer types from 34,905 cancer donors and multiple primary sites. This represents a more than 1.5-fold content increase compared to the previous version. NCG also annotates properties of cancer genes, such as duplicability, evolutionary origin, RNA and protein expression, miRNA and protein interactions, and protein function and essentiality. NCG is accessible at http://ncg.kcl.ac.uk/. Electronic supplementary material The online version of this article (10.1186/s13059-018-1612-0) contains supplementary material, which is available to authorized users.

0 comments Cited 572 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Po Jui Shih:

ORCID: https://orcid.org/0000-0001-9088-4409

Hassaan Saadat:

ORCID: https://orcid.org/0000-0003-3691-4130

Sri Parameswaran:

ORCID: https://orcid.org/0000-0003-0435-9080

Hasindu Gamaarachchi:

ORCID: https://orcid.org/0000-0002-9034-9905

Journal

Journal ID (nlm-ta): Gigascience

Journal ID (iso-abbrev): Gigascience

Journal ID (publisher-id): gigascience

Title: GigaScience

Publisher: Oxford University Press

ISSN (Electronic): 2047-217X

Publication date (Electronic): 03 July 2023

Publication date Collection: 2023

Publication date PMC-release: 03 July 2023

Volume: 12

Electronic Location Identifier: giad046

Affiliations

School of Computer Science and Engineering , UNSW Sydney, Sydney, NSW 2052, Australia

School of Electrical Engineering and Telecommunications , UNSW Sydney, Sydney, NSW 2052, Australia

School of Electrical and Information Engineering, University of Sydney , Sydney, NSW 2006, Australia

School of Computer Science and Engineering , UNSW Sydney, Sydney, NSW 2052, Australia

Genomics Pillar, Garvan Institute of Medical Research , Sydney, NSW 2010, Australia

Centre for Population Genomics, Garvan Institute of Medical Research and Murdoch Children’s Research Institute , Sydney 2010, Australia

Author notes

Correspondence address. Po Jui Shih, Computer Science Building (K17), Engineering Rd, UNSW Sydney, Kensington NSW 2052, Australia. E-mail: pojui.shih@ 123456unsw.edu.au

Correspondence address. Hasindu Gamaarachchi, Computer Science Building (K17), Engineering Rd, UNSW Sydney, Kensington NSW 2052, Australia. E-mail: h.gamaarachchi@ 123456garvan.org.au

Author information

Po Jui Shih https://orcid.org/0000-0001-9088-4409

Hassaan Saadat https://orcid.org/0000-0003-3691-4130

Sri Parameswaran https://orcid.org/0000-0003-0435-9080

Hasindu Gamaarachchi https://orcid.org/0000-0002-9034-9905

Article

Publisher ID: giad046

DOI: 10.1093/gigascience/giad046

PMC ID: 10316692

PubMed ID: 37395631

SO-VID: 1210850a-36e4-4266-9e18-38a5a72d7f05

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 19 November 2022

Date revision received : 11 April 2023

Date accepted : 02 June 2023

Page count

Pages: 16

Funding

Funded by: Australian Research Council, DOI 10.13039/501100000923;

Award ID: DE230100178

Comments

Comment on this article

scite_

Cited by 3

See all cited by

Most referenced authors 9,347

See all reference authors

Efficient real-time selective genome sequencing on resource-constrained devices

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Novel Coronavirus Disease COVID-19

Most cited references 50

The Sequence Alignment/Map format and SAMtools

Minimap2: pairwise alignment for nucleotide sequences

The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 27

Cited by 3

Most referenced authors 9,347