Test development, optimization and validation of a WGS pipeline for genetic disorders

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

With advances in massive parallel sequencing (MPS) technology, whole-genome sequencing (WGS) has gradually evolved into the first-tier diagnostic test for genetic disorders. However, deployment practice and pipeline testing for clinical WGS are lacking.

Methods

In this study, we introduced a whole WGS pipeline for genetic disorders, which included the entire process from obtaining a sample to clinical reporting. All samples that underwent WGS were constructed using polymerase chain reaction (PCR)-free library preparation protocols and sequenced on the MGISEQ-2000 platform. Bioinformatics pipelines were developed for the simultaneous detection of various types of variants, including single nucleotide variants (SNVs), insertions and deletions (indels), copy number variants (CNVs) and balanced rearrangements, mitochondrial (MT) variants, and other complex variants such as repeat expansion, pseudogenes and absence of heterozygosity (AOH). A semiautomatic pipeline was developed for the interpretation of potential SNVs and CNVs. Forty-five samples (including 14 positive commercially available samples, 23 laboratory-held positive cell lines and 8 clinical cases) with known variants were used to validate the whole pipeline.

Results

In this study, a whole WGS pipeline for genetic disorders was developed and optimized. Forty-five samples with known variants (6 with SNVs and Indels, 3 with MT variants, 5 with aneuploidies, 1 with triploidy, 23 with CNVs, 5 with balanced rearrangements, 2 with repeat expansions, 1 with AOHs, and 1 with exon 7–8 deletion of SMN1 gene) validated the effectiveness of our pipeline.

Conclusions

This study has been piloted in test development, optimization, and validation of the WGS pipeline for genetic disorders. A set of best practices were recommended using our pipeline, along with a dataset of positive samples for benchmarking.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12920-023-01495-x.

Related collections

Most cited references 52

Record: found
Abstract: found
Article: found

Is Open Access

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li, Richard Durbin (2009)

Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 11044 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

fastp: an ultra-fast all-in-one FASTQ preprocessor

Shifu Chen, Yanqing Zhou, Yaru Chen … (2018)

Abstract Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.

0 comments Cited 6145 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

PLINK: a tool set for whole-genome association and population-based linkage analyses.

Shaun Purcell, Benjamin M. Neale, Kathe Todd-Brown … (2007)

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

0 comments Cited 5684 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jun Sun: sunjun@bgi.com

Wei Chen: kwwgc@163.com

Journal

Journal ID (nlm-ta): BMC Med Genomics

Journal ID (iso-abbrev): BMC Med Genomics

Title: BMC Medical Genomics

Publisher: BioMed Central (London )

ISSN (Electronic): 1755-8794

Publication date (Electronic): 5 April 2023

Publication date PMC-release: 5 April 2023

Publication date Collection: 2023

Volume: 16

Electronic Location Identifier: 74

Affiliations

[1 ]GRID grid.410726.6, ISNI 0000 0004 1797 8419, College of Life Sciences, , University of Chinese Academy of Sciences, ; Beijing, 100049 China

[2 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, Tianjin Medical Laboratory, , BGI-Tianjin, BGI-Shenzhen, ; Tianjin, 300308 China

[3 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, BGI-Tianjin, BGI-Shenzhen, ; Tianjin, 300308 China

[4 ]Department of Paediatrics, Pu’er People’s Hospital, Pu’er, 665000 China

[5 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, BGI Genomics, BGI-Shenzhen, ; Shenzhen, 518083 China

[6 ]GRID grid.5170.3, ISNI 0000 0001 2181 8870, DTU Bioengineering, , Technical University of Denmark, ; 2800 Kongens Lyngby, Denmark

[7 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, BGI-Wuhan Clinical Laboratories, BGI-Shenzhen, ; Wuhan, 430074 China

[8 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, Clinical Laboratory of BGI Health, , BGI-Shenzhen, ; Shenzhen, 518083 China

[9 ]Pu’er People’s Hospital, Pu’er, 665000 China

Article

Publisher ID: 1495

DOI: 10.1186/s12920-023-01495-x

PMC ID: 10077614

PubMed ID: 37020281

SO-VID: 87abda5e-c93b-4667-a498-81dd5ca318f1

License:

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

History

Date received : 8 December 2022

Date accepted : 22 March 2023

Funding

Funded by: Special Foundation for High-level Talents of Guangdong

Award ID: 2016TX03R171

Award Recipient : Zhiyu Peng

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: whole genome sequencing,genetic disorders,clinical diagnosis,bioinformatics pipelines

Data availability:

ScienceOpen disciplines: Genetics

Keywords: whole genome sequencing, genetic disorders, clinical diagnosis, bioinformatics pipelines

Test development, optimization and validation of a WGS pipeline for genetic disorders

Read this article at

Abstract

Background

Methods

Results

Conclusions

Supplementary Information

Related collections

Arabidopsis genomics

Most cited references 52

Fast and accurate short read alignment with Burrows–Wheeler transform

fastp: an ultra-fast all-in-one FASTQ preprocessor

PLINK: a tool set for whole-genome association and population-based linkage analyses.

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 126

Most referenced authors 1,372