1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Test development, optimization and validation of a WGS pipeline for genetic disorders

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          With advances in massive parallel sequencing (MPS) technology, whole-genome sequencing (WGS) has gradually evolved into the first-tier diagnostic test for genetic disorders. However, deployment practice and pipeline testing for clinical WGS are lacking.

          Methods

          In this study, we introduced a whole WGS pipeline for genetic disorders, which included the entire process from obtaining a sample to clinical reporting. All samples that underwent WGS were constructed using polymerase chain reaction (PCR)-free library preparation protocols and sequenced on the MGISEQ-2000 platform. Bioinformatics pipelines were developed for the simultaneous detection of various types of variants, including single nucleotide variants (SNVs), insertions and deletions (indels), copy number variants (CNVs) and balanced rearrangements, mitochondrial (MT) variants, and other complex variants such as repeat expansion, pseudogenes and absence of heterozygosity (AOH). A semiautomatic pipeline was developed for the interpretation of potential SNVs and CNVs. Forty-five samples (including 14 positive commercially available samples, 23 laboratory-held positive cell lines and 8 clinical cases) with known variants were used to validate the whole pipeline.

          Results

          In this study, a whole WGS pipeline for genetic disorders was developed and optimized. Forty-five samples with known variants (6 with SNVs and Indels, 3 with MT variants, 5 with aneuploidies, 1 with triploidy, 23 with CNVs, 5 with balanced rearrangements, 2 with repeat expansions, 1 with AOHs, and 1 with exon 7–8 deletion of SMN1 gene) validated the effectiveness of our pipeline.

          Conclusions

          This study has been piloted in test development, optimization, and validation of the WGS pipeline for genetic disorders. A set of best practices were recommended using our pipeline, along with a dataset of positive samples for benchmarking.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s12920-023-01495-x.

          Related collections

          Most cited references52

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Fast and accurate short read alignment with Burrows–Wheeler transform

          Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            fastp: an ultra-fast all-in-one FASTQ preprocessor

            Abstract Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              PLINK: a tool set for whole-genome association and population-based linkage analyses.

              Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
                Bookmark

                Author and article information

                Contributors
                sunjun@bgi.com
                kwwgc@163.com
                Journal
                BMC Med Genomics
                BMC Med Genomics
                BMC Medical Genomics
                BioMed Central (London )
                1755-8794
                5 April 2023
                5 April 2023
                2023
                : 16
                : 74
                Affiliations
                [1 ]GRID grid.410726.6, ISNI 0000 0004 1797 8419, College of Life Sciences, , University of Chinese Academy of Sciences, ; Beijing, 100049 China
                [2 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, Tianjin Medical Laboratory, , BGI-Tianjin, BGI-Shenzhen, ; Tianjin, 300308 China
                [3 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, BGI-Tianjin, BGI-Shenzhen, ; Tianjin, 300308 China
                [4 ]Department of Paediatrics, Pu’er People’s Hospital, Pu’er, 665000 China
                [5 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, BGI Genomics, BGI-Shenzhen, ; Shenzhen, 518083 China
                [6 ]GRID grid.5170.3, ISNI 0000 0001 2181 8870, DTU Bioengineering, , Technical University of Denmark, ; 2800 Kongens Lyngby, Denmark
                [7 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, BGI-Wuhan Clinical Laboratories, BGI-Shenzhen, ; Wuhan, 430074 China
                [8 ]GRID grid.21155.32, ISNI 0000 0001 2034 1839, Clinical Laboratory of BGI Health, , BGI-Shenzhen, ; Shenzhen, 518083 China
                [9 ]Pu’er People’s Hospital, Pu’er, 665000 China
                Article
                1495
                10.1186/s12920-023-01495-x
                10077614
                37020281
                87abda5e-c93b-4667-a498-81dd5ca318f1
                © The Author(s) 2023

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 8 December 2022
                : 22 March 2023
                Funding
                Funded by: Special Foundation for High-level Talents of Guangdong
                Award ID: 2016TX03R171
                Award Recipient :
                Categories
                Research
                Custom metadata
                © The Author(s) 2023

                Genetics
                whole genome sequencing,genetic disorders,clinical diagnosis,bioinformatics pipelines
                Genetics
                whole genome sequencing, genetic disorders, clinical diagnosis, bioinformatics pipelines

                Comments

                Comment on this article