33
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Detection of long repeat expansions from PCR-free whole-genome sequence data

      research-article
      1 , 2 , 3 , 4 , 3 , 5 , 6 , 1 , 1 , 1 , 1 , 3 , 3 , 2 , 2 , 5 , 5 , 7 , 2 , 2 , 8 , 9 , 10 , 10 , 10 , 10 , 11 , 11 , 8 , 9 , 10 , 10 , 10 , 10 , 12 , 13 , 6 , 6 , 14 , 15 , The US–Venezuela Collaborative Research Group 16 , 17 , 17 , 17 , 1 , 2 , 3 , 2 , 1
      Genome Research
      Cold Spring Harbor Laboratory Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing (WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the C9orf72 repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95% CI [0.98, 1.00]) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9% (2786/2789, 95% CI [0.997, 1.00]) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.

          Related collections

          Most cited references37

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Fast and accurate short read alignment with Burrows–Wheeler transform

          Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A framework for variation discovery and genotyping using next-generation DNA sequencing data

            Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Expanded GGGGCC hexanucleotide repeat in noncoding region of C9ORF72 causes chromosome 9p-linked FTD and ALS.

              Several families have been reported with autosomal-dominant frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS), genetically linked to chromosome 9p21. Here, we report an expansion of a noncoding GGGGCC hexanucleotide repeat in the gene C9ORF72 that is strongly associated with disease in a large FTD/ALS kindred, previously reported to be conclusively linked to chromosome 9p. This same repeat expansion was identified in the majority of our families with a combined FTD/ALS phenotype and TDP-43-based pathology. Analysis of extended clinical series found the C9ORF72 repeat expansion to be the most common genetic abnormality in both familial FTD (11.7%) and familial ALS (23.5%). The repeat expansion leads to the loss of one alternatively spliced C9ORF72 transcript and to formation of nuclear RNA foci, suggesting multiple disease mechanisms. Our findings indicate that repeat expansion in C9ORF72 is a major cause of both FTD and ALS. Copyright © 2011 Elsevier Inc. All rights reserved.
                Bookmark

                Author and article information

                Journal
                Genome Res
                Genome Res
                genome
                genome
                GENOME
                Genome Research
                Cold Spring Harbor Laboratory Press
                1088-9051
                1549-5469
                November 2017
                November 2017
                : 27
                : 11
                : 1895-1903
                Affiliations
                [1 ]Illumina Incorporated, San Diego, California 92122, USA;
                [2 ]Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, The Netherlands;
                [3 ]Illumina Limited, Chesterford Research Park, Little Chesterford, Nr Saffron Walden, Essex, CB10 1XL, United Kingdom;
                [4 ]Repositive Limited, Future Business Centre, Cambridge CB4 2HY, United Kingdom;
                [5 ]Department of Neuroscience, Mayo Clinic, Jacksonville, Florida 32224, USA;
                [6 ]New York Genome Center, New York, New York 10013, USA;
                [7 ]SURFsara, 1098 XG Amsterdam, The Netherlands;
                [8 ]Academic Unit of Neurology, Trinity College Dublin, Trinity Biomedical Sciences Institute, Dublin 2, Republic of Ireland;
                [9 ]Department of Neurology, Beaumont Hospital, Dublin 9, Republic of Ireland;
                [10 ]Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London SE5 9RX, United Kingdom;
                [11 ]Department of Molecular Neuroscience, UCL Institute of Neurology, London WC1N 3BG, United Kingdom;
                [12 ]University of Southampton, Southampton SO17 1BJ, United Kingdom;
                [13 ]Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield S10 2HQ, United Kingdom;
                [14 ]Columbia University, New York, New York 10032, USA;
                [15 ]Hereditary Disease Foundation, New York, New York 10032, USA;
                [16 ]The US–Venezuela Collaborative Research Group;
                [17 ]Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
                Author notes
                [18]

                These authors contributed equally to this work.

                Article
                9509184
                10.1101/gr.225672.117
                5668946
                28887402
                76cf289c-6938-4212-a2ad-a2350bcabcfb
                © 2017 Dolzhenko et al.; Published by Cold Spring Harbor Laboratory Press

                This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

                History
                : 1 June 2017
                : 28 August 2017
                Page count
                Pages: 9
                Funding
                Funded by: SURF Cooperative
                Funded by: NIH/NINDS
                Award ID: P01 NS084974
                Award ID: R01 NS080882
                Funded by: Thierry Latran Foundation
                Funded by: Netherlands Organization for Health Research and Development
                Funded by: ALS Foundation Netherlands
                Funded by: MND Association (UK) (Project MinE)
                Funded by: W.M. Keck Foundation
                Funded by: “Finding Genetic Modifiers As Avenues to Developing New Therapeutics”
                Funded by: European Community's Health Seventh Framework Programme
                Award ID: FP7/2007-2013
                Funded by: Horizon 2020 Programme
                Award ID: 633413
                Funded by: ZonMW
                Funded by: ERA Net for Research on Rare Diseases (PYRAMID)
                Funded by: UK, Medical Research Council
                Award ID: MR/L501529/1
                Award ID: ES/L008238/1
                Funded by: Ireland, Health Research Board
                Funded by: Netherlands, ZonMw
                Funded by: National Institute for Health Research (NIHR) Dementia Biomedical Research Unit at South London and Maudsley NHS Foundation Trust and King's College London
                Funded by: UK National DNA Bank for MND Research
                Funded by: MND Association and the Wellcome Trust
                Funded by: Medical Research Council at the Centre for Integrated Genomic Medical Research, University of Manchester
                Categories
                Method

                Comments

                Comment on this article