115
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A central challenge in interpreting personal genomes is determining which mutations most likely influence disease. Although progress has been made in scoring the functional impact of individual mutations, the characteristics of the genes in which those mutations are found remain largely unexplored. For example, genes known to carry few common functional variants in healthy individuals may be judged more likely to cause certain kinds of disease than genes known to carry many such variants. Until now, however, it has not been possible to develop a quantitative assessment of how well genes tolerate functional genetic variation on a genome-wide scale. Here we describe an effort that uses sequence data from 6503 whole exome sequences made available by the NHLBI Exome Sequencing Project (ESP). Specifically, we develop an intolerance scoring system that assesses whether genes have relatively more or less functional genetic variation than expected based on the apparently neutral variation found in the gene. To illustrate the utility of this intolerance score, we show that genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease, but with striking variation in intolerance among genes causing different classes of genetic disease. We conclude by showing that use of an intolerance ranking system can aid in interpreting personal genomes and identifying pathogenic mutations.

          Author Summary

          This work uses empirical single nucleotide variant data from the NHLBI Exome Sequencing Project to introduce a genome-wide scoring system that ranks human genes in terms of their intolerance to standing functional genetic variation in the human population. It is often inferred that genes carrying relatively fewer or relatively more common functional variants in healthy individuals may be judged respectively more or less likely to cause certain kinds of disease. We show that this intolerance score correlates remarkably well with genes already known to cause Mendelian diseases (P<10 −26). Equally striking, however, are the differences in the relationship between standing genetic variation and disease causing genes for different disease types. Considering disorder classes defined by Goh et al (2007) human disease network, we show a nearly opposite pattern for genes linked to developmental disorders and those linked to immunological disorders, with the former being preferentially caused by genes that do not tolerate functional variation and the latter caused by genes with an excess of common functional variation. We conclude by showing that use of an intolerance ranking system can facilitate interpreting personal genomes and can facilitate identifying high impact mutations through the gene in which they occur.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: found
          • Article: not found

          De novo gene disruptions in children on the autistic spectrum.

          Exome sequencing of 343 families, each with a single child on the autism spectrum and at least one unaffected sibling, reveal de novo small indels and point substitutions, which come mostly from the paternal line in an age-dependent manner. We do not see significantly greater numbers of de novo missense mutations in affected versus unaffected children, but gene-disrupting mutations (nonsense, splice site, and frame shifts) are twice as frequent, 59 to 28. Based on this differential and the number of recurrent and total targets of gene disruption found in our and similar studies, we estimate between 350 and 400 autism susceptibility genes. Many of the disrupted genes in these studies are associated with the fragile X protein, FMRP, reinforcing links between autism and synaptic plasticity. We find FMRP-associated genes are under greater purifying selection than the remainder of genes and suggest they are especially dosage-sensitive targets of cognitive disorders. Copyright © 2012 Elsevier Inc. All rights reserved.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A codon-based model of nucleotide substitution for protein-coding DNA sequences.

            (1994)
            A codon-based model for the evolution of protein-coding DNA sequences is presented for use in phylogenetic estimation. A Markov process is used to describe substitutions between codons. Transition/transversion rate bias and codon usage bias are allowed in the model, and selective restraints at the protein level are accommodated using physicochemical distances between the amino acids coded for by the codons. Analyses of two data sets suggest that the new codon-based model can provide a better fit to data than can nucleotide-based models and can produce more reliable estimates of certain biologically important measures such as the transition/transversion rate ratio and the synonymous/nonsynonymous substitution rate ratio.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

              Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                1553-7390
                1553-7404
                August 2013
                August 2013
                22 August 2013
                : 9
                : 8
                : e1003709
                Affiliations
                [1 ]Center for Human Genome Variation, Duke University, School of Medicine, Durham, North Carolina, United States of America
                [2 ]Departments of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria, Australia
                [3 ]Department of Medicine, Section of Medical Genetics, Duke University, School of Medicine, Durham, North Carolina, United States of America
                [4 ]Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, United States of America
                Dartmouth College, United States of America
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: SP ASA DBG. Performed the experiments: SP QW ASA DBG. Analyzed the data: SP ELH ASA DBG. Wrote the paper: SP QW ELH ASA DBG. Collected data for analysis: SP QW.

                Article
                PGENETICS-D-13-00588
                10.1371/journal.pgen.1003709
                3749936
                23990802
                961a291c-1222-4795-ab5a-e31b8e315435
                Copyright @ 2013

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 4 March 2013
                : 23 June 2013
                Page count
                Pages: 13
                Funding
                SP is a National Health and Medical Research Council of Australia (NHMRC) (CJ Martin) Early Career Fellow (1035130). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology
                Computational Biology
                Genomics
                Genetics
                Genomics
                Population Biology
                Population Genetics
                Mathematics
                Statistics
                Biostatistics
                Statistical Methods

                Genetics
                Genetics

                Comments

                Comment on this article