18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      Disease variant prediction with deep generative models of evolutionary data

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Related collections

          Most cited references41

          • Record: found
          • Abstract: found
          • Article: not found

          Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology

          The American College of Medical Genetics and Genomics (ACMG) previously developed guidance for the interpretation of sequence variants. 1 In the past decade, sequencing technology has evolved rapidly with the advent of high-throughput next generation sequencing. By adopting and leveraging next generation sequencing, clinical laboratories are now performing an ever increasing catalogue of genetic testing spanning genotyping, single genes, gene panels, exomes, genomes, transcriptomes and epigenetic assays for genetic disorders. By virtue of increased complexity, this paradigm shift in genetic testing has been accompanied by new challenges in sequence interpretation. In this context, the ACMG convened a workgroup in 2013 comprised of representatives from the ACMG, the Association for Molecular Pathology (AMP) and the College of American Pathologists (CAP) to revisit and revise the standards and guidelines for the interpretation of sequence variants. The group consisted of clinical laboratory directors and clinicians. This report represents expert opinion of the workgroup with input from ACMG, AMP and CAP stakeholders. These recommendations primarily apply to the breadth of genetic tests used in clinical laboratories including genotyping, single genes, panels, exomes and genomes. This report recommends the use of specific standard terminology: ‘pathogenic’, ‘likely pathogenic’, ‘uncertain significance’, ‘likely benign’, and ‘benign’ to describe variants identified in Mendelian disorders. Moreover, this recommendation describes a process for classification of variants into these five categories based on criteria using typical types of variant evidence (e.g. population data, computational data, functional data, segregation data, etc.). Because of the increased complexity of analysis and interpretation of clinical genetic testing described in this report, the ACMG strongly recommends that clinical molecular genetic testing should be performed in a CLIA-approved laboratory with results interpreted by a board-certified clinical molecular geneticist or molecular genetic pathologist or equivalent.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The mutational constraint spectrum quantified from variation in 141,456 humans

            Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A method and server for predicting damaging missense mutations

              To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naïve Bayes classifier (Supplementary Methods). We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naïve Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging. Supplementary Material 1
                Bookmark

                Author and article information

                Contributors
                Journal
                Nature
                Nature
                Springer Science and Business Media LLC
                0028-0836
                1476-4687
                October 27 2021
                Article
                10.1038/s41586-021-04043-8
                34707284
                8195db57-967a-4578-a874-d05df4ff37cd
                © 2021

                https://www.springer.com/tdm

                https://www.springer.com/tdm

                History

                Comments

                Comment on this article