There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
Over the years since the genetic testing of
BRCA1 and
BRCA2 has been conducted for research and later introduced into clinical practice, a high
number of missense variants have been reported in the literature and deposited in
public databases. Polymorphism Phenotyping v2 (PolyPhen-2) and Sorting Intolerant
from Tolerant (SIFT) are two widely applied bioinformatics tools used to assess the
functional impacts of missense variants. A total of 2605
BRCA1 and 4763
BRCA2 variants from the ClinVar database were analysed with PolyPhen2 and SIFT. When SIFT
was evaluated alongside PolyPhen-2 HumDiv and HumVar, it had shown top performance
in terms of negative predictive value (NPV) (100%) and sensitivity (100%) for ClinVar
classified benign and pathogenic
BRCA1 variants. Both SIFT and PolyPhen-2 HumDiv achieved 100% NPV and 100% sensitivity
in prediction of pathogenicity of the
BRCA2 variants. Agreement was achieved in prediction outcomes from the three tested approaches
in 55.04% and 68.97% of the variants of unknown significance (VUS) for
BRCA1 and
BRCA2, respectively. The performances of PolyPhen-2 and SIFT in predicting functional impacts
varied across the two genes. Due to lack of high concordance in prediction outcomes
among the two tested algorithms, their usefulness in classifying the pathogenicity
of VUS identified through molecular testing of
BRCA1 and
BRCA2 is hence limited in the clinical setting.
To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naïve Bayes classifier (Supplementary Methods). We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naïve Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging. Supplementary Material 1
ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/) provides a freely available archive of reports of relationships among medically important variants and phenotypes. ClinVar accessions submissions reporting human variation, interpretations of the relationship of that variation to human health and the evidence supporting each interpretation. The database is tightly coupled with dbSNP and dbVar, which maintain information about the location of variation on human assemblies. ClinVar is also based on the phenotypic descriptions maintained in MedGen (http://www.ncbi.nlm.nih.gov/medgen). Each ClinVar record represents the submitter, the variation and the phenotype, i.e. the unit that is assigned an accession of the format SCV000000000.0. The submitter can update the submission at any time, in which case a new version is assigned. To facilitate evaluation of the medical importance of each variant, ClinVar aggregates submissions with the same variation/phenotype combination, adds value from other NCBI databases, assigns a distinct accession of the format RCV000000000.0 and reports if there are conflicting clinical interpretations. Data in ClinVar are available in multiple formats, including html, download as XML, VCF or tab-delimited subsets. Data from ClinVar are provided as annotation tracks on genomic RefSeqs and are used in tools such as Variation Reporter (http://www.ncbi.nlm.nih.gov/variation/tools/reporter), which reports what is known about variation based on user-supplied locations.
Genetic testing of cancer susceptibility genes is now widely applied in clinical practice to predict risk of developing cancer. In general, sequence-based testing of germline DNA is used to determine whether an individual carries a change that is clearly likely to disrupt normal gene function. Genetic testing may detect changes that are clearly pathogenic, clearly neutral, or variants of unclear clinical significance. Such variants present a considerable challenge to the diagnostic laboratory and the receiving clinician in terms of interpretation and clear presentation of the implications of the result to the patient. There does not appear to be a consistent approach to interpreting and reporting the clinical significance of variants either among genes or among laboratories. The potential for confusion among clinicians and patients is considerable and misinterpretation may lead to inappropriate clinical consequences. In this article we review the current state of sequence-based genetic testing, describe other standardized reporting systems used in oncology, and propose a standardized classification system for application to sequence-based results for cancer predisposition genes. We suggest a system of five classes of variants based on the degree of likelihood of pathogenicity. Each class is associated with specific recommendations for clinical management of at-risk relatives that will depend on the syndrome. We propose that panels of experts on each cancer predisposition syndrome facilitate the classification scheme and designate appropriate surveillance and cancer management guidelines. The international adoption of a standardized reporting system should improve the clinical utility of sequence-based genetic tests to predict cancer risk. (c) 2008 Wiley-Liss, Inc.
GRID grid.412106.0, ISNI 0000 0004 0621 9599, Department of Laboratory Medicine, , National University Hospital, ; NUH Main Building, 5 Lower Kent Ridge Road, Singapore, 119074 Singapore
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate if changes were
made. The images or other third party material in this article are included in the
article's Creative Commons licence, unless indicated otherwise in a credit line to
the material. If material is not included in the article's Creative Commons licence
and your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder. To view
a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/.
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.