28
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes

      research-article
      1 , 2 , 3 , 11 , 13 , , 1 , 2 , 11 , 1 , 2 , 11 , 1 , 2 , 1 , 2 , 1 , 2 , 3 , 4 , 5 , 6 , 4 , 4 , 4 , 4 , 4 , 4 , 4 , 5 , 5 , 6 , 6 , 1 , 2 , 3 , 1 , 2 , 3 , 1 , 2 , 3 , 1 , 2 , 3 , 1 , 2 , 3 , 1 , 2 , 3 , 1 , 2 , 3 , 1 , 2 , 1 , 2 , 1 , 2 , 1 , 2 , 7 , 7 , 7 , 7 , 7 , 1 , 2 , 9 , 10 , 1 , 2 , 1 , 2 , 3 , 7 , 1 , 2 , 3 , 8 , 4 , 12 , 5 , 12 , 6 , 12 , 1 , 2 , 3 , ∗∗
      Cell Genomics
      Elsevier
      rare variant association studies, rare variants, exome sequencing, GWAS, biobanks, PheWAS

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Summary

          Genome-wide association studies have successfully discovered thousands of common variants associated with human diseases and traits, but the landscape of rare variations in human disease has not been explored at scale. Exome-sequencing studies of population biobanks provide an opportunity to systematically evaluate the impact of rare coding variations across a wide range of phenotypes to discover genes and allelic series relevant to human health and disease. Here, we present results from systematic association analyses of 4,529 phenotypes using single-variant and gene tests of 394,841 individuals in the UK Biobank with exome-sequence data. We find that the discovery of genetic associations is tightly linked to frequency and is correlated with metrics of deleteriousness and natural selection. We highlight biological findings elucidated by these data and release the dataset as a public resource alongside the Genebass browser for rapidly exploring rare-variant association results.

          Graphical abstract

          Highlights

          • Public release of gene-based association statistics for 4,529 diseases and traits

          • Genebass, a browser framework to display rare-variant associations

          • Tight coupling between frequency, natural selection, and power for genetic discovery

          • Biological signal between SCRIB and white-matter integrity (from MRI)

          Abstract

          Karczewski et al. generated a massive-scale association dataset between rare genetic mutations and thousands of diseases and traits and released these data in the Genebass browser. They quantify the influence of natural selection and gene function on association discovery and highlight an association between SCRIB and a brain-imaging trait.

          Related collections

          Most cited references48

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The mutational constraint spectrum quantified from variation in 141,456 humans

          Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The UK Biobank resource with deep phenotyping and genomic data

            The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A method and server for predicting damaging missense mutations

              To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naïve Bayes classifier (Supplementary Methods). We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naïve Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging. Supplementary Material 1
                Bookmark

                Author and article information

                Contributors
                Journal
                Cell Genom
                Cell Genom
                Cell Genomics
                Elsevier
                2666-979X
                15 August 2022
                14 September 2022
                15 August 2022
                : 2
                : 9
                : 100168
                Affiliations
                [1 ]Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
                [2 ]Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
                [3 ]Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
                [4 ]Genomics Research Center, AbbVie, North Chicago, IL 60064, USA
                [5 ]Biogen, Inc., Cambridge, MA 02142, USA
                [6 ]Worldwide Research Development and Medical, Pfizer, Inc., Cambridge, MA 02139, USA
                [7 ]Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
                [8 ]Institute for Molecular Medicine Finland, Helsinki, Finland
                Author notes
                []Corresponding author konradk@ 123456broadinstitute.org
                [∗∗ ]Corresponding author bneale@ 123456broadinstitute.org
                [9]

                Present address: Center for Population Genomics, Garvan Institute of Medical Research and UNSW, Sydney, NSW, Australia

                [10]

                Present address: Murdoch Children’s Research Institute, Parkville, VIC, Australia

                [11]

                These authors contributed equally

                [12]

                These authors contributed equally

                [13]

                Lead contact

                Article
                S2666-979X(22)00110-0 100168
                10.1016/j.xgen.2022.100168
                9903662
                36778668
                b931aa41-154d-45eb-a2f2-8bf31c665760
                © 2022 The Author(s)

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

                History
                : 8 November 2021
                : 20 March 2022
                : 16 July 2022
                Categories
                Resource

                rare variant association studies,rare variants,exome sequencing,gwas,biobanks,phewas

                Comments

                Comment on this article