3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      FunSPU: A versatile and adaptive multiple functional annotation-based association test of whole-genome sequencing data

      research-article
      1 , 2 , 1 , *
      PLoS Genetics
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Despite ongoing large-scale population-based whole-genome sequencing (WGS) projects such as the NIH NHLBI TOPMed program and the NHGRI Genome Sequencing Program, WGS-based association analysis of complex traits remains a tremendous challenge due to the large number of rare variants, many of which are non-trait-associated neutral variants. External biological knowledge, such as functional annotations based on the ENCODE, Epigenomics Roadmap and GTEx projects, may be helpful in distinguishing causal rare variants from neutral ones; however, each functional annotation can only provide certain aspects of the biological functions. Our knowledge for selecting informative annotations a priori is limited, and incorporating non-informative annotations will introduce noise and lose power. We propose FunSPU, a versatile and adaptive test that incorporates multiple biological annotations and is adaptive at both the annotation and variant levels and thus maintains high power even in the presence of noninformative annotations. In addition to extensive simulations, we illustrate our proposed test using the TWINSUK cohort (n = 1,752) of UK10K WGS data based on six functional annotations: CADD, RegulomeDB, FunSeq, Funseq2, GERP++, and GenoSkyline. We identified genome-wide significant genetic loci on chromosome 19 near gene TOMM40 and APOC4-APOC2 associated with low-density lipoprotein (LDL), which are replicated in the UK10K ALSPAC cohort (n = 1,497). These replicated LDL-associated loci were missed by existing rare variant association tests that either ignore external biological information or rely on a single source of biological knowledge. We have implemented the proposed test in an R package “FunSPU”.

          Author summary

          In recent years, large-scale whole-genome sequencing (WGS) data have been generated, such as those in the UK10K project and the ongoing NIH Trans-Omics for Precision Medicine (TOPMed) WGS program, providing unprecedented opportunities to investigate low-frequency variants and rare variants in association with complex diseases and traits. However, WGS-based association analysis of complex traits remains a tremendous challenge due to the large number of rare variants, many of which are non-trait-associated neutral variants. External biological knowledge, such as functional annotations based on the ENCODE, Epigenomics Roadmap and GTEx projects, can be helpful in distinguishing causal rare variants from neutral ones; however, each functional annotation can only provide certain aspects of the biological functions. To this end, we have proposed a versatile and adaptive association test, FunSPU, to exploit multiple sources of biological knowledge in the analysis of WGS data. We illustrate our proposed test using the TWINSUK cohort of UK10K WGS data based on six functional annotations. We identified genome-wide significant genetic loci associated with low-density lipoprotein, which are replicated in the UK10K ALSPAC cohort. These replicated loci were missed by existing rare variant association tests that either ignore external biological information or rely on a single source of biological knowledge.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: not found

          Newly identified loci that influence lipid concentrations and risk of coronary artery disease.

          To identify genetic variants influencing plasma lipid concentrations, we first used genotype imputation and meta-analysis to combine three genome-wide scans totaling 8,816 individuals and comprising 6,068 individuals specific to our study (1,874 individuals from the FUSION study of type 2 diabetes and 4,184 individuals from the SardiNIA study of aging-associated variables) and 2,758 individuals from the Diabetes Genetics Initiative, reported in a companion study in this issue. We subsequently examined promising signals in 11,569 additional individuals. Overall, we identify strongly associated variants in eleven loci previously implicated in lipid metabolism (ABCA1, the APOA5-APOA4-APOC3-APOA1 and APOE-APOC clusters, APOB, CETP, GCKR, LDLR, LPL, LIPC, LIPG and PCSK9) and also in several newly identified loci (near MVK-MMAB and GALNT2, with variants primarily associated with high-density lipoprotein (HDL) cholesterol; near SORT1, with variants primarily associated with low-density lipoprotein (LDL) cholesterol; near TRIB1, MLXIPL and ANGPTL3, with variants primarily associated with triglycerides; and a locus encompassing several genes near NCAN, with variants strongly associated with both triglycerides and LDL cholesterol). Notably, the 11 independent variants associated with increased LDL cholesterol concentrations in our study also showed increased frequency in a sample of coronary artery disease cases versus controls.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS

            Over the past few years, substantial effort has been put into the functional annotation of variation in human genome sequence. Such annotations can play a critical role in identifying putatively causal variants among the abundant natural variation that occurs at a locus of interest. The main challenges in using these various annotations include their large numbers, and their diversity. Here we develop an unsupervised approach to integrate these different annotations into one measure of functional importance (Eigen), that, unlike most existing methods, is not based on any labeled training data. We show that the resulting meta-score has better discriminatory ability using disease associated and putatively benign variants from published studies (in both coding and noncoding regions) compared with the recently proposed CADD score. Across varied scenarios, the Eigen score performs generally better than any single individual annotation, representing a powerful single functional score that can be incorporated in fine-mapping studies.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Exome-wide association study of plasma lipids in >300,000 individuals

              We screened DNA sequence variants on an exome-focused genotyping array in >300,000 participants with replication in >280,000 participants and identified 444 independent variants in 250 loci significantly associated with total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and/or triglycerides (TG). At two loci (JAK2 and A1CF), experimental analysis in mice revealed lipid changes consistent with the human data. We utilized mapped variants to address four clinically relevant questions and found the following: (1) beta-thalassemia trait carriers displayed lower TC and were protected from coronary artery disease; (2) outside of the CETP locus, there was not a predictable relationship between plasma HDL-C and risk for age-related macular degeneration; (3) only some mechanisms of lowering LDL-C seemed to increase risk for type 2 diabetes; and (4) TG-lowering alleles involved in hepatic production of TG-rich lipoproteins (e.g., TM6SF2, PNPLA3) tracked with higher liver fat, higher risk for type 2 diabetes, and lower risk for coronary artery disease whereas TG-lowering alleles involved in peripheral lipolysis (e.g., LPL, ANGPTL4) had no effect on liver fat but lowered risks for both type 2 diabetes and coronary artery disease.
                Bookmark

                Author and article information

                Contributors
                Role: Formal analysisRole: MethodologyRole: SoftwareRole: Writing – original draft
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: SupervisionRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, CA USA )
                1553-7390
                1553-7404
                29 April 2019
                April 2019
                : 15
                : 4
                : e1008081
                Affiliations
                [1 ] Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
                [2 ] Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center, Houston, Texas, United States of America
                Case Western Reserve University, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0002-6939-0794
                http://orcid.org/0000-0001-7758-6116
                Article
                PGENETICS-D-18-01379
                10.1371/journal.pgen.1008081
                6508749
                31034468
                a82d1427-c4bf-4b84-82fe-38109fd05ccd
                © 2019 Ma, Wei

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 6 July 2018
                : 11 March 2019
                Page count
                Figures: 4, Tables: 2, Pages: 21
                Funding
                Funded by: National Heart, Lung, and Blood Institute (US)
                Award ID: R01HL116720
                Award Recipient :
                Funded by: National Cancer Institute (US)
                Award ID: R01CA169122
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000050, National Heart, Lung, and Blood Institute;
                Award ID: R21HL126032
                Award Recipient :
                This research was supported by National Institutes of Health grants R01HL116720 (to PW), R01CA169122 (to PW) and R21HL126032 (to PW). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Funding for UK10K was provided by the Wellcome Trust under award WT091310.
                Categories
                Research Article
                Biology and Life Sciences
                Genetics
                Heredity
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genome Annotation
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genome Annotation
                Medicine and Health Sciences
                Pharmacology
                Drug Research and Development
                Drug Design
                Computer-Aided Drug Design
                Biology and Life Sciences
                Genetics
                Genetic Loci
                Physical Sciences
                Mathematics
                Discrete Mathematics
                Combinatorics
                Permutation
                Biology and Life Sciences
                Genetics
                Genomics
                Functional Genomics
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Statistical Methods
                Test Statistics
                Physical Sciences
                Mathematics
                Statistics
                Statistical Methods
                Test Statistics
                Research and Analysis Methods
                Research Assessment
                Research Errors
                Custom metadata
                vor-update-to-uncorrected-proof
                2019-05-09
                This study makes use of data generated by the UK10K Consortium, derived from samples from the TwinsUK and ALSPAC cohorts. A full list of the investigators who contributed to the generation of the data is available from www.UK10K.org. Data are available from UK10K Data Access Committee for researchers who meet the criteria for access to confidential data.

                Genetics
                Genetics

                Comments

                Comment on this article