5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identifying causal variants by fine mapping across multiple studies

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of “fine mapping” methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).

          Author summary

          Genome-Wide Association Studies (GWAS) have successfully identified numerous genetic variants associated with a variety of complex traits in humans. However, most variants that are associated with traits do not actually cause those traits, but rather are correlated with the truly causal variants through Linkage Disequilibrium (LD). This problem is addressed by so-called “fine mapping” methods, which attempt to prioritize putative causal variants for functional follow-up studies. In this work, we propose a new method, MsCAVIAR, which improves fine mapping performance by leveraging data from multiple studies, such as GWAS of the same trait using individuals with different ethnic backgrounds (“trans-ethnic fine mapping”), while taking into account the possibility that causal variants may affect the trait more or less strongly in different studies. We show in simulations that our method reduces the number of variants needed for functional follow-up testing versus other methods, and we also demonstrate the efficacy of MsCAVIAR in a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).

          Related collections

          Most cited references34

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A global reference for human genetic variation

          The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age

            Cathie Sudlow and colleagues describe the UK Biobank, a large population-based prospective study, established to allow investigation of the genetic and non-genetic determinants of the diseases of middle and old age.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              GCTA: a tool for genome-wide complex trait analysis.

              For most human complex diseases and traits, SNPs identified by genome-wide association studies (GWAS) explain only a small fraction of the heritability. Here we report a user-friendly software tool called genome-wide complex trait analysis (GCTA), which was developed based on a method we recently developed to address the "missing heritability" problem. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait rather than testing the association of any particular SNP to the trait. We introduce GCTA's five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. We focus on the function of estimating the variance explained by all the SNPs on the X chromosome and testing the hypotheses of dosage compensation. The GCTA software is a versatile tool to estimate and partition complex trait variation with large GWAS data sets.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Formal analysisRole: SoftwareRole: VisualizationRole: Writing – original draft
                Role: Formal analysisRole: Software
                Role: ConceptualizationRole: Methodology
                Role: ConceptualizationRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: SupervisionRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                PLoS Genetics
                Public Library of Science (San Francisco, CA USA )
                1553-7390
                1553-7404
                September 2021
                20 September 2021
                : 17
                : 9
                : e1009733
                Affiliations
                [1 ] Department of Computer Science, University of California, Los Angeles, California, United States
                [2 ] Bioinformatics Interdepartmental Program, University of California, Los Angeles, California, United States
                [3 ] Department of Mathematics, University of California, Los Angeles, California, United States
                [4 ] Department of Human Genetics, University of California, Los Angeles, California, United States
                [5 ] Department of Computational Medicine, University of California, Los Angeles, California, United States
                [6 ] Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States
                Helmholtz Zentrum München Deutsches Forschungszentrum für Umwelt und Gesundheit: Helmholtz Zentrum Munchen Deutsches Forschungszentrum fur Gesundheit und Umwelt, GERMANY
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0003-2394-8868
                https://orcid.org/0000-0002-1921-2113
                https://orcid.org/0000-0002-6308-346X
                https://orcid.org/0000-0001-8307-3958
                https://orcid.org/0000-0003-1149-4758
                Article
                PGENETICS-D-20-01186
                10.1371/journal.pgen.1009733
                8491908
                34543273
                3b6c2c22-3b72-49c3-9311-7277c4a384c2
                © 2021 LaPierre et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 31 August 2020
                : 21 July 2021
                Page count
                Figures: 4, Tables: 0, Pages: 19
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: DGE-1829071
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: T32 EB016640
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: 0513612
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: 0731455
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: 0729049
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: 0916676
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: 1065276
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: 1302448
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: 1320589
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: 1331176
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: K25-HL080079
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: U01-DA024417
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: P01-HL30568
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: P01-HL28481
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: R01-GM083198
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: R01-ES021801
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: R01-MH101782
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: R01-ES022282
                Award Recipient :
                NL would like to acknowledge the support of National Science Foundation grant DGE-1829071 and National Institute of Health grant T32 EB016640. EE is supported by National Science Foundation grants 0513612, 0731455, 0729049, 0916676, 1065276, 1302448, 1320589 and 1331176, and National Institutes of Health grants K25-HL080079, U01-DA024417, P01-HL30568, P01-HL28481, R01-GM083198, R01-ES021801, R01-MH101782, and R01-ES022282. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Genetics
                Single Nucleotide Polymorphisms
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genome-Wide Association Studies
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genome-Wide Association Studies
                Biology and Life Sciences
                Genetics
                Human Genetics
                Genome-Wide Association Studies
                Physical Sciences
                Mathematics
                Probability Theory
                Random Variables
                Covariance
                Biology and Life Sciences
                Genetics
                Genetic Loci
                Biology and Life Sciences
                Genetics
                Heredity
                Linkage Disequilibrium
                Biology and Life Sciences
                Biochemistry
                Proteins
                Lipoproteins
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Statistical Methods
                Metaanalysis
                Physical Sciences
                Mathematics
                Statistics
                Statistical Methods
                Metaanalysis
                Biology and Life Sciences
                Genetics
                Heredity
                Custom metadata
                vor-update-to-uncorrected-proof
                2021-10-05
                MsCAVIAR is free and open source, and the source code is available on GitHub: ( https://github.com/nlapier2/MsCAVIAR). Code and instructions to replicate our results are also available on GitHub: ( https://github.com/nlapier2/mscaviar_replication). The UK Biobank HDL Cholesterol dataset can be downloaded from https://broad-ukb-sumstats-us-east-1.s3.amazonaws.com/round2/additive-tsvs/30760_raw.gwas.imputed_v3.both_sexes.tsv.bgz. The Biobank Japan HDL Cholesterol dataset can be downloaded by accessing http://jenger.riken.jp/en/result and clicking the "Download" button next to "High-density-lipoprotein cholesterol (HDL-C) (autosome)". The 1000 Genomes data was downloaded by using the following script https://github.com/gkichaev/PAINTOR_V3.0/blob/master/PAINTOR_Utilities/CalcLD_1KG_VCF.py; instructions are available at https://github.com/gkichaev/PAINTOR_V3.0/wiki/2a.-Computing-1000-genomes-LD.

                Genetics
                Genetics

                Comments

                Comment on this article