124
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      MAGMA: Generalized Gene-Set Analysis of GWAS Data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well.

          Author Summary

          Gene and gene-set analysis are statistical methods for analysing multiple genetic markers simultaneously to determine their joint effect. These methods can be used when the effects of individual markers is too weak to detect, which is a common problem when studying polygenic traits. Moreover, gene-set analysis can provide additional insight into functional and biological mechanisms underlying the genetic component of a trait. Although a number of methods for gene and gene-set analysis are available however, they generally suffer from various statistical issues and can be very time-consuming to run. We have therefore developed a new method called MAGMA to address these issues, and have compared it to a number of existing tools. Our results show that MAGMA detects more associated genes and gene-sets than other methods, and is also considerably faster. The way the method is set up also makes it highly flexible. This makes it suitable as a basis for more general statistical analyses aimed at investigating more complex research questions.

          Related collections

          Most cited references1

          • Record: found
          • Abstract: found
          • Article: not found

          Permutation-based approaches do not adequately allow for linkage disequilibrium in gene-wide multi-locus association analysis.

          Additional information about risk genes or risk pathways for diseases can be extracted from genome-wide association studies through analyses of groups of markers. The most commonly employed approaches involve combining individual marker data by adding the test statistics, or summing the logarithms of their P-values, and then using permutation testing to derive empirical P-values that allow for the statistical dependence of single-marker tests arising from linkage disequilibrium (LD). In the present study, we use simulated data to show that these approaches fail to reflect the structure of the sampling error, and the effect of this is to give undue weight to correlated markers. We show that the results obtained are internally inconsistent in the presence of strong LD, and are externally inconsistent with the results derived from multi-locus analysis. We also show that the results obtained from regression and multivariate Hotelling T(2) (H-T2) testing, but not those obtained from permutations, are consistent with the theoretically expected distributions, and that the H-T2 test has greater power to detect gene-wide associations in real datasets. Finally, we show that while the results from permutation testing can be made to approximate those from regression and multivariate Hotelling T(2) testing through aggressive LD pruning of markers, this comes at the cost of loss of information. We conclude that when conducting multi-locus analyses of sets of single-nucleotide polymorphisms, regression or multivariate Hotelling T(2) testing, which give equivalent results, are preferable to the other more commonly applied approaches.
            Bookmark

            Author and article information

            Contributors
            Role: Editor
            Journal
            PLoS Comput Biol
            PLoS Comput. Biol
            plos
            ploscomp
            PLoS Computational Biology
            Public Library of Science (San Francisco, CA USA )
            1553-734X
            1553-7358
            17 April 2015
            April 2015
            : 11
            : 4
            : e1004219
            Affiliations
            [1 ]Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, The Netherlands
            [2 ]Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
            [3 ]Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
            [4 ]Department of Clinical Genetics, VU University Medical Centre Amsterdam, Neuroscience Campus Amsterdam, The Netherlands
            Stanford University, UNITED STATES
            Author notes

            The authors have declared that no competing interests exist.

            Conceived and designed the experiments: CAdL JMM TH DP. Performed the experiments: CAdL. Analyzed the data: CAdL. Wrote the paper: CAdL JMM TH DP.

            Article
            PCOMPBIOL-D-14-01949
            10.1371/journal.pcbi.1004219
            4401657
            25885710
            e9b5ae28-2e85-4032-8d41-270b8d2116aa
            Copyright @ 2015

            This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

            History
            : 24 October 2014
            : 2 March 2015
            Page count
            Figures: 4, Tables: 5, Pages: 19
            Funding
            This study was conducted as part of the Complexity project of the Netherlands Scientific Organisation ( www.nwo.nl), grant NWO 645-000-003 (DP, TH). Statistical analyses were carried out on the Genetic Cluster Computer ( http://www.geneticcluster.org) funded by the Netherlands Scientific Organisation, grant NWO 480-05-003 (DP). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
            Categories
            Research Article
            Custom metadata
            MAGMA software and auxiliary files can be downloaded from http://ctglab.nl/software/magma. Data used in this study can be obtained via the following URLs: WTCCC Crohn’s Disease GWAS data: http://www.wtccc.org.uk MSigDB Canonical pathways: http://www.broadinstitute.org/gsea/msigdb HapMap 3 data: http://hapmap.ncbi.nlm.nih.gov 1,000 Genomes data: http://www.1000genomes.org

            Quantitative & Systems biology
            Quantitative & Systems biology

            Comments

            Comment on this article