3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Transcriptome prediction performance across machine learning models and diverse ancestries

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Summary

          Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.

          Related collections

          Most cited references58

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A global reference for human genetic variation

          The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Regression Shrinkage and Selection Via the Lasso

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Regularization Paths for Generalized Linear Models via Coordinate Descent

                Bookmark

                Author and article information

                Journal
                101772885
                50110
                HGG Adv
                HGG Adv
                HGG advances
                2666-2477
                21 April 2021
                5 January 2021
                8 April 2021
                30 April 2021
                : 2
                : 2
                : 100019
                Affiliations
                [1 ]Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA;
                [2 ]Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL, USA;
                [3 ]Institute for Translational Genomics and Population Sciences, The Lundquist Institute and Department of Pediatrics at Harbor-UCLA Medical Center, Torrance, CA, USA;
                [4 ]Department of Biostatistics, University of Washington, Seattle, WA, USA;
                [5 ]Fralin Life Sciences Institute, Virginia Tech, Blacksburg, VA, USA;
                [6 ]Department of Statistics, Virginia Tech, Blacksburg, VA, USA;
                [7 ]Wake Forest School of Medicine, Winston-Salem, NC, USA;
                [8 ]Department of Medicine, Duke University School of Medicine, Durham, NC, USA;
                [9 ]Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA;
                [10 ]Department of Public Health Sciences, Parkinson School of Health Sciences and Public Health, Loyola University Chicago, Maywood, IL, USA;
                [11 ]Department of Human Biology, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa;
                [12 ]Department of Biology, Loyola University Chicago, Chicago, IL, USA;
                [13 ]Department of Computer Science, Loyola University Chicago, Chicago, IL, USA
                Author notes
                [* ]Correspondence: hwheeler1@ 123456luc.edu
                Article
                NIHMS1691699
                10.1016/j.xhgg.2020.100019
                8087249
                33937878
                eb422d7a-0dfa-4342-a6af-5788ee8bbcaf

                This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/).

                History
                Categories
                Article

                Comments

                Comment on this article