14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Evaluating the Accuracy of Imputation Methods in a Five-Way Admixed Population

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Genotype imputation is a powerful tool for increasing statistical power in an association analysis. Meta-analysis of multiple study datasets also requires a substantial overlap of SNPs for a successful association analysis, which can be achieved by imputation. Quality of imputed datasets is largely dependent on the software used, as well as the reference populations chosen. The accuracy of imputation of available reference populations has not been tested for the five-way admixed South African Colored (SAC) population. In this study, imputation results obtained using three freely-accessible methods were evaluated for accuracy and quality. We show that the African Genome Resource is the best reference panel for imputation of missing genotypes in samples from the SAC population, implemented via the freely accessible Sanger Imputation Server.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT)

          Motivation: Over the last few years, methods based on suffix arrays using the Burrows–Wheeler Transform have been widely used for DNA sequence read matching and assembly. These provide very fast search algorithms, linear in the search pattern size, on a highly compressible representation of the dataset being searched. Meanwhile, algorithmic development for genotype data has concentrated on statistical methods for phasing and imputation, based on probabilistic matching to hidden Markov model representations of the reference data, which while powerful are much less computationally efficient. Here a theory of haplotype matching using suffix array ideas is developed, which should scale too much larger datasets than those currently handled by genotype algorithms. Results: Given M sequences with N bi-allelic variable sites, an O(NM) algorithm to derive a representation of the data based on positional prefix arrays is given, which is termed the positional Burrows–Wheeler transform (PBWT). On large datasets this compresses with run-length encoding by more than a factor of a hundred smaller than using gzip on the raw data. Using this representation a method is given to find all maximal haplotype matches within the set in O(NM) time rather than O(NM 2) as expected from naive pairwise comparison, and also a fast algorithm, empirically independent of M given sufficient memory for indexes, to find maximal matches between a new sequence and the set. The discussion includes some proposals about how these approaches could be used for imputation and phasing. Availability: http://github.com/richarddurbin/pbwt Contact: richard.durbin@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Genotype-imputation accuracy across worldwide human populations.

            A current approach to mapping complex-disease-susceptibility loci in genome-wide association (GWA) studies involves leveraging the information in a reference database of dense genotype data. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and tested for disease association. This imputation strategy has been successful for GWA studies in populations well represented by existing reference panels. We used genotypes at 513,008 autosomal single-nucleotide polymorphism (SNP) loci in 443 unrelated individuals from 29 worldwide populations to evaluate the "portability" of the HapMap reference panels for imputation in studies of diverse populations. When a single HapMap panel was leveraged for imputation of randomly masked genotypes, European populations had the highest imputation accuracy, followed by populations from East Asia, Central and South Asia, the Americas, Oceania, the Middle East, and Africa. For each population, we identified "optimal" mixtures of reference panels that maximized imputation accuracy, and we found that in most populations, mixtures including individuals from at least two HapMap panels produced the highest imputation accuracy. From a separate survey of additional SNPs typed in the same samples, we evaluated imputation accuracy in the scenario in which all genotypes at a given SNP position were unobserved and were imputed on the basis of data from a commercial "SNP chip," again finding that most populations benefited from the use of combinations of two or more HapMap reference panels. Our results can serve as a guide for selecting appropriate reference panels for imputation-based GWA analysis in diverse populations.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Genome-wide association study of ancestry-specific TB risk in the South African Coloured population.

              The worldwide burden of tuberculosis (TB) remains an enormous problem, and is particularly severe in the admixed South African Coloured (SAC) population residing in the Western Cape. Despite evidence from twin studies suggesting a strong genetic component to TB resistance, only a few loci have been identified to date. In this work, we conduct a genome-wide association study (GWAS), meta-analysis and trans-ethnic fine mapping to attempt the replication of previously identified TB susceptibility loci. Our GWAS results confirm the WT1 chr11 susceptibility locus (rs2057178: odds ratio = 0.62, P = 2.71e(-06)) previously identified by Thye et al., but fail to replicate previously identified polymorphisms in the TLR8 gene and locus 18q11.2. Our study demonstrates that the genetic contribution to TB risk varies between continental populations, and illustrates the value of including admixed populations in studies of TB risk and other complex phenotypes. Our evaluation of local ancestry based on the real and simulated data demonstrates that case-only admixture mapping is currently impractical in multi-way admixed populations, such as the SAC, due to spurious deviations in average local ancestry generated by current local ancestry inference methods. This study provides insights into identifying disease genes and ancestry-specific disease risk in multi-way admixed populations.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Genet
                Front Genet
                Front. Genet.
                Frontiers in Genetics
                Frontiers Media S.A.
                1664-8021
                05 February 2019
                2019
                : 10
                : 34
                Affiliations
                [1] 1DST-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town, South Africa
                [2] 2South African Tuberculosis Bioinformatics Initiative (SATBBI), Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town, South Africa
                Author notes

                Edited by: Nicola Mulder, University of Cape Town, South Africa

                Reviewed by: Peristera Paschou, Purdue University, United States; Pablo Orozco-terWengel, Cardiff University, United Kingdom

                *Correspondence: Haiko Schurz, haiko@ 123456sun.ac.za

                Co-first authors

                Co-senior authors

                This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

                Article
                10.3389/fgene.2019.00034
                6370942
                30804980
                6f737843-3b0e-40cc-a76e-1dee3796422c
                Copyright © 2019 Schurz, Müller, van Helden, Tromp, Hoal, Kinnear and Möller.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 27 September 2018
                : 17 January 2019
                Page count
                Figures: 2, Tables: 4, Equations: 0, References: 35, Pages: 9, Words: 0
                Funding
                Funded by: South African Medical Research Council 10.13039/501100001322
                Funded by: National Research Foundation 10.13039/501100001321
                Categories
                Genetics
                Methods

                Genetics
                imputation,accuracy,quality,admixture,1000 genomes,african,caapa,agr
                Genetics
                imputation, accuracy, quality, admixture, 1000 genomes, african, caapa, agr

                Comments

                Comment on this article