42
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Modeling of African population history using f-statistics can be highly biased and is not addressed by previously suggested SNP ascertainment schemes

      Preprint
      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          f-statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. These statistics can provide strong evidence for either admixture or cladality, which can be robust to substantial rates of errors or missing data. f-statistics are guaranteed to be unbiased under “SNP ascertainment” (analyzing non-randomly chosen subsets of single nucleotide polymorphisms) only if it relies on a population that is an outgroup for all groups analyzed. However, ascertainment on a true outgroup that is not co-analyzed with other populations is often impractical and uncommon in the literature. In this study focused on practical rather than theoretical aspects of SNP ascertainment, we show that many non-outgroup ascertainment schemes lead to false rejection of true demographic histories, as well as to failure to reject incorrect models. But the bias introduced by common ascertainments such as the 1240K panel is mostly limited to situations when more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans) or non-human outgroups are co-modelled, for example, f 4 -statistics involving one non-African group, two African groups, and one archaic group. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, cannot fix all these problems since for some classes of f-statistics it is not a clean outgroup ascertainment, and in other cases it demonstrates relatively low power to reject incorrect demographic models since it provides a relatively small number of variants common in anatomically modern humans. And due to the paucity of high-coverage archaic genomes, archaic individuals used for ascertainment often act as sole representatives of the respective groups in an analysis, and we show that this approach is highly problematic. By carrying out large numbers of simulations of diverse demographic histories, we find that bias in inferences based on f-statistics introduced by non-outgroup ascertainment can be minimized if the derived allele frequency spectrum in the population used for ascertainment approaches the spectrum that existed at the root of all groups being co-analyzed. Ascertaining on sites with variants common in a diverse group of African individuals provides a good approximation to such a set of SNPs, addressing the great majority of biases and also retaining high statistical power for studying population history. Such a “pan-African” ascertainment, although not completely problem-free, allows unbiased exploration of demographic models for the widest set of archaic and modern human populations, as compared to the other ascertainment schemes we explored.

          Related collections

          Most cited references62

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Second-generation PLINK: rising to the challenge of larger and richer datasets

          PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information. The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The complete genome sequence of a Neandertal from the Altai Mountains

            We present a high-quality genome sequence of a Neandertal woman from Siberia. We show that her parents were related at the level of half siblings and that mating among close relatives was common among her recent ancestors. We also sequenced the genome of a Neandertal from the Caucasus to low coverage. An analysis of the relationships and population history of available archaic genomes and 25 present-day human genomes shows that several gene flow events occurred among Neandertals, Denisovans and early modern humans, possibly including gene flow into Denisovans from an unknown archaic group. Thus, interbreeding, albeit of low magnitude, occurred among many hominin groups in the Late Pleistocene. In addition, the high quality Neandertal genome allows us to establish a definitive list of substitutions that became fixed in modern humans after their separation from the ancestors of Neandertals and Denisovans.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Ancient admixture in human history.

              Population mixture is an important process in biology. We present a suite of methods for learning about population mixtures, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture. We also describe the development of a new single nucleotide polymorphism (SNP) array consisting of 629,433 sites with clearly documented ascertainment that was specifically designed for population genetic analyses and that we genotyped in 934 individuals from 53 diverse populations. To illustrate the methods, we give a number of examples that provide new insights about the history of human admixture. The most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present-day Basques and Sardinians and the other related to present-day populations of northeast Asia and the Americas. This likely reflects a history of admixture between Neolithic migrants and the indigenous Mesolithic population of Europe, consistent with recent analyses of ancient bones from Sweden and the sequencing of the genome of the Tyrolean "Iceman."
                Bookmark

                Author and article information

                Journal
                bioRxiv
                BIORXIV
                bioRxiv
                Cold Spring Harbor Laboratory
                22 January 2023
                : 2023.01.22.525077
                Affiliations
                [1 ]Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
                [2 ]Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
                [3 ]Kalmyk Research Center of the Russian Academy of Sciences, Elista, Russia
                [4 ]Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
                [5 ]Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
                [6 ]Broad Institute of Harvard and MIT, Cambridge, MA, USA
                Author notes
                [†]

                present addresses: E.Y., Department of Biological Sciences, Middle East Technical U., Ankara, Turkey; U.I., Leibniz Institute on Aging - Fritz Lipmann Institute (FLI), Jena, Germany.

                [*]

                authors contributed equally

                Article
                10.1101/2023.01.22.525077
                9882349
                36711923
                5f762b5a-5f1a-406f-a10d-c78193775481

                This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.

                History
                Categories
                Article

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content36

                Most referenced authors1,980