44
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning

      research-article
      1 , * , 1 , 2
      PLoS Genetics
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover, we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus, even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally, we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample, and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods.

          Author Summary

          The genetic basis of recent adaptation can be uncovered from genomic patterns of variation, which are perturbed in predictable ways when a beneficial mutation “sweeps” through a population. However, the detection of such “selective sweeps” is complicated by demographic events, such as population expansion, which can produce similar skews in genetic diversity. Here, we present a method for detecting selective sweeps that is remarkably powerful and robust to potentially confounding demographic histories. This method, called S/HIC, operates using a machine learning paradigm to combine many different features of population genetic variation, and examine their values across a large genomic region in order to infer whether a selective sweep has recently occurred near its center. S/HIC is also able to accurately distinguish between selection acting on de novo beneficial mutations (“hard sweeps”) and selection on previously standing variants (“soft sweeps”). We demonstrate S/HIC’s power on a variety of simulated datasets as well as human population data wherein we recover several previously discovered targets of recent adaptation as well as a novel selective sweep.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: found
          • Article: not found

          Evolutionary and biomedical insights from the rhesus macaque genome.

          The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The hitch-hiking effect of a favourable gene.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Genomic scans for selective sweeps using SNP data.

              Detecting selective sweeps from genomic SNP data is complicated by the intricate ascertainment schemes used to discover SNPs, and by the confounding influence of the underlying complex demographics and varying mutation and recombination rates. Current methods for detecting selective sweeps have little or no robustness to the demographic assumptions and varying recombination rates, and provide no method for correcting for ascertainment biases. Here, we present several new tests aimed at detecting selective sweeps from genomic SNP data. Using extensive simulations, we show that a new parametric test, based on composite likelihood, has a high power to detect selective sweeps and is surprisingly robust to assumptions regarding recombination rates and demography (i.e., has low Type I error). Our new test also provides estimates of the location of the selective sweep(s) and the magnitude of the selection coefficient. To illustrate the method, we apply our approach to data from the Seattle SNP project and to Chromosome 2 data from the HapMap project. In Chromosome 2, the most extreme signal is found in the lactase gene, which previously has been shown to be undergoing positive selection. Evidence for selective sweeps is also found in many other regions, including genes known to be associated with disease risk such as DPP10 and COL4A3.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, CA USA )
                1553-7390
                1553-7404
                15 March 2016
                March 2016
                : 12
                : 3
                : e1005928
                Affiliations
                [1 ]Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
                [2 ]Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, United States of America
                University of Wisconsin–Madison, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: DRS ADK. Analyzed the data: DRS. Wrote the paper: DRS ADK. Designed software: DRS.

                Article
                PGENETICS-D-15-02221
                10.1371/journal.pgen.1005928
                4792382
                26977894
                75b5940f-f29d-49bd-afae-b21625371026
                © 2016 Schrider, Kern

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 3 September 2015
                : 21 February 2016
                Page count
                Figures: 8, Tables: 0, Pages: 31
                Funding
                DRS was supported by the National Institutes of Health (NIH; http://www.nih.gov/) under Ruth L. Kirschstein National Research Service Award F32 GM105231. ADK was supported by National Science Foundation ( http://www.nsf.gov/) Award MCB-1161367, and by the National Institute of General Medical Sciences of the NIH ( http://www.nih.gov/) under award no. R01GM078204. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                People and Places
                Demography
                Biology and Life Sciences
                Evolutionary Biology
                Evolutionary Processes
                Natural Selection
                Biology and Life Sciences
                Evolutionary Biology
                Population Genetics
                Natural Selection
                Biology and Life Sciences
                Genetics
                Population Genetics
                Natural Selection
                Biology and Life Sciences
                Population Biology
                Population Genetics
                Natural Selection
                Biology and Life Sciences
                Computational Biology
                Genomics Statistics
                Biology and Life Sciences
                Genetics
                Genomics
                Genomics Statistics
                Biology and Life Sciences
                Evolutionary Biology
                Population Genetics
                Biology and Life Sciences
                Genetics
                Population Genetics
                Biology and Life Sciences
                Population Biology
                Population Genetics
                Biology and Life Sciences
                Population Biology
                Population Metrics
                Population Size
                Research and Analysis Methods
                Simulation and Modeling
                Engineering and Technology
                Management Engineering
                Decision Analysis
                Decision Trees
                Research and Analysis Methods
                Decision Analysis
                Decision Trees
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Artificial Intelligence
                Machine Learning
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Custom metadata
                The software described in this paper is available at https://github.com/kern-lab/. The data used in this paper are available from the 1000 Genomes Project and are available at http://www.1000genomes.org/

                Genetics
                Genetics

                Comments

                Comment on this article