19
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      So many genes, so little time: A practical approach to divergence-time estimation in the genomic era

      research-article
      1 , * , 2 , 1
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Phylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. “Gene shopping”, wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that “gene shopping” can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity.

          Related collections

          Most cited references43

          • Record: found
          • Abstract: found
          • Article: not found

          The fossilized birth-death process for coherent calibration of divergence-time estimates.

          Time-calibrated species phylogenies are critical for addressing a wide range of questions in evolutionary biology, such as those that elucidate historical biogeography or uncover patterns of coevolution and diversification. Because molecular sequence data are not informative on absolute time, external data--most commonly, fossil age estimates--are required to calibrate estimates of species divergence dates. For Bayesian divergence time methods, the common practice for calibration using fossil information involves placing arbitrarily chosen parametric distributions on internal nodes, often disregarding most of the information in the fossil record. We introduce the "fossilized birth-death" (FBD) process--a model for calibrating divergence time estimates in a Bayesian framework, explicitly acknowledging that extant species and fossils are part of the same macroevolutionary process. Under this model, absolute node age estimates are calibrated by a single diversification model and arbitrary calibration densities are not necessary. Moreover, the FBD model allows for inclusion of all available fossils. We performed analyses of simulated data and show that node age estimation under the FBD model results in robust and accurate estimates of species divergence times with realistic measures of statistical uncertainty, overcoming major limitations of standard divergence time estimation methods. We used this model to estimate the speciation times for a dataset composed of all living bears, indicating that the genus Ursus diversified in the Late Miocene to Middle Pliocene.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            treePL: divergence time estimation using penalized likelihood for large phylogenies.

            Ever larger phylogenies are being constructed due to the explosion of genetic data and development of high-performance phylogenetic reconstruction algorithms. However, most methods for calculating divergence times are limited to datasets that are orders of magnitude smaller than recently published large phylogenies. Here, we present an algorithm and implementation of a divergence time method using penalized likelihood that can handle datasets of thousands of taxa. We implement a method that combines the standard derivative-based optimization with a stochastic simulated annealing approach to overcome optimization challenges. We compare this approach with existing software including r8s, PATHd8 and BEAST. Source code, example files, binaries and documentation for treePL are available at https://github.com/blackrim/treePL.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach.

              Rates of molecular evolution vary widely between lineages, but quantification of how rates change has proven difficult. Recently proposed estimation procedures have mainly adopted highly parametric approaches that model rate evolution explicitly. In this study, a semiparametric smoothing method is developed using penalized likelihood. A saturated model in which every lineage has a separate rate is combined with a roughness penalty that discourages rates from varying too much across a phylogeny. A data-driven cross-validation criterion is then used to determine an optimal level of smoothing. This criterion is based on an estimate of the average prediction error associated with pruning lineages from the tree. The methods are applied to three data sets of six genes across a sample of land plants. Optimally smoothed estimates of absolute rates entailed 2- to 10-fold variation across lineages.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: Project administrationRole: SoftwareRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: InvestigationRole: MethodologyRole: ResourcesRole: SoftwareRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2018
                17 May 2018
                : 13
                : 5
                : e0197433
                Affiliations
                [1 ] Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
                [2 ] Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
                Laboratoire Arago, FRANCE
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0003-2035-9531
                Article
                PONE-D-17-37783
                10.1371/journal.pone.0197433
                5957400
                29772020
                9605ab2f-116c-43df-b37f-0211385258f8
                © 2018 Smith et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 23 October 2017
                : 2 May 2018
                Page count
                Figures: 6, Tables: 5, Pages: 18
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000155, Division of Environmental Biology;
                Award ID: 1354048
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000155, Division of Environmental Biology;
                Award ID: 1207915
                Award Recipient :
                JFW and SAS were supported by NSF 1354048. JWB and SAS were supported by NSF 1207915.
                Categories
                Research Article
                Biology and Life Sciences
                Evolutionary Biology
                Evolutionary Systematics
                Phylogenetics
                Phylogenetic Analysis
                Biology and Life Sciences
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Phylogenetic Analysis
                Computer and Information Sciences
                Data Management
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Phylogenetic Analysis
                Medicine and Health Sciences
                Infectious Diseases
                Bacterial Diseases
                Caries
                Biology and Life Sciences
                Evolutionary Biology
                Evolutionary Systematics
                Phylogenetics
                Animal Phylogenetics
                Biology and Life Sciences
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Animal Phylogenetics
                Computer and Information Sciences
                Data Management
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Animal Phylogenetics
                Biology and Life Sciences
                Zoology
                Animal Phylogenetics
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Biology and Life Sciences
                Genetics
                Genomics
                Animal Genomics
                Bird Genomics
                Biology and Life Sciences
                Computational Biology
                Genome Evolution
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Evolution
                Biology and Life Sciences
                Evolutionary Biology
                Molecular Evolution
                Genome Evolution
                Biology and Life Sciences
                Evolutionary Biology
                Evolutionary Processes
                Evolutionary Rate
                Biology and Life Sciences
                Evolutionary Biology
                Evolutionary Systematics
                Phylogenetics
                Biology and Life Sciences
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Computer and Information Sciences
                Data Management
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Custom metadata
                While the data used here was all public, we conducted some orthology analyses, the results of which are available on GitHub at https://github.com/FePhyFoFum/SDMIData. Associated scripts related to the method are available on GitHub at https://github.com/FePhyFoFum/sortadate.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article