71
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees

      research-article
      3 , 2 , 1 , 1 ,
      BMC Bioinformatics
      BioMed Central
      RECOMB-CG - 2017 : The Fifteenth RECOMB Comparative Genomics Satellite Conference (RECOMB-CG 2017)
      04-06 October 2017
      Phylogenomics, Incomplete lineage sorting, ASTRAL

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Evolutionary histories can be discordant across the genome, and such discordances need to be considered in reconstructing the species phylogeny. ASTRAL is one of the leading methods for inferring species trees from gene trees while accounting for gene tree discordance. ASTRAL uses dynamic programming to search for the tree that shares the maximum number of quartet topologies with input gene trees, restricting itself to a predefined set of bipartitions.

          Results

          We introduce ASTRAL-III, which substantially improves the running time of ASTRAL-II and guarantees polynomial running time as a function of both the number of species ( n) and the number of genes ( k). ASTRAL-III limits the bipartition constraint set ( X) to grow at most linearly with n and k. Moreover, it handles polytomies more efficiently than ASTRAL-II, exploits similarities between gene trees better, and uses several techniques to avoid searching parts of the search space that are mathematically guaranteed not to include the optimal tree. The asymptotic running time of ASTRAL-III in the presence of polytomies is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$O\left ((nk)^{1.726} D \right)$\end{document} where D= O( nk) is the sum of degrees of all unique nodes in input trees. The running time improvements enable us to test whether contracting low support branches in gene trees improves the accuracy by reducing noise. In extensive simulations, we show that removing branches with very low support (e.g., below 10%) improves accuracy while overly aggressive filtering is harmful. We observe on a biological avian phylogenomic dataset of 14K genes that contracting low support branches greatly improve results.

          Conclusions

          ASTRAL-III is a faster version of the ASTRAL method for phylogenetic reconstruction and can scale up to 10,000 species. With ASTRAL-III, low support branches can be removed, resulting in improved accuracy.

          Electronic supplementary material

          The online version of this article (10.1186/s12859-018-2129-y) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: not found
          • Article: not found

          Comparison of phylogenetic trees

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Gene Trees in Species Trees

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              ASTRAL: genome-scale coalescent-based species tree estimation

              Motivation: Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and species trees is incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent. While many methods have been developed to estimate species trees from multiple genes, some which have statistical guarantees under the multi-species coalescent model, existing methods are too computationally intensive for use with genome-scale analyses or have been shown to have poor accuracy under some realistic conditions. Results: We present ASTRAL, a fast method for estimating species trees from multiple genes. ASTRAL is statistically consistent, can run on datasets with thousands of genes and has outstanding accuracy—improving on MP-EST and the population tree from BUCKy, two statistically consistent leading coalescent-based methods. ASTRAL is often more accurate than concatenation using maximum likelihood, except when ILS levels are low or there are too few gene trees. Availability and implementation: ASTRAL is available in open source form at https://github.com/smirarab/ASTRAL/. Datasets studied in this article are available at http://www.cs.utexas.edu/users/phylo/datasets/astral. Contact: warnow@illinois.edu Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                chz069@ucsd.edu
                mrabieeh@ucsd.edu
                esayyari@ucsd.edu
                smirarab@ucsd.edu
                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                8 May 2018
                8 May 2018
                2018
                : 19
                Issue : Suppl 6 Issue sponsor : Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. JM is a co-author of one of the papers published in this supplement, review of his paper was organised by LN.
                : 153
                Affiliations
                [1 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Department of Electrical and Computer Engineering, , University of California at San Diego, ; 9500 Gilman Drive, La Jolla, 92093-0021 CA USA
                [2 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Department of Computer Science and Engineering, , University of California at San Diego, ; 9500 Gilman Drive, La Jolla, 92093-0021 CA USA
                [3 ]ISNI 0000 0001 2107 4242, GRID grid.266100.3, Bioinformatics and Systems Biology, University of California at San Diego, ; 9500 Gilman Drive, La Jolla, 92093-0021 CA USA
                Article
                2129
                10.1186/s12859-018-2129-y
                5998893
                29745866
                db760f77-ab59-492f-bc77-a015303421fb
                © The Author(s) 2018

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                RECOMB-CG - 2017 : The Fifteenth RECOMB Comparative Genomics Satellite Conference
                RECOMB-CG 2017
                Barcelona, Spain
                04-06 October 2017
                History
                Categories
                Research
                Custom metadata
                © The Author(s) 2018

                Bioinformatics & Computational biology
                phylogenomics,incomplete lineage sorting,astral
                Bioinformatics & Computational biology
                phylogenomics, incomplete lineage sorting, astral

                Comments

                Comment on this article