0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement Learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The computational search for the maximum-likelihood phylogenetic tree is an NP-hard problem. As such, current tree search algorithms might result in a tree that is the local optima, not the global one. Here, we introduce a paradigm shift for predicting the maximum-likelihood tree, by approximating long-term gains of likelihood rather than maximizing likelihood gain at each step of the search. Our proposed approach harnesses the power of reinforcement learning to learn an optimal search strategy, aiming at the global optimum of the search space. We show that when analyzing empirical data containing dozens of sequences, the log-likelihood improvement from the starting tree obtained by the reinforcement learning–based agent was 0.969 or higher compared to that achieved by current state-of-the-art techniques. Notably, this performance is attained without the need to perform costly likelihood optimizations apart from the training process, thus potentially allowing for an exponential increase in runtime. We exemplify this for data sets containing 15 sequences of length 18,000 bp and demonstrate that the reinforcement learning–based method is roughly three times faster than the state-of-the-art software. This study illustrates the potential of reinforcement learning in addressing the challenges of phylogenetic tree reconstruction.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: found
          • Article: not found

          New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

          PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            PAML 4: phylogenetic analysis by maximum likelihood.

            PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML). The programs may be used to compare and test phylogenetic trees, but their main strengths lie in the rich repertoire of evolutionary models implemented, which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses. Uses of the programs include estimation of synonymous and nonsynonymous rates (d(N) and d(S)) between two protein-coding DNA sequences, inference of positive Darwinian selection through phylogenetic comparison of protein-coding genes, reconstruction of ancestral genes and proteins for molecular restoration studies of extinct life forms, combined analysis of heterogeneous data sets from multiple gene loci, and estimation of species divergence times incorporating uncertainties in fossil calibrations. This note discusses some of the major applications of the package, which includes example data sets to demonstrate their use. The package is written in ANSI C, and runs under Windows, Mac OSX, and UNIX systems. It is available at -- (http://abacus.gene.ucl.ac.uk/software/paml.html).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The neighbor-joining method: a new method for reconstructing phylogenetic trees.

              N Saitou, M Nei (1987)
              A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Mol Biol Evol
                Mol Biol Evol
                molbev
                Molecular Biology and Evolution
                Oxford University Press (UK )
                0737-4038
                1537-1719
                June 2024
                03 June 2024
                03 June 2024
                : 41
                : 6
                : msae105
                Affiliations
                School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel
                The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel
                Balvatnik School of Computer Science, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel
                The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel
                Balvatnik School of Computer Science, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel
                The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel
                School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel
                Author notes

                Dana Azouri, Oz Granit and Michael Alburquerque contributed equally.

                Author information
                https://orcid.org/0000-0003-0620-2626
                https://orcid.org/0009-0000-4696-3054
                https://orcid.org/0009-0007-7130-3489
                https://orcid.org/0000-0001-6891-2645
                https://orcid.org/0000-0001-9463-2575
                https://orcid.org/0000-0002-8460-1502
                Article
                msae105
                10.1093/molbev/msae105
                11180600
                38829798
                7f01dc89-b63f-49af-85f6-117d470780fb
                © The Author(s) 2024. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.

                History
                : 02 July 2023
                : 17 May 2024
                : 28 May 2024
                : 17 June 2024
                Page count
                Pages: 14
                Categories
                Methods
                AcademicSubjects/SCI01130
                AcademicSubjects/SCI01180

                Molecular biology
                phylogenetics,reinforcement learning,machine learning,artificial intelligence,evolution,molecular biology

                Comments

                Comment on this article