The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement Learning

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The computational search for the maximum-likelihood phylogenetic tree is an NP-hard problem. As such, current tree search algorithms might result in a tree that is the local optima, not the global one. Here, we introduce a paradigm shift for predicting the maximum-likelihood tree, by approximating long-term gains of likelihood rather than maximizing likelihood gain at each step of the search. Our proposed approach harnesses the power of reinforcement learning to learn an optimal search strategy, aiming at the global optimum of the search space. We show that when analyzing empirical data containing dozens of sequences, the log-likelihood improvement from the starting tree obtained by the reinforcement learning–based agent was 0.969 or higher compared to that achieved by current state-of-the-art techniques. Notably, this performance is attained without the need to perform costly likelihood optimizations apart from the training process, thus potentially allowing for an exponential increase in runtime. We exemplify this for data sets containing 15 sequences of length 18,000 bp and demonstrate that the reinforcement learning–based method is roughly three times faster than the state-of-the-art software. This study illustrates the potential of reinforcement learning in addressing the challenges of phylogenetic tree reconstruction.

Related collections

Most cited references 40

Record: found
Abstract: found
Article: not found

New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

Stéphane Guindon, Jean-François Dufayard, Vincent Lefort … (2010)

PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.

0 comments Cited 3035 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

PAML 4: phylogenetic analysis by maximum likelihood.

Ziheng Yang (2007)

PAML, currently in version 4, is a package of programs for phylogenetic analyses of DNA and protein sequences using maximum likelihood (ML). The programs may be used to compare and test phylogenetic trees, but their main strengths lie in the rich repertoire of evolutionary models implemented, which can be used to estimate parameters in models of sequence evolution and to test interesting biological hypotheses. Uses of the programs include estimation of synonymous and nonsynonymous rates (d(N) and d(S)) between two protein-coding DNA sequences, inference of positive Darwinian selection through phylogenetic comparison of protein-coding genes, reconstruction of ancestral genes and proteins for molecular restoration studies of extinct life forms, combined analysis of heterogeneous data sets from multiple gene loci, and estimation of species divergence times incorporating uncertainties in fossil calibrations. This note discusses some of the major applications of the package, which includes example data sets to demonstrate their use. The package is written in ANSI C, and runs under Windows, Mac OSX, and UNIX systems. It is available at -- (http://abacus.gene.ucl.ac.uk/software/paml.html).

0 comments Cited 2573 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The neighbor-joining method: a new method for reconstructing phylogenetic trees.

N Saitou, M Nei (1987)

A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.

0 comments Cited 2439 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Dana Azouri:

ORCID: https://orcid.org/0000-0003-0620-2626

Oz Granit:

ORCID: https://orcid.org/0009-0000-4696-3054

Michael Alburquerque:

ORCID: https://orcid.org/0009-0007-7130-3489

Yishay Mansour:

ORCID: https://orcid.org/0000-0001-6891-2645

Tal Pupko:

ORCID: https://orcid.org/0000-0001-9463-2575

Itay Mayrose:

ORCID: https://orcid.org/0000-0002-8460-1502

Andrey Rzhetsky: Role: Associate Editor

Journal

Journal ID (nlm-ta): Mol Biol Evol

Journal ID (iso-abbrev): Mol Biol Evol

Journal ID (publisher-id): molbev

Title: Molecular Biology and Evolution

Publisher: Oxford University Press (UK )

ISSN (Print): 0737-4038

ISSN (Electronic): 1537-1719

Publication date Collection: June 2024

Publication date (Electronic): 03 June 2024

Publication date PMC-release: 03 June 2024

Volume: 41

Issue: 6

Electronic Location Identifier: msae105

Affiliations

School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel

The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel

Balvatnik School of Computer Science, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel

The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel

Balvatnik School of Computer Science, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel

The Shmunis School of Biomedicine and Cancer Research, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel

School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv , Tel Aviv 69978, Israel

Author notes

Dana Azouri, Oz Granit and Michael Alburquerque contributed equally.

Corresponding authors: E-mails: itaymay@ 123456tauex.tau.ac.il ; talp@ 123456tauex.tau.ac.il ; mansour@ 123456tauex.tau.ac.il .

Author information

Dana Azouri https://orcid.org/0000-0003-0620-2626

Oz Granit https://orcid.org/0009-0000-4696-3054

Michael Alburquerque https://orcid.org/0009-0007-7130-3489

Yishay Mansour https://orcid.org/0000-0001-6891-2645

Tal Pupko https://orcid.org/0000-0001-9463-2575

Itay Mayrose https://orcid.org/0000-0002-8460-1502

Article

Publisher ID: msae105

DOI: 10.1093/molbev/msae105

PMC ID: 11180600

PubMed ID: 38829798

SO-VID: 7f01dc89-b63f-49af-85f6-117d470780fb

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.

History

Date received : 02 July 2023

Date revision received : 17 May 2024

Date accepted : 28 May 2024

Date: 17 June 2024

Page count

Pages: 14

The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement Learning

Read this article at

Abstract

Related collections

Computer Vision, Deep Learning, Deep Reinforcement Learning, IoT

Most cited references 40

New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

PAML 4: phylogenetic analysis by maximum likelihood.

The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 181

Most referenced authors 567