14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning.

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Reconstructing the phylogenetic relationships between species is one of the most formidable tasks in evolutionary biology. Multiple methods exist to reconstruct phylogenetic trees, each with their own strengths and weaknesses. Both simulation and empirical studies have identified several "zones" of parameter space where accuracy of some methods can plummet, even for four-taxon trees. Further, some methods can have undesirable statistical properties such as statistical inconsistency and/or the tendency to be positively misleading (i.e. assert strong support for the incorrect tree topology). Recently, deep learning techniques have made inroads on a number of both new and longstanding problems in biological research. In this study, we designed a deep convolutional neural network (CNN) to infer quartet topologies from multiple sequence alignments. This CNN can readily be trained to make inferences using both gapped and ungapped data. We show that our approach is highly accurate on simulated data, often outperforming traditional methods, and is remarkably robust to bias-inducing regions of parameter space such as the Felsenstein zone and the Farris zone. We also demonstrate that the confidence scores produced by our CNN can more accurately assess support for the chosen topology than bootstrap and posterior probability scores from traditional methods. Although numerous practical challenges remain, these findings suggest that the deep learning approaches such as ours have the potential to produce more accurate phylogenetic inferences.

          Related collections

          Author and article information

          Journal
          Syst Biol
          Systematic biology
          Oxford University Press (OUP)
          1076-836X
          1063-5157
          March 01 2020
          : 69
          : 2
          Affiliations
          [1 ] Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, UNC-Chapel Hill, Chapel Hill, NC 27599-7264, USA.
          [2 ] Biological and Biomedical Sciences Program, University of North Carolina at Chapel Hill, 130 Mason Farm Road, UNC-Chapel Hill Chapel Hill, NC 27599-7264, USA.
          Article
          5559282
          10.1093/sysbio/syz060
          31504938
          14e4a946-1dcd-452a-9253-ea8711eb7a26
          History

          Supervised machine learning,convolutional neuronal network,phylogenetics

          Comments

          Comment on this article