Automated analysis of phylogenetic clusters

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

As sequence data sets used for the investigation of pathogen transmission patterns increase in size, automated tools and standardized methods for cluster analysis have become necessary. We have developed an automated Cluster Picker which identifies monophyletic clades meeting user-input criteria for bootstrap support and maximum genetic distance within large phylogenetic trees. A second tool, the Cluster Matcher, automates the process of linking genetic data to epidemiological or clinical data, and matches clusters between runs of the Cluster Picker.

Results

We explore the effect of different bootstrap and genetic distance thresholds on clusters identified in a data set of publicly available HIV sequences, and compare these results to those of a previously published tool for cluster identification. To demonstrate their utility, we then use the Cluster Picker and Cluster Matcher together to investigate how clusters in the data set changed over time. We find that clusters containing sequences from more than one UK location at the first time point (multiple origin) were significantly more likely to grow than those representing only a single location.

Conclusions

The Cluster Picker and Cluster Matcher can rapidly process phylogenetic trees containing tens of thousands of sequences. Together these tools will facilitate comparisons of pathogen transmission dynamics between studies and countries.

Related collections

Most cited references 39

Record: found
Abstract: found
Article: found

Is Open Access

FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

Morgan N. Price, Paramvir S Dehal, Adam Arkin (2010)

Background We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability. Methodology/Principal Findings Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximum-likelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the “CAT” approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100–1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory. Conclusions/Significance FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments. FastTree 2 is freely available at http://www.microbesonline.org/fasttree.

0 comments Cited 4017 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Collective dynamics of 'small-world' networks.

D J Watts, S H Strogatz (1998)

Networks of coupled dynamical systems have been used to model biological oscillators, Josephson junction arrays, excitable media, neural networks, spatial games, genetic control networks and many other self-organizing systems. Ordinarily, the connection topology is assumed to be either completely regular or completely random. But many biological, technological and social networks lie somewhere between these two extremes. Here we explore simple models of networks that can be tuned through this middle ground: regular networks 'rewired' to introduce increasing amounts of disorder. We find that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs. We call them 'small-world' networks, by analogy with the small-world phenomenon (popularly known as six degrees of separation. The neural network of the worm Caenorhabditis elegans, the power grid of the western United States, and the collaboration graph of film actors are shown to be small-world networks. Models of dynamical systems with small-world coupling display enhanced signal-propagation speed, computational power, and synchronizability. In particular, infectious diseases spread more easily in small-world networks than in regular lattices.

0 comments Cited 2439 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

APE: Analyses of Phylogenetics and Evolution in R language.

E Paradis, J. Claude, K Strimmer (2004)

Analysis of Phylogenetics and Evolution (APE) is a package written in the R language for use in molecular evolution and phylogenetics. APE provides both utility functions for reading and writing data and manipulating phylogenetic trees, as well as several advanced methods for phylogenetic and evolutionary analysis (e.g. comparative and population genetic methods). APE takes advantage of the many R functions for statistics and graphics, and also provides a flexible framework for developing and implementing further statistical methods for the analysis of evolutionary processes. The program is free and available from the official R package archive at http://cran.r-project.org/src/contrib/PACKAGES.html#ape. APE is licensed under the GNU General Public License.

0 comments Cited 1636 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Manon Ragonnet-Cronin

Emma Hodcroft

Stéphane Hué

Esther Fearnhill

Valerie Delpech

Andrew J Leigh Brown

Samantha Lycett

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Journal ID (iso-abbrev): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2013

Publication date (Electronic): 6 November 2013

Volume: 14

Page: 317

Affiliations

[1 ]University of Edinburgh, Edinburgh, UK

[2 ]University College London, London, UK

[3 ]MRC Clinical Trials Unit, London, UK

[4 ]Public Health England, London, UK

Author notes

on behalf of the UK HIV Drug Resistance Database

Article

Publisher ID: 1471-2105-14-317

DOI: 10.1186/1471-2105-14-317

PMC ID: 4228337

PubMed ID: 24191891

SO-VID: 65a9740b-c05f-4734-9c5c-b8d506bcb2fc

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 12 June 2013

Date accepted : 30 October 2013

Comments

Comment on this article

scite_

337

350

Smart Citations

337

350

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 175

See all cited by

Most referenced authors 503

See all reference authors

Automated analysis of phylogenetic clusters

Read this article at

Abstract

Background

Results

Conclusions

Related collections

UCL: UN SDG 03 Good Health and Well-Being

Most cited references 39

FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments

Collective dynamics of 'small-world' networks.

APE: Analyses of Phylogenetics and Evolution in R language.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 379

Cited by 175

Most referenced authors 503