MUSCLE: a multiple sequence alignment method with reduced time and space complexity

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles.

Results

We compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fast to be the fastest algorithm on all test sets, achieving average alignment accuracy similar to CLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is able to align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer.

Conclusions

MUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs. MUSCLE is freely available at http://www.drive5.com/muscle.

Related collections

Most cited references 37

Record: found
Abstract: found
Article: not found

SMART, a simple modular architecture research tool: identification of signaling domains.

J. Schultz, F Milpetz, P. Bork … (1998)

Accurate multiple alignments of 86 domains that occur in signaling proteins have been constructed and used to provide a Web-based tool (SMART: simple modular architecture research tool) that allows rapid identification and annotation of signaling domain sequences. The majority of signaling proteins are multidomain in character with a considerable variety of domain combinations known. Comparison with established databases showed that 25% of our domain set could not be deduced from SwissProt and 41% could not be annotated by Pfam. SMART is able to determine the modular architectures of single sequences or genomes; application to the entire yeast genome revealed that at least 6.7% of its genes contain one or more signaling domains, approximately 350 greater than previously annotated. The process of constructing SMART predicted (i) novel domain homologues in unexpected locations such as band 4.1-homologous domains in focal adhesion kinases; (ii) previously unknown domain families, including a citron-homology domain; (iii) putative functions of domain families after identification of additional family members, for example, a ubiquitin-binding role for ubiquitin-associated domains (UBA); (iv) cellular roles for proteins, such predicted DEATH domains in netrin receptors further implicating these molecules in axonal guidance; (v) signaling domains in known disease genes such as SPRY domains in both marenostrin/pyrin and Midline 1; (vi) domains in unexpected phylogenetic contexts such as diacylglycerol kinase homologues in yeast and bacteria; and (vii) likely protein misclassifications exemplified by a predicted pleckstrin homology domain in a Candida albicans protein, previously described as an integrin.

0 comments Cited 626 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.

M S Boguski, Andrew Neuwald, Stephen F Altschul … (1993)

A wealth of protein and DNA sequence data is being generated by genome projects and other sequencing efforts. A crucial barrier to deciphering these sequences and understanding the relations among them is the difficulty of detecting subtle local residue patterns common to multiple sequences. Such patterns frequently reflect similar molecular structures and biological properties. A mathematical definition of this "local multiple alignment" problem suitable for full computer automation has been used to develop a new and sensitive algorithm, based on the statistical method of iterative sampling. This algorithm finds an optimized local alignment model for N sequences in N-linear time, requiring only seconds on current workstations, and allows the simultaneous detection and optimization of multiple patterns and pattern repeats. The method is illustrated as applied to helix-turn-helix proteins, lipocalins, and prenyltransferases.

0 comments Cited 222 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Alignment-free sequence comparison-a review.

Susana Vinga, Jonas Almeida, John Osborne (2003)

Genetic recombination and, in particular, genetic shuffling are at odds with sequence comparison by alignment, which assumes conservation of contiguity between homologous segments. A variety of theoretical foundations are being used to derive alignment-free methods that overcome this limitation. The formulation of alternative metrics for dissimilarity between sequences and their algorithmic implementations are reviewed. The overwhelming majority of work on alignment-free sequence has taken place in the past two decades, with most reports published in the past 5 years. Two main categories of methods have been proposed-methods based on word (oligomer) frequency, and methods that do not require resolving the sequence with fixed word length segments. The first category is based on the statistics of word frequency, on the distances defined in a Cartesian space defined by the frequency vectors, and on the information content of frequency distribution. The second category includes the use of Kolmogorov complexity and Chaos Theory. Despite their low visibility, alignment-free metrics are in fact already widely used as pre-selection filters for alignment-based querying of large applications. Recent work is furthering their usage as a scale-independent methodology that is capable of recognizing homology when loss of contiguity is beyond the possibility of alignment. Most of the alignment-free algorithms reviewed were implemented in MATLAB code and are available at http://bioinformatics.musc.edu/resources.html

0 comments Cited 193 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date Collection: 2004

Publication date (Electronic): 19 August 2004

Volume: 5

Page: 113

Affiliations

[1 ]Department of Plant and Microbial Biology, 461 Koshland Hall, University of California, Berkeley, CA 94720-3102, USA

Article

Publisher ID: 1471-2105-5-113

DOI: 10.1186/1471-2105-5-113

PMC ID: 517706

PubMed ID: 15318951

SO-VID: 60783ce5-aa64-411c-a326-c27624de889a

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 25 March 2004

Date accepted : 19 August 2004

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

MUSCLE: a multiple sequence alignment method with reduced time and space complexity

Read this article at

Abstract

Background

Results

Conclusions

Related collections

FeatureCloud

Most cited references 37

SMART, a simple modular architecture research tool: identification of signaling domains.

Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.

Alignment-free sequence comparison-a review.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 35

Cited by 2,904

Most referenced authors 2,197