Parallelization of MAFFT for large-scale multiple sequence alignments

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Summary

We report an update for the MAFFT multiple sequence alignment program to enable parallel calculation of large numbers of sequences. The G-INS-1 option of MAFFT was recently reported to have higher accuracy than other methods for large data, but this method has been impractical for most large-scale analyses, due to the requirement of large computational resources. We introduce a scalable variant, G-large-INS-1, which has equivalent accuracy to G-INS-1 and is applicable to 50 000 or more sequences.

Availability and implementation

This feature is available in MAFFT versions 7.355 or later at https://mafft.cbrc.jp/alignment/software/mpi.html.

Supplementary information

Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 9

Record: found
Abstract: found
Article: found

Is Open Access

Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees

Kazunori Yamada, Kentaro Tomii, Kazutaka Katoh (2016)

Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.

0 comments Cited 136 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy

GPS Raghava, Stephen MJ Searle, Patrick C Audley … (2003)

Background The alignment of two or more protein sequences provides a powerful guide in the prediction of the protein structure and in identifying key functional residues, however, the utility of any prediction is completely dependent on the accuracy of the alignment. In this paper we describe a suite of reference alignments derived from the comparison of protein three-dimensional structures together with evaluation measures and software that allow automatically generated alignments to be benchmarked. We test the OXBench benchmark suite on alignments generated by the AMPS multiple alignment method, then apply the suite to compare eight different multiple alignment algorithms. The benchmark shows the current state-of-the art for alignment accuracy and provides a baseline against which new alignment algorithms may be judged. Results The simple hierarchical multiple alignment algorithm, AMPS, performed as well as or better than more modern methods such as CLUSTALW once the PAM250 pair-score matrix was replaced by a BLOSUM series matrix. AMPS gave an accuracy in Structurally Conserved Regions (SCRs) of 89.9% over a set of 672 alignments. The T-COFFEE method on a data set of families with <8 sequences gave 91.4% accuracy, significantly better than CLUSTALW (88.9%) and all other methods considered here. The complete suite is available from . Conclusions The OXBench suite of reference alignments, evaluation software and results database provide a convenient method to assess progress in sequence alignment techniques. Evaluation measures that were dependent on comparison to a reference alignment were found to give good discrimination between methods. The STAMP S c Score which is independent of a reference alignment also gave good discrimination. Application of OXBench in this paper shows that with the exception of T-COFFEE, the majority of the improvement in alignment accuracy seen since 1985 stems from improved pair-score matrices rather than algorithmic refinements. The maximum theoretical alignment accuracy obtained by pooling results over all methods was 94.5% with 52.5% accuracy for alignments in the 0–10 percentage identity range. This suggests that further improvements in accuracy will be possible in the future.

0 comments Cited 56 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

COFFEE: an objective function for multiple sequence alignments.

James Higgins, Thomas Holm, C Notredame (1998)

In order to increase the accuracy of multiple sequence alignments, we designed a new strategy for optimizing multiple sequence alignments by genetic algorithm. We named it COFFEE (Consistency based Objective Function For alignmEnt Evaluation). The COFFEE score reflects the level of consistency between a multiple sequence alignment and a library containing pairwise alignments of the same sequences. We show that multiple sequence alignments can be optimized for their COFFEE score with the genetic algorithm package SAGA. The COFFEE function is tested on 11 test cases made of structural alignments extracted from 3D_ali. These alignments are compared to those produced using five alternative methods. Results indicate that COFFEE outperforms the other methods when the level of identity between the sequences is low. Accuracy is evaluated by comparison with the structural alignments used as references. We also show that the COFFEE score can be used as a reliability index on multiple sequence alignments. Finally, we show that given a library of structure-based pairwise sequence alignments extracted from FSSP, SAGA can produce high-quality multiple sequence alignments. The main advantage of COFFEE is its flexibility. With COFFEE, any method suitable for making pairwise alignments can be extended to making multiple alignments. The package is available along with the test cases through the WWW: http://www. ebi.ac.uk/cedric cedric.notredame@ebi.ac.uk

0 comments Cited 40 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

John Hancock: Role: Associate Editor

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date (Print): 15 July 2018

Publication date (Electronic): 01 March 2018

Publication date PMC-release: 01 March 2018

Volume: 34

Issue: 14

Pages: 2490-2492

Affiliations

[1 ]Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan

[2 ]Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan

[3 ]Graduate School of Information Sciences, Tohoku University, Sendai, Japan

[4 ]Biotechnology Research Institute for Drug Discovery (BRD), AIST, Tokyo, Japan

[5 ]AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), Tokyo, Japan

[6 ]Research Institute for Microbial Diseases, Osaka University, Suita, Japan

Author notes

To whom correspondence should be addressed. katoh@ 123456ifrec.osaka-u.ac.jp

Author information

Kazutaka Katoh http://orcid.org/0000-0003-4133-8393

Article

Publisher ID: bty121

DOI: 10.1093/bioinformatics/bty121

PMC ID: 6041967

PubMed ID: 29506019

SO-VID: f926e007-87e5-4705-82fd-ffeab6f62d85

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

History

Date received : 19 October 2017

Date revision received : 07 February 2018

Date accepted : 28 February 2018

Page count

Pages: 3

Funding

Funded by: JSPS 10.13039/501100001691

Funded by: KAKENHI

Award ID: JP16K07464

Award ID: JP17J06457

Funded by: Platform Project for Supporting Drug Discovery and Life Science Research

Award ID: JP17am0101110

Award ID: JP17am0101108

Funded by: AMED 10.13039/100009619

Comments

Comment on this article

scite_

Cited by 367

See all cited by

Most referenced authors 302

See all reference authors

Parallelization of MAFFT for large-scale multiple sequence alignments

Read this article at

Abstract

Summary

Availability and implementation

Supplementary information

Related collections

REPO4EU WP2 Databases

Most cited references 9

Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees

OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy

COFFEE: an objective function for multiple sequence alignments.

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 93

Cited by 367

Most referenced authors 302