A comparison of common programming languages used in bioinformatics

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python.

Results

Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found.

Source code and additional information are available from http://www.bioinformatics.org/benchmark/

Conclusion

This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.

Related collections

Most cited references 16

Record: found
Abstract: found
Article: not found

Basic local alignment search tool.

Stephen F Altschul, Warren Gish, Webb Miller … (1990)

A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.

0 comments Cited 9626 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research.

A Conesa, S Götz, J. M. García-Gómez … (2005)

We present here Blast2GO (B2G), a research tool designed with the main purpose of enabling Gene Ontology (GO) based data mining on sequence data for which no GO annotation is yet available. B2G joints in one application GO annotation based on similarity searches with statistical analysis and highlighted visualization on directed acyclic graphs. This tool offers a suitable platform for functional genomics research in non-model species. B2G is an intuitive and interactive desktop application that allows monitoring and comprehension of the whole annotation and analysis process. Blast2GO is freely available via Java Web Start at http://www.blast2go.de. http://www.blast2go.de -> Evaluation.

0 comments Cited 1321 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

Stéphane Guindon, Olivier Gascuel, Bruce Rannala (2003)

The increase in the number of large data sets and the complexity of current probabilistic sequence evolution models necessitates fast and reliable phylogeny reconstruction methods. We describe a new approach, based on the maximum- likelihood principle, which clearly satisfies these requirements. The core of this method is a simple hill-climbing algorithm that adjusts tree topology and branch lengths simultaneously. This algorithm starts from an initial tree built by a fast distance-based method and modifies this tree to improve its likelihood at each iteration. Due to this simultaneous adjustment of the topology and branch lengths, only a few iterations are sufficient to reach an optimum. We used extensive and realistic computer simulations to show that the topological accuracy of this new method is at least as high as that of the existing maximum-likelihood programs and much higher than the performance of distance-based and parsimony approaches. The reduction of computing time is dramatic in comparison with other maximum-likelihood packages, while the likelihood maximization ability tends to be higher. For example, only 12 min were required on a standard personal computer to analyze a data set consisting of 500 rbcL sequences with 1,428 base pairs from plant plastids, thus reaching a speed of the same order as some popular distance-based and parsimony algorithms. This new method is implemented in the PHYML program, which is freely available on our web page: http://www.lirmm.fr/w3ifa/MAAS/.

0 comments Cited 1001 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2008

Publication date (Electronic): 5 February 2008

Volume: 9

Page: 82

Affiliations

[1 ]Department of Biological Sciences, Macquarie University, Sydney, NSW 2109, Australia

Article

Publisher ID: 1471-2105-9-82

DOI: 10.1186/1471-2105-9-82

PMC ID: 2267699

PubMed ID: 18251993

SO-VID: b2291d9b-e9f1-47d6-924e-35ef486ccf13

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 4 October 2007

Date accepted : 5 February 2008

Comments

Comment on this article

scite_

Cited by 28

See all cited by

Most referenced authors 752

See all reference authors

- Version 1

A comparison of common programming languages used in bioinformatics

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Smart Contracts Programming Languages

Most cited references 16

Basic local alignment search tool.

Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research.

A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 211

Cited by 28

Most referenced authors 752