A high level interface to SCOP and ASTRAL implemented in Python

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources.

Results

We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL.

Conclusion

The modules make the analysis and generation of datasets for use in structural genomics easier and more principled.

Related collections

Most cited references 5

Record: found
Abstract: found
Article: not found

The ASTRAL compendium for protein structure and sequence analysis.

S E Brenner (2000)

The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. The SPACI scores included in the system summarize the overall characteristics of a protein structure. A structural alignments database indicates residue equivalencies in superimposed protein domain structures. The PDB sequence-map files provide a linkage between the amino acid sequence of the molecule studied (SEQRES records in a database entry) and the sequence of the atoms experimentally observed in the structure (ATOM records). These maps are combined with information in the SCOPdatabase to provide sequences of protein domains. Selected subsets of the domain database, with varying degrees of similarity measured in several different ways, are also available. ASTRALmay be accessed at http://astral.stanford.edu/

0 comments Cited 116 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

C Chothia, Daniel S. Brenner, T Hubbard (1998)

Pairwise sequence comparison methods have been assessed using proteins whose relationships are known reliably from their structures and functions, as described in the SCOP database [Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia C. (1995) J. Mol. Biol. 247, 536-540]. The evaluation tested the programs BLAST [Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). J. Mol. Biol. 215, 403-410], WU-BLAST2 [Altschul, S. F. & Gish, W. (1996) Methods Enzymol. 266, 460-480], FASTA [Pearson, W. R. & Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA 85, 2444-2448], and SSEARCH [Smith, T. F. & Waterman, M. S. (1981) J. Mol. Biol. 147, 195-197] and their scoring schemes. The error rate of all algorithms is greatly reduced by using statistical scores to evaluate matches rather than percentage identity or raw scores. The E-value statistical scores of SSEARCH and FASTA are reliable: the number of false positives found in our tests agrees well with the scores reported. However, the P-values reported by BLAST and WU-BLAST2 exaggerate significance by orders of magnitude. SSEARCH, FASTA ktup = 1, and WU-BLAST2 perform best, and they are capable of detecting almost all relationships between proteins whose sequence identities are >30%. For more distantly related proteins, they do much less well; only one-half of the relationships between proteins with 20-30% identity are found. Because many homologs have low sequence similarity, most distant relationships cannot be detected by any pairwise comparison method; however, those which are identified may be used with confidence.

0 comments Cited 54 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Bio* toolkits--a brief overview.

Harry J Mangalam (2002)

Bioinformatics research is often difficult to do with commercial software. The Open Source BioPerl, BioPython and Biojava projects provide toolkits with multiple functionality that make it easier to create customised pipelines or analysis. This review briefly compares the quirks of the underlying languages and the functionality, documentation, utility and relative advantages of the Bio counterparts, particularly from the point of view of the beginning biologist programmer.

0 comments Cited 21 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date Collection: 2006

Publication date (Electronic): 10 January 2006

Volume: 7

Page: 10

Affiliations

[1 ]Bioinformatics, Institute of Cell and Molecular Science, School of Medicine and Dentistry, Queen Mary, University of London, Charterhouse Square, London EC1 6BQ, UK

[2 ]Department of Plant and Microbial Biology, University of California, Berkeley, USA

Article

Publisher ID: 1471-2105-7-10

DOI: 10.1186/1471-2105-7-10

PMC ID: 1373603

PubMed ID: 16403221

SO-VID: daa07c34-ba5d-4f6c-bd6e-60783ab399fb

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 17 June 2005

Date accepted : 10 January 2006

Comments

Comment on this article

scite_

Cited by 5

See all cited by

Most referenced authors 371

See all reference authors

- Version 1

A high level interface to SCOP and ASTRAL implemented in Python

Read this article at

Abstract

Background

Results

Conclusion

Related collections

G3: Genes|Genomes|Genetics

Most cited references 5

The ASTRAL compendium for protein structure and sequence analysis.

Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

The Bio* toolkits--a brief overview.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 146

Cited by 5

Most referenced authors 371