There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
The PANTHER database was designed for high-throughput analysis of protein sequences.
One of the key features is a simplified ontology of protein function, which allows
browsing of the database by biological functions. Biologist curators have associated
the ontology terms with groups of protein sequences rather than individual sequences.
Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups.
The advantage of this approach is that new sequences can be automatically classified
as they become available. To ensure accurate functional classification, HMMs are constructed
not only for families, but also for functionally distinct subfamilies. Multiple sequence
alignments and phylogenetic trees, including curator-assigned information, are available
for each family. The current version of the PANTHER database includes training sequences
from all organisms in the GenBank non-redundant protein database, and the HMMs have
been used to classify gene products across the entire genomes of human, and Drosophila
melanogaster. The ontology terms and protein families and subfamilies, as well as
Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding
contractual obligations, access to human gene classifications and to protein family
trees and multiple sequence alignments will temporarily require a nominal registration
fee. PANTHER is publicly available on the web at http://panther.celera.com.