MISS-Prot: web server for self/non-self discrimination of protein residue networks in parasites; theory and experiments in Fasciola peptides and Anisakis allergens
There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
Infections caused by human parasites (HPs) affect the poorest 500 million people worldwide
but chemotherapy has become expensive, toxic, and/or less effective due to drug resistance.
On the other hand, many 3D structures in Protein Data Bank (PDB) remain without function
annotation. We need theoretical models to quickly predict biologically relevant Parasite
Self Proteins (PSP), which are expressed differentially in a given parasite and are
dissimilar to proteins expressed in other parasites and have a high probability to
become new vaccines (unique sequence) or drug targets (unique 3D structure). We present
herein a model for PSPs in eight different HPs (Ascaris, Entamoeba, Fasciola, Giardia,
Leishmania, Plasmodium, Trypanosoma, and Toxoplasma) with 90% accuracy for 15 341
training and validation cases. The model combines protein residue networks, Markov
Chain Models (MCM) and Artificial Neural Networks (ANN). The input parameters are
the spectral moments of the Markov transition matrix for electrostatic interactions
associated with the protein residue complex network calculated with the MARCH-INSIDE
software. We implemented this model in a new web-server called MISS-Prot (MARCH-INSIDE
Scores for Self-Proteins). MISS-Prot was programmed using PHP/HTML/Python and MARCH-INSIDE
routines and is freely available at: . This server is easy to use by non-experts in
Bioinformatics who can carry out automatic online upload and prediction with 3D structures
deposited at PDB (mode 1). We can also study outcomes of Peptide Mass Fingerprinting
(PMFs) and MS/MS for query proteins with unknown 3D structures (mode 2). We illustrated
the use of MISS-Prot in experimental and/or theoretical studies of peptides from Fasciola
hepatica cathepsin proteases or present on 10 Anisakis simplex allergens (Ani s 1
to Ani s 10). In doing so, we combined electrophoresis (1DE), MALDI-TOF Mass Spectroscopy,
and MASCOT to seek sequences, Molecular Mechanics + Molecular Dynamics (MM/MD) to
generate 3D structures and MISS-Prot to predict PSP scores. MISS-Prot also allows
the prediction of PSP proteins in 16 additional species including parasite hosts,
fungi pathogens, disease transmission vectors, and biotechnologically relevant organisms.
Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.
Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.
Information on subcellular localization of proteins is important to molecular cell biology, proteomics, system biology and drug discovery. To provide the vast majority of experimental scientists with a user-friendly tool in these areas, we present a package of Web servers developed recently by hybridizing the 'higher level' approach with the ab initio approach. The package is called Cell-PLoc and contains the following six predictors: Euk-mPLoc, Hum-mPLoc, Plant-PLoc, Gpos-PLoc, Gneg-PLoc and Virus-PLoc, specialized for eukaryotic, human, plant, Gram-positive bacterial, Gram-negative bacterial and viral proteins, respectively. Using these Web servers, one can easily get the desired prediction results with a high expected accuracy, as demonstrated by a series of cross-validation tests on the benchmark data sets that covered up to 22 subcellular location sites and in which none of the proteins included had > or =25% sequence identity to any other protein in the same subcellular-location subset. Some of these Web servers can be particularly used to deal with multiplex proteins as well, which may simultaneously exist at, or move between, two or more different subcellular locations. Proteins with multiple locations or dynamic features of this kind are particularly interesting, because they may have some special biological functions intriguing to investigators in both basic research and drug discovery. This protocol is a step-by-step guide on how to use the Web-server predictors in the Cell-PLoc package. The computational time for each prediction is less than 5 s in most cases. The Cell-PLoc package is freely accessible at http://chou.med.harvard.edu/bioinf/Cell-PLoc.
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.