There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
Background
The accumulation of somatic mutation is a common property in all cancer genomes. These
mutations include several patterns of mutagenesis such as small insertions, chromosomal
rearrangement and nucleotide substitutions. Consequently, the mutated genomes produce
mutant transcriptome and, therefore, mutant proteins that give the cancer cell its
oncogenic properties [1]. For such mutated proteins, however, mass spectrometry-based
identification by shotgun proteomics is generally difficult because the identification
is dependent on databases containing normal proteins or hybrid database with normal
and mutated proteins. Here, we present 'onco-proteogenomics, a novel proteogenomics
approach to identify the cancer-related peptides (phospho- and non-phospho peptides)
and proteins.
Methods
We analyzed 15 MS/MS runs of HeLa S3 cells, as a test sample, by shotgun proteomics
and phosphoproteomics. The obtained data was analyzed by an extended version of MSSS
(MS Spectra Sequential Subtraction), the proteogenomic approach that we used before
in the identification of novel genomic features in Rice plant [2]. In our onco- proteogenomic
approach, we used four databases containing normal sequences (Human protein, cDNA,
mRNA and genome databases) for Mascot peptide identification and removed all the MS/MS
spectra that corresponds to all identified peptides. The reminder MS/MS spectra were
searched against one cancer-driven database obtained through deep sequencing of HeLa
S3 cells to identify cancer-specific peptides.
Results
The four databases that contain normal sequences were used sequentially to identify
all potential peptide sequences and phosphorylation sites that can be generated from
the normal genome. This includes the potential protein sequences, junction-peptides
and exon-skipping peptides (protein and cDNA databases), exonic peptides (mRNA database)
and extragenic peptides (genome database). Following each Mascot search, we removed
all the MS/ MS spectra corresponding to the identified peptide sequences and created
new files containing the reminder MS/MS spectra. Next, we constructed HeLa S3 transcriptome
database with data obtained from deep sequencing of HeLa S3 cells (obtained from NCBI
UniGene Database). The constructed database contains over 60,000 entries. For the
remaining unidentified MS/MS spectra, we performed Mascot search against this transcriptome
database. Consequently, we were able to identify 25 cancer-specific peptides including
phosphorylated sites. For further check, the identified peptides were aligned to the
employed normal databases using NCBI BLAST. The alignment did not show any significant
matches indicating that these peptides are specifically expressed in the HeLa S3 cancer
cell-line. In future studies, we will apply the same approach in different cancers
aiming to identify global cancer biomarkers and drug targets Figure 1.
Figure 1
Analysis flowchart and future work.