77
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Sequence-based prediction of protein protein interaction using a deep-learning algorithm

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Protein-protein interactions (PPIs) are critical for many biological processes. It is therefore important to develop accurate high-throughput methods for identifying PPI to better understand protein function, disease occurrence, and therapy design. Though various computational methods for predicting PPI have been developed, their robustness for prediction with external datasets is unknown. Deep-learning algorithms have achieved successful results in diverse areas, but their effectiveness for PPI prediction has not been tested.

          Results

          We used a stacked autoencoder, a type of deep-learning algorithm, to study the sequence-based PPI prediction. The best model achieved an average accuracy of 97.19% with 10-fold cross-validation. The prediction accuracies for various external datasets ranged from 87.99% to 99.21%, which are superior to those achieved with previous methods.

          Conclusions

          To our knowledge, this research is the first to apply a deep-learning algorithm to sequence-based PPI prediction, and the results demonstrate its potential in this field.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12859-017-1700-2) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.

          The recent abundance of genome sequence data has brought an urgent need for systematic proteomics to decipher the encoded protein networks that dictate cellular function. To date, generation of large-scale protein-protein interaction maps has relied on the yeast two-hybrid system, which detects binary interactions through activation of reporter gene expression. With the advent of ultrasensitive mass spectrometric protein identification methods, it is feasible to identify directly protein complexes on a proteome-wide scale. Here we report, using the budding yeast Saccharomyces cerevisiae as a test case, an example of this approach, which we term high-throughput mass spectrometric protein complex identification (HMS-PCI). Beginning with 10% of predicted yeast proteins as baits, we detected 3,617 associated proteins covering 25% of the yeast proteome. Numerous protein complexes were identified, including many new interactions in various signalling pathways and in the DNA damage response. Comparison of the HMS-PCI data set with interactions reported in the literature revealed an average threefold higher success rate in detection of known complexes compared with large-scale two-hybrid studies. Given the high degree of connectivity observed in this study, even partial HMS-PCI coverage of complex proteomes, including that of humans, should allow comprehensive identification of cellular networks.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae.

            Defining protein complexes is critical to virtually all aspects of cell biology. Two recent affinity purification/mass spectrometry studies in Saccharomyces cerevisiae have vastly increased the available protein interaction data. The practical utility of such high throughput interaction sets, however, is substantially decreased by the presence of false positives. Here we created a novel probabilistic metric that takes advantage of the high density of these data, including both the presence and absence of individual associations, to provide a measure of the relative confidence of each potential protein-protein interaction. This analysis largely overcomes the noise inherent in high throughput immunoprecipitation experiments. For example, of the 12,122 binary interactions in the general repository of interaction data (BioGRID) derived from these two studies, we marked 7504 as being of substantially lower confidence. Additionally, applying our metric and a stringent cutoff we identified a set of 9074 interactions (including 4456 that were not among the 12,122 interactions) with accuracy comparable to that of conventional small scale methodologies. Finally we organized proteins into coherent multisubunit complexes using hierarchical clustering. This work thus provides a highly accurate physical interaction map of yeast in a format that is readily accessible to the biological community.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences

              Compared to the available protein sequences of different organisms, the number of revealed protein–protein interactions (PPIs) is still very limited. So many computational methods have been developed to facilitate the identification of novel PPIs. However, the methods only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins. In this article, a sequence-based method is proposed by combining a new feature representation using auto covariance (AC) and support vector machine (SVM). AC accounts for the interactions between residues a certain distance apart in the sequence, so this method adequately takes the neighbouring effect into account. When performed on the PPI data of yeast Saccharomyces cerevisiae, the method achieved a very promising prediction result. An independent data set of 11 474 yeast PPIs was used to evaluate this prediction model and the prediction accuracy is 88.09%. The performance of this method is superior to those of the existing sequence-based methods, so it can be a useful supplementary tool for future proteomics studies. The prediction software and all data sets used in this article are freely available at http://www.scucic.cn/Predict_PPI/index.htm.
                Bookmark

                Author and article information

                Contributors
                jfpei@pku.edu.cn
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                25 May 2017
                25 May 2017
                2017
                : 18
                : 277
                Affiliations
                [1 ]ISNI 0000 0001 2256 9319, GRID grid.11135.37, Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, , Peking University, ; Beijing, 100871 China
                [2 ]ISNI 0000 0001 2256 9319, GRID grid.11135.37, Beijing National Laboratory for Molecular Science, State Key Laboratory for Structural Chemistry of Unstable and Stable Species, College of Chemistry and Molecular Engineering, , Peking University, ; Beijing, 100871 China
                [3 ]ISNI 0000 0001 2256 9319, GRID grid.11135.37, Peking-Tsinghua Center for Life Sciences, , Peking University, ; Beijing, 100871 China
                Author information
                http://orcid.org/0000-0002-8482-1185
                Article
                1700
                10.1186/s12859-017-1700-2
                5445391
                28545462
                4dc177f5-dd6d-4e06-ac4e-072a893fdba7
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 6 March 2017
                : 18 May 2017
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100002855, Ministry of Science and Technology of the People's Republic of China;
                Award ID: 2016YFA0502303
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100001809, National Natural Science Foundation of China;
                Award ID: 21673010
                Award ID: 81273436
                Award Recipient :
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2017

                Bioinformatics & Computational biology
                deep learning,protein-protein interaction
                Bioinformatics & Computational biology
                deep learning, protein-protein interaction

                Comments

                Comment on this article