6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      DeepBindPoc: a deep learning method to rank ligand binding pockets using molecular vector representation

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Accurate identification of ligand-binding pockets in a protein is important for structure-based drug design. In recent years, several deep learning models were developed to learn important physical–chemical and spatial information to predict ligand-binding pockets in a protein. However, ranking the native ligand binding pockets from a pool of predicted pockets is still a hard task for computational molecular biologists using a single web-based tool. Hence, we believe, by using closer to real application data set as training and by providing ligand information, an enhanced model to identify accurate pockets can be obtained. In this article, we propose a new deep learning method called DeepBindPoc for identifying and ranking ligand-binding pockets in proteins. The model is built by using information about the binding pocket and associated ligand. We take advantage of the mol2vec tool to represent both the given ligand and pocket as vectors to construct a densely fully connected layer model. During the training, important features for pocket-ligand binding are automatically extracted and high-level information is preserved appropriately. DeepBindPoc demonstrated a strong complementary advantage for the detection of native-like pockets when combined with traditional popular methods, such as fpocket and P2Rank. The proposed method is extensively tested and validated with standard procedures on multiple datasets, including a dataset with G-protein Coupled receptors. The systematic testing and validation of our method suggest that DeepBindPoc is a valuable tool to rank near-native pockets for theoretically modeled protein with unknown experimental active site but have known ligand. The DeepBindPoc model described in this article is available at GitHub ( https://github.com/haiping1010/DeepBindPoc) and the webserver is available at ( http://cbblab.siat.ac.cn/DeepBindPoc/index.php).

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          The meaning and use of the area under a receiver operating characteristic (ROC) curve.

          A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Open Babel: An open chemical toolbox

            Background A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendor-neutral formats. Results We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license from http://openbabel.org.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              CASTp 3.0: computed atlas of surface topography of proteins

              Abstract Geometric and topological properties of protein structures, including surface pockets, interior cavities and cross channels, are of fundamental importance for proteins to carry out their functions. Computed Atlas of Surface Topography of proteins (CASTp) is a web server that provides online services for locating, delineating and measuring these geometric and topological properties of protein structures. It has been widely used since its inception in 2003. In this article, we present the latest version of the web server, CASTp 3.0. CASTp 3.0 continues to provide reliable and comprehensive identifications and quantifications of protein topography. In addition, it now provides: (i) imprints of the negative volumes of pockets, cavities and channels, (ii) topographic features of biological assemblies in the Protein Data Bank, (iii) improved visualization of protein structures and pockets, and (iv) more intuitive structural and annotated information, including information of secondary structure, functional sites, variant sites and other annotations of protein residues. The CASTp 3.0 web server is freely accessible at http://sts.bioe.uic.edu/castp/.
                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ
                PeerJ Inc. (San Diego, USA )
                2167-8359
                6 April 2020
                2020
                : 8
                : e8864
                Affiliations
                [1 ]Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences , Shenzhen, Guangdong Province, China
                [2 ]College of Software Technology, Zhejiang University , Zhejiang Province, Zhejiang, China
                [3 ]School of Biological Sciences, Nanyang Technological University , Singapore, Singapore
                [4 ]Shenzhen Children’s Hospital , Shenzhen, Guangdong Province, China
                Author information
                http://orcid.org/0000-0002-5541-234X
                http://orcid.org/0000-0002-6103-0700
                http://orcid.org/0000-0002-6564-5059
                Article
                8864
                10.7717/peerj.8864
                7144620
                bd928ec0-3df9-42e1-89dc-5343f0b1b662
                © 2020 Zhang et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

                History
                : 13 December 2019
                : 8 March 2020
                Funding
                Funded by: National Key Research and Development Program of China
                Award ID: 2018YFB0204403 and 2016YFB0201305
                Funded by: Shenzhen Basic Research Fund
                Award ID: JCYJ20180507182818013, GGFW2017073114031767 and JCYJ20170413093358429
                Funded by: National Science Foundation of China under
                Award ID: U1435215 and 61433012
                Funded by: National Natural Youth Science Foundation of China
                Award ID: 31601028
                Funded by: China Postdoctoral Science Foundation
                Award ID: 2019M653132
                Funded by: CAS Key Lab
                Award ID: 2011DP173015
                Funded by: Shenzhen Discipline Construction Project for Urban Computing and Data Intelligence
                Funded by: Youth Innovation Promotion Association
                This work was supported by the National Key Research and Development Program of China under grant Nos. 2018YFB0204403 and 2016YFB0201305, the Shenzhen Basic Research Fund under grant no. JCYJ20180507182818013, GGFW2017073114031767 and JCYJ20170413093358429, National Science Foundation of China under grant nos. U1435215 and 61433012; the National Natural Youth Science Foundation of China (grant no. 31601028), the China Postdoctoral Science Foundation (grant no. 2019M653132), CAS Key Lab under grant no. 2011DP173015. This work was also supported by the Shenzhen Discipline Construction Project for Urban Computing and Data Intelligence, Youth Innovation Promotion Association, CAS to Yanjie Wei. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Bioinformatics
                Computational Biology
                Molecular Biology
                Computational Science
                Data Mining and Machine Learning

                ligand pocket identification,deep neural network,mol2vec,densely fully connected neural network,protein–ligand interactions

                Comments

                Comment on this article