2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Deep learning-based multi-functional therapeutic peptides prediction with a multi-label focal dice loss function

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          With the great number of peptide sequences produced in the postgenomic era, it is highly desirable to identify the various functions of therapeutic peptides quickly. Furthermore, it is a great challenge to predict accurate multi-functional therapeutic peptides (MFTP) via sequence-based computational tools.

          Results

          Here, we propose a novel multi-label-based method, named ETFC, to predict 21 categories of therapeutic peptides. The method utilizes a deep learning-based model architecture, which consists of four blocks: embedding, text convolutional neural network, feed-forward network, and classification blocks. This method also adopts an imbalanced learning strategy with a novel multi-label focal dice loss function. multi-label focal dice loss is applied in the ETFC method to solve the inherent imbalance problem in the multi-label dataset and achieve competitive performance. The experimental results state that the ETFC method is significantly better than the existing methods for MFTP prediction. With the established framework, we use the teacher–student-based knowledge distillation to obtain the attention weight from the self-attention mechanism in the MFTP prediction and quantify their contributions toward each of the investigated activities.

          Availability and implementation

          The source code and dataset are available via: https://github.com/xialab-ahu/ETFC.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          UniProt: the universal protein knowledgebase in 2021

          (2020)
          Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Attention Is All You Need

            The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 15 pages, 5 figures
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types.

              Antimicrobial peptides (AMPs), also called host defense peptides, are an evolutionarily conserved component of the innate immune response and are found among all classes of life. According to their special functions, AMPs are generally classified into ten categories: Antibacterial Peptides, Anticancer/tumor Peptides, Antifungal Peptides, Anti-HIV Peptides, Antiviral Peptides, Antiparasital Peptides, Anti-protist Peptides, AMPs with Chemotactic Activity, Insecticidal Peptides, and Spermicidal Peptides. Given a query peptide, how can we identify whether it is an AMP or non-AMP? If it is, can we identify which functional type or types it belong to? Particularly, how can we deal with the multi-type problem since an AMP may belong to two or more functional types? To address these problems, which are obviously very important to both basic research and drug development, a multi-label classifier was developed based on the pseudo amino acid composition (PseAAC) and fuzzy K-nearest neighbor (FKNN) algorithm, where the components of PseAAC were featured by incorporating five physicochemical properties. The novel classifier is called iAMP-2L, where "2L" means that it is a 2-level predictor. The 1st-level is to answer the 1st question above, while the 2nd-level is to answer the 2nd and 3rd questions that are beyond the reach of any existing methods in this area. For the conveniences of users, a user-friendly web-server for iAMP-2L was established at http://www.jci-bioinfo.cn/iAMP-2L. Copyright © 2013 Elsevier Inc. All rights reserved.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                June 2023
                22 May 2023
                22 May 2023
                : 39
                : 6
                : btad334
                Affiliations
                Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University , Hefei, Anhui 230601, China
                Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University , Hefei, Anhui 230601, China
                Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University , Hefei, Anhui 230601, China
                Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University , Hefei, Anhui 230601, China
                Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University , Hefei, Anhui 230601, China
                Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University , Hefei, Anhui 230601, China
                Author notes
                Corresponding authors. Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China. E-mails: ynbin@ 123456ahu.edu.cn (Y.B.) and jfxia@ 123456ahu.edu.cn (J.X.)
                Author information
                https://orcid.org/0000-0001-6122-5930
                https://orcid.org/0000-0003-3024-1705
                Article
                btad334
                10.1093/bioinformatics/btad334
                10234765
                37216900
                d4de3fc5-aa5d-4c75-ab93-ae17d1fef81e
                © The Author(s) 2023. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 09 December 2022
                : 12 May 2023
                : 16 May 2023
                : 19 May 2023
                : 01 June 2023
                Page count
                Pages: 10
                Funding
                Funded by: National Natural Science Foundation of China, DOI 10.13039/501100001809;
                Award ID: 62272004
                Award ID: 11835014
                Funded by: National Key Research and Development Program of China, DOI 10.13039/501100012166;
                Award ID: 2020YFA0908700
                Categories
                Original Paper
                Sequence Analysis
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article