ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

Recently, peptides have emerged as a promising class of pharmaceuticals for various diseases treatment poised between traditional small molecule drugs and therapeutic proteins. However, one of the key bottlenecks preventing them from therapeutic peptides is their toxicity toward human cells, and few available algorithms for predicting toxicity are specially designed for short-length peptides.

Results

We present ToxIBTL, a novel deep learning framework by utilizing the information bottleneck principle and transfer learning to predict the toxicity of peptides as well as proteins. Specifically, we use evolutionary information and physicochemical properties of peptide sequences and integrate the information bottleneck principle into a feature representation learning scheme, by which relevant information is retained and the redundant information is minimized in the obtained features. Moreover, transfer learning is introduced to transfer the common knowledge contained in proteins to peptides, which aims to improve the feature representation capability. Extensive experimental results demonstrate that ToxIBTL not only achieves a higher prediction performance than state-of-the-art methods on the peptide dataset, but also has a competitive performance on the protein dataset. Furthermore, a user-friendly online web server is established as the implementation of the proposed ToxIBTL.

Availability and implementation

The proposed ToxIBTL and data can be freely accessible at http://server.wei-group.net/ToxIBTL. Our source code is available at https://github.com/WLYLab/ToxIBTL.

Supplementary information

Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 40

Record: found
Abstract: found
Article: not found

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

S Altschul (1997)

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

0 comments Cited 4221 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

InterProScan 5: genome-scale protein function classification

Philip Jones, David Binns, Hsin-Yu Chang … (2014)

Motivation: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code. Availability and implementation: InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/. Contact: http://www.ebi.ac.uk/support or interhelp@ebi.ac.uk or mitchell@ebi.ac.uk

0 comments Cited 2070 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The Pfam protein families database in 2019

Sara El-Gebali, Jaina Mistry, Alex Bateman … (2018)

Abstract The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors’ ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.

0 comments Cited 1550 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Lesong Wei: (View ORCID Profile)

Journal

Title: Bioinformatics

Publisher: Oxford University Press (OUP)

ISSN (Print): 1367-4803

ISSN (Electronic): 1460-2059

Publication date Created: March 15 2022

Publication date Created: March 04 2022

Publication date Created: January 06 2022

Publication date Other: March 15 2022

Publication date (Print): March 04 2022

Publication date (Electronic): January 06 2022

Volume: 38

Issue: 6

Pages: 1514-1524

Affiliations

[1 ]Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan

[2 ]School of Mathematics and Statistics, Shandong University, Weihai, China

[3 ]School of Software, Shandong University, Jinan, China

Article

DOI: 10.1093/bioinformatics/btac006

PubMed ID: 34999757

SO-VID: 5202d7f8-7442-40b4-8c89-a40a0f2839aa

License:

https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model

ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning

Read this article at

Abstract

Motivation

Results

Availability and implementation

Supplementary information

Related collections

Teaching and learning evolution

Most cited references 40

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

InterProScan 5: genome-scale protein function classification

The Pfam protein families database in 2019

Author and article information

Contributors

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 120

Cited by 20

Most referenced authors 2,045