DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

Infectious diseases caused by novel viruses have become a major public health concern. Rapid identification of virus–host interactions can reveal mechanistic insights into infectious diseases and shed light on potential treatments. Current computational prediction methods for novel viruses are based mainly on protein sequences. However, it is not clear to what extent other important features, such as the symptoms caused by the viruses, could contribute to a predictor. Disease phenotypes (i.e. signs and symptoms) are readily accessible from clinical diagnosis and we hypothesize that they may act as a potential proxy and an additional source of information for the underlying molecular interactions between the pathogens and hosts.

Results

We developed DeepViral, a deep learning based method that predicts protein–protein interactions (PPI) between humans and viruses. Motivated by the potential utility of infectious disease phenotypes, we first embedded human proteins and viruses in a shared space using their associated phenotypes and functions, supported by formalized background knowledge from biomedical ontologies. By jointly learning from protein sequences and phenotype features, DeepViral significantly improves over existing sequence-based methods for intra- and inter-species PPI prediction.

Availability and implementation

Code and datasets for reproduction and customization are available at https://github.com/bio-ontology-research-group/DeepViral. Prediction results for 14 virus families are available at https://doi.org/10.5281/zenodo.4429824.

Supplementary information

Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 65

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15636 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

An interactive web-based dashboard to track COVID-19 in real time

Ensheng Dong, Hongru Du, Lauren Gardner (2020)

In December, 2019, a local outbreak of pneumonia of initially unknown cause was detected in Wuhan (Hubei, China), and was quickly determined to be caused by a novel coronavirus, 1 namely severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The outbreak has since spread to every province of mainland China as well as 27 other countries and regions, with more than 70 000 confirmed cases as of Feb 17, 2020. 2 In response to this ongoing public health emergency, we developed an online interactive dashboard, hosted by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, Baltimore, MD, USA, to visualise and track reported cases of coronavirus disease 2019 (COVID-19) in real time. The dashboard, first shared publicly on Jan 22, illustrates the location and number of confirmed COVID-19 cases, deaths, and recoveries for all affected countries. It was developed to provide researchers, public health authorities, and the general public with a user-friendly tool to track the outbreak as it unfolds. All data collected and displayed are made freely available, initially through Google Sheets and now through a GitHub repository, along with the feature layers of the dashboard, which are now included in the Esri Living Atlas. The dashboard reports cases at the province level in China; at the city level in the USA, Australia, and Canada; and at the country level otherwise. During Jan 22–31, all data collection and processing were done manually, and updates were typically done twice a day, morning and night (US Eastern Time). As the outbreak evolved, the manual reporting process became unsustainable; therefore, on Feb 1, we adopted a semi-automated living data stream strategy. Our primary data source is DXY, an online platform run by members of the Chinese medical community, which aggregates local media and government reports to provide cumulative totals of COVID-19 cases in near real time at the province level in China and at the country level otherwise. Every 15 min, the cumulative case counts are updated from DXY for all provinces in China and for other affected countries and regions. For countries and regions outside mainland China (including Hong Kong, Macau, and Taiwan), we found DXY cumulative case counts to frequently lag behind other sources; we therefore manually update these case numbers throughout the day when new cases are identified. To identify new cases, we monitor various Twitter feeds, online news services, and direct communication sent through the dashboard. Before manually updating the dashboard, we confirm the case numbers with regional and local health departments, including the respective centres for disease control and prevention (CDC) of China, Taiwan, and Europe, the Hong Kong Department of Health, the Macau Government, and WHO, as well as city-level and state-level health authorities. For city-level case reports in the USA, Australia, and Canada, which we began reporting on Feb 1, we rely on the US CDC, the government of Canada, the Australian Government Department of Health, and various state or territory health authorities. All manual updates (for countries and regions outside mainland China) are coordinated by a team at Johns Hopkins University. The case data reported on the dashboard aligns with the daily Chinese CDC 3 and WHO situation reports 2 for within and outside of mainland China, respectively (figure ). Furthermore, the dashboard is particularly effective at capturing the timing of the first reported case of COVID-19 in new countries or regions (appendix). With the exception of Australia, Hong Kong, and Italy, the CSSE at Johns Hopkins University has reported newly infected countries ahead of WHO, with Hong Kong and Italy reported within hours of the corresponding WHO situation report. Figure Comparison of COVID-19 case reporting from different sources Daily cumulative case numbers (starting Jan 22, 2020) reported by the Johns Hopkins University Center for Systems Science and Engineering (CSSE), WHO situation reports, and the Chinese Center for Disease Control and Prevention (Chinese CDC) for within (A) and outside (B) mainland China. Given the popularity and impact of the dashboard to date, we plan to continue hosting and managing the tool throughout the entirety of the COVID-19 outbreak and to build out its capabilities to establish a standing tool to monitor and report on future outbreaks. We believe our efforts are crucial to help inform modelling efforts and control measures during the earliest stages of the outbreak.

0 comments Cited 4666 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

UniProt: a worldwide hub of protein knowledge

(2018)

Abstract The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life. Detailed annotations extracted from the literature by expert curators have been collected for over half a million of these proteins. These annotations are supplemented by annotations provided by rule based automated systems, and those imported from other resources. In this article we describe significant updates that we have made over the last 2 years to the resource. We have greatly expanded the number of Reference Proteomes that we provide and in particular we have focussed on improving the number of viral Reference Proteomes. The UniProt website has been augmented with new data visualizations for the subcellular localization of proteins as well as their structure and interactions. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.

0 comments Cited 2499 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Peter Robinson: Role: Associate Editor

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date Collection: 01 September 2021

Publication date (Electronic): 03 March 2021

Publication date PMC-release: 03 March 2021

Volume: 37

Issue: 17

Pages: 2722-2729

Affiliations

[1 ]Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology , Thuwal 23955, Saudi Arabia

[2 ]Computational Bioscience Research Center, King Abdullah University of Science and Technology , Thuwal 23955, Saudi Arabia

[3 ]Institute of Biological, Environmental and Rural Sciences, Aberystwyth University , Wales SY23 3BQ, UK

[4 ]Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology , Thuwal 23955, Saudi Arabia

Author notes

To whom correspondence should be addressed. robert.hoehndorf@ 123456kaust.edu.sa

Author information

Nicholas J. Dimonaco https://orcid.org/0000-0002-3808-206X

Robert Hoehndorf https://orcid.org/0000-0001-8149-5890

Article

Publisher ID: btab147

DOI: 10.1093/bioinformatics/btab147

PMC ID: 8428617

PubMed ID: 33682875

SO-VID: 2201cc99-d4bd-4697-91d8-c58ca73029d1

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 13 August 2020

Date revision received : 18 January 2021

Date: 28 February 2021

Date accepted : 01 March 2021

Page count

Pages: 8

Funding

Funded by: King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR);

Award ID: URF/1/3790-01-01

Comments

Comment on this article

scite_

Cited by 26

See all cited by

Most referenced authors 4,059

See all reference authors

DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes

Read this article at

Abstract

Motivation

Results

Availability and implementation

Supplementary information

Related collections

Genetoberfest

Most cited references 65

Gene Ontology: tool for the unification of biology

An interactive web-based dashboard to track COVID-19 in real time

UniProt: a worldwide hub of protein knowledge

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 59

Cited by 26

Most referenced authors 4,059