2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Infectious diseases caused by novel viruses have become a major public health concern. Rapid identification of virus–host interactions can reveal mechanistic insights into infectious diseases and shed light on potential treatments. Current computational prediction methods for novel viruses are based mainly on protein sequences. However, it is not clear to what extent other important features, such as the symptoms caused by the viruses, could contribute to a predictor. Disease phenotypes (i.e. signs and symptoms) are readily accessible from clinical diagnosis and we hypothesize that they may act as a potential proxy and an additional source of information for the underlying molecular interactions between the pathogens and hosts.

          Results

          We developed DeepViral, a deep learning based method that predicts protein–protein interactions (PPI) between humans and viruses. Motivated by the potential utility of infectious disease phenotypes, we first embedded human proteins and viruses in a shared space using their associated phenotypes and functions, supported by formalized background knowledge from biomedical ontologies. By jointly learning from protein sequences and phenotype features, DeepViral significantly improves over existing sequence-based methods for intra- and inter-species PPI prediction.

          Availability and implementation

          Code and datasets for reproduction and customization are available at https://github.com/bio-ontology-research-group/DeepViral. Prediction results for 14 virus families are available at https://doi.org/10.5281/zenodo.4429824.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references65

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            An interactive web-based dashboard to track COVID-19 in real time

            In December, 2019, a local outbreak of pneumonia of initially unknown cause was detected in Wuhan (Hubei, China), and was quickly determined to be caused by a novel coronavirus, 1 namely severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The outbreak has since spread to every province of mainland China as well as 27 other countries and regions, with more than 70 000 confirmed cases as of Feb 17, 2020. 2 In response to this ongoing public health emergency, we developed an online interactive dashboard, hosted by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, Baltimore, MD, USA, to visualise and track reported cases of coronavirus disease 2019 (COVID-19) in real time. The dashboard, first shared publicly on Jan 22, illustrates the location and number of confirmed COVID-19 cases, deaths, and recoveries for all affected countries. It was developed to provide researchers, public health authorities, and the general public with a user-friendly tool to track the outbreak as it unfolds. All data collected and displayed are made freely available, initially through Google Sheets and now through a GitHub repository, along with the feature layers of the dashboard, which are now included in the Esri Living Atlas. The dashboard reports cases at the province level in China; at the city level in the USA, Australia, and Canada; and at the country level otherwise. During Jan 22–31, all data collection and processing were done manually, and updates were typically done twice a day, morning and night (US Eastern Time). As the outbreak evolved, the manual reporting process became unsustainable; therefore, on Feb 1, we adopted a semi-automated living data stream strategy. Our primary data source is DXY, an online platform run by members of the Chinese medical community, which aggregates local media and government reports to provide cumulative totals of COVID-19 cases in near real time at the province level in China and at the country level otherwise. Every 15 min, the cumulative case counts are updated from DXY for all provinces in China and for other affected countries and regions. For countries and regions outside mainland China (including Hong Kong, Macau, and Taiwan), we found DXY cumulative case counts to frequently lag behind other sources; we therefore manually update these case numbers throughout the day when new cases are identified. To identify new cases, we monitor various Twitter feeds, online news services, and direct communication sent through the dashboard. Before manually updating the dashboard, we confirm the case numbers with regional and local health departments, including the respective centres for disease control and prevention (CDC) of China, Taiwan, and Europe, the Hong Kong Department of Health, the Macau Government, and WHO, as well as city-level and state-level health authorities. For city-level case reports in the USA, Australia, and Canada, which we began reporting on Feb 1, we rely on the US CDC, the government of Canada, the Australian Government Department of Health, and various state or territory health authorities. All manual updates (for countries and regions outside mainland China) are coordinated by a team at Johns Hopkins University. The case data reported on the dashboard aligns with the daily Chinese CDC 3 and WHO situation reports 2 for within and outside of mainland China, respectively (figure ). Furthermore, the dashboard is particularly effective at capturing the timing of the first reported case of COVID-19 in new countries or regions (appendix). With the exception of Australia, Hong Kong, and Italy, the CSSE at Johns Hopkins University has reported newly infected countries ahead of WHO, with Hong Kong and Italy reported within hours of the corresponding WHO situation report. Figure Comparison of COVID-19 case reporting from different sources Daily cumulative case numbers (starting Jan 22, 2020) reported by the Johns Hopkins University Center for Systems Science and Engineering (CSSE), WHO situation reports, and the Chinese Center for Disease Control and Prevention (Chinese CDC) for within (A) and outside (B) mainland China. Given the popularity and impact of the dashboard to date, we plan to continue hosting and managing the tool throughout the entirety of the COVID-19 outbreak and to build out its capabilities to establish a standing tool to monitor and report on future outbreaks. We believe our efforts are crucial to help inform modelling efforts and control measures during the earliest stages of the outbreak.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              UniProt: a worldwide hub of protein knowledge

              (2018)
              Abstract The UniProt Knowledgebase is a collection of sequences and annotations for over 120 million proteins across all branches of life. Detailed annotations extracted from the literature by expert curators have been collected for over half a million of these proteins. These annotations are supplemented by annotations provided by rule based automated systems, and those imported from other resources. In this article we describe significant updates that we have made over the last 2 years to the resource. We have greatly expanded the number of Reference Proteomes that we provide and in particular we have focussed on improving the number of viral Reference Proteomes. The UniProt website has been augmented with new data visualizations for the subcellular localization of proteins as well as their structure and interactions. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 September 2021
                03 March 2021
                03 March 2021
                : 37
                : 17
                : 2722-2729
                Affiliations
                [1 ]Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology , Thuwal 23955, Saudi Arabia
                [2 ]Computational Bioscience Research Center, King Abdullah University of Science and Technology , Thuwal 23955, Saudi Arabia
                [3 ]Institute of Biological, Environmental and Rural Sciences, Aberystwyth University , Wales SY23 3BQ, UK
                [4 ]Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology , Thuwal 23955, Saudi Arabia
                Author notes
                To whom correspondence should be addressed. robert.hoehndorf@ 123456kaust.edu.sa
                Author information
                https://orcid.org/0000-0002-3808-206X
                https://orcid.org/0000-0001-8149-5890
                Article
                btab147
                10.1093/bioinformatics/btab147
                8428617
                33682875
                2201cc99-d4bd-4697-91d8-c58ca73029d1
                © The Author(s) 2021. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 13 August 2020
                : 18 January 2021
                : 28 February 2021
                : 01 March 2021
                Page count
                Pages: 8
                Funding
                Funded by: King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR);
                Award ID: URF/1/3790-01-01
                Categories
                Original Papers
                Databases and Ontologies
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article