11
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-2 sequences available to the community). We aim to fill this gap, by mining literature abstracts to extract—for each variant/mutation—its related effects (in epidemiological, immunological, clinical, or viral kinetics terms) with labeled higher/lower levels in relation to the nonmutated virus.

          Results

          The proposed framework comprises (i) the provisioning of abstracts from a COVID-19–related big data corpus (CORD-19) and (ii) the identification of mutation/variant effects in abstracts using a GPT2-based prediction model. The above techniques enable the prediction of mutations/variants with their effects and levels in 2 distinct scenarios: (i) the batch annotation of the most relevant CORD-19 abstracts and (ii) the on-demand annotation of any user-selected CORD-19 abstract through the CoVEffect web application ( http://gmql.eu/coveffect), which assists expert users with semiautomated data labeling. On the interface, users can inspect the predictions and correct them; user inputs can then extend the training dataset used by the prediction model. Our prototype model was trained through a carefully designed process, using a minimal and highly diversified pool of samples.

          Conclusions

          The CoVEffect interface serves for the assisted annotation of abstracts, allowing the download of curated datasets for further use in data integration or analysis pipelines. The overall framework can be adapted to resolve similar unstructured-to-structured text translation tasks, which are typical of biomedical domains.

          Related collections

          Most cited references69

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A new coronavirus associated with human respiratory disease in China

          Emerging infectious diseases, such as severe acute respiratory syndrome (SARS) and Zika virus disease, present a major threat to public health 1–3 . Despite intense research efforts, how, when and where new diseases appear are still a source of considerable uncertainty. A severe respiratory disease was recently reported in Wuhan, Hubei province, China. As of 25 January 2020, at least 1,975 cases had been reported since the first patient was hospitalized on 12 December 2019. Epidemiological investigations have suggested that the outbreak was associated with a seafood market in Wuhan. Here we study a single patient who was a worker at the market and who was admitted to the Central Hospital of Wuhan on 26 December 2019 while experiencing a severe respiratory syndrome that included fever, dizziness and a cough. Metagenomic RNA sequencing 4 of a sample of bronchoalveolar lavage fluid from the patient identified a new RNA virus strain from the family Coronaviridae, which is designated here ‘WH-Human 1’ coronavirus (and has also been referred to as ‘2019-nCoV’). Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that the virus was most closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) that had previously been found in bats in China 5 . This outbreak highlights the ongoing ability of viral spill-over from animals to cause severe disease in humans.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            NCBI GEO: archive for functional genomics data sets—update

            The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus

              Summary A SARS-CoV-2 variant carrying the Spike protein amino acid change D614G has become the most prevalent form in the global pandemic. Dynamic tracking of variant frequencies revealed a recurrent pattern of G614 increase at multiple geographic levels: national, regional and municipal. The shift occurred even in local epidemics where the original D614 form was well established prior to the introduction of the G614 variant. The consistency of this pattern was highly statistically significant, suggesting that the G614 variant may have a fitness advantage. We found that the G614 variant grows to higher titer as pseudotyped virions. In infected individuals G614 is associated with lower RT-PCR cycle thresholds, suggestive of higher upper respiratory tract viral loads, although not with increased disease severity. These findings illuminate changes important for a mechanistic understanding of the virus, and support continuing surveillance of Spike mutations to aid in the development of immunological interventions.
                Bookmark

                Author and article information

                Contributors
                Journal
                Gigascience
                Gigascience
                gigascience
                GigaScience
                Oxford University Press
                2047-217X
                23 May 2023
                2023
                23 May 2023
                : 12
                : giad036
                Affiliations
                Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy , Italy
                Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy , Italy
                Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy , Italy
                Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy , Italy
                Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy , Italy
                Author notes
                Correspondence address. Anna Bernasconi. Via Ponzio 34/5, 20133, Milano, Italy. E-mail: anna.bernasconi@ 123456polimi.it
                Author information
                https://orcid.org/0000-0002-5465-6182
                https://orcid.org/0000-0002-5645-5886
                https://orcid.org/0009-0002-5423-6978
                https://orcid.org/0000-0003-0671-2415
                https://orcid.org/0000-0001-8016-5750
                Article
                giad036
                10.1093/gigascience/giad036
                10205000
                37222749
                24d02387-2d2a-4306-a53b-277df578e379
                © The Author(s) 2023. Published by Oxford University Press GigaScience.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 05 December 2022
                : 11 April 2023
                : 27 April 2023
                Page count
                Pages: 15
                Funding
                Funded by: NextGenerationEU program;
                Categories
                Research
                AcademicSubjects/SCI00960
                AcademicSubjects/SCI02254

                deep learning,language models,machine learning interpretability,cord-19 dataset,sars-cov-2,viral variants,viral mutations,web interface

                Comments

                Comment on this article