3
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The COVID-19 pandemic is marked by the successive emergence of new SARS-CoV-2 variants, lineages, and sublineages that outcompete earlier strains, largely due to factors like increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system, to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute >10% of all the viral sequences added to the GISAID, a public database supporting viral genetic sequence sharing, in a given week. DeepAutoCoV is trained and validated by assembling global and country-specific data sets from over 16 million Spike protein sequences sampled over a period of ~4 years. DeepAutoCoV successfully flags FDLs at very low frequencies (0.01%–3%), with median lead times of 4–17 weeks, and predicts FDLs between ~5 and ~25 times better than a baseline approach. For example, the B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than a year before it was considered for an updated COVID-19 vaccine. Furthermore, DeepAutoCoV outputs interpretable results by pinpointing specific mutations potentially linked to increased fitness and may provide significant insights for the optimization of public health ‘pre-emptive’ intervention strategies.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A pneumonia outbreak associated with a new coronavirus of probable bat origin

          Since the outbreak of severe acute respiratory syndrome (SARS) 18 years ago, a large number of SARS-related coronaviruses (SARSr-CoVs) have been discovered in their natural reservoir host, bats 1–4 . Previous studies have shown that some bat SARSr-CoVs have the potential to infect humans 5–7 . Here we report the identification and characterization of a new coronavirus (2019-nCoV), which caused an epidemic of acute respiratory syndrome in humans in Wuhan, China. The epidemic, which started on 12 December 2019, had caused 2,794 laboratory-confirmed infections including 80 deaths by 26 January 2020. Full-length genome sequences were obtained from five patients at an early stage of the outbreak. The sequences are almost identical and share 79.6% sequence identity to SARS-CoV. Furthermore, we show that 2019-nCoV is 96% identical at the whole-genome level to a bat coronavirus. Pairwise protein sequence analysis of seven conserved non-structural proteins domains show that this virus belongs to the species of SARSr-CoV. In addition, 2019-nCoV virus isolated from the bronchoalveolar lavage fluid of a critically ill patient could be neutralized by sera from several patients. Notably, we confirmed that 2019-nCoV uses the same cell entry receptor—angiotensin converting enzyme II (ACE2)—as SARS-CoV.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A new coronavirus associated with human respiratory disease in China

            Emerging infectious diseases, such as severe acute respiratory syndrome (SARS) and Zika virus disease, present a major threat to public health 1–3 . Despite intense research efforts, how, when and where new diseases appear are still a source of considerable uncertainty. A severe respiratory disease was recently reported in Wuhan, Hubei province, China. As of 25 January 2020, at least 1,975 cases had been reported since the first patient was hospitalized on 12 December 2019. Epidemiological investigations have suggested that the outbreak was associated with a seafood market in Wuhan. Here we study a single patient who was a worker at the market and who was admitted to the Central Hospital of Wuhan on 26 December 2019 while experiencing a severe respiratory syndrome that included fever, dizziness and a cough. Metagenomic RNA sequencing 4 of a sample of bronchoalveolar lavage fluid from the patient identified a new RNA virus strain from the family Coronaviridae, which is designated here ‘WH-Human 1’ coronavirus (and has also been referred to as ‘2019-nCoV’). Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that the virus was most closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) that had previously been found in bats in China 5 . This outbreak highlights the ongoing ability of viral spill-over from animals to cause severe disease in humans.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              SARS-CoV-2 variants, spike mutations and immune escape

              Although most mutations in the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome are expected to be either deleterious and swiftly purged or relatively neutral, a small proportion will affect functional properties and may alter infectivity, disease severity or interactions with host immunity. The emergence of SARS-CoV-2 in late 2019 was followed by a period of relative evolutionary stasis lasting about 11 months. Since late 2020, however, SARS-CoV-2 evolution has been characterized by the emergence of sets of mutations, in the context of ‘variants of concern’, that impact virus characteristics, including transmissibility and antigenicity, probably in response to the changing immune profile of the human population. There is emerging evidence of reduced neutralization of some SARS-CoV-2 variants by postvaccination serum; however, a greater understanding of correlates of protection is required to evaluate how this may impact vaccine effectiveness. Nonetheless, manufacturers are preparing platforms for a possible update of vaccine sequences, and it is crucial that surveillance of genetic and antigenic changes in the global virus population is done alongside experiments to elucidate the phenotypic impacts of mutations. In this Review, we summarize the literature on mutations of the SARS-CoV-2 spike protein, the primary antigen, focusing on their impacts on antigenicity and contextualizing them in the protein structure, and discuss them in the context of observed mutation frequencies in global sequence datasets. The evolution of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been characterized by the emergence of mutations and so-called variants of concern that impact virus characteristics, including transmissibility and antigenicity. In this Review, members of the COVID-19 Genomics UK (COG-UK) Consortium and colleagues summarize mutations of the SARS-CoV-2 spike protein, focusing on their impacts on antigenicity and contextualizing them in the protein structure, and discuss them in the context of observed mutation frequencies in global sequence datasets.
                Bookmark

                Author and article information

                Contributors
                Journal
                Brief Bioinform
                Brief Bioinform
                bib
                Briefings in Bioinformatics
                Oxford University Press
                1467-5463
                1477-4054
                November 2024
                24 October 2024
                24 October 2024
                : 25
                : 6
                : bbae535
                Affiliations
                Department of Electrical , Computer and Biomedical Engineering, University of Pavia , Via Adolfo Ferrata 5, Pavia, 27100, Italy
                Department of Electrical , Computer and Biomedical Engineering, University of Pavia , Via Adolfo Ferrata 5, Pavia, 27100, Italy
                Department of Epidemiology , College of Public Health and Health Professions, University of Florida , 2004 Mowry Road, Gainesville, FL 32610, United States
                Emerging Pathogens Institute, University of Florida , 2055 Mowry Road, Gainesville, FL 32610, United States
                Department of Electrical , Computer and Biomedical Engineering, University of Pavia , Via Adolfo Ferrata 5, Pavia, 27100, Italy
                Emerging Pathogens Institute, University of Florida , 2055 Mowry Road, Gainesville, FL 32610, United States
                Department of Pathology , Immunology and Laboratory Medicine, College of Medicine, University of Florida , 1600 SW Archer Road, Gainesville, FL 32610, United States
                Department of Epidemiology , College of Public Health and Health Professions, University of Florida , 2004 Mowry Road, Gainesville, FL 32610, United States
                Emerging Pathogens Institute, University of Florida , 2055 Mowry Road, Gainesville, FL 32610, United States
                Author notes
                Corresponding authors. Marco Salemi, Emerging Pathogens Institute, University of Florida, 2055 Mowry Rd., Gainesville, FL 32610, United States. E-mail: salemi@ 123456pathology.ufl.edu ; Simone Marini, Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States. E-mail: simone.marini@ 123456ufl.edu ; Riccardo Bellazzi, Department of Industrial Engineering & Information, University of Pavia, IT-27100 Pavia, Italy. E-mail: riccardo.bellazzi@ 123456unipv.it
                Author information
                https://orcid.org/0009-0002-4405-1697
                https://orcid.org/0000-0002-6974-9808
                https://orcid.org/0000-0002-5704-3533
                Article
                bbae535
                10.1093/bib/bbae535
                11500442
                39446192
                5ff35147-e1e1-47d6-8fe2-9638ef06b7cf
                © The Author(s) 2024. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 25 July 2024
                : 10 September 2024
                : 08 October 2024
                Page count
                Pages: 8
                Funding
                Funded by: National Institutes of Health, DOI 10.13039/100000002;
                Funded by: NIH NIAID;
                Award ID: R01 AI170187
                Categories
                Problem Solving Protocol
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                sars-cov-2 (sub)lineages,deep learning,anomaly detection,spike protein sequences,genomic surveillance

                Comments

                Comment on this article