24
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Survey of BERT-Base Models for Scientific Text Classification: COVID-19 Case Study

      , , , ,
      Applied Sciences
      MDPI AG

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          On 30 January 2020, the World Health Organization announced a new coronavirus, which later turned out to be very dangerous. Since that date, COVID-19 has spread to become a pandemic that has now affected practically all regions in the world. Since then, many researchers in medicine have contributed to fighting COVID-19. In this context and given the great growth of scientific publications related to this global pandemic, manual text and data retrieval has become a challenging task. To remedy this challenge, we are proposing CovBERT, a pre-trained language model based on the BERT model to automate the literature review process. CovBERT relies on prior training on a large corpus of scientific publications in the biomedical domain and related to COVID-19 to increase its performance on the literature review task. We evaluate CovBERT on the classification of short text based on our scientific dataset of biomedical articles on COVID-19 entitled COV-Dat-20. We demonstrate statistically significant improvements by using BERT.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          BioBERT: a pre-trained biomedical language representation model for biomedical text mining

          Abstract Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Coronavirus Disease 2019 (COVID-19): A Perspective from China

            Abstract In December 2019, an outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection occurred in Wuhan, Hubei Province, China and spread across China and beyond. On February 12, 2020, WHO officially named the disease caused by the novel coronavirus as Coronavirus Disease 2019 (COVID-19). Since most COVID-19 infected patients were diagnosed with pneumonia and characteristic CT imaging patterns, radiological examinations have become vital in early diagnosis and assessment of disease course. To date, CT findings have been recommended as major evidence for clinical diagnosis of COVID-19 in Hubei, China. This review focuses on the etiology, epidemiology, and clinical symptoms of COVID-19, while highlighting the role of chest CT in prevention and disease control. A full translation of this article in Chinese is available in the supplement. - 请见䃼充资料阅读文章中文版∘
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Term-weighting approaches in automatic text retrieval

                Bookmark

                Author and article information

                Contributors
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                Journal
                ASPCC7
                Applied Sciences
                Applied Sciences
                MDPI AG
                2076-3417
                March 2022
                March 11 2022
                : 12
                : 6
                : 2891
                Article
                10.3390/app12062891
                515c920b-ed23-4389-a582-7fb6eef61a5f
                © 2022

                https://creativecommons.org/licenses/by/4.0/

                History

                Comments

                Comment on this article