0
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Accelerating Evidence Synthesis in Observational Studies: Development of a Living Natural Language Processing–Assisted Intelligent Systematic Literature Review System

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Systematic literature review (SLR), a robust method to identify and summarize evidence from published sources, is considered to be a complex, time-consuming, labor-intensive, and expensive task.

          Objective

          This study aimed to present a solution based on natural language processing (NLP) that accelerates and streamlines the SLR process for observational studies using real-world data.

          Methods

          We followed an agile software development and iterative software engineering methodology to build a customized intelligent end-to-end living NLP-assisted solution for observational SLR tasks. Multiple machine learning–based NLP algorithms were adopted to automate article screening and data element extraction processes. The NLP prediction results can be further reviewed and verified by domain experts, following the human-in-the-loop design. The system integrates explainable articificial intelligence to provide evidence for NLP algorithms and add transparency to extracted literature data elements. The system was developed based on 3 existing SLR projects of observational studies, including the epidemiology studies of human papillomavirus–associated diseases, the disease burden of pneumococcal diseases, and cost-effectiveness studies on pneumococcal vaccines.

          Results

          Our Intelligent SLR Platform covers major SLR steps, including study protocol setting, literature retrieval, abstract screening, full-text screening, data element extraction from full-text articles, results summary, and data visualization. The NLP algorithms achieved accuracy scores of 0.86-0.90 on article screening tasks (framed as text classification tasks) and macroaverage F1 scores of 0.57-0.89 on data element extraction tasks (framed as named entity recognition tasks).

          Conclusions

          Cutting-edge NLP algorithms expedite SLR for observational studies, thus allowing scientists to have more time to focus on the quality of data and the synthesis of evidence in observational studies. Aligning the living SLR concept, the system has the potential to update literature data and enable scientists to easily stay current with the literature related to observational studies prospectively and continuously.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          XGBoost

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            What is a support vector machine?

            Support vector machines (SVMs) are becoming popular in a wide variety of biological applications. But, what exactly are SVMs and how do they work? And what are their most promising applications in the life sciences?
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              BioBERT: a pre-trained biomedical language representation model for biomedical text mining

              Abstract Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.
                Bookmark

                Author and article information

                Contributors
                Journal
                JMIR Med Inform
                JMIR Med Inform
                JMI
                medinform
                7
                JMIR Medical Informatics
                JMIR Publications (Toronto, Canada )
                2291-9694
                2024
                23 October 2024
                : 12
                : e54653
                Affiliations
                [1 ]IMO Health , 9600 W Bryn Mawr Ave # 100, Rosemont, IL, 60018, United States
                [2 ]Merck & Co, Inc , 126 East Lincoln Ave, Rahway, NJ, United States, 1 619-643-2693
                Author notes
                DongWangPhD, Merck & Co, Inc, 126 East Lincoln Ave., Rahway, NJ, United States, 1 619-643-2693; dong.wang10@ 123456merck.com

                DW, JC, DE, NC, PCF, and LY are employees of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA. JD, BL, SW, XW, LH, JW, and FJM are employees of IMO.

                Author information
                http://orcid.org/0000-0003-1030-6348
                http://orcid.org/0000-0002-0322-4566
                http://orcid.org/0009-0003-3322-6284
                http://orcid.org/0009-0005-4574-1448
                http://orcid.org/0009-0008-3323-4803
                http://orcid.org/0009-0006-4859-0953
                http://orcid.org/0009-0004-7341-6901
                http://orcid.org/0000-0001-9902-9476
                http://orcid.org/0009-0006-4982-4547
                http://orcid.org/0000-0002-5187-6120
                Article
                54653
                10.2196/54653
                11523763
                39441204
                36a9c3a6-1b67-479c-af00-2da501fc360d
                Copyright © Frank J Manion, Jingcheng Du, Dong Wang, Long He, Bin Lin, Jingqi Wang, Siwei Wang, David Eckels, Jan Cervenka, Peter C Fiduccia, Nicole Cossrow, Lixia Yao. Originally published in JMIR Medical Informatics (https://medinform.jmir.org)

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

                History
                : 17 November 2023
                : 24 April 2024
                : 23 July 2024
                Categories
                Methods and Instruments in Medical Informatics
                Original Paper
                Natural Language Processing
                New Technologies
                Data Science
                Epublishing and Open Access
                Information Retrieval
                New Methods

                machine learning,deep learning,natural language processing,systematic literature review,artificial intelligence,software development,data extraction,epidemiology

                Comments

                Comment on this article