Accelerating Evidence Synthesis in Observational Studies: Development of a Living Natural Language Processing–Assisted Intelligent Systematic Literature Review System

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Systematic literature review (SLR), a robust method to identify and summarize evidence from published sources, is considered to be a complex, time-consuming, labor-intensive, and expensive task.

Objective

This study aimed to present a solution based on natural language processing (NLP) that accelerates and streamlines the SLR process for observational studies using real-world data.

Methods

We followed an agile software development and iterative software engineering methodology to build a customized intelligent end-to-end living NLP-assisted solution for observational SLR tasks. Multiple machine learning–based NLP algorithms were adopted to automate article screening and data element extraction processes. The NLP prediction results can be further reviewed and verified by domain experts, following the human-in-the-loop design. The system integrates explainable articificial intelligence to provide evidence for NLP algorithms and add transparency to extracted literature data elements. The system was developed based on 3 existing SLR projects of observational studies, including the epidemiology studies of human papillomavirus–associated diseases, the disease burden of pneumococcal diseases, and cost-effectiveness studies on pneumococcal vaccines.

Results

Our Intelligent SLR Platform covers major SLR steps, including study protocol setting, literature retrieval, abstract screening, full-text screening, data element extraction from full-text articles, results summary, and data visualization. The NLP algorithms achieved accuracy scores of 0.86-0.90 on article screening tasks (framed as text classification tasks) and macroaverage F1 scores of 0.57-0.89 on data element extraction tasks (framed as named entity recognition tasks).

Conclusions

Cutting-edge NLP algorithms expedite SLR for observational studies, thus allowing scientists to have more time to focus on the quality of data and the synthesis of evidence in observational studies. Aligning the living SLR concept, the system has the potential to update literature data and enable scientists to easily stay current with the literature related to observational studies prospectively and continuously.

Related collections

Most cited references 25

Record: found
Abstract: not found
Conference Proceedings: not found

XGBoost

Tianqi Chen, Carlos Guestrin (2016)

0 comments Cited 2564 times – based on 0 reviews

Bookmark

Record: found
Abstract: found
Article: not found

What is a support vector machine?

William Noble (2006)

Support vector machines (SVMs) are becoming popular in a wide variety of biological applications. But, what exactly are SVMs and how do they work? And what are their most promising applications in the life sciences?

0 comments Cited 918 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim … (2019)

Abstract Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.

0 comments Cited 839 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Frank J Manion:

ORCID: http://orcid.org/0000-0003-1030-6348

Jingcheng Du:

ORCID: http://orcid.org/0000-0002-0322-4566

Dong Wang:

ORCID: http://orcid.org/0009-0003-3322-6284

Long He

Bin Lin

Jingqi Wang:

ORCID: http://orcid.org/0009-0005-4574-1448

Siwei Wang:

ORCID: http://orcid.org/0009-0008-3323-4803

David Eckels:

ORCID: http://orcid.org/0009-0006-4859-0953

Jan Cervenka:

ORCID: http://orcid.org/0009-0004-7341-6901

Peter C Fiduccia:

ORCID: http://orcid.org/0000-0001-9902-9476

Nicole Cossrow:

ORCID: http://orcid.org/0009-0006-4982-4547

Lixia Yao:

ORCID: http://orcid.org/0000-0002-5187-6120

Journal

Journal ID (nlm-ta): JMIR Med Inform

Journal ID (iso-abbrev): JMIR Med Inform

Journal ID (hwp): JMI

Journal ID (publisher-id): medinform

Journal ID (index): 7

Title: JMIR Medical Informatics

Publisher: JMIR Publications (Toronto, Canada )

ISSN (Electronic): 2291-9694

Publication date Collection: 2024

Publication date (Electronic): 23 October 2024

Volume: 12

Electronic Location Identifier: e54653

Affiliations

[1 ]IMO Health , 9600 W Bryn Mawr Ave # 100, Rosemont, IL, 60018, United States

[2 ]Merck & Co, Inc , 126 East Lincoln Ave, Rahway, NJ, United States, 1 619-643-2693

Author notes

DongWangPhD, Merck & Co, Inc, 126 East Lincoln Ave., Rahway, NJ, United States, 1 619-643-2693; dong.wang10@ 123456merck.com

DW, JC, DE, NC, PCF, and LY are employees of Merck Sharp & Dohme LLC, a subsidiary of Merck & Co., Inc., Rahway, NJ, USA. JD, BL, SW, XW, LH, JW, and FJM are employees of IMO.

Author information

Frank J Manion http://orcid.org/0000-0003-1030-6348

Jingcheng Du http://orcid.org/0000-0002-0322-4566

Dong Wang http://orcid.org/0009-0003-3322-6284

Jingqi Wang http://orcid.org/0009-0005-4574-1448

Siwei Wang http://orcid.org/0009-0008-3323-4803

David Eckels http://orcid.org/0009-0006-4859-0953

Jan Cervenka http://orcid.org/0009-0004-7341-6901

Peter C Fiduccia http://orcid.org/0000-0001-9902-9476

Nicole Cossrow http://orcid.org/0009-0006-4982-4547

Lixia Yao http://orcid.org/0000-0002-5187-6120

Article

Publisher ID: 54653

DOI: 10.2196/54653

PMC ID: 11523763

PubMed ID: 39441204

SO-VID: 36a9c3a6-1b67-479c-af00-2da501fc360d

Copyright © Copyright © Frank J Manion, Jingcheng Du, Dong Wang, Long He, Bin Lin, Jingqi Wang, Siwei Wang, David Eckels, Jan Cervenka, Peter C Fiduccia, Nicole Cossrow, Lixia Yao. Originally published in JMIR Medical Informatics (https://medinform.jmir.org)

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

History

Date received : 17 November 2023

Date revision received : 24 April 2024

Date accepted : 23 July 2024

Submit your digital health research with an established publisher
- celebrating 25 years of open access