Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI.

Objective

We aimed to assess the reliability of medical information provided by ChatGPT.

Methods

Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT.

Results

Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Interrater agreement as measured by the Fleiss κ was 0.78 ( P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100%.

Conclusions

ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information.

Related collections

Most cited references 32

Record: found
Abstract: not found
Article: not found

EASL Clinical Practice Guidelines: Management of hepatocellular carcinoma

Valérie Vilgrain, Peter Galle, Alejandro Forner … (2018)

0 comments Cited 2550 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

Aidan Gilson, Conrad W Safranek, Thomas Huang … (2023)

Background Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user input. Objective This study aimed to evaluate the performance of ChatGPT on questions within the scope of the United States Medical Licensing Examination Step 1 and Step 2 exams, as well as to analyze responses for user interpretability. Methods We used 2 sets of multiple-choice questions to evaluate ChatGPT’s performance, each with questions pertaining to Step 1 and Step 2. The first set was derived from AMBOSS, a commonly used question bank for medical students, which also provides statistics on question difficulty and the performance on an exam relative to the user base. The second set was the National Board of Medical Examiners (NBME) free 120 questions. ChatGPT’s performance was compared to 2 other large language models, GPT-3 and InstructGPT. The text output of each ChatGPT response was evaluated across 3 qualitative metrics: logical justification of the answer selected, presence of information internal to the question, and presence of information external to the question. Results Of the 4 data sets, AMBOSS-Step1 , AMBOSS-Step2 , NBME-Free-Step1 , and NBME-Free-Step2 , ChatGPT achieved accuracies of 44% (44/100), 42% (42/100), 64.4% (56/87), and 57.8% (59/102), respectively. ChatGPT outperformed InstructGPT by 8.15% on average across all data sets, and GPT-3 performed similarly to random chance. The model demonstrated a significant decrease in performance as question difficulty increased ( P =.01) within the AMBOSS-Step1 data set. We found that logical justification for ChatGPT’s answer selection was present in 100% of outputs of the NBME data sets. Internal information to the question was present in 96.8% (183/189) of all questions. The presence of information external to the question was 44.5% and 27% lower for incorrect answers relative to correct answers on the NBME-Free-Step1 ( P <.001) and NBME-Free-Step2 ( P =.001) data sets, respectively. Conclusions ChatGPT marks a significant improvement in natural language processing models on the tasks of medical question answering. By performing at a greater than 60% threshold on the NBME-Free-Step-1 data set, we show that the model achieves the equivalent of a passing score for a third-year medical student. Additionally, we highlight ChatGPT’s capacity to provide logic and informational context across the majority of answers. These facts taken together make a compelling case for the potential applications of ChatGPT as an interactive medical education tool to support learning.

0 comments Cited 321 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

ChatGPT: five priorities for research

Eva A. M. van Dis, Johan Bollen, Willem Zuidema … (2023)

0 comments Cited 273 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Sebastian Manuel Staubli:

ORCID: https://orcid.org/0000-0002-0818-9835

Royal Free London NHS Foundation TrustPond StreetLondon, NW3 2QGUnited Kingdom44 20 7794 0500s.staubli@nhs.net

Journal

Journal ID (nlm-ta): J Med Internet Res

Journal ID (iso-abbrev): J Med Internet Res

Journal ID (publisher-id): JMIR

Title: Journal of Medical Internet Research

Publisher: JMIR Publications (Toronto, Canada )

ISSN (Print): 1439-4456

ISSN (Electronic): 1438-8871

Publication date Collection: 2023

Publication date (Electronic): 30 June 2023

Volume: 25

Electronic Location Identifier: e47479

Affiliations

[1 ] Royal Free London NHS Foundation Trust London United Kingdom

[2 ] Clarunis – University Center for Gastrointestinal and Liver Diseases Basel Switzerland

[3 ] Departement Chirurgie Kantonsspital Aarau Aarau Switzerland

[4 ] Organ Transplant Center of Excellence King Faisal Specialist Hospital & Research Centre Riyadh Saudi Arabia

Author notes

Corresponding Author: Sebastian Manuel Staubli s.staubli@ 123456nhs.net

Author information

Harriet Louise Walker https://orcid.org/0009-0009-6278-6399

Shahi Ghani https://orcid.org/0000-0001-7366-296X

Christoph Kuemmerli https://orcid.org/0000-0002-7109-3545

Christian Andreas Nebiker https://orcid.org/0000-0002-7493-2850

Beat Peter Müller https://orcid.org/0000-0002-8552-8538

Dimitri Aristotle Raptis https://orcid.org/0000-0002-0898-3270

Sebastian Manuel Staubli https://orcid.org/0000-0002-0818-9835

Article

Publisher ID: v25i1e47479

DOI: 10.2196/47479

PMC ID: 10365578

PubMed ID: 37389908

SO-VID: 0dc4c49f-99da-4999-8b15-8c3391269995

Copyright © ©Harriet Louise Walker, Shahi Ghani, Christoph Kuemmerli, Christian Andreas Nebiker, Beat Peter Müller, Dimitri Aristotle Raptis, Sebastian Manuel Staubli. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 30.06.2023.

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

History

Date received : 7 April 2023

Date revision requested : 30 May 2023

Date revision received : 7 June 2023

Date accepted : 15 June 2023

Comments

Comment on this article

scite_

Cited by 24

See all cited by

Most referenced authors 1,221

See all reference authors

Submit your digital health research with an established publisher
- celebrating 25 years of open access