Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study

Rao, Arya; Pang, Michael; Kim, John; Kamineni, Meghana; Lie, Winston; Prasad, Anoop K.; Landman, Adam; Dreyer, Keith Jay; Succi, Marc D

doi:10.2196/48659

ScienceOpen: research and publishing network

For Publishers

For Researchers

Blog
About

Search
Advanced search

views

recommends

Submit your digital health research with an established publisher
- celebrating 25 years of open access

Record: found
Abstract: found
Article: found

Is Open Access

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study

research-article

Author(s): Arya Rao , BA ¹ ^, ² ^, ³ , Michael Pang , BS ¹ ^, ² ^, ³ , John Kim , BA ¹ ^, ² ^, ³ , Meghana Kamineni , BS ¹ ^, ² ^, ³ , Winston Lie , BA, MSc ¹ ^, ² ^, ³ , Anoop K Prasad , MBBS ¹ ^, ² ^, ³ , Adam Landman , MD, MHS, MIS, MS ² ^, ⁴ , Keith Dreyer , DO, PhD ² ^, ⁵ , Marc D Succi , MD ¹ ^, ² ^, ³ ^, ⁶ ^,

Editor(s): Gunther Eysenbach , Taiane de Azevedo Cardoso

Other contributor(s): Yih-Dih Cheng (Reviewer)

Publication date (Electronic): 22 August 2023

Journal: Journal of Medical Internet Research

Publisher: JMIR Publications

Keywords: large language models, LLMs, artificial intelligence, AI, clinical decision support, clinical vignettes, ChatGPT, Generative Pre-trained Transformer, GPT, utility, development, usability, chatbot, accuracy, decision-making

Read this article at

ScienceOpenPublisher PMC

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Large language model (LLM)–based artificial intelligence chatbots direct the power of large training data sets toward successive, related tasks as opposed to single-ask tasks, for which artificial intelligence already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as artificial physicians, has not yet been evaluated.

Objective

This study aimed to evaluate ChatGPT’s capacity for ongoing clinical decision support via its performance on standardized clinical vignettes.

Methods

We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared its accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. Accuracy was measured by the proportion of correct responses to the questions posed within the clinical vignettes tested, as calculated by human scorers. We further conducted linear regression to assess the contributing factors toward ChatGPT’s performance on clinical tasks.

Results

ChatGPT achieved an overall accuracy of 71.7% (95% CI 69.3%-74.1%) across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI 67.8%-86.1%) and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI 54.2%-66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=–15.8%; P<.001) and clinical management (β=–7.4%; P=.02) question types.

Conclusions

ChatGPT achieves impressive accuracy in clinical decision-making, with increasing strength as it gains more clinical information at its disposal. In particular, ChatGPT demonstrates the greatest accuracy in tasks of final diagnosis as compared to initial diagnosis. Limitations include possible model hallucinations and the unclear composition of ChatGPT’s training data set.

Related collections

Most cited references 24

Record: found
Abstract: not found
Article: not found

Artificial intelligence in healthcare

Kun-Hsing Yu, Andrew Beam, Isaac Kohane (2018)

0 comments Cited 604 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

Tiffany H. Kung, Morgan Cheatham, Arielle Medenilla … (2023)

We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.

0 comments Cited 599 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Conference Proceedings: not found

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

Emily Bender, Timnit Gebru, Angelina McMillan-Major … (2021)

0 comments Cited 230 times – based on 0 reviews

Bookmark

All references

Author and article information

Contributors

Marc D Succi:

ORCID: https://orcid.org/0000-0002-1518-3984

Department of RadiologyMassachusetts General Hospital55 Fruit StreetBoston, MA, 02114United States1 617 935 9144msucci@partners.org

Journal

Journal ID (nlm-ta): J Med Internet Res

Journal ID (iso-abbrev): J Med Internet Res

Journal ID (publisher-id): JMIR

Title: Journal of Medical Internet Research

Publisher: JMIR Publications (Toronto, Canada )

ISSN (Print): 1439-4456

ISSN (Electronic): 1438-8871

Publication date Collection: 2023

Publication date (Electronic): 22 August 2023

Volume: 25

Electronic Location Identifier: e48659

Affiliations

[1 ] Medically Engineered Solutions in Healthcare Incubator Innovation in Operations Research Center (MESH IO) Massachusetts General Hospital Boston, MA United States

[2 ] Harvard Medical School Boston, MA United States

[3 ] Department of Radiology Massachusetts General Hospital Boston, MA United States

[4 ] Department of Radiology Brigham and Women's Hospital Boston, MA United States

[5 ] Data Science Office Mass General Brigham Boston, MA United States

[6 ] Mass General Brigham Innovation Mass General Brigham Boston, MA United States

Author notes

Corresponding Author: Marc D Succi msucci@ 123456partners.org

Author information

Arya Rao https://orcid.org/0000-0003-3007-4812

Michael Pang https://orcid.org/0000-0001-5619-9344

John Kim https://orcid.org/0000-0003-4252-5916

Meghana Kamineni https://orcid.org/0000-0002-6698-5151

Winston Lie https://orcid.org/0009-0002-0939-7449

Anoop K Prasad https://orcid.org/0000-0002-4409-6062

Adam Landman https://orcid.org/0000-0002-2166-0521

Keith Dreyer https://orcid.org/0000-0003-1207-6443

Marc D Succi https://orcid.org/0000-0002-1518-3984

Article

Publisher ID: v25i1e48659

DOI: 10.2196/48659

PMC ID: 10481210

PubMed ID: 37606976

SO-VID: 00f87557-2d1f-4e81-9f2f-55d6cd475032

Copyright © ©Arya Rao, Michael Pang, John Kim, Meghana Kamineni, Winston Lie, Anoop K Prasad, Adam Landman, Keith Dreyer, Marc D Succi. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 22.08.2023.

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

History

Date received : 2 May 2023

Date revision requested : 7 July 2023

Date revision received : 26 July 2023

Date accepted : 27 July 2023

Comments

Comment on this article

scite_

Cited by 27

See all cited by

Most referenced authors 253

See all reference authors

- Version 1

Submit your digital health research with an established publisher
- celebrating 25 years of open access