18
views
0
recommends
+1 Recommend
2 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study

      research-article

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Large language model (LLM)–based artificial intelligence chatbots direct the power of large training data sets toward successive, related tasks as opposed to single-ask tasks, for which artificial intelligence already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as artificial physicians, has not yet been evaluated.

          Objective

          This study aimed to evaluate ChatGPT’s capacity for ongoing clinical decision support via its performance on standardized clinical vignettes.

          Methods

          We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared its accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. Accuracy was measured by the proportion of correct responses to the questions posed within the clinical vignettes tested, as calculated by human scorers. We further conducted linear regression to assess the contributing factors toward ChatGPT’s performance on clinical tasks.

          Results

          ChatGPT achieved an overall accuracy of 71.7% (95% CI 69.3%-74.1%) across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI 67.8%-86.1%) and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI 54.2%-66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=–15.8%; P<.001) and clinical management (β=–7.4%; P=.02) question types.

          Conclusions

          ChatGPT achieves impressive accuracy in clinical decision-making, with increasing strength as it gains more clinical information at its disposal. In particular, ChatGPT demonstrates the greatest accuracy in tasks of final diagnosis as compared to initial diagnosis. Limitations include possible model hallucinations and the unclear composition of ChatGPT’s training data set.

          Related collections

          Most cited references24

          • Record: found
          • Abstract: not found
          • Article: not found

          Artificial intelligence in healthcare

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

            We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.
              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

                Bookmark

                Author and article information

                Contributors
                Journal
                J Med Internet Res
                J Med Internet Res
                JMIR
                Journal of Medical Internet Research
                JMIR Publications (Toronto, Canada )
                1439-4456
                1438-8871
                2023
                22 August 2023
                : 25
                : e48659
                Affiliations
                [1 ] Medically Engineered Solutions in Healthcare Incubator Innovation in Operations Research Center (MESH IO) Massachusetts General Hospital Boston, MA United States
                [2 ] Harvard Medical School Boston, MA United States
                [3 ] Department of Radiology Massachusetts General Hospital Boston, MA United States
                [4 ] Department of Radiology Brigham and Women's Hospital Boston, MA United States
                [5 ] Data Science Office Mass General Brigham Boston, MA United States
                [6 ] Mass General Brigham Innovation Mass General Brigham Boston, MA United States
                Author notes
                Corresponding Author: Marc D Succi msucci@ 123456partners.org
                Author information
                https://orcid.org/0000-0003-3007-4812
                https://orcid.org/0000-0001-5619-9344
                https://orcid.org/0000-0003-4252-5916
                https://orcid.org/0000-0002-6698-5151
                https://orcid.org/0009-0002-0939-7449
                https://orcid.org/0000-0002-4409-6062
                https://orcid.org/0000-0002-2166-0521
                https://orcid.org/0000-0003-1207-6443
                https://orcid.org/0000-0002-1518-3984
                Article
                v25i1e48659
                10.2196/48659
                10481210
                37606976
                00f87557-2d1f-4e81-9f2f-55d6cd475032
                ©Arya Rao, Michael Pang, John Kim, Meghana Kamineni, Winston Lie, Anoop K Prasad, Adam Landman, Keith Dreyer, Marc D Succi. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 22.08.2023.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

                History
                : 2 May 2023
                : 7 July 2023
                : 26 July 2023
                : 27 July 2023
                Categories
                Original Paper
                Original Paper

                Medicine
                large language models,llms,artificial intelligence,ai,clinical decision support,clinical vignettes,chatgpt,generative pre-trained transformer,gpt,utility,development,usability,chatbot,accuracy,decision-making

                Comments

                Comment on this article