1
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Evaluating ChatGPT-4’s Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The potential of artificial intelligence (AI) chatbots, particularly ChatGPT with GPT-4 (OpenAI), in assisting with medical diagnosis is an emerging research area. However, it is not yet clear how well AI chatbots can evaluate whether the final diagnosis is included in differential diagnosis lists.

          Objective

          This study aims to assess the capability of GPT-4 in identifying the final diagnosis from differential-diagnosis lists and to compare its performance with that of physicians for case report series.

          Methods

          We used a database of differential-diagnosis lists from case reports in the American Journal of Case Reports, corresponding to final diagnoses. These lists were generated by 3 AI systems: GPT-4, Google Bard (currently Google Gemini), and Large Language Models by Meta AI 2 (LLaMA2). The primary outcome was focused on whether GPT-4’s evaluations identified the final diagnosis within these lists. None of these AIs received additional medical training or reinforcement. For comparison, 2 independent physicians also evaluated the lists, with any inconsistencies resolved by another physician.

          Results

          The 3 AIs generated a total of 1176 differential diagnosis lists from 392 case descriptions. GPT-4’s evaluations concurred with those of the physicians in 966 out of 1176 lists (82.1%). The Cohen κ coefficient was 0.63 (95% CI 0.56-0.69), indicating a fair to good agreement between GPT-4 and the physicians’ evaluations.

          Conclusions

          GPT-4 demonstrated a fair to good agreement in identifying the final diagnosis from differential-diagnosis lists, comparable to physicians for case report series. Its ability to compare differential diagnosis lists with final diagnoses suggests its potential to aid clinical decision-making support through diagnostic feedback. While GPT-4 showed a fair to good agreement for evaluation, its application in real-world scenarios and further validation in diverse clinical environments are essential to fully understand its utility in the diagnostic process.

          Related collections

          Most cited references46

          • Record: found
          • Abstract: not found
          • Article: not found

          The Power of Feedback

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            An overview of clinical decision support systems: benefits, risks, and strategies for success

            Computerized clinical decision support systems, or CDSS, represent a paradigm shift in healthcare today. CDSS are used to augment clinicians in their complex decision-making processes. Since their first use in the 1980s, CDSS have seen a rapid evolution. They are now commonly administered through electronic medical records and other computerized clinical workflows, which has been facilitated by increasing global adoption of electronic medical records with advanced capabilities. Despite these advances, there remain unknowns regarding the effect CDSS have on the providers who use them, patient outcomes, and costs. There have been numerous published examples in the past decade(s) of CDSS success stories, but notable setbacks have also shown us that CDSS are not without risks. In this paper, we provide a state-of-the-art overview on the use of clinical decision support systems in medicine, including the different types, current use cases with proven efficacy, common pitfalls, and potential harms. We conclude with evidence-based recommendations for minimizing risk in CDSS design, implementation, evaluation, and maintenance.
              Bookmark
              • Record: found
              • Abstract: not found
              • Book: not found

              Statistical Methods for Rates and Proportions

                Bookmark

                Author and article information

                Contributors
                Journal
                JMIR Form Res
                JMIR Form Res
                JFR
                JMIR Formative Research
                JMIR Publications (Toronto, Canada )
                2561-326X
                2024
                26 June 2024
                : 8
                : e59267
                Affiliations
                [1 ] Department of Diagnostic and Generalist Medicine Dokkyo Medical University Tochigi Japan
                [2 ] Department of General Medicine Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences Okayama Japan
                Author notes
                Corresponding Author: Takanobu Hirosawa hirosawa@ 123456dokkyomed.ac.jp
                Author information
                https://orcid.org/0000-0002-3573-8203
                https://orcid.org/0000-0001-6042-7397
                https://orcid.org/0009-0000-8822-7127
                https://orcid.org/0000-0001-9104-8891
                https://orcid.org/0000-0001-9513-6864
                https://orcid.org/0000-0002-3788-487X
                Article
                v8i1e59267
                10.2196/59267
                11237772
                38924784
                b6ff1742-857b-4ac7-aa85-a314709da60c
                ©Takanobu Hirosawa, Yukinori Harada, Kazuya Mizuta, Tetsu Sakamoto, Kazuki Tokumasu, Taro Shimizu. Originally published in JMIR Formative Research (https://formative.jmir.org), 26.06.2024.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.

                History
                : 8 April 2024
                : 24 April 2024
                : 28 April 2024
                : 4 May 2024
                Categories
                Original Paper
                Original Paper

                decision support system,diagnostic errors,diagnostic excellence,diagnosis,large language model,llm,natural language processing,gpt-4,chatgpt,diagnoses,physicians,artificial intelligence,ai,chatbots,medical diagnosis,assessment,decision-making support,application,applications,app,apps

                Comments

                Comment on this article