13
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument

      research-article
      , MBChB, MSc 1 , , MBBS, BSc, MSc, DHMS 1 , , MD 2 , , MD, PD 3 , , Prof Dr 2 , , MD, MSc, PhD 4 , , MD 1 , 2 ,
      ,
      (Reviewer), (Reviewer), (Reviewer)
      Journal of Medical Internet Research
      JMIR Publications
      artificial intelligence, internet information, patient information, ChatGPT, EQIP tool, chatbot, chatbots, conversational agent, conversational agents, internal medicine, pancreas, liver, hepatic, biliary, gall, bile, gallstone, pancreatitis, pancreatic, medical information

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          ChatGPT-4 is the latest release of a novel artificial intelligence (AI) chatbot able to answer freely formulated and complex questions. In the near future, ChatGPT could become the new standard for health care professionals and patients to access medical information. However, little is known about the quality of medical information provided by the AI.

          Objective

          We aimed to assess the reliability of medical information provided by ChatGPT.

          Methods

          Medical information provided by ChatGPT-4 on the 5 hepato-pancreatico-biliary (HPB) conditions with the highest global disease burden was measured with the Ensuring Quality Information for Patients (EQIP) tool. The EQIP tool is used to measure the quality of internet-available information and consists of 36 items that are divided into 3 subsections. In addition, 5 guideline recommendations per analyzed condition were rephrased as questions and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by 2 authors independently. All queries were repeated 3 times to measure the internal consistency of ChatGPT.

          Results

          Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer, and hepatocellular carcinoma). The median EQIP score across all conditions was 16 (IQR 14.5-18) for the total of 36 items. Divided by subsection, median scores for content, identification, and structure data were 10 (IQR 9.5-12.5), 1 (IQR 1-1), and 4 (IQR 4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Interrater agreement as measured by the Fleiss κ was 0.78 ( P<.001), indicating substantial agreement. Internal consistency of the answers provided by ChatGPT was 100%.

          Conclusions

          ChatGPT provides medical information of comparable quality to available static internet information. Although currently of limited quality, large language models could become the future standard for patients and health care professionals to gather medical information.

          Related collections

          Most cited references32

          • Record: found
          • Abstract: not found
          • Article: not found

          EASL Clinical Practice Guidelines: Management of hepatocellular carcinoma

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

            Background Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user input. Objective This study aimed to evaluate the performance of ChatGPT on questions within the scope of the United States Medical Licensing Examination Step 1 and Step 2 exams, as well as to analyze responses for user interpretability. Methods We used 2 sets of multiple-choice questions to evaluate ChatGPT’s performance, each with questions pertaining to Step 1 and Step 2. The first set was derived from AMBOSS, a commonly used question bank for medical students, which also provides statistics on question difficulty and the performance on an exam relative to the user base. The second set was the National Board of Medical Examiners (NBME) free 120 questions. ChatGPT’s performance was compared to 2 other large language models, GPT-3 and InstructGPT. The text output of each ChatGPT response was evaluated across 3 qualitative metrics: logical justification of the answer selected, presence of information internal to the question, and presence of information external to the question. Results Of the 4 data sets, AMBOSS-Step1 , AMBOSS-Step2 , NBME-Free-Step1 , and NBME-Free-Step2 , ChatGPT achieved accuracies of 44% (44/100), 42% (42/100), 64.4% (56/87), and 57.8% (59/102), respectively. ChatGPT outperformed InstructGPT by 8.15% on average across all data sets, and GPT-3 performed similarly to random chance. The model demonstrated a significant decrease in performance as question difficulty increased ( P =.01) within the AMBOSS-Step1 data set. We found that logical justification for ChatGPT’s answer selection was present in 100% of outputs of the NBME data sets. Internal information to the question was present in 96.8% (183/189) of all questions. The presence of information external to the question was 44.5% and 27% lower for incorrect answers relative to correct answers on the NBME-Free-Step1 ( P <.001) and NBME-Free-Step2 ( P =.001) data sets, respectively. Conclusions ChatGPT marks a significant improvement in natural language processing models on the tasks of medical question answering. By performing at a greater than 60% threshold on the NBME-Free-Step-1 data set, we show that the model achieves the equivalent of a passing score for a third-year medical student. Additionally, we highlight ChatGPT’s capacity to provide logic and informational context across the majority of answers. These facts taken together make a compelling case for the potential applications of ChatGPT as an interactive medical education tool to support learning.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              ChatGPT: five priorities for research

                Bookmark

                Author and article information

                Contributors
                Journal
                J Med Internet Res
                J Med Internet Res
                JMIR
                Journal of Medical Internet Research
                JMIR Publications (Toronto, Canada )
                1439-4456
                1438-8871
                2023
                30 June 2023
                : 25
                : e47479
                Affiliations
                [1 ] Royal Free London NHS Foundation Trust London United Kingdom
                [2 ] Clarunis – University Center for Gastrointestinal and Liver Diseases Basel Switzerland
                [3 ] Departement Chirurgie Kantonsspital Aarau Aarau Switzerland
                [4 ] Organ Transplant Center of Excellence King Faisal Specialist Hospital & Research Centre Riyadh Saudi Arabia
                Author notes
                Corresponding Author: Sebastian Manuel Staubli s.staubli@ 123456nhs.net
                Author information
                https://orcid.org/0009-0009-6278-6399
                https://orcid.org/0000-0001-7366-296X
                https://orcid.org/0000-0002-7109-3545
                https://orcid.org/0000-0002-7493-2850
                https://orcid.org/0000-0002-8552-8538
                https://orcid.org/0000-0002-0898-3270
                https://orcid.org/0000-0002-0818-9835
                Article
                v25i1e47479
                10.2196/47479
                10365578
                37389908
                0dc4c49f-99da-4999-8b15-8c3391269995
                ©Harriet Louise Walker, Shahi Ghani, Christoph Kuemmerli, Christian Andreas Nebiker, Beat Peter Müller, Dimitri Aristotle Raptis, Sebastian Manuel Staubli. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 30.06.2023.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

                History
                : 7 April 2023
                : 30 May 2023
                : 7 June 2023
                : 15 June 2023
                Categories
                Original Paper
                Original Paper

                Medicine
                artificial intelligence,internet information,patient information,chatgpt,eqip tool,chatbot,chatbots,conversational agent,conversational agents,internal medicine,pancreas,liver,hepatic,biliary,gall,bile,gallstone,pancreatitis,pancreatic,medical information

                Comments

                Comment on this article