4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Large language models (LLMs), such as ChatGPT-3.5, Google Bard, and Microsoft Bing, have shown promising capabilities in various natural language processing (NLP) tasks. However, their performance and accuracy in solving domain-specific questions, particularly in the field of hematology, have not been extensively investigated.

          Objective

          This study aimed to explore the capability of LLMs, namely, ChatGPT-3.5, Google Bard, and Microsoft Bing (Precise), in solving hematology-related cases and comparing their performance.

          Methods

          This was a cross-sectional study conducted in the Department of Physiology and Pathology, All India Institute of Medical Sciences, Deoghar, Jharkhand, India. We curated a set of 50 cases on hematology covering a range of topics and complexities. The dataset included queries related to blood disorders, hematologic malignancies, laboratory test parameters, calculations, and treatment options. Each case and related question was prepared with a set of correct answers to compare with. We utilized ChatGPT-3.5, Google Bard Experiment, and Microsoft Bing (Precise) for question-answering tasks. The answers were checked by two physiologists and one pathologist. They rated the answers on a rating scale from one to five. The average score of the three models was compared by Friedman’s test with Dunn’s post-hoc test. The performance of the LLMs was compared with a median of 2.5 by a one-sample median test as the curriculum from which the questions were curated has a 50% pass grade.

          Results

          The scores among the three LLMs were significantly different (p-value < 0.0001) with the highest score by ChatGPT (3.15±1.19), followed by Bard (2.23±1.17) and Bing (1.98±1.01). The score of ChatGPT was significantly higher than 50% (p-value = 0.0004), Bard's score was close to 50% (p-value = 0.38), and Bing's score was significantly lower than the pass score (p-value = 0.0015).

          Conclusion

          The LLMs reveal significant differences in solving case vignettes in hematology. ChatGPT exhibited the highest score, followed by Google Bard and Microsoft Bing. The observed performance trends suggest that ChatGPT holds promising potential in the medical domain. However, none of the models was capable of answering all questions accurately. Further research and optimization of language models can offer valuable contributions to healthcare and medical education applications.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: not found

          A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research.

          Intraclass correlation coefficient (ICC) is a widely used reliability index in test-retest, intrarater, and interrater reliability analyses. This article introduces the basic concept of ICC in the content of reliability analysis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

            We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations

              This paper presents an analysis of the advantages, limitations, ethical considerations, future prospects, and practical applications of ChatGPT and artificial intelligence (AI) in the healthcare and medical domains. ChatGPT is an advanced language model that uses deep learning techniques to produce human-like responses to natural language inputs. It is part of the family of generative pre-training transformer (GPT) models developed by OpenAI and is currently one of the largest publicly available language models. ChatGPT is capable of capturing the nuances and intricacies of human language, allowing it to generate appropriate and contextually relevant responses across a broad spectrum of prompts. The potential applications of ChatGPT in the medical field range from identifying potential research topics to assisting professionals in clinical and laboratory diagnosis. Additionally, it can be used to help medical students, doctors, nurses, and all members of the healthcare fraternity to know about updates and new developments in their respective fields. The development of virtual assistants to aid patients in managing their health is another important application of ChatGPT in medicine. Despite its potential applications, the use of ChatGPT and other AI tools in medical writing also poses ethical and legal concerns. These include possible infringement of copyright laws, medico-legal complications, and the need for transparency in AI-generated content. In conclusion, ChatGPT has several potential applications in the medical and healthcare fields. However, these applications come with several limitations and ethical considerations which are presented in detail along with future prospects in medicine and healthcare.
                Bookmark

                Author and article information

                Journal
                Cureus
                Cureus
                2168-8184
                Cureus
                Cureus (Palo Alto (CA) )
                2168-8184
                21 August 2023
                August 2023
                : 15
                : 8
                : e43861
                Affiliations
                [1 ] Physiology, All India Institute of Medical Sciences, Deoghar, Deoghar, IND
                [2 ] Pathology, All India Institute of Medical Sciences, Deoghar, Deoghar, IND
                Author notes
                Article
                10.7759/cureus.43861
                10511207
                37736448
                0f2628b6-b313-4510-85c6-3cb0b13c4f5f
                Copyright © 2023, Kumari et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 21 August 2023
                Categories
                Medical Education
                Healthcare Technology
                Hematology

                ai and robotics in healthcare,microsoft bing,google bard,chatgpt,pathology,hematology,hematologic diseases,natural language processing,search engine,pathologists

                Comments

                Comment on this article