Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination

      research-article
      * , ,
      Heliyon
      Elsevier
      ChatGPT-3.5, ChatGPT-4, Taiwan plastic surgery board examination

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Chat Generative Pre-Trained Transformer (ChatGPT) is a state-of-the-art large language model that has been evaluated across various medical fields, with mixed performance on licensing examinations. This study aimed to assess the performance of ChatGPT-3.5 and ChatGPT-4 in answering questions from the Taiwan Plastic Surgery Board Examination.

          Methods

          The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on 1375 questions from the past 8 years of the Taiwan Plastic Surgery Board Examination, including 985 single-choice and 390 multiple-choice questions. We obtained the responses between June and July 2023, launching a new chat session for each question to eliminate memory retention bias.

          Results

          Overall, ChatGPT-4 outperformed ChatGPT-3.5, achieving a 59 % correct answer rate compared to 41 % for ChatGPT-3.5. ChatGPT-4 passed five out of eight yearly exams, whereas ChatGPT-3.5 failed all. On single-choice questions, ChatGPT-4 scored 66 % correct, compared to 48 % for ChatGPT-3.5. On multiple-choice, ChatGPT-4 achieved a 43 % correct rate, nearly double the 23 % of ChatGPT-3.5.

          Conclusion

          As ChatGPT evolves, its performance on the Taiwan Plastic Surgery Board Examination is expected to improve further. The study suggests potential reforms, such as incorporating more problem-based scenarios, leveraging ChatGPT to refine exam questions, and integrating AI-assisted learning into candidate preparation. These advancements could enhance the assessment of candidates' critical thinking and problem-solving abilities in the field of plastic surgery.

          Related collections

          Most cited references27

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models

          Purpose This study aimed to assess the performance of ChatGPT, specifically the GPT-3.5 and GPT-4 models, in understanding complex surgical clinical information and its potential implications for surgical education and training. Methods The dataset comprised 280 questions from the Korean general surgery board exams conducted between 2020 and 2022. Both GPT-3.5 and GPT-4 models were evaluated, and their performances were compared using McNemar test. Results GPT-3.5 achieved an overall accuracy of 46.8%, while GPT-4 demonstrated a significant improvement with an overall accuracy of 76.4%, indicating a notable difference in performance between the models (P < 0.001). GPT-4 also exhibited consistent performance across all subspecialties, with accuracy rates ranging from 63.6% to 83.3%. Conclusion ChatGPT, particularly GPT-4, demonstrates a remarkable ability to understand complex surgical clinical information, achieving an accuracy rate of 76.4% on the Korean general surgery board exam. However, it is important to recognize the limitations of large language models and ensure that they are used in conjunction with human expertise and judgment.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found
            Is Open Access

            Performance of ChatGPT on the pharmacist licensing examination in Taiwan

            Background: ChatGPT is an artificial intelligence model trained for conversations. ChatGPT has been widely applied in general medical education and cardiology, but its application in pharmacy has been lacking. This study examined the accuracy of ChatGPT on the Taiwanese Pharmacist Licensing Examination and investigated its potential role in pharmacy education. Methods: ChatGPT was used on the first Taiwanese Pharmacist Licensing Examination in 2023 in Mandarin and English. The questions were entered manually one by one. Graphical questions, chemical formulae, and tables were excluded. Textual questions were scored according to the number of correct answers. Chart question scores were determined by multiplying the number and the correct rate of text questions. This study was conducted from March 5 to March 10, 2023, by using ChatGPT 3.5. Results: The correct rate of ChatGPT in Chinese and English questions was 54.4% and 56.9% in the first stage, and 53.8% and 67.6% in the second stage. On the Chinese test, only pharmacology and pharmacochemistry sections received passing scores. The English test scores were higher than the Chinese test scores across all subjects and were significantly higher in dispensing pharmacy and clinical pharmacy as well as therapeutics. Conclusion: ChatGPT 3.5 failed the Taiwanese Pharmacist Licensing Examination. Although it is not able to pass the examination, it can be improved quickly through deep learning. It reminds us that we should not only use multiple-choice questions to assess a pharmacist’s ability, but also use more variety of evaluations in the future. Pharmacy education should be changed in line with the examination, and students must be able to use AI technology for self-learning. More importantly, we need to help students develop humanistic qualities and strengthen their ability to interact with patients, so that they can become warm-hearted healthcare professionals.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study

              Background ChatGPT (OpenAI) has gained considerable attention because of its natural and intuitive responses. ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers, as stated by OpenAI as a limitation. However, considering that ChatGPT is an interactive AI that has been trained to reduce the output of unethical sentences, the reliability of the training data is high and the usefulness of the output content is promising. Fortunately, in March 2023, a new version of ChatGPT, GPT-4, was released, which, according to internal evaluations, was expected to increase the likelihood of producing factual responses by 40% compared with its predecessor, GPT-3.5. The usefulness of this version of ChatGPT in English is widely appreciated. It is also increasingly being evaluated as a system for obtaining medical information in languages other than English. Although it does not reach a passing score on the national medical examination in Chinese, its accuracy is expected to gradually improve. Evaluation of ChatGPT with Japanese input is limited, although there have been reports on the accuracy of ChatGPT’s answers to clinical questions regarding the Japanese Society of Hypertension guidelines and on the performance of the National Nursing Examination. Objective The objective of this study is to evaluate whether ChatGPT can provide accurate diagnoses and medical knowledge for Japanese input. Methods Questions from the National Medical Licensing Examination (NMLE) in Japan, administered by the Japanese Ministry of Health, Labour and Welfare in 2022, were used. All 400 questions were included. Exclusion criteria were figures and tables that ChatGPT could not recognize; only text questions were extracted. We instructed GPT-3.5 and GPT-4 to input the Japanese questions as they were and to output the correct answers for each question. The output of ChatGPT was verified by 2 general practice physicians. In case of discrepancies, they were checked by another physician to make a final decision. The overall performance was evaluated by calculating the percentage of correct answers output by GPT-3.5 and GPT-4. Results Of the 400 questions, 292 were analyzed. Questions containing charts, which are not supported by ChatGPT, were excluded. The correct response rate for GPT-4 was 81.5% (237/292), which was significantly higher than the rate for GPT-3.5, 42.8% (125/292). Moreover, GPT-4 surpassed the passing standard (>72%) for the NMLE, indicating its potential as a diagnostic and therapeutic decision aid for physicians. Conclusions GPT-4 reached the passing standard for the NMLE in Japan, entered in Japanese, although it is limited to written questions. As the accelerated progress in the past few months has shown, the performance of the AI will improve as the large language model continues to learn more, and it may well become a decision support system for medical professionals by providing more accurate information.
                Bookmark

                Author and article information

                Contributors
                Journal
                Heliyon
                Heliyon
                Heliyon
                Elsevier
                2405-8440
                18 July 2024
                30 July 2024
                18 July 2024
                : 10
                : 14
                : e34851
                Affiliations
                [1]Department of Plastic Surgery, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University and College of Medicine, Kaohsiung, 83301, Taiwan
                Author notes
                [* ]Corresponding author. Department of Plastic Surgery, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Taiwan No.123, Ta-Pei Road, Niao-Song District, Kaohsiung City, 833, Taiwan. m93chinghua@ 123456gmail.com
                Article
                S2405-8440(24)10882-1 e34851
                10.1016/j.heliyon.2024.e34851
                11324965
                39149010
                0da832b2-fde7-4f25-a962-73942777941f
                © 2024 The Authors

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

                History
                : 23 May 2024
                : 27 June 2024
                : 17 July 2024
                Categories
                Research Article

                chatgpt-3.5,chatgpt-4,taiwan plastic surgery board examination

                Comments

                Comment on this article