0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      GPT-4 can pass the Korean National Licensing Examination for Korean Medicine Doctors

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Traditional Korean medicine (TKM) emphasizes individualized diagnosis and treatment. This uniqueness makes AI modeling difficult due to limited data and implicit processes. Large language models (LLMs) have demonstrated impressive medical inference, even without advanced training in medical texts. This study assessed the capabilities of GPT-4 in TKM, using the Korean National Licensing Examination for Korean Medicine Doctors (K-NLEKMD) as a benchmark. The K-NLEKMD, administered by a national organization, encompasses 12 major subjects in TKM. GPT-4 answered 340 questions from the 2022 K-NLEKMD. We optimized prompts with Chinese-term annotation, English translation for questions and instruction, exam-optimized instruction, and self-consistency. GPT-4 with optimized prompts achieved 66.18% accuracy, surpassing both the examination’s average pass mark of 60% and the 40% minimum for each subject. The gradual introduction of language-related prompts and prompting techniques enhanced the accuracy from 51.82% to its maximum accuracy. GPT-4 showed low accuracy in subjects including public health & medicine-related law, internal medicine (2), and acupuncture medicine which are highly localized in Korea and TKM. The model’s accuracy was lower for questions requiring TKM-specialized knowledge than those that did not. It exhibited higher accuracy in diagnosis-based and recall-based questions than in intervention-based questions. A significant positive correlation was observed between the consistency and accuracy of GPT-4’s responses. This study unveils both the potential and challenges of applying LLMs to TKM. These findings underline the potential of LLMs like GPT-4 in culturally adapted medicine, especially TKM, for tasks such as clinical assistance, medical education, and research. But they also point towards the necessity for the development of methods to mitigate cultural bias inherent in large language models and validate their efficacy in real-world clinical settings.

          Related collections

          Most cited references39

          • Record: found
          • Abstract: found
          • Article: not found

          Attention Is All You Need

          The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 15 pages, 5 figures
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            BioBERT: a pre-trained biomedical language representation model for biomedical text mining

            Abstract Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Language Models are Few-Shot Learners

              Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general. 40+32 pages
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – original draft
                Role: Formal analysisRole: MethodologyRole: Software
                Role: Writing – review & editing
                Role: Funding acquisitionRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: MethodologyRole: Project administrationRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLOS Digit Health
                PLOS Digit Health
                plos
                PLOS Digital Health
                Public Library of Science (San Francisco, CA USA )
                2767-3170
                15 December 2023
                December 2023
                : 2
                : 12
                : e0000416
                Affiliations
                [1 ] Department of Physiology, College of Korean Medicine, Gachon University, Seongnam, Gyeonggi-do, Korea
                [2 ] Division of Integrated Art Therapy, School of Korean Medicine, Pusan National University, Yangsan, Gyeongsangnam-do, Korea
                [3 ] Division of Longevity and Biofunctional Medicine, School of Korean Medicine, Pusan National University, Yangsan, Gyeongsangnam-do, Korea
                [4 ] Department of Neurobiology, Stanford University School of Medicine, Stanford, California, United States of America
                Massachusetts Institute of Technology, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0002-3546-8389
                https://orcid.org/0000-0003-3823-5799
                https://orcid.org/0000-0001-8281-9148
                Article
                PDIG-D-23-00147
                10.1371/journal.pdig.0000416
                10723673
                38100393
                52e21db1-c829-482b-a5c2-007c4b0556fa
                © 2023 Jang et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 19 April 2023
                : 20 November 2023
                Page count
                Figures: 4, Tables: 2, Pages: 14
                Funding
                Funded by: the National Research Foundation of Korea
                Award ID: 2020R1F1A1075145
                Award Recipient :
                Funded by: the National Research Foundation of Korea
                Award ID: 2022R1F1A1068841
                Award Recipient :
                This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1F1A1075145 to Y.-K. K. and 2022R1F1A1068841 to C.-E. K.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognitive Psychology
                Language
                Biology and Life Sciences
                Psychology
                Cognitive Psychology
                Language
                Social Sciences
                Psychology
                Cognitive Psychology
                Language
                Medicine and Health Sciences
                Health Care
                Health Care Providers
                Physicians
                People and Places
                Population Groupings
                Professions
                Medical Personnel
                Physicians
                People and Places
                Population Groupings
                Ethnicities
                Asian People
                Korean People
                Medicine and Health Sciences
                Complementary and Alternative Medicine
                Traditional Medicine
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognitive Psychology
                Decision Making
                Biology and Life Sciences
                Psychology
                Cognitive Psychology
                Decision Making
                Social Sciences
                Psychology
                Cognitive Psychology
                Decision Making
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognition
                Decision Making
                Medicine and Health Sciences
                Pediatrics
                Medicine and Health Sciences
                Public and Occupational Health
                Social Sciences
                Law and Legal Sciences
                Medical Law
                Medicine and Health Sciences
                Medical Humanities
                Medical Law
                Custom metadata
                The benchmark dataset, Korean National Licensing Examination, is available from the Korea Health Personnel Licensing Examination Institute ( https://www.kuksiwon.or.kr/).

                Comments

                Comment on this article