17
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Can the ChatGPT and other large language models with internet-connected database solve the questions and concerns of patient with prostate cancer and help democratize medical knowledge?

      letter
      1 , 2 , 2 , 1 ,
      Journal of Translational Medicine
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          To the editor, Large language models (LLMs) represented by ChatGPT have shown promising potential in the field of medicine [1, 2]. However, it should be noted that the answers provided by ChatGPT may contain errors [3]. In addition, other companies have launched internet-connected LLMs that can access the latest data, potentially outperforming ChatGPT which was trained on pre-September 2021 data. Prostate cancer(PCa) is the second-most common type of cancer in men globally, with a relatively long survival time compared with other cancer types [4]. Taking PCa as an example, we evaluated whether these LLMs could provide correct and useful information on common problems related to PCa and provide appropriate humanistic care, thus contributing to the democratization of medical knowledge. We designed 22 questions based on patient education guidelines (CDC and UpToDate) and our own clinical experience, covering screening, prevention, treatment options, and postoperative complications (Table 1). The questions ranged from basic to advanced knowledge of PCa. A total of five state-of-the-art LLMs were included, including ChatGPT (Free and Plus version), YouChat, NeevaAI, Perplexity (concise and detailed model), and Chatsonic. The quality of the answers was primarily evaluated based on their accuracy, comprehensiveness, patient readability, humanistic care and stability. Table 1 Questions and corresponding difficulty levels used to test the performance of LLMs No. Questions Difficulty level 1 What is prostate cancer? Basic 2 What are the symptoms of prostate cancer? Basic 3 How can I prevent from prostate cancer? Basic 4 Who is at risk of prostate cancer? Basic 5 How is prostate cancer diagnosed? Basic 6 What is a prostate biopsy? Basic 7 How is prostate cancer treated? Basic 8 How long can I live if I have prostate cancer? Basic 9 How often do I need get a PSA test? Basic 10 What is prostate-specific antigen? Basic 11 What is screening for prostate cancer? Basic 12 Should I get screened for prostate cancer? Basic 13 My father had prostate cancer. Will I have prostate cancer too? Hard 14 I have a high PSA level. Do I have prostate cancer? Hard 15 What does a PSA level of 4 mean? Hard 16 What does a PSA level of 10 mean? Hard 17 What does a PSA level of 20 mean? Hard 18 The doctor said my prostate is totally removed by surgery. Why my PSA is still high after surgery? Hard 19 I have localized prostate cancer. Which is better, the radiation therapy or the surgery? Hard 20 Should I have robotic surgery or laparoscopic surgery if I have prostate cancer? Hard 21 What is the best medicine for Castration-resistant prostate cancer? Hard 22 Which is better for prostate cancer? Apalutamide or Enzalutamide? Hard The accuracy of most LLMs’ responses was above 90%, except for NeevaAI and Chatsonic (Fig. 1A). For basic information questions with definite answers, most LLMs could achieve a high accuracy. Nevertheless, the accuracy decreased in questions associated with specific scenario, or in questions that involved summary and analysis (e.g., Why the PSA is still high after surgery?). Among these LLMs, ChatGPT had the highest accuracy rate, and the free version of ChatGPT was slightly better than the paid version. Fig. 1 The performance of several large language models (LLMs) in answering different questions. All responses were generated and recorded on February 19, 2023. Three experienced urologists worked together to complete the ratings. A Accuracy of responses. Using a 3-point scale: 1 for correct, 2 for mixed with correct and incorrect/outdated data, and 3 for completely incorrect. From left to right, the performance in all questions, the performance in basic questions, and the performance in difficult questions. B The comprehensiveness of correctly answered responses. A 5-point Likert scale is used, with 1 representing “very comprehensive” and 5 representing “very Inadequate”. C Readability of answers. A 5-point Likert scale is used, with 1 representing “very easy to understand” and 5 representing “very difficult to understand”. D Stability of responses. Judged based on whether the model’s accuracy is consistent across different responses to the same question. Except for NeevaAI and Perplexity, the other models generated different responses each time, so we generated three responses for each question in these models to examine the stability of the models Evaluations of comprehensiveness show that LLMs performs well in answering most questions (Fig. 1B). For example, they can effectively highlight different PSA level significance, remind patients that PSA is not the final diagnostic test, and suggest further examination. They can also compare treatment options in detail, outlining the pros and cons, and provide helpful references for patients to make informed decisions. In addition, it is commendable that most responses point out the need for patients to consult their doctors for more advice. The readability of responses from most LLMs, except NeevaAI, was satisfactory (Fig. 1C). We believe that patients can understand the information conveyed in LLMs’ responses in most cases. All LLMs could provide humanistic care when discussing expected lifespan, informing patients about the relatively long survival time of PCa, which eased anxiety. However, they did not exhibit humanistic care when answering other inquiries. LLMs’ responses were generally stable, but inconsistent outcomes were detected in some instances (Fig. 1D). We then analyzed the reasons for the poor performance of LLMs in some responses. The most common issue was the mixture of outdated or incorrect information in the answers, including claims that open surgery is a more common choice for prostate cancer radical prostatectomy than robot-assisted surgery [5], and inaccurate responses regarding the approved indications when comparing apalutamide and enzalutamide. Inadequate comprehensiveness was mainly due to lack of specific details or omission of key points. For instance, Perplexity missed screening as an important measure in preventing PCa. Regarding the frequency of PSA testing, some answers only recommended a case-by-case approach, without specifying testing frequency for different age groups. LLMs sometimes misunderstand background information and provide inaccurate answers, such as mechanically suggesting that “PSA testing is not the final diagnostic test for PCa,” but monitoring PSA after prostatectomy is clearly not for the purpose of diagnosing PCa. It must be noted that some AI models based on search engines such as NeevaAI tend to simply provide the content of literature without summarizing and explaining, leading to poor readability. While we anticipated that the internet-connected LLMs would surpass ChatGPT, they failed to do so. This suggests that model training may be more important than real-time internet-connection. Although not yet perfect, LLMs can provide correct answers to basic questions that PCa patients are concerned about and can analyze specific situations to a certain extent. LLMs have the potential to be applied in patient education and consultation, providing patient-friendly information to help them understand their medical conditions and treatment options, enabling shared decision-making. More importantly, LLMs can help democratize medical knowledge, providing timely access to accurate medical information regardless of geographic or socioeconomic status. This is especially important for underserved populations in medical deserts, and those facing longer waiting times for medical care during the pandemics like COVID-19. We believe that LLMs have unlimited potential with the rapid development of AI. However, current LLMs are not yet capable of completely replace doctors, as they may contain errors or omit key points in responses, still have significant shortcomings in analyzing questions in specific contexts and cannot ask patients additional questions to gather more information. Moreover, they still cannot comfort patients like humans.

          Related collections

          Most cited references5

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Epidemiology of Prostate Cancer

          Prostate cancer is the second most frequent cancer diagnosis made in men and the fifth leading cause of death worldwide. Prostate cancer may be asymptomatic at the early stage and often has an indolent course that may require only active surveillance. Based on GLOBOCAN 2018 estimates, 1,276,106 new cases of prostate cancer were reported worldwide in 2018, with higher prevalence in the developed countries. Differences in the incidence rates worldwide reflect differences in the use of diagnostic testing. Prostate cancer incidence and mortality rates are strongly related to the age with the highest incidence being seen in elderly men (> 65 years of age). African-American men have the highest incidence rates and more aggressive type of prostate cancer compared to White men. There is no evidence yet on how to prevent prostate cancer; however, it is possible to lower the risk by limiting high-fat foods, increasing the intake of vegetables and fruits and performing more exercise. Screening is highly recommended at age 45 for men with familial history and African-American men. Up-to-date statistics on prostate cancer occurrence and outcomes along with a better understanding of the etiology and causative risk factors are essential for the primary prevention of this disease.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model

            This study examines the appropriateness of artificial intelligence model responses to fundamental cardiovascular disease prevention questions.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information

              Abstract Data about the quality of cancer information that chatbots and other artificial intelligence systems provide are limited. Here, we evaluate the accuracy of cancer information on ChatGPT compared with the National Cancer Institute’s (NCI’s) answers by using the questions on the “Common Cancer Myths and Misconceptions” web page. The NCI’s answers and ChatGPT answers to each question were blinded, and then evaluated for accuracy (accurate: yes vs no). Ratings were evaluated independently for each question, and then compared between the blinded NCI and ChatGPT answers. Additionally, word count and Flesch-Kincaid readability grade level for each individual response were evaluated. Following expert review, the percentage of overall agreement for accuracy was 100% for NCI answers and 96.9% for ChatGPT outputs for questions 1 through 13 (ĸ = ‒0.03, standard error = 0.08). There were few noticeable differences in the number of words or the readability of the answers from NCI or ChatGPT. Overall, the results suggest that ChatGPT provides accurate information about common cancer myths and misconceptions.
                Bookmark

                Author and article information

                Contributors
                drchenrui@foxmail.com
                Journal
                J Transl Med
                J Transl Med
                Journal of Translational Medicine
                BioMed Central (London )
                1479-5876
                19 April 2023
                19 April 2023
                2023
                : 21
                : 269
                Affiliations
                [1 ]GRID grid.16821.3c, ISNI 0000 0004 0368 8293, Department of Urology, Renji Hospital, , Shanghai Jiao Tong University School of Medicine, ; Shanghai, 200127 China
                [2 ]GRID grid.284723.8, ISNI 0000 0000 8877 7471, The First Clinical Medical School, , Southern Medical University, ; 1023 Shatai South Road, Guangzhou, 510515 Guangdong China
                Author information
                http://orcid.org/0000-0003-3728-2577
                Article
                4123
                10.1186/s12967-023-04123-5
                10115367
                37076876
                231d36fb-cd3d-4146-b3fe-60324907acad
                © The Author(s) 2023

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 24 March 2023
                : 9 April 2023
                Funding
                Funded by: Rising-Star Program of Science and Technology Commission of Shanghai Municipality
                Award ID: 21QA1411500
                Award Recipient :
                Funded by: Natural Science Foundation of Science and Technology Commission of Shanghai
                Award ID: 22ZR1478000
                Award Recipient :
                Funded by: the National Natural Science Foundation of China
                Award ID: 82272905
                Award Recipient :
                Categories
                Letter to the Editor
                Custom metadata
                © The Author(s) 2023

                Medicine
                Medicine

                Comments

                Comment on this article