Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The increasing application of generative artificial intelligence large language models (LLMs) in various fields, including dentistry, raises questions about their accuracy.

Objective

This study aims to comparatively evaluate the answers provided by 4 LLMs, namely Bard (Google LLC), ChatGPT-3.5 and ChatGPT-4 (OpenAI), and Bing Chat (Microsoft Corp), to clinically relevant questions from the field of dentistry.

Methods

The LLMs were queried with 20 open-type, clinical dentistry–related questions from different disciplines, developed by the respective faculty of the School of Dentistry, European University Cyprus. The LLMs’ answers were graded 0 (minimum) to 10 (maximum) points against strong, traditionally collected scientific evidence, such as guidelines and consensus statements, using a rubric, as if they were examination questions posed to students, by 2 experienced faculty members. The scores were statistically compared to identify the best-performing model using the Friedman and Wilcoxon tests. Moreover, the evaluators were asked to provide a qualitative evaluation of the comprehensiveness, scientific accuracy, clarity, and relevance of the LLMs’ answers.

Results

Overall, no statistically significant difference was detected between the scores given by the 2 evaluators; therefore, an average score was computed for every LLM. Although ChatGPT-4 statistically outperformed ChatGPT-3.5 ( P=.008), Bing Chat ( P=.049), and Bard ( P=.045), all models occasionally exhibited inaccuracies, generality, outdated content, and a lack of source references. The evaluators noted instances where the LLMs delivered irrelevant information, vague answers, or information that was not fully accurate.

Conclusions

This study demonstrates that although LLMs hold promising potential as an aid in the implementation of evidence-based dentistry, their current limitations can lead to potentially harmful health care decisions if not used judiciously. Therefore, these tools should not replace the dentist’s critical thinking and in-depth understanding of the subject matter. Further research, clinical validation, and model improvements are necessary for these tools to be fully integrated into dental practice. Dental practitioners must be aware of the limitations of LLMs, as their imprudent use could potentially impact patient care. Regulatory measures should be established to oversee the use of these evolving technologies.

Related collections

Most cited references 59

Record: found
Abstract: found
Article: not found

GRADE guidelines: 3. Rating the quality of evidence.

Howard Balshem, Mark Helfand, Holger Schünemann … (2011)

This article introduces the approach of GRADE to rating quality of evidence. GRADE specifies four categories-high, moderate, low, and very low-that are applied to a body of evidence, not to individual studies. In the context of a systematic review, quality reflects our confidence that the estimates of the effect are correct. In the context of recommendations, quality reflects our confidence that the effect estimates are adequate to support a particular recommendation. Randomized trials begin as high-quality evidence, observational studies as low quality. "Quality" as used in GRADE means more than risk of bias and so may also be compromised by imprecision, inconsistency, indirectness of study results, and publication bias. In addition, several factors can increase our confidence in an estimate of effect. GRADE provides a systematic approach for considering and reporting each of these factors. GRADE separates the process of assessing quality of evidence from the process of making recommendations. Judgments about the strength of a recommendation depend on more than just the quality of evidence. Copyright © 2011 Elsevier Inc. All rights reserved.

0 comments Cited 1640 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns

Malik Sallam (2023)

ChatGPT is an artificial intelligence (AI)-based conversational large language model (LLM). The potential applications of LLMs in health care education, research, and practice could be promising if the associated valid concerns are proactively examined and addressed. The current systematic review aimed to investigate the utility of ChatGPT in health care education, research, and practice and to highlight its potential limitations. Using the PRIMSA guidelines, a systematic search was conducted to retrieve English records in PubMed/MEDLINE and Google Scholar (published research or preprints) that examined ChatGPT in the context of health care education, research, or practice. A total of 60 records were eligible for inclusion. Benefits of ChatGPT were cited in 51/60 (85.0%) records and included: (1) improved scientific writing and enhancing research equity and versatility; (2) utility in health care research (efficient analysis of datasets, code generation, literature reviews, saving time to focus on experimental design, and drug discovery and development); (3) benefits in health care practice (streamlining the workflow, cost saving, documentation, personalized medicine, and improved health literacy); and (4) benefits in health care education including improved personalized learning and the focus on critical thinking and problem-based learning. Concerns regarding ChatGPT use were stated in 58/60 (96.7%) records including ethical, copyright, transparency, and legal issues, the risk of bias, plagiarism, lack of originality, inaccurate content with risk of hallucination, limited knowledge, incorrect citations, cybersecurity issues, and risk of infodemics. The promising applications of ChatGPT can induce paradigm shifts in health care education, research, and practice. However, the embrace of this AI chatbot should be conducted with extreme caution considering its potential limitations. As it currently stands, ChatGPT does not qualify to be listed as an author in scientific articles unless the ICMJE/COPE guidelines are revised or amended. An initiative involving all stakeholders in health care education, research, and practice is urgently needed. This will help to set a code of ethics to guide the responsible use of ChatGPT among other LLMs in health care and academia.

0 comments Cited 384 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Is Open Access

ChatGPT: the future of discharge summaries?

Sajan B Patel, Kyle Lam (2023)

0 comments Cited 174 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Kostis Giannakopoulos:

ORCID: https://orcid.org/0000-0001-7008-7306

School of DentistryEuropean University Cyprus6 Diogenis StEngomiNicosia, 2404Cyprus357 22559622357 22559622k.giannakopoulos@euc.ac.cy

Journal

Journal ID (nlm-ta): J Med Internet Res

Journal ID (iso-abbrev): J Med Internet Res

Journal ID (publisher-id): JMIR

Title: Journal of Medical Internet Research

Publisher: JMIR Publications (Toronto, Canada )

ISSN (Print): 1439-4456

ISSN (Electronic): 1438-8871

Publication date Collection: 2023

Publication date (Electronic): 28 December 2023

Volume: 25

Electronic Location Identifier: e51580

Affiliations

[1 ] School of Dentistry European University Cyprus Nicosia Cyprus

[2 ] Information Management Systems Institute, ATHENA Research and Innovation Center Athens Greece

[3 ] School of Dentistry Aristotle University of Thessaloniki Thessaloniki Greece

[4 ] Mohammed Bin Rashid University of Medicine and Health Sciences Dubai United Arab Emirates

Author notes

Corresponding Author: Kostis Giannakopoulos k.giannakopoulos@ 123456euc.ac.cy

Author information

Kostis Giannakopoulos https://orcid.org/0000-0001-7008-7306

Argyro Kavadella https://orcid.org/0009-0003-0560-8373

Anas Aaqel Salim https://orcid.org/0000-0002-7731-6666

Vassilis Stamatopoulos https://orcid.org/0000-0002-9044-796X

Eleftherios G Kaklamanos https://orcid.org/0000-0002-0513-5110

Article

Publisher ID: v25i1e51580

DOI: 10.2196/51580

PMC ID: 10784979

PubMed ID: 38009003

SO-VID: 92277ed3-bf96-43e0-ab0b-b097fbeb8471

Copyright © ©Kostis Giannakopoulos, Argyro Kavadella, Anas Aaqel Salim, Vassilis Stamatopoulos, Eleftherios G Kaklamanos. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 28.12.2023.

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

History

Date received : 4 August 2023

Date revision requested : 14 September 2023

Date revision received : 15 October 2023

Date accepted : 20 November 2023

Comments

Comment on this article

scite_

Cited by 8

See all cited by

Most referenced authors 350

See all reference authors

Submit your digital health research with an established publisher
- celebrating 25 years of open access