Artificial Hallucinations in ChatGPT: Implications in Scientific Writing

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

While still in its infancy, ChatGPT (Generative Pretrained Transformer), introduced in November 2022, is bound to hugely impact many industries, including healthcare, medical education, biomedical research, and scientific writing. Implications of ChatGPT, that new chatbot introduced by OpenAI on academic writing, is largely unknown. In response to the Journal of Medical Science (Cureus) Turing Test - call for case reports written with the assistance of ChatGPT, we present two cases one of homocystinuria-associated osteoporosis, and the other is on late-onset Pompe disease (LOPD), a rare metabolic disorder. We tested ChatGPT to write about the pathogenesis of these conditions. We documented the positive, negative, and rather troubling aspects of our newly introduced chatbot’s performance.

Related collections

Most cited references 6

Record: found
Abstract: not found
Article: not found

ChatGPT listed as author on research papers: many scientists disapprove

Chris Stokel-Walker (2023)

0 comments Cited 203 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Survey of Hallucination in Natural Language Generation

Ziwei Ji, Nayeon Lee, Rita Frieske … (2022)

Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended text, which degrades the system performance and fails to meet user expectations in many real-world scenarios. To address this issue, many studies have been presented in measuring and mitigating hallucinated texts, but these have never been reviewed in a comprehensive manner before. In this survey, we thus provide a broad overview of the research progress and challenges in the hallucination problem in NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; and (2) an overview of task-specific research progress on hallucinations in the following downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, and machine translation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.

0 comments Cited 100 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers

Catherine A. Gao, Frederick M. Howard, Nikolay S. Markov … (2022)

Background Large language models such as ChatGPT can produce increasingly realistic text, with unknown information on the accuracy and integrity of using these models in scientific writing. Methods We gathered ten research abstracts from five high impact factor medical journals (n=50) and asked ChatGPT to generate research abstracts based on their titles and journals. We evaluated the abstracts using an artificial intelligence (AI) output detector, plagiarism detector, and had blinded human reviewers try to distinguish whether abstracts were original or generated. Results All ChatGPT-generated abstracts were written clearly but only 8% correctly followed the specific journal’s formatting requirements. Most generated abstracts were detected using the AI output detector, with scores (higher meaning more likely to be generated) of median [interquartile range] of 99.98% [12.73, 99.98] compared with very low probability of AI-generated output in the original abstracts of 0.02% [0.02, 0.09]. The AUROC of the AI output detector was 0.94. Generated abstracts scored very high on originality using the plagiarism detector (100% [100, 100] originality). Generated abstracts had a similar patient cohort size as original abstracts, though the exact numbers were fabricated. When given a mixture of original and general abstracts, blinded human reviewers correctly identified 68% of generated abstracts as being generated by ChatGPT, but incorrectly identified 14% of original abstracts as being generated. Reviewers indicated that it was surprisingly difficult to differentiate between the two, but that the generated abstracts were vaguer and had a formulaic feel to the writing. Conclusion ChatGPT writes believable scientific abstracts, though with completely generated data. These are original without any plagiarism detected but are often identifiable using an AI output detector and skeptical human reviewers. Abstract evaluation for journals and medical conferences must adapt policy and practice to maintain rigorous scientific standards; we suggest inclusion of AI output detectors in the editorial process and clear disclosure if these technologies are used. The boundaries of ethical and acceptable use of large language models to help scientific writing remain to be determined.

0 comments Cited 46 times – based on 0 reviews

Preprint

     Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Cureus

Journal ID (iso-abbrev): Cureus

Journal ID (issn): 2168-8184

Title: Cureus

Publisher: Cureus (Palo Alto (CA) )

ISSN (Electronic): 2168-8184

Publication date (Electronic, pub): 19 February 2023

Publication date (Electronic, collection): February 2023

Volume: 15

Issue: 2

Electronic Location Identifier: e35179

Affiliations

[1 ] Internal Medicine, Kings County Hospital Center, Brooklyn, USA

[2 ] Internal Medicine, Veterans Affairs Medical Center, Brooklyn, USA

[3 ] Internal Medicine, State University of New York Downstate Medical Center, Brooklyn, USA

Author notes

Hussam Alkaissi hussam.alkaissi@ 123456downstate.edu

Article

DOI: 10.7759/cureus.35179

PMC ID: 9939079

PubMed ID: 36811129

SO-VID: a8e0b958-e9b9-49d2-a07e-78b1803a28b7

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Artificial Hallucinations in ChatGPT: Implications in Scientific Writing

Read this article at

Abstract

Related collections

Artificial Intelligence in Medicine

Most cited references 6

ChatGPT listed as author on research papers: many scientists disapprove

Survey of Hallucination in Natural Language Generation

Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 203

Cited by 191

Most referenced authors 48