There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Background The emergence of systems based on large language models (LLMs) such as OpenAI’s ChatGPT has created a range of discussions in scholarly circles. Since LLMs generate grammatically correct and mostly relevant (yet sometimes outright wrong, irrelevant or biased) outputs in response to provided prompts, using them in various writing tasks including writing peer review reports could result in improved productivity. Given the significance of peer reviews in the existing scholarly publication landscape, exploring challenges and opportunities of using LLMs in peer review seems urgent. After the generation of the first scholarly outputs with LLMs, we anticipate that peer review reports too would be generated with the help of these systems. However, there are currently no guidelines on how these systems should be used in review tasks. Methods To investigate the potential impact of using LLMs on the peer review process, we used five core themes within discussions about peer review suggested by Tennant and Ross-Hellauer. These include 1) reviewers’ role, 2) editors’ role, 3) functions and quality of peer reviews, 4) reproducibility, and 5) the social and epistemic functions of peer reviews. We provide a small-scale exploration of ChatGPT’s performance regarding identified issues. Results LLMs have the potential to substantially alter the role of both peer reviewers and editors. Through supporting both actors in efficiently writing constructive reports or decision letters, LLMs can facilitate higher quality review and address issues of review shortage. However, the fundamental opacity of LLMs’ training data, inner workings, data handling, and development processes raise concerns about potential biases, confidentiality and the reproducibility of review reports. Additionally, as editorial work has a prominent function in defining and shaping epistemic communities, as well as negotiating normative frameworks within such communities, partly outsourcing this work to LLMs might have unforeseen consequences for social and epistemic relations within academia. Regarding performance, we identified major enhancements in a short period and expect LLMs to continue developing. Conclusions We believe that LLMs are likely to have a profound impact on academia and scholarly communication. While potentially beneficial to the scholarly communication system, many uncertainties remain and their use is not without risks. In particular, concerns about the amplification of existing biases and inequalities in access to appropriate infrastructure warrant further attention. For the moment, we recommend that if LLMs are used to write scholarly reviews and decision letters, reviewers and editors should disclose their use and accept full responsibility for data security and confidentiality, and their reports’ accuracy, tone, reasoning and originality. Supplementary Information The online version contains supplementary material available at 10.1186/s41073-023-00133-5.
The increasing volume of research submissions to academic journals poses a significant challenge for traditional peer-review processes. To address this issue, this study explores the potential of employing ChatGPT, an advanced large language model (LLM), developed by OpenAI, as an artificial intelligence (AI) reviewer for academic journals. By leveraging the vast knowledge and natural language processing capabilities of ChatGPT, we hypothesize it may be possible to enhance the efficiency, consistency, and quality of the peer-review process. This research investigated key aspects of integrating ChatGPT into the journal review workflow. We compared the critical analysis of ChatGPT, acting as an AI reviewer, to human reviews for a single published article. Our methodological framework involved subjecting ChatGPT to an intricate examination, wherein its evaluative acumen was juxtaposed against human-authored reviews of a singular published article. As this is a feasibility study, one article was reviewed, which was a case report on scurvy. The entire article was used as an input into ChatGPT and commanded it to “Please perform a review of the following article and give points for revision.” Since this was a case report with a limited word count the entire article could fit in one chat box. The output by ChatGPT was then compared with the comments by human reviewers. Key performance metrics, including precision and overall agreement, were judiciously and subjectively measured to portray the efficacy of ChatGPT as an AI reviewer in comparison to its human counterparts. The outcomes of this rigorous analysis unveiled compelling evidence regarding ChatGPT’s performance as an AI reviewer. We demonstrated that ChatGPT’s critical analyses aligned with those of human reviewers, as evidenced by the inter-rater agreement. Notably, ChatGPT exhibited commendable capability in identifying methodological flaws, articulating insightful feedback on theoretical frameworks, and gauging the overall contribution of the articles to their respective fields. While the integration of ChatGPT showcased immense promise, certain challenges and caveats surfaced. For example, ambiguities might present with complex research articles, leading to nuanced discrepancies between AI and human reviews. Also figures and images cannot be reviewed by ChatGPT. Lengthy articles need to be reviewed in parts by ChatGPT as the entire article will not fit in one chat/response. The benefits consist of reduction in time needed by journals to review the articles submitted to them, as well as an AI assistant to give a different perspective about the research papers other than the human reviewers. In conclusion, this research contributes a groundbreaking foundation for incorporating ChatGPT into the pantheon of journal reviewers. The delineated guidelines distill key insights into operationalizing ChatGPT as a proficient reviewer within academic journal frameworks, paving the way for a more efficient and insightful review process.
Peer review is the established method for evaluating the quality and validity of research manuscripts in scholarly publishing. However, scientific peer review faces challenges as the volume of submitted research has steadily increased in recent years. Time constraints and peer review quality assurance can place burdens on reviewers, potentially discouraging their participation. Some artificial intelligence (AI) tools might assist in relieving these pressures. This study explores the efficiency and effectiveness of one of the artificial intelligence (AI) chatbots, ChatGPT (Generative Pre-trained Transformer), in the peer review process.
This is an Open Access article distributed under the terms of the Creative Commons
Attribution (CC BY), which permits unrestricted use, distribution, and reproduction
in any medium, as long as the original authors and source are cited. No permission
is required from the authors or the publishers.
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.