BeHonest: Benchmarking Honesty of Large Language Models

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Previous works on Large Language Models (LLMs) have mainly focused on evaluating their helpfulness or harmlessness. However, honesty, another crucial alignment criterion, has received relatively less attention. Dishonest behaviors in LLMs, such as spreading misinformation and defrauding users, eroding user trust, and causing real-world harm, present severe risks that intensify as these models approach superintelligence levels. Enhancing honesty in LLMs addresses critical deficiencies and helps uncover latent capabilities that are not readily expressed. This underscores the urgent need for reliable methods and benchmarks to effectively ensure and evaluate the honesty of LLMs. In this paper, we introduce BeHonest, a pioneering benchmark specifically designed to assess honesty in LLMs comprehensively. BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries, avoidance of deceit, and consistency in responses. Building on this foundation, we designed 10 scenarios to evaluate and analyze 9 popular LLMs on the market, including both closed-source and open-source models from different model families with varied model sizes. Our findings indicate that there is still significant room for improvement in the honesty of LLMs. We also encourage the AI community to prioritize honesty alignment in LLMs. Our benchmark and code can be found at: \url{https://github.com/GAIR-NLP/BeHonest}.

Related collections

Author and article information

Journal

Publication date Created: 19 June 2024

Article

ArXiV ID: 2406.13261

SO-VID: 8db2333c-ea2c-4b8d-8b5d-9518f1c3b683

License:

http://creativecommons.org/licenses/by-sa/4.0/

History

Custom metadata

Categories cs.CL cs.AI

ScienceOpen disciplines: Theoretical computer science,Artificial intelligence

Data availability:

ScienceOpen disciplines: Theoretical computer science, Artificial intelligence

BeHonest: Benchmarking Honesty of Large Language Models

Read this article at

Abstract

Related collections

Radiology and Natural Language Processing

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 562