Attributive Reasoning for Hallucination Diagnosis of Large Language Models

Yuyan Chen; Zehao Li; Shuangjie You; Zhengyu Chen; Jingwen Chang; Yi Zhang; Weinan Dai; Qingpei Guo; Yanghua Xiao

doi:10.1609/aaai.v39i22.34536

Back to AAAI

AAAI 2025

Attributive Reasoning for Hallucination Diagnosis of Large Language Models

Conference Paper AAAI Technical Track on Natural Language Processing I Artificial Intelligence

PDF Details DOI

Abstract

In recent years, large language models (LLMs) have demonstrated outstanding capabilities in various tasks. However, LLMs also have various drawbacks, especially hallucination. Hallucination refers to the generation of content that does not align with the user input, contradicts previously generated content or world knowledge. Current research on hallucination mainly include knowledge retrieval, prompt engineering, training data improvement, reinforcement learning, etc. However, these methods do not involve different categories of hallucinations which is important on hallucination analysis, and make detailed investigation for the internal state of LLMs which indicates the direction on hallucination occurrence. Therefore, in our research, we introduce an attribution framework to trace the origins of hallucinations based on the internal signals of LLMs. To support this framework, we develop a new benchmark named RelQA-Cate, which includes eight categories of hallucinations for the answers generated by LLMs. After that, we present a novel Differential Penalty Decoding (DPD) strategy for reducing hallucinations through adjusting post-probabilities of each answer. We conduct a series of experiments and the performance on answer reliability has significant improvement, achieving 28.25% at most, which demonstrates the effectiveness of our proposed DPD and its generalization in mitigating hallucination in LLMs.

Attributive Reasoning for Hallucination Diagnosis of Large Language Models

Abstract

Authors

Keywords

Context