DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions

Xinran Wang, Yuxuan Zhang, Xiao Zhang, Haolong Yan, Muxi Diao, Songyu Xu, Zhonghao Yan, Hongbing Li, Kongming Liang*, Zhanyu Ma
Beijing University of Posts and Telecommunications, Beijing, China
Equal contribution * Corresponding author

Abstract

Accurately detecting and localizing hallucinations is a critical task to ensure the high accuracy of image captions. In the era of Multimodal Large Language Models (MLLMs), captions have evolved from brief sentences into comprehensive narratives, often exceeding hundreds of words. This shift exponentially increases the challenge: models must now pinpoint specific erroneous spans or words within extensive contexts, rather than merely flagging response-level inconsistencies. However, existing benchmarks lack the fine-grained granularity and domain diversity required to evaluate this capability. To bridge this gap, we introduce DetailVerifyBench, a rigorous benchmark comprising 1,000 high-quality images across five distinct domains. With an average caption length of over 200 words and dense, token-level annotations of multiple hallucination types, it stands as the most challenging benchmark for precise hallucination localization in the long image caption to date.

1,000
Images
5
Domains
200+
Caption Avg. Words
10
Hallucination Types

Benchmark Overview

Domain Source #Images Avg. Caption Length #Hallucination Words Hallucination Rate
GUI Screenspot Pro 200 196 425 68%
Nature DOCCI 200 148 173 26%
Chart Echarts Examples 200 197 322 41%
Movie CineTechBench + ShotBench 200 214 1,094 88%
Poster IMDB + Movie Poster 100k 200 257 1,235 90%

Benchmark Construction Pipeline

Pipeline Step 1

Leaderboard

Type:

Click any column header to sort. Domain scores show token-level F1 per domain.

Citation

@misc{detailverifybench,
  title={DetailVerifyBench: A Benchmark for Dense Hallucination Localization in Long Image Captions},
  author={Xinran Wang and Yuxuan Zhang and Xiao Zhang and Haolong Yan and Muxi Diao and
          Songyu Xu and Zhonghao Yan and Hongbing Li and Kongming Liang and Zhanyu Ma},
  year={2025},
  url={https://github.com/zyx-hhnkh/DetailVerifyBench}
}