HandVQA: Diagnosing Fine-Grained Spatial Reasoning Failures in Vision-Language Models via Hand Pose Question Answering
Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
Authors. MD K. C. Sayem*, M. T. Chowdhury*, Y. Y. Tiruneh, M. A. Khan, M. S. Ali, B. Bhattarai, S. Baek (* equal contribution).
Venue. CVPR 2026.
Project page. kcsayem.github.io/handvqa
Summary.
- Introduces HandVQA, a 1.6M-scale benchmark derived deterministically from 3D hand pose annotations across FreiHAND, InterHand2.6M, and FPHA.
- Provides a fine-grained diagnostic for spatial reasoning failures in modern vision–language models on hand pose question answering.
