AAAI 2026
HARK: Hierarchical Agentic Retrieval with Keyframing for Video Understanding (Student Abstract)
Abstract
Current video understanding models struggle with temporal reasoning and efficient processing while balancing detail preservation with computational efficiency. We propose a hierarchical memory system that segments videos into action and scene units, combined with question-aware agentic keyframe selection. Our method achieves 70.3% overall accuracy on VideoMME short video benchmarks.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- AAAI Conference on Artificial Intelligence
- Archive span
- 1980-2026
- Indexed papers
- 28718
- Paper id
- 490251843341450957