AAAI Conference 2026 Short Paper
HARK: Hierarchical Agentic Retrieval with Keyframing for Video Understanding (Student Abstract)
- Jingcheng Li
- Ye Qiao
- Sitao Huang
Current video understanding models struggle with temporal reasoning and efficient processing while balancing detail preservation with computational efficiency. We propose a hierarchical memory system that segments videos into action and scene units, combined with question-aware agentic keyframe selection. Our method achieves 70.3% overall accuracy on VideoMME short video benchmarks.