Arrow Research search
Back to AAAI

AAAI 2026

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models

Conference Paper AAAI Technical Track on Computer Vision VII Artificial Intelligence

Abstract

Text-to-video models have demonstrated impressive capabilities in producing diverse video content, yet often lack fine-grained control over motion. We address the problem of motion transfer: given a source video and a target text prompt, generate a new video that preserves the source motion while matching the target semantics and allowing large changes in appearance and scene layout. We introduce MotionFlow, a training-free framework that performs test-time latent optimization guided by attention-derived motion cues. MotionFlow first extracts cross-attention maps from a pre-trained video diffusion model and converts them into spatio-temporal motion masks for the source subject. During generation, it optimizes the target latents so that their evolving attention patterns align with these masks, while the target text controls appearance. This avoids direct attention-map replacement and any model-specific fine-tuning, reducing artifacts and improving flexibility. Qualitative and quantitative experiments, including a user study, show that MotionFlow outperforms existing methods in motion fidelity, temporal consistency, and versatility, even under drastic scene changes.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
AAAI Conference on Artificial Intelligence
Archive span
1980-2026
Indexed papers
28718
Paper id
232783464894246052