Zero-to-Hero: Empowering Video Appearance Transfer with Zero-Shot Initialization and Holistic Restoration

Tongtong Su; Chengyu Wang; Haipeng Liao; Jun Huang; Dongming Lu

doi:10.1609/aaai.v40i11.37872

Back to AAAI

AAAI 2026

Zero-to-Hero: Empowering Video Appearance Transfer with Zero-Shot Initialization and Holistic Restoration

Conference Paper AAAI Technical Track on Computer Vision VIII Artificial Intelligence

PDF Details DOI

Abstract

Appearance editing according to user needs is a pivotal task in video editing. Existing text-guided methods often lead to ambiguities regarding user intentions and restrict fine-grained control over editing specific aspects of objects. To overcome these limitations, this paper introduces a novel approach named Zero-to-Hero, which focuses on reference-based video editing by disentangling the editing process into two distinct problems. It achieves this by first editing an anchor frame to satisfy user requirements as a reference image and then consistently propagating its appearance across the other frames in the video. To achieve accurate appearance propagation, in the first stage of Zero-to-Hero, we leverage correspondences within the original frames to guide the attention mechanism, which is more robust than previously proposed optical flow or temporal modules in memory-friendly video generative models, especially when dealing with objects exhibiting large motions. This offers a solid zero-shot initialization that ensures both accuracy and temporal consistency. However, intervention in the attention mechanism results in compounded imaging degradation with unknown blurring and color-missing issues. Following the Zero-Stage, our Hero-Stage holistically learns a conditional generative model for video restoration. To accurately evaluate appearance consistency, we construct a set of videos with multiple appearances using Blender, enabling a fine-grained and deterministic evaluation. Our method outperforms the best-performing baseline with a PSNR improvement of 2.6 dB.

Zero-to-Hero: Empowering Video Appearance Transfer with Zero-Shot Initialization and Holistic Restoration

Abstract

Authors

Keywords

Context