S²Flow: Towards Fast and Authentic Training-Free High-Resolution Video Generation

Chaoqun Wang; Shaobo Min; Xu Yang

doi:10.1609/aaai.v40i12.37932

Back to AAAI

AAAI 2026

S²Flow: Towards Fast and Authentic Training-Free High-Resolution Video Generation

Conference Paper AAAI Technical Track on Computer Vision IX Artificial Intelligence

PDF Details DOI

Abstract

Rectified flow models have shown strong potential in high-fidelity video generation, yet extending them to high-resolution remains challenging due to the high cost of full attention and error accumulation in the ODE-solving process. In this paper, we propose S^2Flow, a training-free framework that enables efficient and authentic high-resolution video generation by jointly exploring Flow-guided Sparse attention and Second-order ODE solution. Specifically, S^2Flow exploits and transfers the semantic and structural information from the low-resolution flow trajectory to guide the high-resolution flow in two aspects. First, S^2Flow dynamically captures the sparse patterns of the spatio-temporal attention maps from low-resolution videos to construct localized 3D windows, enabling efficient window attention in high-resolution inference. This can significantly reduce redundant computation while preserving contextual dependencies. Second, S^2Flow adopts a second-order ODE solver based on Taylor expansion, where the high-order derivative is approximated via central difference from the low-resolution flow, facilitating accurate high-resolution denoising. Extensive experiments on VBench dataset demonstrate that S^2Flow outperforms prior methods in both visual quality and inference speed, enabling 4x acceleration on 2560x1536 video generation.

S²Flow: Towards Fast and Authentic Training-Free High-Resolution Video Generation

Abstract

Authors

Keywords

Context