Arrow Research search
Back to NeurIPS

NeurIPS 2025

Diffusion on Demand: Selective Caching and Modulation for Efficient Generation

Conference Paper Main Conference Track Artificial Intelligence ยท Machine Learning

Abstract

Diffusion transformers demonstrate significant potential for various generation tasks but are challenged by high computational cost. Recently, feature caching methods have been introduced to improve inference efficiency by storing features at certain timesteps and reusing them at subsequent timesteps. However, their effectiveness is limited as they rely only on choosing between cached features and performing model inference. Motivated by high cosine similarity between features across consecutive timesteps, we propose a cache-based framework that reuses features and selectively adapts them through linear modulation. In our framework, the selection is performed via a modulation gate, and both the gate and modulation parameters are learned. Extensive experiments show that our method achieves similar generation performance to the original sampler while requiring significantly less computation. For example, FLOPs and inference latency are reduced by $2. 93\times$ and $2. 15\times$ for DiT-XL/2 and by $2. 83\times$ and $1. 50\times$ for PixArt-$\alpha$, respectively. We find that modulation is effective when applied to as little as 2\% of layers, resulting in negligible computation overhead.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Annual Conference on Neural Information Processing Systems
Archive span
1987-2025
Indexed papers
30776
Paper id
1137807014163333638