Diffusion on Demand: Selective Caching and Modulation for Efficient Generation

Hee Min Choi; Hyoa Kang; Dokwan Oh; Nam Ik Cho

Back to NeurIPS

NeurIPS 2025

Diffusion on Demand: Selective Caching and Modulation for Efficient Generation

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Diffusion transformers demonstrate significant potential for various generation tasks but are challenged by high computational cost. Recently, feature caching methods have been introduced to improve inference efficiency by storing features at certain timesteps and reusing them at subsequent timesteps. However, their effectiveness is limited as they rely only on choosing between cached features and performing model inference. Motivated by high cosine similarity between features across consecutive timesteps, we propose a cache-based framework that reuses features and selectively adapts them through linear modulation. In our framework, the selection is performed via a modulation gate, and both the gate and modulation parameters are learned. Extensive experiments show that our method achieves similar generation performance to the original sampler while requiring significantly less computation. For example, FLOPs and inference latency are reduced by $2. 93\times$ and $2. 15\times$ for DiT-XL/2 and by $2. 83\times$ and $1. 50\times$ for PixArt-$\alpha$, respectively. We find that modulation is effective when applied to as little as 2\% of layers, resulting in negligible computation overhead.

Diffusion on Demand: Selective Caching and Modulation for Efficient Generation

Abstract

Authors

Keywords

Context