AAAI Conference 2026 Conference Paper
Exploring Position Encoding Mechanism in Diffusion U-Net for Training-free High-resolution Image Generation
- Feng Zhou
- Pu Cao
- Yiyang Ma
- Lu Yang
- Yonghao Dang
- Jianqin Yin
Denoising higher-resolution latents using a pre-trained U-Net often results in repetitive and disordered image patterns. In this work, we are motivated to reveal the intrinsic cause of such pattern disruption in high-resolution image generation. Through theoretical analysis and empirical studies, we reveal that the pre-trained U-Net fails to provide sufficient positional information for tokens at high-resolution. Specifically, 1) zero-padding serves as a critical mechanism for position encoding but lacks robustness across varying resolutions; and 2) tokens located farther from the feature map boundaries have increasing difficulty acquiring positional awareness, leading to pattern disruptions. Inspired by these findings, we propose a novel training-free approach for high-resolution generation, introducing a Progressive Boundary Complement (PBC) method. It creates dynamic virtual image boundaries inside the feature map to supplement position information at high resolution, enabling high-quality and rich-content high-resolution image synthesis. Extensive experiments show that our method significantly improves high-resolution image synthesis in terms of visual quality and content richness, achieving state-of-the-art performance.