Author name cluster

Kun Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

AAAI Conference 2026 Conference Paper

Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution

Xiao He
Zhijun Tu
Kun Cheng
Mingrui Zhu
Jie Hu
Nannan Wang
Xinbo Gao

The demonstrated success of sparsely-gated Mixture-of-Experts (MoE) architectures, exemplified by models such as DeepSeek and Grok, has motivated researchers to investigate their adaptation to diverse domains. In real-world image super-resolution (Real-ISR), existing approaches mainly rely on fine-tuning pre-trained diffusion models through Low-Rank Adaptation (LoRA) module to reconstruct high-resolution (HR) images. However, these dense Real-ISR models are limited in their ability to adaptively capture the heterogeneous characteristics of complex real-world degraded samples or enable knowledge sharing between inputs under equivalent computational budgets. To address this, we investigate the integration of sparse MoE into Real-ISR and propose a Mixture-of-Ranks (MoR) architecture for single-step image super-resolution. We introduce a fine-grained expert partitioning strategy that treats each rank in LoRA as an independent expert. This design enables flexible knowledge recombination while isolating fixed-position ranks as shared experts to preserve common-sense features and minimize routing redundancy. Furthermore, we develop a degradation estimation module leveraging CLIP embeddings and predefined positive-negative text pairs to compute relative degradation scores, dynamically guiding expert activation. To better accommodate varying sample complexities, we incorporate zero-expert slots and propose a degradation-aware load-balancing loss, which dynamically adjusts the number of active experts based on degradation severity, ensuring optimal computational resource allocation. Comprehensive experiments validate our framework's effectiveness and state-of-the-art performance.

PDF Details DOI

ICML Conference 2025 Conference Paper

Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts

Kun Cheng
Xiao He 0014
Lei Yu
Zhijun Tu
Mingrui Zhu
Nannan Wang 0001
Xinbo Gao 0001
Jie Hu 0021

Diffusion models have transformed generative modeling but suffer from scalability limitations due to computational overhead and inflexible architectures that process all generative stages and tokens uniformly. In this work, we introduce Diff-MoE, a novel framework that combines Diffusion Transformers with Mixture-of-Experts to exploit both temporarily adaptability and spatial flexibility. Our design incorporates expert-specific timestep conditioning, allowing each expert to process different spatial tokens while adapting to the generative stage, to dynamically allocate resources based on both the temporal and spatial characteristics of the generative task. Additionally, we propose a globally-aware feature recalibration mechanism that amplifies the representational capacity of expert modules by dynamically adjusting feature contributions based on input relevance. Extensive experiments on image generation benchmarks demonstrate that Diff-MoE significantly outperforms state-of-the-art methods. Our work demonstrates the potential of integrating diffusion models with expert-based designs, offering a scalable and effective framework for advanced generative modeling.

Details

AAAI Conference 2025 Conference Paper

Effective Diffusion Transformer Architecture for Image Super-Resolution

Kun Cheng
Lei Yu
Zhijun Tu
Xiao He
Liyu Chen
Yong Guo
Mingrui Zhu
Nannan Wang

Recent advances indicate that diffusion model holds great promise in image super-resolution. While latest methods are primarily based on latent diffusion models with convolutional neural networks, there are few attempts to explore transformers, which have demonstrated remarkable performance in image generation. In this work, we design an effective diffusion transformer for image super resolution (DiT-SR) that achieves the visual quality of prior-based methods, but through a training-from-scratch manner. In practice, DiT-SR leverages an overall U-shaped architecture, and adopts uniform isotropic design for all the transformer blocks across different stages. The former facilitates multi-scale hierarchical feature extraction, while the latter reallocate the computational resources to critical layers to further enhance performance. Moreover, we thoroughly analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module, enhancing the model's capacity to process distinct frequency information at different time steps. Extensive experiments demonstrate that DiT-SR outperforms the existing training-from-scratch diffusion-based SR methods significantly, and even beats some of the prior-based methods on pretrained Stable Diffusion, proving the superiority of diffusion transformer in image super resolution.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors

Shiyin Dong
Mingrui Zhu
Kun Cheng
Nannan Wang
Xinbo Gao

The remarkable prowess of diffusion models in image generation has spurred efforts to extend their application beyond generative tasks. However, a persistent challenge exists in lacking a unified approach to apply diffusion models to visual perception tasks with diverse semantic granularity requirements. Our purpose is to establish a unified visual perception framework, capitalizing on the potential synergies between generative and discriminative models. In this paper, we propose Vermouth, a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an Adapted-Expert providing discriminative priors. Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages. We emphasize that there is no necessity for incorporating a heavyweight or intricate decoder to transform diffusion models into potent representation learners. Extensive comparative evaluations against tailored discriminative models showcase the efficacy of our approach on zero-shot sketch-based image retrieval (ZS-SBIR), few-shot classification, and open-vocabulary (OV) semantic segmentation tasks. The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.

PDF Details DOI

EAAI Journal 2024 Journal Article

Deep learning approach for accurate and stable recognition of driver's lateral intentions using naturalistic driving data

Kun Cheng
Dongye Sun
Datong Qin
Chong Chen

Accurate and stable recognition of a driver's lateral intention is a crucial prerequisite for the proper functioning of advanced driver-assistance systems (ADAS). Existing studies usually rely on auxiliary sensor signals, such as cameras and eye trackers; however, this reliance poses challenges in applying these methods to vehicles lacking such auxiliary sensors. Furthermore, existing studies have not fully leveraged the inherent temporal dependence of lateral intentions, leading to difficulties in avoiding erroneous recognition interruptions. Thus, this study proposes a deep-learning-based lateral intention recognition method to achieve accurate and stable recognition of lateral intention using onboard sensor signals. First, a real vehicle is used to collect a vast amount of driving data, and thus guarantee the robustness and practicality of the recognition model. Subsequently, vehicle trajectories are extracted, and a trajectory clustering method is used to label lateral intentions of the driving data; these intention labels and a feature selection algorithm are utilized to select the most representative recognition features. Therefore, a lateral driving intention recognition model is constructed using double convolutional neural networks with a long short-term memory layer (CNN-LSTM). This network architecture can fully utilize the temporal dependence of lateral intentions. Finally, the recognition performance of the designed double CNN-LSTM networks is validated using the existing driving data and real-world vehicle tests. The results indicate that the double CNN-LSTM networks can achieve stable recognition of lateral intention in real-time and the accuracy reaches 98. 64% in the experiment.

Details DOI