Arrow Research search

Author name cluster

Kun Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

AAAI Conference 2026 Conference Paper

Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution

  • Xiao He
  • Zhijun Tu
  • Kun Cheng
  • Mingrui Zhu
  • Jie Hu
  • Nannan Wang
  • Xinbo Gao

The demonstrated success of sparsely-gated Mixture-of-Experts (MoE) architectures, exemplified by models such as DeepSeek and Grok, has motivated researchers to investigate their adaptation to diverse domains. In real-world image super-resolution (Real-ISR), existing approaches mainly rely on fine-tuning pre-trained diffusion models through Low-Rank Adaptation (LoRA) module to reconstruct high-resolution (HR) images. However, these dense Real-ISR models are limited in their ability to adaptively capture the heterogeneous characteristics of complex real-world degraded samples or enable knowledge sharing between inputs under equivalent computational budgets. To address this, we investigate the integration of sparse MoE into Real-ISR and propose a Mixture-of-Ranks (MoR) architecture for single-step image super-resolution. We introduce a fine-grained expert partitioning strategy that treats each rank in LoRA as an independent expert. This design enables flexible knowledge recombination while isolating fixed-position ranks as shared experts to preserve common-sense features and minimize routing redundancy. Furthermore, we develop a degradation estimation module leveraging CLIP embeddings and predefined positive-negative text pairs to compute relative degradation scores, dynamically guiding expert activation. To better accommodate varying sample complexities, we incorporate zero-expert slots and propose a degradation-aware load-balancing loss, which dynamically adjusts the number of active experts based on degradation severity, ensuring optimal computational resource allocation. Comprehensive experiments validate our framework's effectiveness and state-of-the-art performance.

ICML Conference 2025 Conference Paper

Diff-MoE: Diffusion Transformer with Time-Aware and Space-Adaptive Experts

  • Kun Cheng
  • Xiao He 0014
  • Lei Yu
  • Zhijun Tu
  • Mingrui Zhu
  • Nannan Wang 0001
  • Xinbo Gao 0001
  • Jie Hu 0021

Diffusion models have transformed generative modeling but suffer from scalability limitations due to computational overhead and inflexible architectures that process all generative stages and tokens uniformly. In this work, we introduce Diff-MoE, a novel framework that combines Diffusion Transformers with Mixture-of-Experts to exploit both temporarily adaptability and spatial flexibility. Our design incorporates expert-specific timestep conditioning, allowing each expert to process different spatial tokens while adapting to the generative stage, to dynamically allocate resources based on both the temporal and spatial characteristics of the generative task. Additionally, we propose a globally-aware feature recalibration mechanism that amplifies the representational capacity of expert modules by dynamically adjusting feature contributions based on input relevance. Extensive experiments on image generation benchmarks demonstrate that Diff-MoE significantly outperforms state-of-the-art methods. Our work demonstrates the potential of integrating diffusion models with expert-based designs, offering a scalable and effective framework for advanced generative modeling.

AAAI Conference 2025 Conference Paper

Effective Diffusion Transformer Architecture for Image Super-Resolution

  • Kun Cheng
  • Lei Yu
  • Zhijun Tu
  • Xiao He
  • Liyu Chen
  • Yong Guo
  • Mingrui Zhu
  • Nannan Wang

Recent advances indicate that diffusion model holds great promise in image super-resolution. While latest methods are primarily based on latent diffusion models with convolutional neural networks, there are few attempts to explore transformers, which have demonstrated remarkable performance in image generation. In this work, we design an effective diffusion transformer for image super resolution (DiT-SR) that achieves the visual quality of prior-based methods, but through a training-from-scratch manner. In practice, DiT-SR leverages an overall U-shaped architecture, and adopts uniform isotropic design for all the transformer blocks across different stages. The former facilitates multi-scale hierarchical feature extraction, while the latter reallocate the computational resources to critical layers to further enhance performance. Moreover, we thoroughly analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module, enhancing the model's capacity to process distinct frequency information at different time steps. Extensive experiments demonstrate that DiT-SR outperforms the existing training-from-scratch diffusion-based SR methods significantly, and even beats some of the prior-based methods on pretrained Stable Diffusion, proving the superiority of diffusion transformer in image super resolution.

IJCAI Conference 2024 Conference Paper

Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors

  • Shiyin Dong
  • Mingrui Zhu
  • Kun Cheng
  • Nannan Wang
  • Xinbo Gao

The remarkable prowess of diffusion models in image generation has spurred efforts to extend their application beyond generative tasks. However, a persistent challenge exists in lacking a unified approach to apply diffusion models to visual perception tasks with diverse semantic granularity requirements. Our purpose is to establish a unified visual perception framework, capitalizing on the potential synergies between generative and discriminative models. In this paper, we propose Vermouth, a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an Adapted-Expert providing discriminative priors. Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages. We emphasize that there is no necessity for incorporating a heavyweight or intricate decoder to transform diffusion models into potent representation learners. Extensive comparative evaluations against tailored discriminative models showcase the efficacy of our approach on zero-shot sketch-based image retrieval (ZS-SBIR), few-shot classification, and open-vocabulary (OV) semantic segmentation tasks. The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.

EAAI Journal 2024 Journal Article

Deep learning approach for accurate and stable recognition of driver's lateral intentions using naturalistic driving data

  • Kun Cheng
  • Dongye Sun
  • Datong Qin
  • Chong Chen

Accurate and stable recognition of a driver's lateral intention is a crucial prerequisite for the proper functioning of advanced driver-assistance systems (ADAS). Existing studies usually rely on auxiliary sensor signals, such as cameras and eye trackers; however, this reliance poses challenges in applying these methods to vehicles lacking such auxiliary sensors. Furthermore, existing studies have not fully leveraged the inherent temporal dependence of lateral intentions, leading to difficulties in avoiding erroneous recognition interruptions. Thus, this study proposes a deep-learning-based lateral intention recognition method to achieve accurate and stable recognition of lateral intention using onboard sensor signals. First, a real vehicle is used to collect a vast amount of driving data, and thus guarantee the robustness and practicality of the recognition model. Subsequently, vehicle trajectories are extracted, and a trajectory clustering method is used to label lateral intentions of the driving data; these intention labels and a feature selection algorithm are utilized to select the most representative recognition features. Therefore, a lateral driving intention recognition model is constructed using double convolutional neural networks with a long short-term memory layer (CNN-LSTM). This network architecture can fully utilize the temporal dependence of lateral intentions. Finally, the recognition performance of the designed double CNN-LSTM networks is validated using the existing driving data and real-world vehicle tests. The results indicate that the double CNN-LSTM networks can achieve stable recognition of lateral intention in real-time and the accuracy reaches 98. 64% in the experiment.