Author name cluster

Chenyang Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

AAAI Conference 2026 Conference Paper

DialoGen: Towards Dialog Gesture Generation via Identity-Decoupled Style Guidance in Interactive Diffusion Model

Weiyu Zhao
Chenyang Wang
Liangxiao Hu
Zonglin Li
Wei Yu
Shengping Zhang

We propose DialoGen, a novel framework for generating realistic gestures for both interlocutors in dialog scenarios, conditioned on conversational audios. Unlike most existing methods that focus solely on a single speaker, DialoGen simultaneously generates synchronized gestures for both participants while also embedding identity-decoupled style into generated gestures that enhance realism and expressiveness. To ensure precise synchronization between interlocutors, DialoGen adopts an interactive dual-diffusion model with mutual interaction estimation, which integrates interaction correlation into the diffusion process. More importantly, by leveraging supervised contrastive learning, we develop the identity-decoupled style guidance to adaptively decompose the identity-specific style of interlocutors into latent space, enabling multi-style dialog gesture generation. Extensive experimental results demonstrate that our model significantly outperforms existing methods in generating realistic, speech-aligned, identity-specific gestures, offering a high-quality solution for various dialog scenarios.

PDF Details DOI

EAAI Journal 2026 Journal Article

Event-triggered adaptive robust non-singular fast terminal sliding mode fault-tolerant control for intelligent vehicle stability systems under extreme conditions

Min Gao
Jiaqi Li
Jing Li
Jin Luo
Chenyang Wang

The vehicle stability control system serves as the fundamental guarantee for the active safety of intelligent vehicles. In light of existing challenges such as ensemble uncertainty interference, limited communication resources, and actuator faults. This study presents an event-triggered adaptive robust non-singular fast terminal sliding mode fault-tolerant control method. Adaptive laws are formulated to evaluate the switching gains, thereby circumventing complications associated with insufficient prior knowledge of lumped uncertainties. Subsequently, an event-triggered adaptive robust non-singular fast terminal sliding mode control strategy is introduced to conserve communication resources, mitigate chattering, and prevent singularities. Furthermore, fault factors are incorporated into the vehicle dynamics framework to enhance the torque optimization allocation strategy for fault-tolerant control in the presence of actuator faults. The application of the Lyapunov stability theorem confirms stability over a limited duration, as well as Zeno-free behavior within the vehicle stability control system. Ultimately, CarSim and Matlab/Simulink co-simulation are employed to verify the effectiveness of the proposed method, with numerical simulations conducted across various complex driving conditions. The simulation data indicate that the proposed method reduces the number of event-triggered numbers for the yaw rate controller and the side slip angle controller by 67. 1 %/75. 2 % and 36. 1 %/28. 2 % for Conditions A and B, respectively, when compared to the fixed-time-triggered sliding mode controller and the without-control method.

Details DOI

AAAI Conference 2026 Conference Paper

ForeDiffusion: Foresight-Conditioned Diffusion Policy via Future View Construction for Robot Manipulation

Weize Xie
Yi Ding
Ying He
Leilei Wang
Binwen Bai
Zheyi Zhao
Chenyang Wang
F. Richard Yu

Diffusion strategies have advanced visual motor control by progressively denoising high-dimensional action sequences, providing a promising method for robot manipulation. However, as task complexity increases, the success rate of existing baseline models decreases considerably. Analysis indicates that current diffusion strategies are confronted with two limitations. First, these strategies only rely on short-term observations as conditions. Second, the training objective remains limited to a single denoising loss, which leads to error accumulation and causes grasping deviations. To address these limitations, this paper proposes Foresight-Conditioned Diffusion (ForeDiffusion), by injecting the predicted future view representation into the diffusion process. As a result, the policy is guided to be forward-looking, enabling it to correct trajectory deviations. Following this design, ForeDiffusion employs a dual loss mechanism, combining the traditional denoising loss and the consistency loss of future observations, to achieve the unified optimization. Extensive evaluation on the Adroit suite and the MetaWorld benchmark demonstrates that ForeDiffusion achieves an average success rate of 80% for the overall task, significantly outperforming the existing mainstream diffusion methods by approximately 20% in high difficulty tasks, while maintaining more stable performance across the entire tasks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following

Chenyang Wang
Liang Wen
Shousheng Jia
Xiangzheng Zhang
Liang Xu

While advancements in the reasoning abilities of LLMs have significantly enhanced their performance in solving mathematical problems, coding tasks, and general puzzles, their effectiveness in accurately adhering to instructions remains inconsistent, particularly with more complex directives. Our investigation identifies lazy reasoning during the thinking stage as the primary factor contributing to poor instruction adherence. To mitigate this issue, we propose a comprehensive framework designed to enable rigorous reasoning processes involving preview and self-checking, essential for satisfying strict instruction constraints. Specifically, we first generate instructions with complex constraints and apply a filtering process to obtain valid prompts, resulting in three distinct prompt datasets categorized as hard, easy, and pass. Then, we employ rejection sampling on the pass prompts to curate a small yet high-quality dataset, enabling a cold-start initialization of the model and facilitating its adaptation to effective reasoning patterns. Subsequently, we employ an entropy-preserving supervised fine-tuning (Entropy-SFT) strategy coupled with token-wise entropy-adaptive (TEA-RL) reinforcement learning guided by rule-based dense rewards. This approach encourages the model to transform its reasoning mechanism, ultimately fostering generalizable reasoning abilities that encompass preview and self-checking. Extensive experiments conducted on instruction-following benchmarks demonstrate remarkable performance improvements across various model scales.

PDF Details DOI

IROS Conference 2025 Conference Paper

A Robust Distributed Odometry for Mobile Robots with Steerable Wheels

Wang Xi
Jiaming Guo
Chenyang Wang
Shukun Wu
Jianping He 0001

Odometry estimation remains a critical challenge for wheeled robots, as reducing its drift directly mitigates dependency on external localization systems. This paper proposes a distributed odometry framework for steerable wheels, named ICF-DO, which is applicable to both Steerable Wheeled Mobile Robots (SWMRs) and cooperative multi-single-wheel robot systems. The proposed method features low computational complexity and reduced drift, while demonstrating strong robustness in communication-restricted scenarios. Additionally, singularity can be processed in a distributed manner in the proposed framework. Experimental validation on a real physical SWMR platform demonstrates the effectiveness and practicality of the proposed method.

Details

IROS Conference 2025 Conference Paper

A Two-Stage Lightweight Framework for Efficient Land-Air Bimodal Robot Autonomous Navigation

Yongjie Li
Zhou Liu
Wenshuai Yu
Zhangji Lu
Chenyang Wang
Fei Yu
Qingquan Li

Land-air bimodal robots (LABR) are gaining attention for autonomous navigation, combining high mobility from aerial vehicles with long endurance from ground vehicles. However, existing LABR navigation methods are limited by suboptimal trajectories from mapping-based approaches and the excessive computational demands of learning-based methods. To address this, we propose a two-stage lightweight framework that integrates global key points prediction with local trajectory refinement to generate efficient and reachable trajectories. In the first stage, the Global Key points Prediction Network (GKPN) was used to generate a hybrid land-air keypoint path. The GKPN includes a Sobel Perception Network (SPN) for improved obstacle detection and a Lightweight Attention Planning Network (LAPN) to improves predictive ability by capturing contextual information. In the second stage, the global path is segmented based on predicted key points and refined using a mapping-based planner to create smooth, collision-free trajectories. Experiments conducted on our LABR platform show that our framework reduces network parameters by 14% and energy consumption during land-air transitions by 35% compared to existing approaches. The framework achieves real-time navigation without GPU acceleration and enables zero-shot transfer from simulation to reality during deployment.

Details

IJCAI Conference 2025 Conference Paper

Active Multimodal Distillation for Few-shot Action Recognition

Weijia Feng
Yichen Zhu
Ruojia Zhang
Chenyang Wang
Fei Ma
Xiaobao Wang
Xiaobai Li

Owing to its rapid progress and broad application prospects, few-shot action recognition has attracted considerable interest. However, current methods are predominantly based on limited single-modal data, which does not fully exploit the potential of multimodal information. This paper presents a novel framework that actively identifies reliable modalities for each sample using task-specific contextual cues, thus significantly improving recognition performance. Our framework integrates an Active Sample Inference (ASI) module, which utilizes active inference to predict reliable modalities based on posterior distributions and subsequently organizes them accordingly. Unlike reinforcement learning, active inference replaces rewards with evidence-based preferences, making more stable predictions. Additionally, we introduce an active mutual distillation module that enhances the representation learning of less reliable modalities by transferring knowledge from more reliable ones. Adaptive multimodal inference is employed during the meta-test to assign higher weights to reliable modalities. Extensive experiments across multiple benchmarks demonstrate that our method significantly outperforms existing approaches.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Integration of Old and New Knowledge for Generalized Intent Discovery: A Consistency-driven Prototype-Prompting Framework

Xiao Wei
Xiaobao Wang
Ning Zhuang
Chenyang Wang
Longbiao Wang
Jianwu Dang

Intent detection aims to identify user intents from natural language inputs, where supervised methods rely heavily on labeled in-domain (IND) data and struggle with out-of-domain (OOD) intents, limiting their practical applicability. Generalized Intent Discovery (GID) addresses this by leveraging unlabeled OOD data to discover new intents without additional annotation. However, existing methods focus solely on clustering unsupervised data while neglecting domain adaptation. Therefore, we propose a consistency-driven prototype-prompting framework for GID from the perspective of integrating old and new knowledge, which includes a prototype-prompting framework for transferring old knowledge from external sources, and a hierarchical consistency constraint for learning new knowledge from target domains. We conducted extensive experiments and the results show that our method significantly outperforms all baseline methods, achieving state-of-the-art results, which strongly demonstrates the effectiveness and generalization of our methods. Our source code is publicly available at https: //github. com/smileix/cpp.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

Shi Qiu
Shaoyang Guo
Zhuo-Yang Song
Yunbo Sun
Zeyu Cai
Jiashen Wei
Tianyu Luo
Yixuan Yin

Current benchmarks for evaluating the reasoning capabilities of Large Language Models (LLMs) face significant limitations: task oversimplification, data contamination, and flawed evaluation items. These deficiencies necessitate more rigorous assessment methods. To address these limitations, we introduce PHYBench, a benchmark of 500 original physics problems ranging from high school to Physics Olympiad difficulty. PHYBench addresses data contamination through original content and employs a systematic curation pipeline to eliminate flawed items. Evaluations show that PHYBench activates more tokens and provides stronger differentiation between reasoning models compared to other baselines like AIME 2024, OlympiadBench and GPQA. Even the best-performing model, Gemini 2. 5 Pro, achieves only 36. 9\% accuracy compared to human experts' 61. 9\%. To further enhance evaluation precision, we introduce the Expression Edit Distance (EED) Score for mathematical expression assessment, which improves sample efficiency by 204\% over binary scoring. Moreover, PHYBench effectively elicits multi-step and multi-condition reasoning, providing a platform for examining models' reasoning robustness, preferences, and deficiencies. The benchmark results and dataset are publicly available at https: //www. phybench. cn/.

PDF Details

IROS Conference 2024 Conference Paper

Adaptive Stochastic Nonlinear Model Predictive Control with Look-ahead Deep Reinforcement Learning for Autonomous Vehicle Motion Control

Baha Zarrouki
Chenyang Wang
Johannes Betz

Propagating uncertainties through nonlinear system dynamics in the context of Stochastic Nonlinear Model Predictive Control (SNMPC) is challenging, especially for high-dimensional systems requiring real-time control and operating under time-variant uncertainties such as autonomous vehicles. In this work, we propose an Adaptive SNMPC (aSNMPC) driven by Deep Reinforcement Learning (DRL) to optimize uncertainty handling, constraints robustification, feasibility, and closed-loop performance. To this end, our SNMPC uses Polynomial Chaos Expansion (PCE) for efficient uncertainty propagation, limits its propagation time through an Uncertainty Propagation Horizon (UPH), and transforms nonlinear chance constraints into robustified deterministic ones. We conceive a DRL agent to proactively anticipate upcoming control tasks and to dynamically reduce conservatism by determining the most suitable constraints robustification factor κ, and to enhance feasibility by choosing optimal UPH length T u. We analyze the trained DRL agent’s decision-making process and highlight its ability to learn context-dependent optimal parameters. We showcase the enhanced robustness and feasibility of our DRL-driven aSNMPC through the real-time motion control task of an autonomous passenger vehicle when confronted with significant time-variant disturbances while achieving a minimum solution frequency of 110Hz. The code used in this research is publicly accessible as open-source software: https://github.com/bzarr/TUM-CONTROL

Details

AAAI Conference 2024 Conference Paper

Low-Light Face Super-resolution via Illumination, Structure, and Texture Associated Representation

Chenyang Wang
Junjun Jiang
Kui Jiang
Xianming Liu

Human face captured at night or in dimly lit environments has become a common practice, accompanied by complex low-light and low-resolution degradations. However, the existing face super-resolution (FSR) technologies and derived cascaded schemes are inadequate to recover credible textures. In this paper, we propose a novel approach that decomposes the restoration task into face structural fidelity maintaining and texture consistency learning. The former aims to enhance the quality of face images while improving the structural fidelity, while the latter focuses on eliminating perturbations and artifacts caused by low-light degradation and reconstruction. Based on this, we develop a novel low-light low-resolution face super-resolution framework. Our method consists of two steps: an illumination correction face super-resolution network (IC-FSRNet) for lighting the face and recovering the structural information, and a detail enhancement model (DENet) for improving facial details, thus making them more visually appealing and easier to analyze. As the relighted regions could provide complementary information to boost face super-resolution and vice versa, we introduce the mutual learning to harness the informative components from relighted regions and reconstruction, and achieve the iterative refinement. In addition, DENet equipped with diffusion probabilistic model is built to further improve face image visual quality. Experiments demonstrate that the proposed joint optimization framework achieves significant improvements in reconstruction quality and perceptual quality over existing two-stage sequential solutions. Code is available at https://github.com/wcy-cs/IC-FSRDENet.

PDF Details DOI

EAAI Journal 2023 Journal Article

A systematic empirical study on word embedding based methods in discovering Chinese black keywords

Chenyang Wang
YI Shen
Yuwei Li
Min Zhang
Miao Hu
Jinghua Zheng

With the development of online transactions, the Chinese cyber black market is proliferating and facilitates many cybercrimes. It is difficult to understand the cyber black market due to the confusing jargon (called black keywords in this paper) used by criminals to conceal underground transactions. To discover black keywords automatically, some natural language processing based methods have been proposed by comparing the similarity of word vectors generated by word embedding models. Therefore, the quality of word vectors generated has a significant impact on black keyword discovery and it is necessary to evaluate different word embedding models in discovering black keywords. To this end, we design a Chinese black keyword discovery framework and conduct a systematic empirical study on six existing word embedding models including both static and dynamic types in discovering Chinese black keywords. In specific, we classify Chinese black keywords in four types: domain specific words (DSWs), new meaning words (NMWs), similar pronunciation words (SPWs), and similar glyph words (SGWs). We experimentally find that different word embedding models vary greatly in performance when discovering black keywords, e. g. , dynamic models perform well in discovering DSWs and NMWs, static ones perform poorly in discovering NMWs. We improve the static word embedding model based NMW discovery algorithm by additionally comparing the differences in cross-corpus word nearest-neighbors before and after domain incremental training. For effectively discovering variant words like SPWs and SGWs, we additionally introduce Chinese pronunciation and glyph features. The experimental results demonstrate the effectiveness of the proposed Chinese black keyword discovery framework, with detection accuracies of over 90% for DSWs, 80% for NWMs, 90% for SPWs, and 61% for SGWs.

Details DOI

AAAI Conference 2021 Conference Paper

Graph Heterogeneous Multi-Relational Recommendation

Chong Chen
Weizhi Ma
Min Zhang
Zhaowei Wang
Xiuqiang He
Chenyang Wang
Yiqun Liu
Shaoping Ma

Traditional studies on recommender systems usually leverage only one type of user behaviors (the optimization target, such as purchase), despite the fact that users also generate a large number of various types of interaction data (e. g. , view, click, add-to-cart, etc). Generally, these heterogeneous multirelational data provide well-structured information and can be used for high-quality recommendation. Early efforts towards leveraging these heterogeneous data fail to capture the high-hop structure of user-item interactions, which are unable to make full use of them and may only achieve constrained recommendation performance. In this work, we propose a new multi-relational recommendation model named Graph Heterogeneous Collaborative Filtering (GHCF). To explore the high-hop heterogeneous user-item interactions, we take the advantages of Graph Convolutional Network (GCN) and further improve it to jointly embed both representations of nodes (users and items) and relations for multi-relational prediction. Moreover, to fully utilize the whole heterogeneous data, we perform the advanced efficient non-sampling optimization under a multi-task learning framework. Experimental results on two public benchmarks show that GHCF significantly outperforms the state-of-the-art recommendation methods, especially for cold-start users who have few primary item interactions. Further analysis verifies the importance of the proposed embedding propagation for modelling high-hop heterogeneous user-item interactions, showing the rationality and effectiveness of GHCF. Our implementation has been released (https: //github. com/chenchongthu/GHCF).

PDF Details

ICRA Conference 2021 Conference Paper

Task Autocorrection for Immersive Teleoperation

Chenyang Wang
Simon Huber
Stelian Coros
Roi Poranne

Teleoperating robotic arms is a challenging task that requires years of training to master. It is mentally demanding, as the operator must internally compute transformations, or rely on muscle memory, to perform even the simplest tasks. Alternative methods that rely on embodiment –the immersive, first person experience of controlling the robot from its point of view are recently becoming more popular, thanks to the emergence of mixed reality devices. These methods create an intuitive experience by tracking the users motions, and retargetting them to the robot. However, even recent hardware fails at achieving total immersion, due to inherent discrepancies such as latency, imperfect tracking, and the differences between human and robot motor systems. Thus, performing even simple pick-and-place tasks with these systems, while more intuitive, is still cumbersome, and far from the level of human performance. In this paper we propose an immersive system that aims to bridge this gap. The system tracks the user’s motion and retargets them to the robot as usual, but it also detects the user’s intent, that is, the task they wish to perform. Based on this knowledge, the system can autocorrect the motion when it is about to fail, in a seamless manner, such that the task is successfully performed. We evaluate the efficacy of our autocorrection system in a user study. The results show a statistically significant performance improvement in terms of operation accuracy and time.

Details

AAAI Conference 2020 Conference Paper

D2D-LSTM: LSTM-Based Path Prediction of Content Diffusion Tree in Device-to-Device Social Networks

Heng Zhang
Xiaofei Wang
Jiawen Chen
Chenyang Wang
Jianxin Li

With the proliferation of mobile device users, the Device-to- Device (D2D) communication has ascended to the spotlight in social network for users to share and exchange enormous data. Different from classic online social network (OSN) like Twitter and Facebook, each single data ﬁle to be shared in the D2D social network is often very large in data size, e. g. , video, image or document. Sometimes, a small number of interesting data ﬁles may dominate the network trafﬁc, and lead to heavy network congestion. To reduce the trafﬁc congestion and design effective caching strategy, it is highly desirable to investigate how the data ﬁles are propagated in ofﬂine D2D social network and derive the diffusion model that ﬁts to the new form of social network. However, existing works mainly concern about link prediction, which cannot predict the overall diffusion path when network topology is unknown. In this article, we propose D2D-LSTM based on Long Short-Term Memory (LSTM), which aims to predict complete content propagation paths in D2D social network. Taking the current user’s time, geography and category preference into account, historical features of the previous path can be captured as well. It utilizes prototype users for prediction so as to achieve a better generalization ability. To the best of our knowledge, it is the ﬁrst attempt to use real world large-scale dataset of mobile social network (MSN) to predict propagation path trees in a top-down order. Experimental results corroborate that the proposed algorithm can achieve superior prediction performance than state-of-the-art approaches. Furthermore, D2D- LSTM can achieve 95% average precision for terminal class and 17% accuracy for tree path hit.

PDF Details

IJCAI Conference 2018 Conference Paper

Your Tweets Reveal What You Like: Introducing Cross-media Content Information into Multi-domain Recommendation

Weizhi Ma
Min Zhang
Chenyang Wang
Cheng Luo
Yiqun Liu
Shaoping Ma

Cold start is a challenging problem in recommender systems. Many previous studies attempt to utilize extra information from other platforms to alleviate the problem. Most of the leveraged information is on-topic, directly related to users' preferences in the target domain. Thought to be unrelated, users' off-topic content information (such as user tweets) is usually omitted. However, the off-topic content information also helps to indicate the similarity of users on their tastes, interests, and opinions, which matches the underlying assumption of Collaborative Filtering (CF) algorithms. In this paper, we propose a framework to capture the features from user's off-topic content information in social media and introduce them into Matrix Factorization (MF) based algorithms. The framework is easy to understand and flexible in different embedding approaches and MF based algorithms. To the best of our knowledge, there is no previous study in which user's off-topic content in other platforms is taken into consideration. By capturing the cross-platform content including both on-topic and off-topic information, multiple algorithms with several embedding learning approaches have achieved significant improvements in rating prediction on three datasets. Especially in cold start scenarios, we observe greater enhancement. The results confirm our suggestion that off-topic cross-media information also contributes to the recommendation.

PDF Details