Arrow Research search

Author name cluster

Chenyang Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers
2 author rows

Possible papers

16

AAAI Conference 2026 Conference Paper

DialoGen: Towards Dialog Gesture Generation via Identity-Decoupled Style Guidance in Interactive Diffusion Model

  • Weiyu Zhao
  • Chenyang Wang
  • Liangxiao Hu
  • Zonglin Li
  • Wei Yu
  • Shengping Zhang

We propose DialoGen, a novel framework for generating realistic gestures for both interlocutors in dialog scenarios, conditioned on conversational audios. Unlike most existing methods that focus solely on a single speaker, DialoGen simultaneously generates synchronized gestures for both participants while also embedding identity-decoupled style into generated gestures that enhance realism and expressiveness. To ensure precise synchronization between interlocutors, DialoGen adopts an interactive dual-diffusion model with mutual interaction estimation, which integrates interaction correlation into the diffusion process. More importantly, by leveraging supervised contrastive learning, we develop the identity-decoupled style guidance to adaptively decompose the identity-specific style of interlocutors into latent space, enabling multi-style dialog gesture generation. Extensive experimental results demonstrate that our model significantly outperforms existing methods in generating realistic, speech-aligned, identity-specific gestures, offering a high-quality solution for various dialog scenarios.

EAAI Journal 2026 Journal Article

Event-triggered adaptive robust non-singular fast terminal sliding mode fault-tolerant control for intelligent vehicle stability systems under extreme conditions

  • Min Gao
  • Jiaqi Li
  • Jing Li
  • Jin Luo
  • Chenyang Wang

The vehicle stability control system serves as the fundamental guarantee for the active safety of intelligent vehicles. In light of existing challenges such as ensemble uncertainty interference, limited communication resources, and actuator faults. This study presents an event-triggered adaptive robust non-singular fast terminal sliding mode fault-tolerant control method. Adaptive laws are formulated to evaluate the switching gains, thereby circumventing complications associated with insufficient prior knowledge of lumped uncertainties. Subsequently, an event-triggered adaptive robust non-singular fast terminal sliding mode control strategy is introduced to conserve communication resources, mitigate chattering, and prevent singularities. Furthermore, fault factors are incorporated into the vehicle dynamics framework to enhance the torque optimization allocation strategy for fault-tolerant control in the presence of actuator faults. The application of the Lyapunov stability theorem confirms stability over a limited duration, as well as Zeno-free behavior within the vehicle stability control system. Ultimately, CarSim and Matlab/Simulink co-simulation are employed to verify the effectiveness of the proposed method, with numerical simulations conducted across various complex driving conditions. The simulation data indicate that the proposed method reduces the number of event-triggered numbers for the yaw rate controller and the side slip angle controller by 67. 1 %/75. 2 % and 36. 1 %/28. 2 % for Conditions A and B, respectively, when compared to the fixed-time-triggered sliding mode controller and the without-control method.

AAAI Conference 2026 Conference Paper

ForeDiffusion: Foresight-Conditioned Diffusion Policy via Future View Construction for Robot Manipulation

  • Weize Xie
  • Yi Ding
  • Ying He
  • Leilei Wang
  • Binwen Bai
  • Zheyi Zhao
  • Chenyang Wang
  • F. Richard Yu

Diffusion strategies have advanced visual motor control by progressively denoising high-dimensional action sequences, providing a promising method for robot manipulation. However, as task complexity increases, the success rate of existing baseline models decreases considerably. Analysis indicates that current diffusion strategies are confronted with two limitations. First, these strategies only rely on short-term observations as conditions. Second, the training objective remains limited to a single denoising loss, which leads to error accumulation and causes grasping deviations. To address these limitations, this paper proposes Foresight-Conditioned Diffusion (ForeDiffusion), by injecting the predicted future view representation into the diffusion process. As a result, the policy is guided to be forward-looking, enabling it to correct trajectory deviations. Following this design, ForeDiffusion employs a dual loss mechanism, combining the traditional denoising loss and the consistency loss of future observations, to achieve the unified optimization. Extensive evaluation on the Adroit suite and the MetaWorld benchmark demonstrates that ForeDiffusion achieves an average success rate of 80% for the overall task, significantly outperforming the existing mainstream diffusion methods by approximately 20% in high difficulty tasks, while maintaining more stable performance across the entire tasks.

AAAI Conference 2026 Conference Paper

Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following

  • Chenyang Wang
  • Liang Wen
  • Shousheng Jia
  • Xiangzheng Zhang
  • Liang Xu

While advancements in the reasoning abilities of LLMs have significantly enhanced their performance in solving mathematical problems, coding tasks, and general puzzles, their effectiveness in accurately adhering to instructions remains inconsistent, particularly with more complex directives. Our investigation identifies lazy reasoning during the thinking stage as the primary factor contributing to poor instruction adherence. To mitigate this issue, we propose a comprehensive framework designed to enable rigorous reasoning processes involving preview and self-checking, essential for satisfying strict instruction constraints. Specifically, we first generate instructions with complex constraints and apply a filtering process to obtain valid prompts, resulting in three distinct prompt datasets categorized as hard, easy, and pass. Then, we employ rejection sampling on the pass prompts to curate a small yet high-quality dataset, enabling a cold-start initialization of the model and facilitating its adaptation to effective reasoning patterns. Subsequently, we employ an entropy-preserving supervised fine-tuning (Entropy-SFT) strategy coupled with token-wise entropy-adaptive (TEA-RL) reinforcement learning guided by rule-based dense rewards. This approach encourages the model to transform its reasoning mechanism, ultimately fostering generalizable reasoning abilities that encompass preview and self-checking. Extensive experiments conducted on instruction-following benchmarks demonstrate remarkable performance improvements across various model scales.

IROS Conference 2025 Conference Paper

A Robust Distributed Odometry for Mobile Robots with Steerable Wheels

  • Wang Xi
  • Jiaming Guo
  • Chenyang Wang
  • Shukun Wu
  • Jianping He 0001

Odometry estimation remains a critical challenge for wheeled robots, as reducing its drift directly mitigates dependency on external localization systems. This paper proposes a distributed odometry framework for steerable wheels, named ICF-DO, which is applicable to both Steerable Wheeled Mobile Robots (SWMRs) and cooperative multi-single-wheel robot systems. The proposed method features low computational complexity and reduced drift, while demonstrating strong robustness in communication-restricted scenarios. Additionally, singularity can be processed in a distributed manner in the proposed framework. Experimental validation on a real physical SWMR platform demonstrates the effectiveness and practicality of the proposed method.

IROS Conference 2025 Conference Paper

A Two-Stage Lightweight Framework for Efficient Land-Air Bimodal Robot Autonomous Navigation

  • Yongjie Li
  • Zhou Liu
  • Wenshuai Yu
  • Zhangji Lu
  • Chenyang Wang
  • Fei Yu
  • Qingquan Li

Land-air bimodal robots (LABR) are gaining attention for autonomous navigation, combining high mobility from aerial vehicles with long endurance from ground vehicles. However, existing LABR navigation methods are limited by suboptimal trajectories from mapping-based approaches and the excessive computational demands of learning-based methods. To address this, we propose a two-stage lightweight framework that integrates global key points prediction with local trajectory refinement to generate efficient and reachable trajectories. In the first stage, the Global Key points Prediction Network (GKPN) was used to generate a hybrid land-air keypoint path. The GKPN includes a Sobel Perception Network (SPN) for improved obstacle detection and a Lightweight Attention Planning Network (LAPN) to improves predictive ability by capturing contextual information. In the second stage, the global path is segmented based on predicted key points and refined using a mapping-based planner to create smooth, collision-free trajectories. Experiments conducted on our LABR platform show that our framework reduces network parameters by 14% and energy consumption during land-air transitions by 35% compared to existing approaches. The framework achieves real-time navigation without GPU acceleration and enables zero-shot transfer from simulation to reality during deployment.

IJCAI Conference 2025 Conference Paper

Active Multimodal Distillation for Few-shot Action Recognition

  • Weijia Feng
  • Yichen Zhu
  • Ruojia Zhang
  • Chenyang Wang
  • Fei Ma
  • Xiaobao Wang
  • Xiaobai Li

Owing to its rapid progress and broad application prospects, few-shot action recognition has attracted considerable interest. However, current methods are predominantly based on limited single-modal data, which does not fully exploit the potential of multimodal information. This paper presents a novel framework that actively identifies reliable modalities for each sample using task-specific contextual cues, thus significantly improving recognition performance. Our framework integrates an Active Sample Inference (ASI) module, which utilizes active inference to predict reliable modalities based on posterior distributions and subsequently organizes them accordingly. Unlike reinforcement learning, active inference replaces rewards with evidence-based preferences, making more stable predictions. Additionally, we introduce an active mutual distillation module that enhances the representation learning of less reliable modalities by transferring knowledge from more reliable ones. Adaptive multimodal inference is employed during the meta-test to assign higher weights to reliable modalities. Extensive experiments across multiple benchmarks demonstrate that our method significantly outperforms existing approaches.

IJCAI Conference 2025 Conference Paper

Integration of Old and New Knowledge for Generalized Intent Discovery: A Consistency-driven Prototype-Prompting Framework

  • Xiao Wei
  • Xiaobao Wang
  • Ning Zhuang
  • Chenyang Wang
  • Longbiao Wang
  • Jianwu Dang

Intent detection aims to identify user intents from natural language inputs, where supervised methods rely heavily on labeled in-domain (IND) data and struggle with out-of-domain (OOD) intents, limiting their practical applicability. Generalized Intent Discovery (GID) addresses this by leveraging unlabeled OOD data to discover new intents without additional annotation. However, existing methods focus solely on clustering unsupervised data while neglecting domain adaptation. Therefore, we propose a consistency-driven prototype-prompting framework for GID from the perspective of integrating old and new knowledge, which includes a prototype-prompting framework for transferring old knowledge from external sources, and a hierarchical consistency constraint for learning new knowledge from target domains. We conducted extensive experiments and the results show that our method significantly outperforms all baseline methods, achieving state-of-the-art results, which strongly demonstrates the effectiveness and generalization of our methods. Our source code is publicly available at https: //github. com/smileix/cpp.

NeurIPS Conference 2025 Conference Paper

PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

  • Shi Qiu
  • Shaoyang Guo
  • Zhuo-Yang Song
  • Yunbo Sun
  • Zeyu Cai
  • Jiashen Wei
  • Tianyu Luo
  • Yixuan Yin

Current benchmarks for evaluating the reasoning capabilities of Large Language Models (LLMs) face significant limitations: task oversimplification, data contamination, and flawed evaluation items. These deficiencies necessitate more rigorous assessment methods. To address these limitations, we introduce PHYBench, a benchmark of 500 original physics problems ranging from high school to Physics Olympiad difficulty. PHYBench addresses data contamination through original content and employs a systematic curation pipeline to eliminate flawed items. Evaluations show that PHYBench activates more tokens and provides stronger differentiation between reasoning models compared to other baselines like AIME 2024, OlympiadBench and GPQA. Even the best-performing model, Gemini 2. 5 Pro, achieves only 36. 9\% accuracy compared to human experts' 61. 9\%. To further enhance evaluation precision, we introduce the Expression Edit Distance (EED) Score for mathematical expression assessment, which improves sample efficiency by 204\% over binary scoring. Moreover, PHYBench effectively elicits multi-step and multi-condition reasoning, providing a platform for examining models' reasoning robustness, preferences, and deficiencies. The benchmark results and dataset are publicly available at https: //www. phybench. cn/.

IROS Conference 2024 Conference Paper

Adaptive Stochastic Nonlinear Model Predictive Control with Look-ahead Deep Reinforcement Learning for Autonomous Vehicle Motion Control

  • Baha Zarrouki
  • Chenyang Wang
  • Johannes Betz

Propagating uncertainties through nonlinear system dynamics in the context of Stochastic Nonlinear Model Predictive Control (SNMPC) is challenging, especially for high-dimensional systems requiring real-time control and operating under time-variant uncertainties such as autonomous vehicles. In this work, we propose an Adaptive SNMPC (aSNMPC) driven by Deep Reinforcement Learning (DRL) to optimize uncertainty handling, constraints robustification, feasibility, and closed-loop performance. To this end, our SNMPC uses Polynomial Chaos Expansion (PCE) for efficient uncertainty propagation, limits its propagation time through an Uncertainty Propagation Horizon (UPH), and transforms nonlinear chance constraints into robustified deterministic ones. We conceive a DRL agent to proactively anticipate upcoming control tasks and to dynamically reduce conservatism by determining the most suitable constraints robustification factor κ, and to enhance feasibility by choosing optimal UPH length T u. We analyze the trained DRL agent’s decision-making process and highlight its ability to learn context-dependent optimal parameters. We showcase the enhanced robustness and feasibility of our DRL-driven aSNMPC through the real-time motion control task of an autonomous passenger vehicle when confronted with significant time-variant disturbances while achieving a minimum solution frequency of 110Hz. The code used in this research is publicly accessible as open-source software: https://github.com/bzarr/TUM-CONTROL

AAAI Conference 2024 Conference Paper

Low-Light Face Super-resolution via Illumination, Structure, and Texture Associated Representation

  • Chenyang Wang
  • Junjun Jiang
  • Kui Jiang
  • Xianming Liu

Human face captured at night or in dimly lit environments has become a common practice, accompanied by complex low-light and low-resolution degradations. However, the existing face super-resolution (FSR) technologies and derived cascaded schemes are inadequate to recover credible textures. In this paper, we propose a novel approach that decomposes the restoration task into face structural fidelity maintaining and texture consistency learning. The former aims to enhance the quality of face images while improving the structural fidelity, while the latter focuses on eliminating perturbations and artifacts caused by low-light degradation and reconstruction. Based on this, we develop a novel low-light low-resolution face super-resolution framework. Our method consists of two steps: an illumination correction face super-resolution network (IC-FSRNet) for lighting the face and recovering the structural information, and a detail enhancement model (DENet) for improving facial details, thus making them more visually appealing and easier to analyze. As the relighted regions could provide complementary information to boost face super-resolution and vice versa, we introduce the mutual learning to harness the informative components from relighted regions and reconstruction, and achieve the iterative refinement. In addition, DENet equipped with diffusion probabilistic model is built to further improve face image visual quality. Experiments demonstrate that the proposed joint optimization framework achieves significant improvements in reconstruction quality and perceptual quality over existing two-stage sequential solutions. Code is available at https://github.com/wcy-cs/IC-FSRDENet.

EAAI Journal 2023 Journal Article

A systematic empirical study on word embedding based methods in discovering Chinese black keywords

  • Chenyang Wang
  • YI Shen
  • Yuwei Li
  • Min Zhang
  • Miao Hu
  • Jinghua Zheng

With the development of online transactions, the Chinese cyber black market is proliferating and facilitates many cybercrimes. It is difficult to understand the cyber black market due to the confusing jargon (called black keywords in this paper) used by criminals to conceal underground transactions. To discover black keywords automatically, some natural language processing based methods have been proposed by comparing the similarity of word vectors generated by word embedding models. Therefore, the quality of word vectors generated has a significant impact on black keyword discovery and it is necessary to evaluate different word embedding models in discovering black keywords. To this end, we design a Chinese black keyword discovery framework and conduct a systematic empirical study on six existing word embedding models including both static and dynamic types in discovering Chinese black keywords. In specific, we classify Chinese black keywords in four types: domain specific words (DSWs), new meaning words (NMWs), similar pronunciation words (SPWs), and similar glyph words (SGWs). We experimentally find that different word embedding models vary greatly in performance when discovering black keywords, e. g. , dynamic models perform well in discovering DSWs and NMWs, static ones perform poorly in discovering NMWs. We improve the static word embedding model based NMW discovery algorithm by additionally comparing the differences in cross-corpus word nearest-neighbors before and after domain incremental training. For effectively discovering variant words like SPWs and SGWs, we additionally introduce Chinese pronunciation and glyph features. The experimental results demonstrate the effectiveness of the proposed Chinese black keyword discovery framework, with detection accuracies of over 90% for DSWs, 80% for NWMs, 90% for SPWs, and 61% for SGWs.

AAAI Conference 2021 Conference Paper

Graph Heterogeneous Multi-Relational Recommendation

  • Chong Chen
  • Weizhi Ma
  • Min Zhang
  • Zhaowei Wang
  • Xiuqiang He
  • Chenyang Wang
  • Yiqun Liu
  • Shaoping Ma

Traditional studies on recommender systems usually leverage only one type of user behaviors (the optimization target, such as purchase), despite the fact that users also generate a large number of various types of interaction data (e. g. , view, click, add-to-cart, etc). Generally, these heterogeneous multirelational data provide well-structured information and can be used for high-quality recommendation. Early efforts towards leveraging these heterogeneous data fail to capture the high-hop structure of user-item interactions, which are unable to make full use of them and may only achieve constrained recommendation performance. In this work, we propose a new multi-relational recommendation model named Graph Heterogeneous Collaborative Filtering (GHCF). To explore the high-hop heterogeneous user-item interactions, we take the advantages of Graph Convolutional Network (GCN) and further improve it to jointly embed both representations of nodes (users and items) and relations for multi-relational prediction. Moreover, to fully utilize the whole heterogeneous data, we perform the advanced efficient non-sampling optimization under a multi-task learning framework. Experimental results on two public benchmarks show that GHCF significantly outperforms the state-of-the-art recommendation methods, especially for cold-start users who have few primary item interactions. Further analysis verifies the importance of the proposed embedding propagation for modelling high-hop heterogeneous user-item interactions, showing the rationality and effectiveness of GHCF. Our implementation has been released (https: //github. com/chenchongthu/GHCF).

ICRA Conference 2021 Conference Paper

Task Autocorrection for Immersive Teleoperation

  • Chenyang Wang
  • Simon Huber
  • Stelian Coros
  • Roi Poranne

Teleoperating robotic arms is a challenging task that requires years of training to master. It is mentally demanding, as the operator must internally compute transformations, or rely on muscle memory, to perform even the simplest tasks. Alternative methods that rely on embodiment –the immersive, first person experience of controlling the robot from its point of view are recently becoming more popular, thanks to the emergence of mixed reality devices. These methods create an intuitive experience by tracking the users motions, and retargetting them to the robot. However, even recent hardware fails at achieving total immersion, due to inherent discrepancies such as latency, imperfect tracking, and the differences between human and robot motor systems. Thus, performing even simple pick-and-place tasks with these systems, while more intuitive, is still cumbersome, and far from the level of human performance. In this paper we propose an immersive system that aims to bridge this gap. The system tracks the user’s motion and retargets them to the robot as usual, but it also detects the user’s intent, that is, the task they wish to perform. Based on this knowledge, the system can autocorrect the motion when it is about to fail, in a seamless manner, such that the task is successfully performed. We evaluate the efficacy of our autocorrection system in a user study. The results show a statistically significant performance improvement in terms of operation accuracy and time.

AAAI Conference 2020 Conference Paper

D2D-LSTM: LSTM-Based Path Prediction of Content Diffusion Tree in Device-to-Device Social Networks

  • Heng Zhang
  • Xiaofei Wang
  • Jiawen Chen
  • Chenyang Wang
  • Jianxin Li

With the proliferation of mobile device users, the Device-to- Device (D2D) communication has ascended to the spotlight in social network for users to share and exchange enormous data. Different from classic online social network (OSN) like Twitter and Facebook, each single data file to be shared in the D2D social network is often very large in data size, e. g. , video, image or document. Sometimes, a small number of interesting data files may dominate the network traffic, and lead to heavy network congestion. To reduce the traffic congestion and design effective caching strategy, it is highly desirable to investigate how the data files are propagated in offline D2D social network and derive the diffusion model that fits to the new form of social network. However, existing works mainly concern about link prediction, which cannot predict the overall diffusion path when network topology is unknown. In this article, we propose D2D-LSTM based on Long Short-Term Memory (LSTM), which aims to predict complete content propagation paths in D2D social network. Taking the current user’s time, geography and category preference into account, historical features of the previous path can be captured as well. It utilizes prototype users for prediction so as to achieve a better generalization ability. To the best of our knowledge, it is the first attempt to use real world large-scale dataset of mobile social network (MSN) to predict propagation path trees in a top-down order. Experimental results corroborate that the proposed algorithm can achieve superior prediction performance than state-of-the-art approaches. Furthermore, D2D- LSTM can achieve 95% average precision for terminal class and 17% accuracy for tree path hit.

IJCAI Conference 2018 Conference Paper

Your Tweets Reveal What You Like: Introducing Cross-media Content Information into Multi-domain Recommendation

  • Weizhi Ma
  • Min Zhang
  • Chenyang Wang
  • Cheng Luo
  • Yiqun Liu
  • Shaoping Ma

Cold start is a challenging problem in recommender systems. Many previous studies attempt to utilize extra information from other platforms to alleviate the problem. Most of the leveraged information is on-topic, directly related to users' preferences in the target domain. Thought to be unrelated, users' off-topic content information (such as user tweets) is usually omitted. However, the off-topic content information also helps to indicate the similarity of users on their tastes, interests, and opinions, which matches the underlying assumption of Collaborative Filtering (CF) algorithms. In this paper, we propose a framework to capture the features from user's off-topic content information in social media and introduce them into Matrix Factorization (MF) based algorithms. The framework is easy to understand and flexible in different embedding approaches and MF based algorithms. To the best of our knowledge, there is no previous study in which user's off-topic content in other platforms is taken into consideration. By capturing the cross-platform content including both on-topic and off-topic information, multiple algorithms with several embedding learning approaches have achieved significant improvements in rating prediction on three datasets. Especially in cold start scenarios, we observe greater enhancement. The results confirm our suggestion that off-topic cross-media information also contributes to the recommendation.