Author name cluster

Yi Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

AAAI Conference 2026 Conference Paper

SegMem-RAG: Adaptive Memory for Retrieval-Augmented Generation in Open-Ended Knowledge Environments

Xuanbo Fan
Tianqi Zhao
Yi Cheng
Chi Xiu
Jiaxin Guo
Boci Peng
Bingjing Xu
Jessica Zhang

Retrieval-Augmented Generation (RAG) improves the factual accuracy of large language models by grounding responses in external content. However, most RAG systems assume access to static and well-organized corpora with fixed retrieval logic. In practice, real-world sources are heterogeneous and unlabeled, including user-uploaded documents, manuals, and datasets. Effective access in such settings requires adaptive and self-directed retrieval behavior. We present SegMem‑RAG, a memory-augmented RAG framework that learns to route queries across multiple unlabeled corpora based on experience. It incrementally updates a structured memory and uses self-reflection to guide retrieval over time without supervision. Experimental results demonstrate that SegMem‑RAG significantly outperforms recent baselines in generation quality on multi-corpus QA tasks.

PDF Details DOI

IROS Conference 2025 Conference Paper

ARC: Robots Adaptive Risk-aware Robust Control via Distributional Reinforcement Learning

Junlong Wu
Yi Cheng
Hang Liu
Houde Liu

Locomotion in robots remains an unsolved challenge, particularly for those with complex structures and dynamic environments. Consequently, the control systems for such robots must place greater emphasis on risk mitigation and safety considerations to ensure reliable and stable operation. Existing studies have explicitly incorporated risk factors into policy training, but lacked the ability to adaptively adjust the risk sensitivity for hazardous environments. This deficiency impacts the agent’s exploration during training and thus fails to select the optimal action. We innovatively introduce Adaptive Risk-aware Control (ARC) policies based on Distributional Reinforcement Learning (Dist. RL), a novel framework that dynamically adjusts risk sensitivity levels in response to changing environmental conditions. Our approach uniquely integrates two key components: (1) the Inter Quartile Range (IQR) for quantifying intrinsic environmental uncertainty, and (2) Random Network Distillation (RND) for evaluating parameter uncertainty. This dual-mechanism architecture represents a significant advancement in risk assessment methodologies. Simulations conducted on a variety of robots have demonstrated that our method achieves significantly more robust performance compared to other approaches. Furthermore, sim2real validation on a humanoid robot confirms the practical viability of our approach.

Details

IROS Conference 2025 Conference Paper

CushionCatch: A Compliant Catching Mechanism for Mobile Manipulators via Combined Optimization and Learning

Bingjie Chen
Keyu Fan
Qi Yang
Yi Cheng
Houde Liu
Kangkang Dong
Chongkun Xia
Liang Han

Catching flying objects with a cushioning process is a skill commonly performed by humans, yet it remains a significant challenge for robots. In this paper, we present a framework that combines optimization and learning to achieve compliant catching on mobile manipulators (CCMM). First, we propose a high-level capture planner for mobile manipulators (MM) that calculates the optimal capture point and joint configuration. Next, the pre-catching (PRC) planner ensures the robot reaches the target joint configuration as quickly as possible. To learn compliant catching strategies, we propose a network that leverages the strengths of LSTM for capturing temporal dependencies and positional encoding for spatial context (P-LSTM). This network is designed to effectively learn compliant strategies from human demonstrations. Following this, the post-catching (POC) planner tracks the compliant sequence output by the P-LSTM while avoiding potential collisions due to structural differences between humans and robots. We validate the CCMM framework through both simulated and real-world ball-catching scenarios, achieving a success rate of 98. 70% in simulation, 92. 59% in real-world tests, and a 28. 7% reduction in impact torques. The open source code will be released for the reference of the community 1.

Details

ICLR Conference 2025 Conference Paper

Integrative Decoding: Improving Factuality via Implicit Self-consistency

Yi Cheng
Xiao Liang
Yeyun Gong
Wen Xiao
Song Wang
Yuji Zhang 0002
Wenjun Hou
Kaishuai Xu

Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models. Nonetheless, existing methods usually have strict constraints on the task format, largely limiting their applicability. In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective. Extensive evaluation shows that ID consistently enhances factuality over a wide range of language models, with substantial improvements on the TruthfulQA (+11.2%), Biographies (+15.4%) and LongFact (+8.5%) benchmarks. The performance gains amplify progressively as the number of sampled responses increases, indicating the potential of ID to scale up with repeated sampling.

Details

IROS Conference 2025 Conference Paper

KD-RIEKF: Kinodynamic Right-Invariant EKF for Legged Robot State Estimation

Qi Yang
Bin Lan
Bingjie Chen
Jingjing Wang
Yi Cheng
Yizhe Li
Houde Liu
Bin Liang

We present KD-RIEKF, a novel state estimation framework that incorporates kinodynamic constraints into the Right-Invariant Extended Kalman Filter (RIEKF). Our framework integrates generalized momentum-based contact estimation, centroidal dynamics, and a noise-adaptive module, improving state estimation accuracy by probabilistically adjusting propagation noise to account for contact uncertainty and sensor noise. A key innovation is the expansion of the ground reaction force (GRF) into a state variable. By using GRF-based acceleration as a measurement, our method significantly reduces estimation errors in position, velocity, and orientation. The integration of contact-force-driven adaptive noise effectively boosts the stability of estimation, especially when the system is undergoing turning, acceleration, or deceleration processes. We validated our algorithm in simulation on highly uneven terrain, showing significant enhancements in z-axis position estimation compared to RIEKF. Further experiments on the Unitree Go2 robot across different speeds demonstrated that even in high-speed scenarios over 200 meters, our method reduced position estimation relative error (RE) by 47% and orientation estimation by 42%, confirming its robustness and accuracy under dynamic locomotion.

Details

ICRA Conference 2025 Conference Paper

Tension Dependent Twisted String Actuator Modelling and Efficacy Benchmarking in Force and Impedance Control

Christopher Herneth
Yi Cheng
Amartya Ganguly
Sami Haddadin

This study presents a comprehensive experimental analysis of Twisted String Actuators (TSA), focused on enhancing contraction modelling accuracy and establishing a baseline for TSA tension and impedance control efficacy. A novel TSA string radius function is introduced, computing effective radii for multi-strand bundles based on axial actuator tension. The proposed model was validated in physical experiments, resulting in a reduction of maximal errors between measured and simulated actuator contraction trajectories from up to 60 % in established models to around 10% in our work. Additionally, the tension-dependent radius modification effectively reduced errors between the estimated and the measured bundle tension by an order of magnitude, marking an essential step towards TSA control independent of bundle tension measurements. TSA tension control was assessed based on four metrics: accu-racy, precision, impact stability, and bandwidth, following ISO 9283: 1998 standards. The quality of tension control was found to be dependent on bundle tension, twisting angle and strand quantity, whereas impact stability was maintained in all config-urations. Joint impedance control with TSA was evaluated for perturbation stability and position control bandwidth, where the latter was enhanced with increasing joint stiffness. The presented analysis informs designers about the capabilities of TSAs in different configurations, and their respective suitability for desired applications.

Details

AAAI Conference 2024 Conference Paper

Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning

Yi Cheng
Renjun Hu
Haochao Ying
Xing Shi
Jian Wu
Wei Lin

Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetical feature interactions, referred to as AMFormer. Results show that AMFormer outperforms strong counterparts in fine-grained tabular data modeling, data efficiency in training, and generalization. This is attributed to its parallel additive and multiplicative attention operators and prompt-based optimization, which facilitate the separation of tabular samples in an extended space with arithmetically-engineered features. Our extensive experiments on real-world data also validate the consistent effectiveness, efficiency, and rationale of AMFormer, suggesting it has established a strong inductive bias for deep learning on tabular data. Code is available at https://github.com/aigc-apps/AMFormer.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Cooper: Coordinating Specialized Agents towards a Complex Dialogue Goal

Yi Cheng
Wenge Liu
Jian Wang
Chak Tou Leong
Yi Ouyang
Wenjie Li
Xian Wu
Yefeng Zheng

In recent years, there has been a growing interest in exploring dialogues with more complex goals, such as negotiation, persuasion, and emotional support, which go beyond traditional service-focused dialogue systems. Apart from the requirement for much more sophisticated strategic reasoning and communication skills, a significant challenge of these tasks lies in the difficulty of objectively measuring the achievement of their goals in a quantifiable way, making it difficult for existing research to directly optimize the dialogue procedure towards them. In our work, we emphasize the multifaceted nature of complex dialogue goals and argue that it is more feasible to accomplish them by comprehensively considering and jointly promoting their different aspects. To this end, we propose a novel dialogue framework, Cooper, which coordinates multiple specialized agents, each dedicated to a specific dialogue goal aspect separately, to approach the complex objective. Through this divide-and-conquer manner, we make complex dialogue goals more approachable and elicit greater intelligence via the collaboration of individual agents. Experiments on persuasion and emotional support dialogues demonstrate the superiority of our method over a set of competitive baselines. Our codes are available at https://github.com/YiCheng98/Cooper.

PDF Details DOI

JBHI Journal 2024 Journal Article

Polygonal Approximation Learning for Convex Object Segmentation in Biomedical Images With Bounding Box Supervision

Wenhao Zheng
Jintai Chen
Kai Zhang
Jiahuan Yan
Jinhong Wang
Yi Cheng
Bang Du
Danny Z. Chen

As a common and critical medical image analysis task, deep learning based biomedical image segmentation is hindered by the dependence on costly fine-grained annotations. To alleviate this data dependence, in this article, a novel approach, called Polygonal Approximation Learning (PAL), is proposed for convex object instance segmentation with only bounding-box supervision. The key idea behind PAL is that the detection model for convex objects already contains the necessary information for segmenting them since their convex hulls, which can be generated approximately by the intersection of bounding boxes, are equivalent to the masks representing the objects. To extract the essential information from the detection model, a repeated detection approach is employed on biomedical images where various rotation angles are applied and a dice loss with the projection of the rotated detection results is utilized as a supervised signal in training our segmentation model. In biomedical imaging tasks involving convex objects, such as nuclei instance segmentation, PAL outperforms the known models (e. g. , BoxInst) that rely solely on box supervision. Furthermore, PAL achieves comparable performance with mask-supervised models including Mask R-CNN and Cascade Mask R-CNN. Interestingly, PAL also demonstrates remarkable performance on non-convex object instance segmentation tasks, for example, surgical instrument and organ instance segmentation.

Details DOI

IROS Conference 2024 Conference Paper

Quadruped robot traversing 3D complex environments with limited perception

Yi Cheng
Hang Liu
Guoping Pan
Houde Liu
Linqi Ye

Traversing 3-D complex environments has always been a significant challenge for legged locomotion. Existing methods typically rely on external sensors such as vision and lidar to preemptively react to obstacles by acquiring environmental information. However, in scenarios like nighttime or dense forests, external sensors often fail to function properly, necessitating robots to rely on proprioceptive sensors to perceive diverse obstacles in the environment and respond promptly. This task is undeniably challenging. Our research finds that methods based on collision detection can enhance a robot’s perception of environmental obstacles. In this work, we propose an end-to-end learning-based quadruped robot motion controller that relies solely on proprioceptive sensing. This controller can accurately detect, localize, and agilely respond to collisions in unknown and complex 3D environments, thereby improving the robot’s traversability in complex environments. We demonstrate in both simulation and real-world experiments that our method enables quadruped robots to successfully traverse challenging obstacles in various complex environments. The videos and appendix can be found at Quad-Traverse-Go2.github.io

Details

IROS Conference 2024 Conference Paper

Structural Optimization of Lightweight Bipedal Robot via SERL

Yi Cheng
Chenxi Han
Yuheng Min
Houde Liu
Linqi Ye
Hang Liu

Designing a bipedal robot is a complex and challenging task, especially when dealing with a multitude of structural parameters. Traditional design methods often rely on human intuition and experience. However, such approaches are time-consuming, labor-intensive, lack theoretical guidance and hard to obtain optimal design results within vast design spaces, thus failing to full exploit the inherent performance potential of robots. In this context, this paper introduces the SERL (Structure Evolution Reinforcement Learning) algorithm, which combines reinforcement learning for locomotion tasks with evolution algorithms. The aim is to identify the optimal parameter combinations within a given multidimensional design space. Through the SERL algorithm, we successfully designed a bipedal robot named Wow Orin, where the optimal leg length are obtained through optimization based on body structure and motor torque. We have experimentally validated the effectiveness of the SERL algorithm, which is capable of optimizing the best structure within specified design space and task conditions. Additionally, to assess the performance gap between our designed robot and the current state-of-the-art robots, we compared Wow Orin with mainstream bipedal robots Cassie and Unitree H1. A series of experimental results demonstrate the Outstanding energy efficiency and performance of Wow Orin, further validating the feasibility of applying the SERL algorithm to practical design.

Details

IJCAI Conference 2023 Conference Paper

Robust Image Ordinal Regression with Controllable Image Generation

Yi Cheng
Haochao Ying
Renjun Hu
Jinhong Wang
Wenhao Zheng
Xiao Zhang
Danny Chen
Jian Wu

Image ordinal regression has been mainly studied along the line of exploiting the order of categories. However, the issues of class imbalance and category overlap that are very common in ordinal regression were largely overlooked. As a result, the performance on minority categories is often unsatisfactory. In this paper, we propose a novel framework called CIG based on controllable image generation to directly tackle these two issues. Our main idea is to generate extra training samples with specific labels near category boundaries, and the sample generation is biased toward the less-represented categories. To achieve controllable image generation, we seek to separate structural and categorical information of images based on structural similarity, categorical similarity, and reconstruction constraints. We evaluate the effectiveness of our new CIG approach in three different image ordinal regression scenarios. The results demonstrate that CIG can be flexibly integrated with off-the-shelf image encoders or ordinal regression models to achieve improvement, and further, the improvement is more significant for minority categories.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

“My nose is running. ” “Are you also coughing? ”: Building A Medical Diagnosis Agent with Interpretable Inquiry Logics

Wenge Liu
Yi Cheng
Hao Wang
Jianheng Tang
Yafei Liu
Ruihui Zhao
Wenjie Li
Yefeng Zheng

With the rise of telemedicine, the task of developing Dialogue Systems for Medical Diagnosis (DSMD) has received much attention in recent years. Different from early researches that needed to rely on extra human resources and expertise to build the system, recent researches focused on how to build DSMD in a data-driven manner. However, the previous data-driven DSMD methods largely overlooked the system interpretability, which is critical for a medical application, and they also suffered from the data sparsity issue at the same time. In this paper, we explore how to bring interpretability to data-driven DSMD. Specifically, we propose a more interpretable decision process to implement the dialogue manager of DSMD by reasonably mimicking real doctors' inquiry logics, and we devise a model with highly transparent components to conduct the inference. Moreover, we collect a new DSMD dataset, which has a much larger scale, more diverse patterns, and is of higher quality than the existing ones. The experiments show that our method obtains 7. 7%, 10. 0%, 3. 0% absolute improvement in diagnosis accuracy respectively on three datasets, demonstrating the effectiveness of its rational decision process and model design. Our codes and the GMD-12 dataset are available at https: //github. com/lwgkzl/BR-Agent.

PDF Details DOI

AAAI Conference 2019 Conference Paper

Look across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition

Jian Zhao
Yu Cheng
Yi Cheng
Yang Yang
Fang Zhao
Jianshu Li
Hengzhu Liu
Shuicheng Yan

Despite the remarkable progress in face recognition related technologies, reliably recognizing faces across ages still remains a big challenge. The appearance of a human face changes substantially over time, resulting in significant intraclass variations. As opposed to current techniques for ageinvariant face recognition, which either directly extract ageinvariant features for recognition, or first synthesize a face that matches target age before feature extraction, we argue that it is more desirable to perform both tasks jointly so that they can leverage each other. To this end, we propose a deep Age-Invariant Model (AIM) for face recognition in the wild with three distinct novelties. First, AIM presents a novel unified deep architecture jointly performing cross-age face synthesis and recognition in a mutual boosting way. Second, AIM achieves continuous face rejuvenation/aging with remarkable photorealistic and identity-preserving properties, avoiding the requirement of paired data and the true age of testing samples. Third, we develop effective and novel training strategies for end-to-end learning the whole deep architecture, which generates powerful age-invariant face representations explicitly disentangled from the age variation. Extensive experiments on several cross-age datasets (MORPH, CACD and FG-NET) demonstrate the superiority of the proposed AIM model over the state-of-the-arts. Benchmarking our model on one of the most popular unconstrained face recognition datasets IJB-C additionally verifies the promising generalizability of AIM in recognizing faces in the wild.

PDF Details

IJCAI Conference 2018 Conference Paper

3D-Aided Deep Pose-Invariant Face Recognition

Jian Zhao
Lin Xiong
Yu Cheng
Yi Cheng
Jianshu Li
Li Zhou
Yan Xu
Jayashree Karlekar

Learning from synthetic faces, though perhaps appealing for high data efficiency, may not bring satisfactory performance due to the distribution discrepancy of the synthetic and real face images. To mitigate this gap, we propose a 3D-Aided Deep Pose-Invariant Face Recognition Model (3D-PIM), which automatically recovers realistic frontal faces from arbitrary poses through a 3D face model in a novel way. Specifically, 3D-PIM incorporates a simulator with the aid of a 3D Morphable Model (3D MM) to obtain shape and appearance prior for accelerating face normalization learning, requiring less training data. It further leverages a global-local Generative Adversarial Network (GAN) with multiple critical improvements as a refiner to enhance the realism of both global structures and local details of the face simulator’s output using unlabelled real data only, while preserving the identity information. Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks clearly demonstrate superiority of the proposed model over state-of-the-arts.

PDF Details