Arrow Research search

Author name cluster

Hechang Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

AAAI Conference 2026 Conference Paper

Effective Robotic Cloth Grasping Through Suppressing False Discoveries

  • Xingyu Zhu
  • Zhiwen Tu
  • Yan Wu
  • Shan Luo
  • Hechang Chen
  • Yixing Gao

Enabling robots to grasp disorganized cloth for efficient storage is valuable in robot-assisted room organization. Diverse deformations of cloth and the stacking of multiple items limit grasping-pose estimation that relies on annotations. This necessitates segmenting each cloth item in an unsupervised manner before estimating the grasping position. However, existing segmentation methods primarily focus on improving metrics such as Intersection-over-Union and Pixel Accuracy, which cannot effectively measure the segmentation errors of the cloth area and thus lead to failure grasping position estimation. To address this challenge, we use False Discovery Rate (FDR) as a novel measure of segmentation errors and analyze its impact on grasping success. Our preliminary study reveals a negative correlation between segmentation FDR and grasping success rate, highlighting the need for more reliable segmentation in cluttered cloth scenarios. Therefore, we propose an unsupervised cloth segmentation network based on feature distance-weighted constraints, designed to reduce the false discovery rate in cloth area perception without requiring expensive pixel-level manual annotations. Additionally, to estimate the grasping position on the perceived cloth area, we introduce a strategy based on cloth surface wrinkle analysis, which operates without the need for annotations or training. By integrating the proposed segmentation network and grasping strategy, we develop a robotic system capable of sequentially grasping cluttered cloth from a table. Extensive real-world robotic experiments demonstrate the effectiveness of our approach, outperforming multiple baseline methods in segmentation FDR and grasping success rate.

NeurIPS Conference 2025 Conference Paper

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

  • Jifeng Hu
  • Sili Huang
  • Zhejian Yang
  • Shengchao Hu
  • Li Shen
  • Hechang Chen
  • Lichao Sun
  • Yi Chang

Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process. To address this issue, we propose the Analytic Energy-guided Policy Optimization (AEPO). Specifically, we first provide a theoretical analysis and the closed-form solution of the intermediate guidance when the diffusion model obeys the conditional Gaussian transformation. Then, we analyze the posterior Gaussian distribution in the log-expectation formulation and obtain the target estimation of the log-expectation under mild assumptions. Finally, we train an intermediate energy neural network to approach the target estimation of log-expectation formulation. We apply our method in 30+ offline RL tasks to demonstrate the effectiveness of our method. Extensive experiments illustrate that our method surpasses numerous representative baselines in D4RL offline reinforcement learning benchmarks.

ICML Conference 2025 Conference Paper

Calibrating Video Watch-time Predictions with Credible Prototype Alignment

  • Chao Cui
  • Shisong Tang
  • Fan Li 0017
  • Jiechao Gao
  • Hechang Chen

Accurately predicting user watch-time is crucial for enhancing user stickiness and retention in video recommendation systems. Existing watch-time prediction approaches typically involve transformations of watch-time labels for prediction and subsequent reversal, ignoring both the natural distribution properties of label and the instance representation confusion that results in inaccurate predictions. In this paper, we propose ProWTP, a two-stage method combining prototype learning and optimal transport for watch-time regression prediction, suitable for any deep recommendation model. Specifically, we observe that the watch-ratio (the ratio of watch-time to video duration) within the same duration bucket exhibits a multimodal distribution. To facilitate incorporation into models, we use a hierarchical vector quantised variational autoencoder (HVQ-VAE) to convert the continuous label distribution into a high-dimensional discrete distribution, serving as credible prototypes for calibrations. Based on this, ProWTP views the alignment between prototypes and instance representations as a Semi-relaxed Unbalanced Optimal Transport (SUOT) problem, where the marginal constraints of prototypes are relaxed. And the corresponding optimization problem is reformulated as a weighted Lasso problem for solution. Moreover, ProWTP introduces the assignment and compactness losses to encourage instances to cluster closely around their respective prototypes, thereby enhancing the prototype-level distinguishability. Finally, we conducted extensive offline experiments on two industrial datasets, demonstrating our consistent superiority in real-world application.

AAAI Conference 2025 Conference Paper

GRAIN: Multi-Granular and Implicit Information Aggregation Graph Neural Network for Heterophilous Graphs

  • Songwei Zhao
  • Yuan Jiang
  • Zijing Zhang
  • Yang Yu
  • Hechang Chen

Graph neural networks (GNNs) have shown significant success in learning graph representations. However, recent studies reveal that GNNs often fail to outperform simple MLPs on heterophilous graph tasks, where connected nodes may differ in features or labels, challenging the homophily assumption. Existing methods addressing this issue often overlook the importance of information granularity and rarely consider implicit relationships between distant nodes. To overcome these limitations, we propose the Granular and Implicit Graph Network (GRAIN), a novel GNN model specifically designed for heterophilous graphs. GRAIN enhances node embeddings by aggregating multi-view information at various granularity levels and incorporating implicit data from distant, non-neighboring nodes. This approach effectively integrates local and global information, resulting in smoother, more accurate node representations. We also introduce an adaptive graph information aggregator that efficiently combines multi-granularity and implicit data, significantly improving node representation quality, as shown by experiments on 13 datasets covering varying homophily and heterophily. GRAIN consistently outperforms 12 state-of-the-art models, excelling on both homophilous and heterophilous graphs.

NeurIPS Conference 2025 Conference Paper

InstructFlow: Adaptive Symbolic Constraint-Guided Code Generation for Long-Horizon Planning

  • Haotian Chi
  • Zeyu Feng
  • Yueming LYU
  • Chengqi Zheng
  • Linbo Luo
  • Yew Soon Ong
  • Ivor Tsang
  • Hechang Chen

Long-horizon planning in robotic manipulation tasks requires translating underspecified, symbolic goals into executable control programs satisfying spatial, temporal, and physical constraints. However, language model-based planners often struggle with long-horizon task decomposition, robust constraint satisfaction, and adaptive failure recovery. We introduce InstructFlow, a multi-agent framework that establishes a symbolic, feedback-driven flow of information for code generation in robotic manipulation tasks. InstructFlow employs a InstructFlow Planner to construct and traverse a hierarchical instruction graph that decomposes goals into semantically meaningful subtasks, while a Code Generator generates executable code snippets conditioned on this graph. Crucially, when execution failures occur, a Constraint Generator analyzes feedback and induces symbolic constraints, which are propagated back into the instruction graph to guide targeted code refinement without regenerating from scratch. This dynamic, graph-guided flow enables structured, interpretable, and failure-resilient planning, significantly improving task success rates and robustness across diverse manipulation benchmarks, especially in constraint-sensitive and long-horizon scenarios.

NeurIPS Conference 2025 Conference Paper

Tackling Continual Offline RL through Selective Weights Activation on Aligned Spaces

  • Jifeng Hu
  • Sili Huang
  • Li Shen
  • Zhejian Yang
  • Shengchao Hu
  • Shisong Tang
  • Hechang Chen
  • Lichao Sun

Continual offline reinforcement learning (CORL) has shown impressive ability in diffusion-based continual learning systems by modeling the joint distributions of trajectories. However, most research only focuses on limited continual task settings where the tasks have the same observation and action space, which deviates from the realistic demands of training agents in various environments. In view of this, we propose Vector-Quantized Continual Diffuser, named VQ-CD, to break the barrier of different spaces between various tasks. Specifically, our method contains two complementary sections, where the quantization spaces alignment provides a unified basis for the selective weights activation. In the quantized spaces alignment, we leverage vector quantization to align the different state and action spaces of various tasks, facilitating continual training in the same space. Then, we propose to leverage a unified diffusion model attached by the inverse dynamic model to master all tasks by selectively activating different weights according to the task-related sparse masks. Finally, we conduct extensive experiments on 15 continual learning (CL) tasks, including conventional CL task settings (identical state and action spaces) and general CL task settings (various state and action spaces). Compared with 17 baselines, our method reaches the SOTA performance.

NeurIPS Conference 2024 Conference Paper

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

  • Sili Huang
  • Jifeng Hu
  • Zhejian Yang
  • Liwei Yang
  • Tao Luo
  • Hechang Chen
  • Lichao Sun
  • Bo Yang

Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of attention in transformers, current in-context RL methods suffer from huge computational costs as the task horizon increases. In contrast, the Mamba model is renowned for its efficient ability to process long-term dependencies, which provides an opportunity for in-context RL to solve tasks that require long-term memory. To this end, we first implement Decision Mamba (DM) by replacing the backbone of Decision Transformer (DT). Then, we propose a Decision Mamba-Hybrid (DM-H) with the merits of transformers and Mamba in high-quality prediction and long-term memory. Specifically, DM-H first generates high-value sub-goals from long-term memory through the Mamba model. Then, we use sub-goals to prompt the transformer, establishing high-quality predictions. Experimental results demonstrate that DM-H achieves state-of-the-art in long and short-term tasks, such as D4RL, Grid World, and Tmaze benchmarks. Regarding efficiency, the online testing of DM-H in the long-term task is 28$\times$ times faster than the transformer-based baselines.

TIST Journal 2024 Journal Article

Discovering Expert-Level Air Combat Knowledge via Deep Excitatory-Inhibitory Factorized Reinforcement Learning

  • Hai Yin Piao
  • Shengqi Yang
  • Hechang Chen
  • Junnan Li
  • Jin Yu
  • Xuanqi Peng
  • Xin Yang
  • Zhen Yang

Artificial Intelligence (AI) has achieved a wide range of successes in autonomous air combat decision-making recently. Previous research demonstrated that AI-enabled air combat approaches could even acquire beyond human-level capabilities. However, there remains a lack of evidence regarding two major difficulties. First, the existing methods with fixed decision intervals are mostly devoted to solving what to act but merely pay attention to when to act, which occasionally misses optimal decision opportunities. Second, the method of an expert-crafted finite maneuver library leads to a lack of tactics diversity, which is vulnerable to an opponent equipped with new tactics. In view of this, we propose a novel Deep Reinforcement Learning (DRL) and prior knowledge hybrid autonomous air combat tactics discovering algorithm, namely deep E xcitatory-i N hibitory f ACT or I zed maneu VE r ( ENACTIVE ) learning. The algorithm consists of two key modules, i.e., ENHANCE and FACTIVE. Specifically, ENHANCE learns to adjust the air combat decision-making intervals and appropriately seize key opportunities. FACTIVE factorizes maneuvers and then jointly optimizes them with significant tactics diversity increments. Extensive experimental results reveal that the proposed method outperforms state-of-the-art algorithms with a 62% winning rate and further obtains a margin of a 2.85-fold increase in terms of global tactic space coverage. It also demonstrates that a variety of discovered air combat tactics are comparable to human experts’ knowledge.

ICML Conference 2024 Conference Paper

DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

  • Siyuan Guo 0001
  • Cheng Deng 0001
  • Ying Wen 0001
  • Hechang Chen
  • Yi Chang 0001
  • Jun Wang 0012

In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with the goal of comprehending task requirements, then building and training the best-fit machine learning models. Despite their widespread success, existing LLM agents are hindered by generating unreasonable experiment plans within this scenario. To this end, we present DS-Agent, a novel automatic framework that harnesses LLM agent and case-based reasoning (CBR). In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle, and facilitate consistent performance improvement through the feedback mechanism. Moreover, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm to adapt past successful solutions from the development stage for direct code generation, significantly reducing the demand on foundational capabilities of LLMs. Empirically, DS-Agent with GPT-4 achieves 100% success rate in the development stage, while attaining 36% improvement on average one pass rate across alternative LLMs in the deployment stage. In both stages, DS-Agent achieves the best rank in performance, costing $1. 60 and \$0. 13 per run with GPT-4, respectively. Our data and code are open-sourced at https: //github. com/guosyjlu/DS-Agent.

ICML Conference 2024 Conference Paper

In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

  • Sili Huang
  • Jifeng Hu
  • Hechang Chen
  • Lichao Sun 0001
  • Bo Yang 0002

In-context learning is a promising approach for offline reinforcement learning (RL) to handle online tasks, which can be achieved by providing task prompts. Recent works demonstrated that in-context RL could emerge with self-improvement in a trial-and-error manner when treating RL tasks as an across-episodic sequential prediction problem. Despite the self-improvement not requiring gradient updates, current works still suffer from high computational costs when the across-episodic sequence increases with task horizons. To this end, we propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner. Specifically, IDT is inspired by the efficient hierarchical structure of human decision-making and thus reconstructs the sequence to consist of high-level decisions instead of low-level actions that interact with environments. As one high-level decision can guide multi-step low-level actions, IDT naturally avoids excessively long sequences and solves online tasks more efficiently. Experimental results show that IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods. In particular, the online evaluation time of our IDT is 36$\times$ times faster than baselines in the D4RL benchmark and 27$\times$ times faster in the Grid World benchmark.

NeurIPS Conference 2023 Conference Paper

Learning Generalizable Agents via Saliency-guided Features Decorrelation

  • Sili Huang
  • Yanchao Sun
  • Jifeng Hu
  • Siyuan Guo
  • Hechang Chen
  • Yi Chang
  • Lichao Sun
  • Bo Yang

In visual-based Reinforcement Learning (RL), agents often struggle to generalize well to environmental variations in the state space that were not observed during training. The variations can arise in both task-irrelevant features, such as background noise, and task-relevant features, such as robot configurations, that are related to the optimal decisions. To achieve generalization in both situations, agents are required to accurately understand the impact of changed features on the decisions, i. e. , establishing the true associations between changed features and decisions in the policy model. However, due to the inherent correlations among features in the state space, the associations between features and decisions become entangled, making it difficult for the policy to distinguish them. To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting. Concretely, SGFD consists of two core techniques: Random Fourier Functions (RFF) and the saliency map. RFF is utilized to estimate the complex non-linear correlations in high-dimensional images, while the saliency map is designed to identify the changed features. Under the guidance of the saliency map, SGFD employs sample reweighting to minimize the estimated correlations related to changed features, thereby achieving decorrelation in visual RL tasks. Our experimental results demonstrate that SGFD can generalize well on a wide range of test environments and significantly outperforms state-of-the-art methods in handling both task-irrelevant variations and task-relevant variations.

AAAI Conference 2023 Conference Paper

The Sufficiency of Off-Policyness and Soft Clipping: PPO Is Still Insufficient according to an Off-Policy Measure

  • Xing Chen
  • Dongcui Diao
  • Hechang Chen
  • Hengshuai Yao
  • Haiyin Piao
  • Zhixiao Sun
  • Zhiwei Yang
  • Randy Goebel

The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space. Does there exist better policies outside of this space? By using a novel surrogate objective that employs the sigmoid function (which provides an interesting way of exploration), we found that the answer is "YES", and the better policies are in fact located very far from the clipped space. We show that PPO is insufficient in "off-policyness", according to an off-policy metric called DEON. Our algorithm explores in a much larger policy space than PPO, and it maximizes the Conservative Policy Iteration (CPI) objective better than PPO during training. To the best of our knowledge, all current PPO methods have the clipping operation and optimize in the clipped policy space. Our method is the first of this kind, which advances the understanding of CPI optimization and policy gradient methods. Code is available at https://github.com/raincchio/P3O.

NeurIPS Conference 2022 Conference Paper

Distributional Reward Estimation for Effective Multi-agent Deep Reinforcement Learning

  • Jifeng Hu
  • Yanchao Sun
  • Hechang Chen
  • Sili Huang
  • Haiyin Piao
  • Yi Chang
  • Lichao Sun

Multi-agent reinforcement learning has drawn increasing attention in practice, e. g. , robotics and automatic driving, as it can explore optimal policies using samples generated by interacting with the environment. However, high reward uncertainty still remains a problem when we want to train a satisfactory model, because obtaining high-quality reward feedback is usually expensive and even infeasible. To handle this issue, previous methods mainly focus on passive reward correction. At the same time, recent active reward estimation methods have proven to be a recipe for reducing the effect of reward uncertainty. In this paper, we propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL). Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training. Specifically, we design the multi-action-branch reward estimation to model reward distributions on all action branches. Then we utilize reward aggregation to obtain stable updating signals during training. Our intuition is that consideration of all possible consequences of actions could be useful for learning policies. The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.

IJCAI Conference 2020 Conference Paper

Attention-based Multi-level Feature Fusion for Named Entity Recognition

  • Zhiwei Yang
  • Hechang Chen
  • Jiawei Zhang
  • Jing Ma
  • Yi Chang

Named entity recognition (NER) is a fundamental task in the natural language processing (NLP) area. Recently, representation learning methods (e. g. , character embedding and word embedding) have achieved promising recognition results. However, existing models only consider partial features derived from words or characters while failing to integrate semantic and syntactic information (e. g. , capitalization, inter-word relations, keywords, lexical phrases, etc. ) from multi-level perspectives. Intuitively, multi-level features can be helpful when recognizing named entities from complex sentences. In this study, we propose a novel framework called attention-based multi-level feature fusion (AMFF), which is used to capture the multi-level features from different perspectives to improve NER. Our model consists of four components to respectively capture the local character-level, global character-level, local word-level, and global word-level features, which are then fed into a BiLSTM-CRF network for the final sequence labeling. Extensive experimental results on four benchmark datasets show that our proposed model outperforms a set of state-of-the-art baselines.