Author name cluster

Nuo Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

IS Journal 2026 Journal Article

A Dynamic Framework to Integrate Deep Reinforcement Learning with Hierarchical Symbolic Plans

Xuelong Liu
Nuo Chen
Wenji Mao
Daniel Zeng

Neuro-symbolic framework has become one of the mainstream paradigms in intelligent system design. For intelligent decision-making, Reinforcement Learning (RL) and automated planning are the representative neural and symbolic techniques, respectively, which can facilitate each other. Despite the rapid development and wide applications of deep RL, its drawbacks on sample efficiency and convergence in sparse-reward environments have become the major obstacles to hinder its advancement. To address these issues, in this paper, we propose a neuro-symbolic framework to integrate deep RL with hierarchical plans. Specifically, we develop a selective Monte-Carlo Tree Search algorithm, in which hierarchical plans are dynamically constructed during the learning process. The constructed plans, in turn, provide the high-level guidance for RL to constrain the subtasks leading to goal attainment, thus reducing useless/redundant exploration in RL. Experiments on five challenging scenarios show that our framework achieves better sample efficiency and faster convergence compared to the state-of-the-art approaches.

Details DOI

AAAI Conference 2026 Conference Paper

CoGenSAM: Codebook-Interactive Generative Labeling for Adapting SAM to Crack Segmentation

Zhuangzhuang Chen
Nuo Chen
Dachong Li
Zhiliang Lin
Xingyu Feng
Yifan Zhang
Jie Chen
Jianqiang Li

The goal of this work is to adapt Segment Anything Models (SAM) into crack segmentation tasks via automatic label generation, thus eliminating manual annotation cost. In this regard, an intuitive approach is to extract edges of crack samples and generate labels via the dilation and erosion processes for fine-tuning SAM. However, this simple solution cannot guarantee the quality of generated labels, as crack regions will be corrupted due to the imperfect edge detection. To this end, this paper proposes CoGenSAM, a novel Codebook-interactive Generative Labeling framework that enables an annotation-free SAM fine-tuning. To achieve this, in the first stage, we pre-train a vector-quantized variational auto-encoder (VQVAE) by reconstructing the synthesized crack-like structures for learning crack-aware priors within the codebook. In the second stage, these priors help another VQVAE serve as the restoration model to restore the randomly corrupted structures into uncorrupted ones. Specifically, we propose the crack-aware contrastive-interaction to maximize the mutual information with the above priors via codebook interaction. Then, high-quality labels can be generated by restoring corrupted labels from edge detection, contributing to an annotation-free SAM fine-tuning. We collect a new dataset, Bridge2025, to address the limited availability of related bridge-oriented benchmarks. Experiments show that our performance is close to fully-supervised methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

ExtendAttack: Attacking Servers of LRMs via Extending Reasoning

Zhenhao Zhu
Yue Liu
Zhiwei Xu
Yingwei Ma
Hongcheng Gao
Nuo Chen
Yanpei Guo
Wenjie Qu

Large Reasoning Models (LRMs) have demonstrated promising performance in complex tasks. However, the resource-consuming reasoning processes may be exploited by attackers to maliciously occupy the resources of the servers, leading to a crash, like the DDoS attack in cyber. To this end, we propose a novel attack method on LRMs termed ExtendAttack to maliciously occupy the resources of servers by stealthily extending the reasoning processes of LRMs. Concretely, we systematically obfuscate characters within a benign prompt, transforming them into a complex, poly-base ASCII representation. This compels the model to perform a series of computationally intensive decoding sub-tasks that are deeply embedded within the semantic structure of the query itself. Extensive experiments demonstrate the effectiveness of our proposed ExtendAttack. Remarkably, it significantly increases response length and latency, with the former increasing by over 2.7 times for the o3 model on the HumanEval benchmark. Besides, it preserves the original meaning of the query and achieves comparable answer accuracy, showing the stealthiness.

PDF Details DOI

TMLR Journal 2026 Journal Article

α-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction

Sanbao Su
Nuo Chen
Chenchen Lin
Felix Juefei-Xu
Chen Feng
Fei Miao

Comprehending 3D scenes is paramount for tasks such as planning and mapping for autonomous vehicles and robotics. Camera-based 3D Semantic Occupancy Prediction (OCC) aims to infer scene geometry and semantics from limited observations. While it has gained popularity due to affordability and rich visual cues, existing methods often neglect the inherent uncertainty in models. To address this, we propose an uncertainty-aware OCC method (α-OCC). We first introduce Depth-UP, an uncertainty propagation framework that improves geometry completion by up to 11.58% and semantic segmentation by up to 12.95% across various OCC models. For uncertainty quantification (UQ), we propose the hierarchical conformal prediction (HCP) method, effectively handling the high-level class imbalance in OCC datasets. On the geometry level, the novel KL-based score function significantly improves the occupied recall (45%) of safety-critical classes with minimal performance overhead (3.4% reduction). On UQ, our HCP achieves smaller prediction set sizes while maintaining the defined coverage guarantee. Compared with baselines, it reduces up to 90% set size, with 18% further reduction when integrated with Depth-UP. Our contributions advance OCC accuracy and robustness, marking a noteworthy step forward in autonomous perception systems. Our code is public on https://coperception.github.io/alpha-OCC/.

PDF Details

NeurIPS Conference 2025 Conference Paper

Chain of Execution Supervision Promotes General Reasoning in Large Language Models

Nuo Chen
Zehua Li
Keqin Bao
Junyang Lin
Dayiheng Liu

Building robust and general reasoning ability is a central goal in the development of large language models (LLMs). Recent efforts increasingly turn to code as a rich training source, given its inherent logical structure and diverse reasoning paradigms—such as divide-and-conquer, topological ordering, and enumeration. However, reasoning in code is often expressed implicitly and entangled with syntactic or implementation noise, making direct training on raw code suboptimal. To address this, we introduce TraceMind, a large-scale corpus of 2. 6 million samples that transforms code execution into explicit, step-by-step chain-of-thought style rationales, which we call Chain of Execution (CoE). The corpus spans domains including mathematics, classical algorithms and algorithmic competition, and is enriched with variable-tracing questions and code rewritings to enhance logical granularity and code diversity. We evaluate Tracepile using three training setups—continue-pretraining, instruction tuning after pretraining, and two-stage finetuning. Experiments across four base models (LLaMA 3, LLaMA 3. 1, Qwen-2. 5, and Qwen-2. 5 Coder) and 20 benchmarks covering math, code, logic, and algorithms demonstrate consistent improvements. Notably, Tracepile boosts LLaMA3-8B by 9. 2\% on average across nine math datasets and delivers clear gains on LiveCodeBench, CRUX, and Zebra Logic under two-stage finetuning.

PDF Details

ICLR Conference 2025 Conference Paper

GraphArena: Evaluating and Exploring Large Language Models on Graph Computation

Jianheng Tang
Qifan Zhang
Yuhan Li
Nuo Chen
Jia Li

The ``arms race'' of Large Language Models (LLMs) demands new benchmarks to examine their progresses. In this paper, we introduce GraphArena, a benchmarking tool designed to evaluate LLMs on real-world graph computational problems. It offers a suite of four polynomial-time tasks (e.g., Shortest Distance) and six NP-complete challenges (e.g., Traveling Salesman Problem). GraphArena features a rigorous evaluation framework that classifies LLM outputs as correct, suboptimal (feasible but not optimal), hallucinatory (properly formatted but infeasible), or missing. Evaluation of over 10 LLMs reveals that even top-performing LLMs struggle with larger, more complex graph problems and exhibit hallucination issues. We further explore four potential solutions to address this issue and improve LLMs on graph computation, including chain-of-thought prompting, instruction tuning, code writing, and scaling test-time compute, each demonstrating unique strengths and limitations. GraphArena complements the existing LLM benchmarks and is open-sourced at https://github.com/squareRoot3/GraphArena.

Details

ECAI Conference 2025 Conference Paper

Significance-Driven Skeleton Map Convolution for Skeleton-Based Action Recognition

Nuo Chen
Liqun Huang
Xiaotao Huang

Graph Convolutional Networks (GCNS) have been widely used with excellent performance in skeleton based human action recognition. In GCN-based methods, neighborhood graphs with semantics are important for the network, however, existing methods have limitations: although they optimise the neighborhood matrix by adaptive weighting, it is difficult to focus on key regions with high contribution to the action, resulting in a limited ability to discriminate visually similar actions. To solve this problem, we propose two approaches: 1) construct additional dynamic sensitive topology focusing on the spatial associations of key nodes by extracting the high contribution vertex set based on the magnitude of motion changes. 2) propose a regional-level spatio-temporal aggregation module to achieve fine-grained spatio-temporal semantic modeling. Finally, we validate the effectiveness of our proposed module through various comparative experiments.

Details

ICRA Conference 2025 Conference Paper

Task-Oriented 6-DoF Grasp Pose Detection in Clutters

An-Lan Wang
Nuo Chen
Kun-Yu Lin
Yuan-Ming Li
Wei-Shi Zheng 0001

In general, humans would grasp an object differently for different tasks, e. g. , “grasping the handle of a knife to cut” vs. “grasping the blade to hand over”. In the field of robotic grasp pose detection research, some existing works consider this task-oriented grasping and made some progress, but they are generally constrained by low-DoF gripper type or non-cluttered setting, which is not applicable for human assistance in real life. With an aim to get more general and practical grasp models, in this paper, we investigate the problem named Task-Oriented 6-DoF Grasp Pose Detection in Clutters (TO6DGC), which extends the task-oriented problem to a more general 6-DOF Grasp Pose Detection in Cluttered (multi-object) scenario. To this end, we construct a large-scale 6-DoF task-oriented grasping dataset, 6-DoF Task Grasp (6DTG), which features 4391 cluttered scenes with over 2 million 6-DoF grasp poses. Each grasp is annotated with a specific task, involving 6 tasks and 198 objects in total. Moreover, we propose One-Stage TaskGrasp (OSTG), a strong baseline to address the TO6DGC problem. Our OSTG adopts a task-oriented point selection strategy to detect where to grasp, and a task-oriented grasp generation module to decide how to grasp given a specific task. To evaluate the effectiveness of OSTG, extensive experiments are conducted on 6DTG. The results show that our method outperforms various baselines on multiple metrics. Real robot experiments also verify that our OSTG has a better perception of the task-oriented grasp points and 6-DoF grasp poses.

Details

NeurIPS Conference 2024 Conference Paper

Are Your Models Still Fair? Fairness Attacks on Graph Neural Networks via Node Injections

Zihan Luo
Hong Huang
Yongkang Zhou
Jiping Zhang
Nuo Chen
Hai Jin

Despite the remarkable capabilities demonstrated by Graph Neural Networks (GNNs) in graph-related tasks, recent research has revealed the fairness vulnerabilities in GNNs when facing malicious adversarial attacks. However, all existing fairness attacks require manipulating the connectivity between existing nodes, which may be prohibited in reality. To this end, we introduce a Node Injection-based Fairness Attack (NIFA), exploring the vulnerabilities of GNN fairness in such a more realistic setting. In detail, NIFA first designs two insightful principles for node injection operations, namely the uncertainty-maximization principle and homophily-increase principle, and then optimizes injected nodes’ feature matrix to further ensure the effectiveness of fairness attacks. Comprehensive experiments on three real-world datasets consistently demonstrate that NIFA can significantly undermine the fairness of mainstream GNNs, even including fairness-aware GNNs, by injecting merely 1% of nodes. We sincerely hope that our work can stimulate increasing attention from researchers on the vulnerability of GNN fairness, and encourage the development of corresponding defense mechanisms. Our code and data are released at: https: //github. com/CGCL-codes/NIFA.

PDF Details DOI

AAAI Conference 2023 Conference Paper

FiTs: Fine-Grained Two-Stage Training for Knowledge-Aware Question Answering

Qichen Ye
Bowen Cao
Nuo Chen
Weiyuan Xu
Yuexian Zou

Knowledge-aware question answering (KAQA) requires the model to answer questions over a knowledge base, which is essential for both open-domain QA and domain-specific QA, especially when language models alone cannot provide all the knowledge needed. Despite the promising result of recent KAQA systems which tend to integrate linguistic knowledge from pre-trained language models (PLM) and factual knowledge from knowledge graphs (KG) to answer complex questions, a bottleneck exists in effectively fusing the representations from PLMs and KGs because of (i) the semantic and distributional gaps between them, and (ii) the difficulties in joint reasoning over the provided knowledge from both modalities. To address the above two problems, we propose a Fine-grained Two-stage training framework (FiTs) to boost the KAQA system performance: The first stage aims at aligning representations from the PLM and the KG, thus bridging the modality gaps between them, named knowledge adaptive post-training. The second stage, called knowledge-aware fine-tuning, aims to improve the model's joint reasoning ability based on the aligned representations. In detail, we fine-tune the post-trained model via two auxiliary self-supervised tasks in addition to the QA supervision. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on three benchmarks in the commonsense reasoning (i.e., CommonsenseQA, OpenbookQA) and medical question answering (i.e., MedQA-USMILE) domains.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Human Mobility Modeling during the COVID-19 Pandemic via Deep Graph Diffusion Infomax

Yang Liu
Yu Rong
Zhuoning Guo
Nuo Chen
Tingyang Xu
Fugee Tsung
Jia Li

Non-Pharmaceutical Interventions (NPIs), such as social gathering restrictions, have shown effectiveness to slow the transmission of COVID-19 by reducing the contact of people. To support policy-makers, multiple studies have first modelled human mobility via macro indicators (e.g., average daily travel distance) and then study the effectiveness of NPIs. In this work, we focus on mobility modelling and, from a micro perspective, aim to predict locations that will be visited by COVID-19 cases. Since NPIs generally cause economic and societal loss, such a prediction benefits governments when they design and evaluate them. However, in real-world situations, strict privacy data protection regulations result in severe data sparsity problems (i.e., limited case and location information). To address these challenges and jointly model variables including a geometric graph, a set of diffusions and a set of locations, we propose a model named Deep Graph Diffusion Infomax (DGDI). We show the maximization of DGDI can be bounded by two tractable components: a univariate Mutual Information (MI) between geometric graph and diffusion representation, and a univariate MI between diffusion representation and location representation. To facilitate the research of COVID-19 prediction, we present two benchmarks that contain geometric graphs and location histories of COVID-19 cases. Extensive experiments on the two benchmarks show that DGDI significantly outperforms other competing methods.

PDF Details DOI

AAAI Conference 2022 Conference Paper

From Good to Best: Two-Stage Training for Cross-Lingual Machine Reading Comprehension

Nuo Chen
Linjun Shou
Ming Gong
Jian Pei

Cross-lingual Machine Reading Comprehension (xMRC) is challenging due to the lack of training data in low-resource languages. The recent approaches use training data only in a resource-rich language like English to fine-tune large-scale cross-lingual pre-trained language models. Due to the big difference between languages, a model fine-tuned only by a source language may not perform well for target languages. Interestingly, we observe that while the top-1 results predicted by the previous approaches may often fail to hit the ground-truth answers, the correct answers are often contained in the top-k predicted results. Based on this observation, we develop a two-stage approach to enhance the model performance. The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer. The second stage focuses on precision: an answer-aware contrastive learning (AA-CL) mechanism is developed to learn the fine difference between the accurate answer and other candidates. Our extensive experiments show that our model significantly outperforms a series of strong baselines on two cross-lingual MRC benchmark datasets.

PDF Details

IJCAI Conference 2021 Conference Paper

MRD-Net: Multi-Modal Residual Knowledge Distillation for Spoken Question Answering

Chenyu You
Nuo Chen
Yuexian Zou

Spoken question answering (SQA) has recently drawn considerable attention in the speech community. It requires systems to find correct answers from the given spoken passages simultaneously. The common SQA systems consist of the automatic speech recognition (ASR) module and text-based question answering module. However, previous methods suffer from severe performance degradation due to ASR errors. To alleviate this problem, this work proposes a novel multi-modal residual knowledge distillation method (MRD-Net), which further distills knowledge at the acoustic level from the audio-assistant (Audio-A). Specifically, we utilize the teacher (T) trained on manual transcriptions to guide the training of the student (S) on ASR transcriptions. We also show that introducing an Audio-A helps this procedure by learning residual errors between T and S. Moreover, we propose a simple yet effective attention mechanism to adaptively leverage audio-text features as the new deep attention knowledge to boost the network performance. Extensive experiments demonstrate that the proposed MRD-Net achieves superior results compared with state-of-the-art methods on three spoken question answering benchmark datasets.

PDF Details DOI