Author name cluster

Runze Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

JBHI Journal 2026 Journal Article

Avatar-Based Picture Exchange Communication System Enhancing Joint Attention Training for Children With Autism

Yongjun Ren
Runze Liu
Huinan Sang
Xiaofeng Yu

Children with Autism Spectrum Disorder (ASD) often struggle with social communication and feel anxious in interactive situations. The Picture Exchange Communication System (PECS) is commonly used to enhance basic communication skills in children with ASD, but it falls short in reducing social anxiety during therapist interactions and in keeping children engaged. This paper proposes the use of virtual character technology alongside PECS training to address these issues. By integrating a virtual avatar, children's communication skills and ability to express needs can be gradually improved. This approach also reduces anxiety and enhances the interactivity and attractiveness of the training. After conducting a T-test, it was found that PECS assisted by a virtual avatar significantly improves children's focus on activities and enhances their behavioral responsiveness. To address the problem of poor accuracy of gaze estimation in unconstrained environments, this study further developed a visual feature-based gaze estimation algorithm, the three-channel gaze network (TCG-Net). It utilizes binocular images to refine the gaze direction and infer the primary focus from facial images. Our focus was on enhancing gaze tracking accuracy in natural environments, crucial for evaluating and improving Joint Attention (JA) in children during interactive processes. TCG-Net achieved an angular error of 4. 0 on the MPIIGaze dataset, 5. 0 on the EyeDiap dataset, and 6. 8 on the RT-Gene dataset, confirming the effectiveness of our approach in improving gaze accuracy and the quality of social interactions.

Details DOI

AAAI Conference 2026 Conference Paper

GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

Jian Zhao
Runze Liu
Kaiyan Zhang
Zhimu Zhou
Junqi Gao
Dong Li
Jiafei Lyu
Zhouyi Qian

Recent advancements in Large Language Models (LLMs) have shown that it is promising to utilize Process Reward Models (PRMs) as verifiers to enhance the performance of LLMs. However, current PRMs face three key challenges: (1) limited process supervision and generalization capabilities, (2) dependence on scalar value prediction without leveraging the generative abilities of LLMs, and (3) inability to scale the test-time compute of PRMs. In this work, we introduce GenPRM, a generative process reward model that performs explicit Chain-of-Thought (CoT) reasoning with code verification before providing judgment for each reasoning step. To obtain high-quality process supervision labels and rationale data, we propose Relative Progress Estimation (RPE) and a rationale synthesis framework that incorporates code verification. Experimental results on ProcessBench and several mathematical reasoning tasks show that GenPRM significantly outperforms prior PRMs with only 23K training data from MATH dataset. Through test-time scaling, a 1.5B GenPRM outperforms GPT-4o, and a 7B GenPRM surpasses Qwen2.5-Math-PRM-72B on ProcessBench. Additionally, GenPRM demonstrates strong abilities to serve as a critic model for policy model refinement. This work establishes a new paradigm for process supervision that bridges the gap between PRMs and critic models in LLMs.

PDF Details DOI

JBHI Journal 2026 Journal Article

Multi-Channel Temporal Interference Retinal Stimulation Based on Reinforcement Learning

Xiayu Chen
Wennan Chan
Yingqiang Meng
Runze Liu
Yueyi Yu
Sheng Hu
Jijun Han
Xiaoxiao Wang

Retinal degenerative diseases such as age-related macular degeneration and retinitis pigmentosa cause severe vision impairment, while current electrical stimulation therapies are limited by poor spatial targeting precision. As a promising non-invasive alternative, the efficacy of temporal interference stimulation (TIS) for retinal targeting depends on optimized multi-electrode parameters. This study reconstructed a whole-head finite element model with detailed ocular structures and applied reinforcement learning (RL)-based multi-channel electrode parameter optimization to retinal stimulation. Systematic evaluation demonstrated that the focal precision of TIS improves with increasing channel numbers (consistent across all subject head models), with RL significantly outperforming conventional genetic algorithms (GA) and unsupervised neural networks (USNN) in focusing capability. Furthermore, by implementing the computationally intensive envelope calculation using the JAX framework, we achieved a nearly order-of-magnitude reduction in optimization time (to approx. 2 minutes per run on an RTX 4090D), significantly enhancing the practical feasibility of the proposed RL framework. This work provides a novel and computationally efficient methodology for precise non-invasive neuromodulation parameter optimization, applicable not only to retinal diseases but potentially to broader neurological conditions.

Details DOI

NeurIPS Conference 2025 Conference Paper

Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration

俊琪高
Zhichang Guo
Dazhi Zhang
Dong Li
Runze Liu
Pengfei Li
Kai Tian
Biqing Qi

Heterogeneous Large Language Model (LLM) fusion integrates the strengths of multiple source LLMs with different architectures into a target LLM with low computational overhead. While promising, existing methods suffer from two major limitations: 1) reliance on real data from limited domain for knowledge fusion, preventing the target LLM from fully acquiring knowledge across diverse domains, and 2) fixed data allocation proportions across domains, failing to dynamically adjust according to the target LLM's varying capabilities across domains, leading to a capability imbalance. To overcome these limitations, we propose Bohdi, a synthetic-data-only heterogeneous LLM fusion framework. Through the organization of knowledge domains into a hierarchical tree structure, Bohdi enables automatic domain exploration and multi-domain data generation through multi-model collaboration, thereby comprehensively extracting knowledge from source LLMs. By formalizing domain expansion and data sampling proportion allocation on the knowledge tree as a Hierarchical Multi-Armed Bandit problem, Bohdi leverages the designed DynaBranches mechanism to adaptively adjust sampling proportions based on the target LLM's performance feedback across domains. Integrated with our proposed Introspection-Rebirth (IR) mechanism, DynaBranches dynamically tracks capability shifts during target LLM's updates via Sliding Window Binomial Likelihood Ratio Testing (SWBLRT), further enhancing its online adaptation capability. Comparative experimental results on a comprehensive suite of benchmarks demonstrate that Bohdi significantly outperforms existing baselines on multiple target LLMs, exhibits higher data efficiency, and virtually eliminates the imbalance in the target LLM's capabilities.

PDF Details

IROS Conference 2025 Conference Paper

PCGE: Boosting 3D Visual Grounding via Progressive Comprehension and Geometric-topology Perception Enhancement

Zeyue Wang
Xixia Xu
Runze Liu
Dongchen Zhu
Jiamao Li

The 3D visual grounding task aims to establish correspondences between the 3D physical world and textual descriptions. Despite significant progress having been made, it still suffers from some challenges that need to be solved. a) Scene-agnostic text reasoning causes misaligned target region concentration. b) The regional pseudo-center interferences result in an inaccurate geometric center. c) Multi-modal features overemphasize semantics, leading to degradation in geometric topological perception for size regression. To address these issues, we creatively propose a Progressive Comprehension and Geometric-topology Perception Enhancement (PCGE) one-stage framework, which decouples the task into keypoint estimation and size regression under textual constraints. Specifically, to enable coarse-to-fine keypoint estimation, we propose the STAR module to focus the target region approximately with a scene-specific reasoning mechanism, while the K2C module performs geometric calibration to alleviate pseudo-center bias. For size regression, we propose GTE to enhance the geometric boundary perception during the decoding process, improving size regression via establishing topological matrices. Compared with previous methods, our approach achieves state-of-the-art performance on ScanRefer and Sr3D, with 3. 94% leads of Acc@0. 50 on ScanRefer, and 3. 7% leads on Sr3D.

Details

AAAI Conference 2025 Conference Paper

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors

Fengshuo Bai
Runze Liu
Yali Du
Ying Wen
Yaodong Yang

Evaluating deep reinforcement learning (DRL) agents against targeted behavior attacks is critical for assessing their robustness. These attacks aim to manipulate the victim into specific behaviors that align with the attacker’s objectives, often bypassing traditional reward-based defenses. Prior methods have primarily focused on reducing cumulative rewards; however, rewards are typically too generic to capture complex safety requirements effectively. As a result, focusing solely on reward reduction can lead to suboptimal attack strategies, particularly in safety-critical scenarios where more precise behavior manipulation is needed. To address these challenges, we propose RAT, a method designed for universal, targeted behavior attacks. RAT trains an intention policy that is explicitly aligned with human preferences, serving as a precise behavioral target for the adversary. Concurrently, an adversary manipulates the victim's policy to follow this target behavior. To enhance the effectiveness of these attacks, RAT dynamically adjusts the state occupancy measure within the replay buffer, allowing for more controlled and effective behavior manipulation. Our empirical results on robotic simulation tasks demonstrate that RAT outperforms existing adversarial attack algorithms in inducing specific behaviors. Additionally, RAT shows promise in improving agent robustness, leading to more resilient policies. We further validate RAT by guiding Decision Transformer agents to adopt behaviors aligned with human preferences in various MuJoCo tasks, demonstrating its effectiveness across diverse tasks.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning

Runze Liu
Fengshuo Bai
Yali Du
Yaodong Yang

Setting up a well-designed reward function has been challenging for many reinforcement learning applications. Preference-based reinforcement learning (PbRL) provides a new framework that avoids reward engineering by leveraging human preferences (i. e. , preferring apples over oranges) as the reward signal. Therefore, improving the efficacy of data usage for preference data becomes critical. In this work, we propose Meta-Reward-Net (MRN), a data-efficient PbRL framework that incorporates bi-level optimization for both reward and policy learning. The key idea of MRN is to adopt the performance of the Q-function as the learning target. Based on this, MRN learns the Q-function and the policy in the inner level while updating the reward function adaptively according to the performance of the Q-function on the preference data in the outer level. Our experiments on robotic simulated manipulation tasks and locomotion tasks demonstrate that MRN outperforms prior methods in the case of few preference labels and significantly improves data efficiency, achieving state-of-the-art in preference-based RL. Ablation studies further demonstrate that MRN learns a more accurate Q-function compared to prior work and shows obvious advantages when only a small amount of human feedback is available. The source code and videos of this project are released at https: //sites. google. com/view/meta-reward-net.

PDF Details

TIST Journal 2020 Journal Article

STARS

Rui Liu
Runze Liu
Andrea Pugliese
V. S. Subrahmanian

Customers of virtually all online marketplaces rely upon reviews in order to select the product or service they wish to buy. These marketplaces in turn deploy review fraud detection systems so that the integrity of reviews is preserved. A well-known problem with review fraud detection systems is their underlying assumption that the majority of reviews are honest-this assumption leads to a vulnerability where an attacker can try to generate many fake reviews of a product. In this article, we consider the case where a company wishes to fraudulently promote its product through fake reviews and propose the Sockpuppet-based Targeted Attack on Reviewing Systems (STARS for short). STARS enables an attacker to enter fake reviews for a product from multiple, apparently independent, sockpuppet accounts. We show that the STARS attack enables companies to successfully promote their product against seven recent, well-known review fraud detectors on four datasets (Amazon, Epinions, and the BitcoinAlpha and OTC exchanges) by significant margins. To protect against the STARS attack, we propose a new fraud detection algorithm called RTV. RTV introduces a new class of users (called trusted users) and also considers reviews left by verified users which were not considered in existing review fraud detectors. We show that RTV significantly mitigates the impact of the STARS attack across the four datasets listed above.

Details DOI