Author name cluster

Zeyu Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

AAAI Conference 2026 Conference Paper

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Yuhang Liu
Zeyu Liu
Shuanghe Zhu
Pengxiang Li
Congkai Xie
Jiasheng Wang
Xueyu Hu
Xiaotian Han

The emergence of Multimodal Large Language Models (MLLMs) has propelled the development of autonomous agents that operate on Graphical User Interfaces (GUIs) using pure visual input. A fundamental challenge is robustly grounding natural language instructions. This requires a precise spatial alignment, which accurately locates the coordinates of each element, and, more critically, a correct semantic alignment, which matches the instructions to the functionally appropriate UI element. Although Reinforcement Learning with Verifiable Rewards (RLVR) has proven to be effective at improving spatial alignment for these MLLMs, we find that inefficient exploration bottlenecks semantic alignment, which prevents models from learning difficult semantic associations. To address this exploration problem, we present Adaptive Exploration Policy Optimization (AEPO), a new policy optimization framework. AEPO employs a multi-answer generation strategy to enforce broader exploration, which is then guided by a theoretically grounded Adaptive Exploration Reward (AER) function derived from first principles of efficiency η=U/C. Our AEPO-trained models, InfiGUI-G1-3B and InfiGUI-G1-7B, establish new state-of-the-art results across multiple challenging GUI grounding benchmarks, achieving significant relative improvements of up to 9.0% against the naive RLVR baseline on benchmarks designed to test generalization and semantic understanding.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

Liyan Tang
Grace Kim
Xinyu Zhao
Thom Lake
Wenxuan Ding
Fangcong Yin
Prasann Singhal
Manya Wadhwa

Chart understanding presents a unique challenge for large vision-language models (LVLMs), as it requires the integration of sophisticated textual and visual reasoning capabilities. However, current LVLMs exhibit a notable imbalance between these skills, falling short on visual reasoning that is difficult to perform in text. We conduct a case study using a synthetic dataset solvable only through visual reasoning and show that model performance degrades significantly with increasing visual complexity, while human performance remains robust. We then introduce ChartMuseum, a new Chart Question Answering (QA) benchmark containing 1, 162 expert-annotated questions spanning multiple reasoning types, curated from real-world charts across 184 sources, specifically built to evaluate complex visual and textual reasoning. Unlike prior chart understanding benchmarks---where frontier models perform similarly and near saturation---our benchmark exposes a substantial gap between model and human performance, while effectively differentiating model capabilities: although humans achieve 93% accuracy, the best-performing model Gemini-2. 5-Pro attains only 63. 0%, and the leading open-source LVLM Qwen2. 5-VL-72B-Instruct achieves only 38. 5%. Moreover, on questions requiring primarily visual reasoning, all models experience a 35%-55% performance drop from text-reasoning-heavy question performance. Lastly, our qualitative error analysis reveals specific categories of visual reasoning that are challenging for current LVLMs. Both ChartMuseum and the evaluation code are available at https: //github. com/Liyan06/ChartMuseum.

PDF Details

IROS Conference 2025 Conference Paper

EIC Framework for Hand Exoskeletons Based on a Multimodal Large Language Model

Houcheng Li
Zhenchan Su
Honglei Guo
Yifan Wang
Zeyu Liu
Long Cheng

Current hand exoskeleton interaction methods primarily focus on recognizing a limited range of hand motion intentions and rely on pre-programmed control to execute predefined commands. However, these approaches face significant limitations when confronted with unanticipated or non-predefined scenarios, such as performing various gestures or grasping different objects. To address this challenge, this paper proposes an embodied interaction control (EIC) framework for hand exoskeletons based on a multimodal large language model (MLLM). First, an embodied interaction method leveraging multi-modal fusion of speech and image information is developed, enabling more intuitive, hands-free, accurate, and robust human-robot interaction. By utilizing multi-modal data, the MLLM infers the user’s hand motion intentions and generates corresponding motion plans for the exoskeleton. The underlying control strategy is then used to execute the motion planning. Notably, leveraging the advanced reasoning and code-generation capabilities of MLLMs, the framework can generate undefined gestures and grasping actions. Finally, experimental results validate the effectiveness and generalizability of the EIC framework.

Details

ICML Conference 2025 Conference Paper

Local Identifying Causal Relations in the Presence of Latent Variables

Zheng Li
Zeyu Liu
Feng Xie 0002
Hao Zhang 0079
Chunchen Liu
Zhi Geng

We tackle the problem of identifying whether a variable is the cause of a specified target using observational data. State-of-the-art causal learning algorithms that handle latent variables typically rely on identifying the global causal structure, often represented as a partial ancestral graph (PAG), to infer causal relationships. Although effective, these approaches are often redundant and computationally expensive when the focus is limited to a specific causal relationship. In this work, we introduce novel local characterizations that are necessary and sufficient for various types of causal relationships between two variables, enabling us to bypass the need for global structure learning. Leveraging these local insights, we develop efficient and fully localized algorithms that accurately identify causal relationships from observational data. We theoretically demonstrate the soundness and completeness of our approach. Extensive experiments on benchmark networks and real-world datasets further validate the effectiveness and efficiency of our method.

Details

EAAI Journal 2025 Journal Article

Mutual transfer learning for cuff-less blood pressure estimation using photoplethysmography-based visibility graphs

Chenbin Ma
Zhenchang Liu
Peng Zhang
Lishuang Guo
Haonan Zhang
Zeyu Liu
Guanglei Zhang

Details DOI

TMLR Journal 2025 Journal Article

When SNN meets ANN: Error-Free ANN-to-SNN Conversion for Extreme Edge Efficiency

Gourav Datta
Zeyu Liu
James Diffenderfer
Bhavya Kailkhura
Peter Anthony Beerel

Spiking Neural Networks (SNN) are now demonstrating comparable accuracy to convolutional neural networks (CNN), thanks to advanced ANN-to-SNN conversion techniques, all while delivering remarkable energy and latency efficiency when deployed on neuromorphic hardware. However, these conversion techniques incur a large number of time steps, and consequently, high spiking activity. In this paper, we propose a novel ANN-to-SNN conversion framework, that incurs an exponentially lower number of time steps compared to that required in the existing conversion approaches. Our framework modifies the standard integrate-and-fire (IF) neuron model used in SNNs with no change in computational complexity and shifts the bias term of each batch normalization (BN) layer in the trained ANN. To reduce spiking activity, we propose training the source ANN with a fine-grained $\ell_1$ regularizer with surrogate gradients that encourages high spike sparsity in the converted SNN. Our proposed framework thus yields lossless SNNs with low latency, low compute energy, thanks to the low time steps and high spike sparsity, and high test accuracy, for example, $75.12$% with only $4$ time steps on the ImageNet dataset. Codes will be made available. Code is available at https://github.com/godatta/SNN_meets_ANN.

PDF Details

IROS Conference 2024 Conference Paper

FOCWS: A High Sensitive Flexible Optical Curvature Sensor Inspired by Arthropod Sensory Systems

Jiachen Wei
Zhengwei Li
Zeyu Liu
Wei He
Long Cheng
Yanhong Liu

Flexible sensors for joint angle measurement play a crucial role in various human-robot interaction applications. In previous studies, sensors with various sensing mechanisms have been developed. Among them, optical waveguide sensors exhibit high resistance to environmental factors (such as temperature and humidity) and low sensitivity to electromagnetic interference. Researchers have enhanced the sensitivity of optical waveguide sensors to tensile strain by doping other substances (such as graphite) into the optical core material of the optical waveguide. However, the sensitivity of measuring joint angles based on tensile strain principles remains relatively low. In nature, arthropods utilize crack-like structures near their leg joints to perceive minute mechanical stress changes. Here, we propose a curvature sensor based on a Flexible Optical Crack Waveguide Structure (FOCWS) inspired by the arthropod sensory systems. By cutting the optical core, we increase its light power loss during bending strain, thereby enhancing the sensor’s sensitivity to angle measurement. The characteristics of light propagation and geometric parameters were studied through simulation, and experiments were designed to validate the simulation results. The average sensitivity is 0. 068 dB/ °, which is nearly 300 times higher compared to uncut optical waveguide.

Details

IROS Conference 2023 Conference Paper

A Two-Dimensional Reticular Core Optical Waveguide Sensor for Tactile and Positioning Sensing

Zeyu Liu
Zhengwei Li
Long Cheng

Tactile sensors based on optical waveguides are highly sensitive to pressure, possess good chemical inertness and electromagnetic resistance, and are unaffected by temperature changes in the surrounding environment. Researchers have developed various waveguide structures with multi-level cores to simultaneously measure tactile forces and positions. However, these designs result in thicker waveguides and reduced sensitivity in the lower levels. This study introduces a two-dimensional reticular core optical waveguide for tactile force and positioning sensing, where vertical waveguides intersect each other. The reticular core reduces waveguide thickness and simplifies fabrication processes. The simulation investigates the characteristics of light propagation and geometric parameters. Experimental results confirm the proposed reticular waveguide's force-sensing capability, with an average sensitivity of 0. 36 dB/N. Compared to the split-level structure, the reticular waveguide demonstrates more consistent sensitivities along the two shear directions. Utilizing a deep neural network, the spatial resolution achieves approximately 0. 72 mm along the X-axis and 1. 14 mm along the Y-axis, outperforming the split-level structure.

Details