Arrow Research search

Author name cluster

Shuai Ren

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
1 author row

Possible papers

5

AAAI Conference 2026 Conference Paper

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning

  • Zhengxi Lu
  • Yuxiang Chai
  • Yaxuan Guo
  • Xi Yin
  • Liang Liu
  • Hao Wang
  • Han Xiao
  • Shuai Ren

The recent DeepSeek-R1 has showcased the emergence of reasoning capabilities in large language models (LLMs) through reinforcement learning (RL) with rule-based rewards. Despite its success in language tasks, its application in multimodal domains, particularly in graphic user interface (GUI) agent tasks, remains under-explored. To address this gap, we propose UI-R1, the first framework to investigate how rule-based RL can enhance the reasoning capabilities of multimodal large language models (MLLMs) for GUI action prediction tasks. UI-R1 introduces a novel rule-based action reward scheme, enabling model optimization via policy-based algorithms such as Group Relative Policy Optimization (GRPO). To further improve efficiency at inference time, we present UI-R1-Efficient, a two-stage training paradigm that reduces reasoning length while boosting overall performance. In addition, we construct a compact yet high-quality dataset containing 2K challenging tasks across five prevalent mobile device action types. Experiments show that our proposed models (e.g., UI-R1-3B) achieve substantial improvements over the base model (Qwen2.5-VL-3B) on both in-domain (ID) and out-of-domain (OOD) tasks, with average accuracy gains of 18.3% on ScreenSpot, 6.0% on ScreenSpot-Pro, and 10.9% on ANDROIDCONTROL. Moreover, our efficient versions deliver competitive performance compared to considerably larger state-of-the-art models, underscoring the potential of reinforcement learning to advance GUI control and paving the way for future research in Human-Computer Interaction (HCI).

JBHI Journal 2025 Journal Article

Design of a Multi-Parameter Fusion Sensor and System for Respiratory Monitoring of Mechanically Ventilated Patients in the ICU

  • Shuai Ren
  • Xiaohan Wang
  • Maolin Cai
  • Yan Shi
  • Tao Wang
  • Zujin Luo

In order to achieve precise respiratory therapy for mechanically ventilated patients, real-time monitoring of the state parameters of inhaled and exhaled gases is required. These parameters are primarily measured by ventilators, with limitations such as insufficient monitoring parameters, circuit leaks, and constraints imposed by distance and obstacles. This paper designs a low-power wireless sensor for multi-parameter monitoring near the patient, which can be used continuously for approximately 60 days. Based on this sensor, an intelligent respiratory monitoring system with a distributed architecture is proposed to achieve intelligent patient-ventilator asynchrony (PVA) perception. Experimental results show that the system can stably and accurately collect and transmit data, with measurement errors for pressure, flow, temperature, humidity, and CO $_{2}$ concentration being $\pm$ 1. 3%, $\pm$ 2. 1%, $\pm$ 0. 6 $^\circ$, $\pm$ 1% RH, $\pm$ 0. 3 mmHg respectively. The proposed sensor and system have the potential to enhance the efficiency and intelligence of medical care significantly.

JBHI Journal 2025 Journal Article

Improving Patient-Ventilator Synchrony During Pressure Support Ventilation Based on Reinforcement Learning Algorithm

  • Liming Hao
  • Xiaohan Wang
  • Shuai Ren
  • Yan Shi
  • Maolin Cai
  • Tao Wang
  • Zujin Luo

Mechanical ventilation is an effective treatment for critically ill patients and those with pulmonary diseases. However, patient-ventilator asynchrony (PVA) remains a significant challenge, potentially leading to high mortality. Improving patient-ventilator synchrony poses a complex decision-making problem in clinical practice. Traditional methods rely heavily on clinicians' experience, often resulting in inefficiencies, delayed ventilator adjustments, and resource shortages. This paper proposes a novel approach using a deep reinforcement learning (RL) algorithm based on deep Q-learning (DQN) to enhance patient-ventilator synchrony during pressure support ventilation. The action space and reward function are established from clinical experience, and a pneumatic model of the mechanical ventilation system is constructed to simulate various patient conditions and types of PVAs. Clinical data are used to evaluate the RL algorithm qualitatively and quantitatively. The RL-optimized ventilation strategy reduces the proportion of breaths containing PVAs from 37. 52% to 7. 08%, demonstrating its effectiveness in assisting clinical decision-making, improving synchrony, and enabling intelligent ventilator control, bedside monitoring, and automatic weaning.

TMLR Journal 2025 Journal Article

LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

  • Guangyi Liu
  • Pengxiang Zhao
  • Yaozhen Liang
  • Liang Liu
  • Yaxuan Guo
  • Han Xiao
  • Weifeng Lin
  • Yuxiang Chai

With the rapid rise of large language models (LLMs), phone automation has undergone transformative changes. This paper systematically reviews LLM-driven phone GUI agents, highlighting their evolution from script-based automation to intelligent, adaptive systems. We first contextualize key challenges, (i) limited generality, (ii) high maintenance overhead, and (iii) weak intent comprehension, and show how LLMs address these issues through advanced language understanding, multimodal perception, and robust decision-making. We then propose a taxonomy covering fundamental agent frameworks (single-agent, multi-agent, plan-then-act), modeling approaches (prompt engineering, training-based), and essential datasets and benchmarks. Furthermore, we detail task-specific architectures, supervised fine-tuning, and reinforcement learning strategies that bridge user intent and GUI operations. Finally, we discuss open challenges such as dataset diversity, on-device deployment efficiency, user-centric adaptation, and security concerns, offering forward-looking insights into this rapidly evolving field. By providing a structured overview and identifying pressing research gaps, this paper serves as a definitive reference for researchers and practitioners seeking to harness LLMs in designing scalable, user-friendly phone GUI agents. The collection of papers reviewed in this survey will be hosted and regularly updated on the GitHub repository: \url{https://github.com/PhoneLLM/Awesome-LLM-Powered-Phone-GUI-Agents}

NeurIPS Conference 2025 Conference Paper

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

  • Han Xiao
  • Guozhi Wang
  • Yuxiang Chai
  • Zimu Lu
  • Weifeng Lin
  • Hao He
  • Lue Fan
  • Liuyang Bian

In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently processes historical context and unifies action-level and task-level rewards. To support the training of UI-Genie-RM, we develop deliberately-designed data generation strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in dynamic environments. For training the model, we generate UI-Genie-RM-517k and UI-Genie-Agent-16k, establishing the first reward-specific dataset for GUI agents while demonstrating high-quality synthetic trajectory generation without manual annotation. Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks with three generations of data-model self-improvement. We open-source our complete framework implementation and generated datasets to facilitate further research in https: //github. com/Euphoria16/UI-Genie.