Author name cluster

Peng Yan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

JAIR Journal 2026 Journal Article

A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions

Pascal J. Sager
Benjamin Meyer
Peng Yan
Rebekka von Wartburg-Kottler
Layan Etaiwi
Aref Enayati
Gabriel Nobel
Ahmed Abdulkadir

Background: Agents for computer use (ACUs) are systems that execute complex tasks on digital devices – such as personal computers or mobile phones – given instructions in natural language. These agents automate tasks by controlling software through low-level actions like mouse clicks and touchscreen gestures. However, despite rapid progress, ACUs are not yet mature for everyday use. Objectives: This survey examines the current state-of-the-art, identifies trends, and points out research gaps in the development of practical ACUs. The goal is to provide a comprehensive review and analysis that helps advance general-purpose, robust, and scalable agents for real-world computer use. Methods: We introduce a multifaceted taxonomy of ACUs across three dimensions: (I) the domain perspective, characterizing the contexts in which agents operate; (II) the interaction perspective, describing observation modalities (e.g., screenshots, HTML) and action modalities (e.g., mouse, keyboard, code execution); and (III) the agent perspective, detailing how agents perceive, reason, and learn. We review 87 original research papers about ACUs and 33 relevant datasets, covering both foundation model-based and specialized approaches. Results: Our taxonomy comprehensively structures state-of-the-art approaches and establishes the groundwork for guiding future ACU research. We found that the field is transitioning from specialized agents toward foundation-model-based agents, a shift from text to image-based observation space, and an increasing adoption of behavior cloning methodologies. Furthermore, we identify six key research gaps: insufficient generalization, inefficient learning, limited planning, low task complexity in benchmarks, non-standardized evaluation, and a disconnect between research and practical conditions. Conclusions: To continue rapid improvements in the field, we recommend focusing on: (a) vision-based observations and low-level control to enhance generalization; (b) adaptive learning beyond static prompting; (c) effective planning and reasoning capabilities; (d) realistic, high-complexity benchmarks; (e) standardized evaluation criteria based on task success; and (f) aligning agent design with real-world deployment constraints. Collectively, our findings and proposed directions help develop more general-purpose agents for everyday digital tasks.

PDF Details DOI

EAAI Journal 2026 Journal Article

A deep learning model for photovoltaic soiling loss prediction and estimation based on Large Kernel Cross-Attention Fusion

Shaokai Zheng
Peng Yan
Shengsu Ni
Daolei Wang

Details DOI

AAAI Conference 2026 Conference Paper

Multi-Aspect Cross-modal Quantization for Generative Recommendation

Fuwei Zhang
Xiaoyu Liu
Dongbo Xi
Jishen Yin
Huan Chen
Peng Yan
Fuzhen Zhuang
Zhao Zhang

Generative Recommendation (GR) has emerged as a new paradigm in recommender systems. This approach relies on quantized representations to discretize item features, modeling users’ historical interactions as sequences of discrete tokens. Based on these tokenized sequences, GR predicts the next item by employing next-token prediction methods. The challenges of GR lie in constructing high-quality semantic identifiers (IDs) that are hierarchically organized, minimally conflicting, and conducive to effective generative model training. However, current approaches remain limited in their ability to harness multimodal information and to capture the deep and intricate interactions among diverse modalities, both of which are essential for learning high-quality semantic IDs and for effectively training GR models. To address this, we propose Multi-Aspect Cross-modal quantization for generative Recommendation (MACRec), which introduces multimodal information and incorporates it into both semantic ID learning and generative model training from different aspects. Specifically, we first introduce cross-modal quantization during the ID learning process, which effectively reduces conflict rates and thus improves codebook usability through the complementary integration of multimodal information. In addition, to further enhance the generative ability of our GR model, we incorporate multi-aspect cross-modal alignments, including the implicit and explicit alignments. Finally, we conduct extensive experiments on three well-known recommendation datasets to demonstrate the effectiveness of our proposed method.

PDF Details DOI

IROS Conference 2025 Conference Paper

An Inflatable Deployable Origami Grasper for Adaptive and High-Load Grasping

Peng Yan
Guang Liang
Sen Wang
Hailin Huang
Wei Wang
Xu Li
Bing Li

Robotic graspers are essential for enhancing the efficiency and versatility of robots in grasping tasks. In this paper, we propose a novel inflatable deployable origami grasper with a rigid-flexible coupling structure. The proposed grasper can achieve multiple deployment configurations under a single pneumatic actuation, enabling both deployment and grasping operations while also allowing for passive self-folding during deflation. The design and fabrication of the grasper are presented. Then, the stiffness model for the inflatable deployable origami unit is developed based on the equivalent truss method. Experimental results show that the grasper successfully grasps objects of various shapes and sizes in both enveloping and fingertip grasping modes, using either two or four fingers. With its simple mechanical system and high deploy/fold ratio, the proposed grasper holds significant potential for applications in industrial automation and space exploration.

Details

AAAI Conference 2025 Conference Paper

Self-Evolutionary Large Language Models Through Uncertainty-Enhanced Preference Optimization

Jianing Wang
Yang Zhou
Xiaocheng Zhang
Mengjiao Bao
Peng Yan

Iterative preference optimization has recently become one of the de-facto training paradigms for large language models (LLMs), but the performance is still underwhelming due to too much noisy preference data yielded in the loop. To combat this issue, we present an Uncertainty-enhanced Preference Optimization (UPO) framework to make the LLM self-evolve with reliable feedback. The key idea is mitigating the noisy preference pairs derived from the current policy and reward models by performing pair-wise uncertainty estimation and judiciously reliable feedback sampling. To reach this goal, we thus introduce an estimator model, which incorporates Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation for the batch of preference pairs. Compared to the existing methods that directly filter generated responses based on the reward score, the estimator focuses on the model uncertainty in a pair-wise manner and effectively bypasses the confirmation bias problem of the reward model. Additionally, we also propose an uncertainty-enhanced self-evolution algorithm to better improve the LLM robustly align with these reliable feedback data. Extensive experiments over multiple benchmarks demonstrate our framework substantially improves the performance of iterative preference optimization.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Dual Personalization on Federated Recommendation

Chunxu Zhang
Guodong Long
Tianyi Zhou
Peng Yan
Zijian Zhang
Chengqi Zhang
Bo Yang

Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of recommender systems in federated settings. The code is available.

PDF Details DOI