Author name cluster

Jiayi Pan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

AAAI Conference 2026 Conference Paper

SpecDiff: Accelerating Diffusion Model Inference with Self-Speculation

Jiayi Pan
Jiaming Xu
Yongkang Zhou
Guohao Dai

Feature caching has recently emerged as a promising method for diffusion model acceleration. It effectively alleviates the inefficiency problem caused by high computational requirements by caching similar features in the inference process of the diffusion model. In this paper, we analyze existing feature caching methods from the perspective of information utilization, and point out that relying solely on historical information will lead to constrained accuracy and speed performance. And we propose a novel paradigm that introduces future information via self-speculation based on the information similarity at the same time step across different iteration times. Based on this paradigm, we present SpecDiff, a training-free multi-level feature caching strategy including a cached feature selection algorithm and a multi-level feature classification algorithm. (1) Feature selection algorithm based on self-speculative information. SpecDiff determines a dynamic importance score for each token based on self-speculative information and historical information, and performs cached feature selection through the importance score. (2) Multi-level feature classification algorithm based on feature importance scores.SpecDiff classifies tokens by leveraging the differences in feature importance scores and introduces a multi-level feature calculation strategy. Extensive experiments show that SpecDiff achieves average 2.80×, 2.74×, and 3.17× speedup with negligible quality loss in Stable Diffusion 3, 3.5, and FLUX compared to RFlow on NVIDIA A800-80GB GPU. By merging speculative and historical information, SpecDiff overcomes the speedup-accuracy trade-off bottleneck, pushing the Pareto frontier of speedup and accuracy in the efficient diffusion model inference.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Counterfactual Task-augmented Meta-learning for Cold-start Sequential Recommendation

Zhiqiang Wang
Jiayi Pan
Xingwang Zhao
Jianqing Liang
Chenjiao Feng
Kaixuan Yao

Cold-start sequential recommendation, where user interaction histories are sparse or minimal, remains a significant challenge in recommendation systems. Current meta-learning-based approaches rely heavily on the interaction histories of regular users to construct meta-tasks, aiming to acquire prior knowledge for cold-start adaptation. However, these methods often fail to account for preference discrepancies between regular and cold-start users, leading to biased preference modeling and suboptimal recommendations. To address this issue, we propose a novel counterfactual task-augmented meta-learning method for cold-start sequential recommendations. Our approach intervenes in user interaction histories to create counterfactual sequences that simulate potential but unrealized user behaviors, establishing counterfactual tasks within a meta-learning framework. Additionally, we aggregate meta-path neighbors to uncover latent relationships between items, enabling more detailed and accurate modeling of user preferences. Moreover, by integrating real and counterfactual task losses, we jointly optimize the model through a combination of global and local updates, enhancing its adaptability to cold-start scenarios. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art techniques, achieving superior results in cold-start sequential recommendation tasks.

PDF Details DOI

ICLR Conference 2025 Conference Paper

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Xingyao Wang 0002
Boxuan Li
Yufan Song
Frank F. Xu
Xiangru Tang
Mingchen Zhuge
Jiayi Pan
Yueqi Song

Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and effect change in their surrounding environments. In this paper, we introduce OpenHands, a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to a human developer: by writing code, interacting with a command line, and browsing the web. We describe how the platform allows for the implementation of new agents, utilization of various LLMs, safe interaction with sandboxed environments for code execution, and incorporation of evaluation benchmarks. Based on our currently incorporated benchmarks, we perform an evaluation of agents over 13 challenging tasks, including software engineering (e.g., SWE-Bench) and web browsing (e.g., WebArena), amongst others. Released under the permissive MIT license, OpenHands is a community project spanning academia and industry with more than 2K contributions from over 186 contributors in less than six months of development, and will improve going forward.

Details

ICML Conference 2025 Conference Paper

Training Software Engineering Agents and Verifiers with SWE-Gym

Jiayi Pan
Xingyao Wang 0002
Graham Neubig
Navdeep Jaitly
Heng Ji 0001
Alane Suhr
Yizhe Zhang 0002

We present SWE-Gym, the first environment for training real-world software engineering (SWE) agents. SWE-Gym contains 2, 438 real-world Python task instances, each comprising a codebase with an executable runtime environment, unit tests, and a task specified in natural language. We use SWE-Gym to train language model based SWE agents, achieving up to 19% absolute gains in resolve rate on the popular SWE-Bench Verified and Lite test sets. We also experiment with inference-time scaling through verifiers trained on agent trajectories sampled from SWE-Gym. When combined with our fine-tuned SWE agents, we achieve 32. 0% and 26. 0% on SWE-Bench Verified and Lite, respectively, reflecting a new state-of-the-art for open-weight SWE agents. To facilitate further research, we publicly release SWE-Gym, models, and agent trajectories.

Details

ICRA Conference 2024 Conference Paper

Bayesian-Guided Evolutionary Strategy with RRT for Multi-Robot Exploration

Shuge Wu
Chunzheng Wang
Jiayi Pan
Dongming Han
Zhongliang Zhao

With the increasing demand for multi-robot exploration of unknown environments, how to accomplish this problem efficiently has become a focus of research. However, in this kind of task, the formulation of strategies for frontier point detection and task allocation largely determines the overall efficiency of the system. In the task of multi-robot exploration of unknown environments, the strategies of frontier point detection and task assignment determine the overall efficiency of the system. Most of the existing methods implement frontier point detection based on the Rapidly-Exploring Random Tree (RRT) and use greedy algorithms for task allocation. However, the classical RRT algorithm is a fixed growth step, which leads to the difficulty of growing branches in narrow environments, making the efficiency and correctness of detecting frontier points lower. Meanwhile, the allocation strategy of the greedy algorithm causes each robot to consider only the exploration area with the largest gain for itself, which easily leads to repeated exploration and reduces the overall efficiency of the system. To solve these problems, we propose an adaptive RRT tree growth strategy for frontier point detection, which can adjust the step size according to the known map information and thus improve the efficiency and accuracy of detection; and introduce a Bayesian-guided evolutionary strategy(BGE) for efficient task allocation, which can utilize the current and historical information to find the optimal allocation scheme in a global perspective. We conduct a comprehensive test of the proposed strategy in the ROS system as well as in the real world, which proves the efficiency of our strategy. Our code is open-sourced and can be provided under request.

Details

NeurIPS Conference 2024 Conference Paper

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Hao Bai
Yifei Zhou
Mert Cemri
Jiayi Pan
Alane Suhr
Sergey Levine
Aviral Kumar

Pre-trained vision language models (VLMs), though powerful, typically lack training on decision-centric data, rendering them sub-optimal for decision-making tasks such as in-the-wild device control through Graphical User Interfaces (GUIs) when used off-the-shelf. While training with static demonstrations has shown some promise, we show that such methods fall short when controlling real GUIs due to their failure to deal with real world stochasticity and dynamism not captured in static observational data. This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents through fine-tuning a pre-trained VLM in two stages: offline and offline-to-online RL. We first build a scalable and parallelizable Android learning environment equipped with a VLM-based general-purpose evaluator and then identify the key design choices for simple and effective RL in this domain. We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild (AitW) dataset, where our 1. 5B VLM trained with RL achieves a 49. 5\% absolute improvement -- from 17. 7 to 67. 2\% success rate -- over supervised fine-tuning with static human demonstration data. It is worth noting that such improvement is achieved without any additional supervision or demonstration data. These results significantly surpass not only the prior best agents, including AppAgent with GPT-4V (8. 3\% success rate) and the 17B CogAgent trained with AitW data (14. 4\%), but also our implementation of prior best autonomous RL approach based on filtered behavior cloning (57. 8\%), thereby establishing a new state-of-the-art for digital agents for in-the-wild device control.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Yuexiang Zhai
Hao Bai
Zipeng Lin
Jiayi Pan
Shengbang Tong
Yifei Zhou
Alane Suhr
Saining Xie

Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios. However, this fine-tuning paradigm may not be able to efficiently learn optimal decision-making agents in multi-step goal-directed tasks from interactive environments. To address this challenge, we propose an algorithmic framework that fine-tunes VLMs with reinforcement learning (RL). Specifically, our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning, enabling the VLM to efficiently explore intermediate reasoning steps that lead to the final text-based action. Next, the open-ended text output is parsed into an executable action to interact with the environment to obtain goal-directed task rewards. Finally, our framework uses these task rewards to fine-tune the entire VLM with RL. Empirically, we demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks, enabling 7b models to outperform commercial models such as GPT4-V or Gemini. Furthermore, we find that CoT reasoning is a crucial component for performance improvement, as removing the CoT reasoning results in a significant decrease in the overall performance of our method.

PDF Details DOI

AILAW Journal 2008 Journal Article

Regulation retrieval using industry specific taxonomies

Chin Pang Cheng
Gloria T. Lau
Kincho H. Law
Jiayi Pan
Albert Jones

Abstract Increasingly, taxonomies are being developed and used by industry practitioners to facilitate information interoperability and retrieval. Within a single industrial domain, there exist many taxonomies that are intended for different applications. Industry specific taxonomies often represent the vocabularies that are commonly used by the practitioners. Their jobs are multi-faceted, which include checking for code and regulatory compliance. As such, it will be very desirable if industry practitioners are able to easily locate and browse regulations of interest. In practice, multiple sources of government regulations exist and they are often organized and classified by the needs of the issuing agencies that enforce them rather than the needs of the communities that use them. One way to bridge these two distinct needs is to develop methods and tools that enable practitioners to browse and retrieve government regulations using their own terms and vocabularies, for example, via existing industry taxonomies. The mapping from a single taxonomy to a single regulation is a trivial keyword matching task. We examine a relatedness analysis approach for mapping a single taxonomy to multiple regulations. We then present an approach for mapping multiple taxonomies to a single regulation by measuring the relatedness of concepts. Cosine similarity, Jaccard coefficient and market basket analysis are used to measure the semantic relatedness between concepts from two different taxonomies. Preliminary evaluations of the three relatedness analysis measures are performed using examples from the civil engineering and building industry. These examples illustrate the potential benefits of regulatory usage from the mapping between various taxonomies and regulations.

Details DOI