Arrow Research search

Author name cluster

David Zhu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

TMLR Journal 2025 Journal Article

Two-Step Offline Preference-Based Reinforcement Learning on Explicitly Constrained Policies

  • Yinglun Xu
  • Tarun Suresh
  • Rohan Gumaste
  • David Zhu
  • Ruirui Li
  • Zhengyang Wang
  • Haoming Jiang
  • Xianfeng Tang

Preference-based reinforcement learning (PBRL) in the offline setting has succeeded greatly in industrial applications such as chatbots. A two-step learning framework that learns a reward model from an offline dataset first and then optimizes a policy over the learned reward model through online reinforcement learning has been widely adopted. However, such a method faces challenges from the risk of reward hacking and the complexity of reinforcement learning. To overcome the challenge, our insight is that both challenges come from the state-actions not supported in the dataset. Such state actions are unreliable and increase the complexity of the reinforcement learning problem. Based on the insight, we develop a novel two-step learning method called PRC: preference-based reinforcement learning on explicitly constrained policies. The high-level idea is to limit the reinforcement learning agent to optimize over policies supported on an explicitly constrained action space that excludes the out-of-distribution state-actions. We empirically verify that our method has high learning efficiency on various datasets in robotic control environments.

NeurIPS Conference 2022 Conference Paper

Robust Rent Division

  • Dominik Peters
  • Ariel D. Procaccia
  • David Zhu

In fair rent division, the problem is to assign rooms to roommates and fairly split the rent based on roommates' reported valuations for the rooms. Envy-free rent division is the most popular application on the fair division website Spliddit. The standard model assumes that agents can correctly report their valuations for each room. In practice, agents may be unsure about their valuations, for example because they have had only limited time to inspect the rooms. Our goal is to find a robust rent division that remains fair even if agent valuations are slightly different from the reported ones. We introduce the lexislack solution, which selects a rent division that remains envy-free for valuations within as large a radius as possible of the reported valuations. We also consider robustness notions for valuations that come from a probability distribution, and use results from learning theory to show how we can find rent divisions that (almost) maximize the probability of being envy-free, or that minimize the expected envy. We show that an almost optimal allocation can be identified based on polynomially many samples from the valuation distribution. Finding the best allocation given these samples is NP-hard, but in practice such an allocation can be found using integer linear programming.

ICRA Conference 1991 Conference Paper

Pipe routing-path planning (with many constraints)

  • David Zhu
  • Jean-Claude Latombe

An implemented system for designing pipe layouts automatically using robot path planning techniques is described. The authors introduce a new approach to pipe layout design automation in which pipe routes are treated as paths left behind by rigid objects (robots). They have implemented this approach in a basic pipe router, which is also described, and have extended this router in order to make it capable of treating a variety of other constraints which are typical of practical pipe layout design problems. These constraints relate to the process carried out in the pipes, to the design of their mechanical support, and to the constructability and the ease of operation and maintenance of the designed pipe systems. >

ICRA Conference 1990 Conference Paper

Constraint reformulation in a hierarchical path planner

  • David Zhu
  • Jean-Claude Latombe

Robot path planning consists of approximating the set of free configurations of the robot by a collection of rectangloid cells at successive levels of approximation and, at each level, searching the graph representing the adjacency relation among these cells. A set of algorithms which specifically address the main two computational issues underlying this approach, cell decomposition and graph searching, has been developed. The authors have implemented a planner which incorporates these algorithms and have experimented with it on examples. The experiments show that the planner is significantly faster than previous planners based on the same general approach. The authors focus on the planner's cell-decomposition algorithms. They use a constraint reformulation technique that consists of approximating the obstacles intersecting the cell to be decomposed by a collection of rectangloids and computing their complements in the cell. Two types of approximations are used, bounding and bounded approximations. This technique produces much less cells than previously proposed techniques, resulting in smaller search graphs. Experimental results are presented. >