Arrow Research search

Author name cluster

Benjamin Schiffer

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

ICML Conference 2025 Conference Paper

Clone-Robust AI Alignment

  • Ariel D. Procaccia
  • Benjamin Schiffer
  • Shirley Zhang 0001

A key challenge in training Large Language Models (LLMs) is properly aligning them with human preferences. Reinforcement Learning with Human Feedback (RLHF) uses pairwise comparisons from human annotators to train reward functions and has emerged as a popular alignment method. However, input datasets in RLHF can be unbalanced due to adversarial manipulation or inadvertent repetition. Therefore, we want RLHF algorithms to perform well even when the set of alternatives is not uniformly distributed. Drawing on insights from social choice theory, we introduce robustness to approximate clones, a desirable property of RLHF algorithms which requires that adding near-duplicate alternatives does not significantly change the learned reward function. We first demonstrate that the standard RLHF algorithm based on regularized maximum likelihood estimation (MLE) fails to satisfy this property. We then propose the weighted MLE, a new RLHF algorithm that modifies the standard regularized MLE by weighting alternatives based on their similarity to other alternatives. This new algorithm guarantees robustness to approximate clones while preserving desirable theoretical properties.

AAAI Conference 2025 Conference Paper

Improved Regret Bounds for Online Fair Division with Bandit Learning

  • Benjamin Schiffer
  • Shirley Zhang

We study online fair division when there are a finite number of item types and the player values for the items are drawn randomly from distributions with unknown means. In this setting, a sequence of indivisible items arrives according to a random online process, and each item must be allocated to a single player. The goal is to maximize expected social welfare while maintaining that the allocation satisfies proportionality in expectation. When player values are normalized, we show that it is possible to with high probability guarantee proportionality constraint satisfaction and achieve O(√T) regret. To achieve this result, we present an upper confidence bound (UCB) algorithm that uses two rounds of linear optimization. This algorithm highlights fundamental aspects of proportionality constraints that allow for a UCB algorithm despite the presence of many (potentially tight) constraints. This result improves upon the previous best regret rate of O(T^(2/3)).

AAAI Conference 2025 Conference Paper

Multi-Apartment Rent Division

  • Ariel D. Procaccia
  • Benjamin Schiffer
  • Shirley Zhang

Rent division is the well-studied problem of fairly assigning rooms and dividing rent among a set of roommates within a single apartment. A shortcoming of existing solutions is that renters are assumed to be considering apartments in isolation, whereas in reality, renters can choose among multiple apartments. In this paper, we generalize the rent division problem to the multi-apartment setting, where the goal is to both fairly choose an apartment among a set of alternatives and fairly assign rooms and rents within the chosen apartment. Our main contribution is a generalization of envy-freeness called *negotiated envy-freeness*. We show that a solution satisfying negotiated envy-freeness is guaranteed to exist and that it is possible to optimize over all negotiated envy-free solutions in polynomial time. We also define an even stronger fairness notion called *universal envy-freeness* and study its existence when values are drawn randomly.

NeurIPS Conference 2024 Conference Paper

Honor Among Bandits: No-Regret Learning for Online Fair Division

  • Ariel D. Procaccia
  • Benjamin Schiffer
  • Shirley Zhang

We consider the problem of online fair division of indivisible goods to players when there are a finite number of types of goods and player values are drawn from distributions with unknown means. Our goal is to maximize social welfare subject to allocating the goods fairly in expectation. When a player's value for an item is unknown at the time of allocation, we show that this problem reduces to a variant of (stochastic) multi-armed bandits, where there exists an arm for each player's value for each type of good. At each time step, we choose a distribution over arms which determines how the next item is allocated. We consider two sets of fairness constraints for this problem: envy-freeness in expectation and proportionality in expectation. Our main result is the design of an explore-then-commit algorithm that achieves $\tilde{O}(T^{2/3})$ regret while maintaining either fairness constraint. This result relies on unique properties fundamental to fair-division constraints that allow faster rates of learning, despite the restricted action space.