Author name cluster

Cyrus Cousins

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

AAAI Conference 2026 Conference Paper

Moral Change or Noise? On Problems of Aligning AI with Temporally Unstable Human Feedback

Vijay Keswani
Cyrus Cousins
Breanna Nguyen
Vincent Conitzer
Hoda Heidari
Jana Schaich Borg
Walter Sinnott-Armstrong

Alignment methods in moral domains seek to elicit moral preferences of human stakeholders and incorporate them into AI. This presupposes moral preferences as static targets, but such preferences often evolve over time. Proper alignment of AI to dynamic human preferences should ideally account for "legitimate" changes to moral reasoning, while ignoring changes related to attention deficits, cognitive biases, or other arbitrary factors. However, common AI alignment approaches largely neglect temporal changes in preferences, posing serious challenges to proper alignment, especially in high-stakes applications of AI, e.g., in healthcare domains, where misalignment can jeopardize the trustworthiness of the system and yield serious individual and societal harms. This work investigates the extent to which people's moral preferences change over time, and the impact of such changes on AI alignment. Our study is grounded in the kidney allocation domain, where we elicit responses to pairwise comparisons of hypothetical kidney transplant patients from over 400 participants across 3-5 sessions. We find that, on average, participants change their response to the same scenario presented at different times around 6-20% of the time (exhibiting "response instability"). Additionally, we observe significant shifts in several participants' retrofitted decision-making models over time (capturing "model instability"). Predictive performance of simple AI models decreases as a function of both response and model instability. Moreover, predictive performance diminishes over time, highlighting the importance of accounting for temporal changes in preferences during training. These findings raise fundamental normative and technical challenges relevant to AI alignment, highlighting the need to better understand the object of alignment (what to align to) when user preferences change significantly over time, including the mechanisms underlying these changes.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Fair and Welfare-Efficient Constrained Multi-Matchings under Uncertainty

Elita Lobo
Justin Payan
Cyrus Cousins
Yair Zick

We study fair allocation of constrained resources, where a market designer optimizes overall welfare while maintaining group fairness. In many large-scale settings, utilities are not known in advance, but are instead observed after realizing the allocation. We therefore estimate agent utilities using machine learning. Optimizing over estimates requires trading-off between mean utilities and their predictive variances. We discuss these trade-offs under two paradigms for preference modeling – in the stochastic optimization regime, the market designer has access to a probability distribution over utilities, and in the robust optimization regime they have access to an uncertainty set containing the true utilities with high probability. We discuss utilitarian and egalitarian welfare objectives, and we explore how to optimize for them under stochastic and robust paradigms. We demonstrate the efficacy of our approaches on three publicly available conference reviewer assignment datasets. The approaches presented enable scalable constrained resource allocation under uncertainty for many combinations of objectives and preference models.

PDF Details DOI

RLJ Journal 2024 Journal Article

On Welfare-Centric Fair Reinforcement Learning

Cyrus Cousins
Kavosh Asadi
Elita Lobo
Michael Littman

We propose a welfare-centric fair reinforcement-learning setting, in which an agent enjoys vector-valued reward from a set of beneficiaries. Given a welfare function W(·), the task is to select a policy π̂ that approximately optimizes the welfare of theirvalue functions from start state s0, i.e., π̂ ≈ argmaxπ W Vπ1 (s0 ), Vπ2 (s0 ),..., Vπg (s0 ). We find that welfare-optimal policies are stochastic and start-state dependent. Whether individual actions are mistakes depends on the policy, thus mistake bounds, regret analysis, and PAC-MDP learning do not readily generalize to our setting. We develop the adversarial-fair KWIK (Kwik-Af) learning model, wherein at each timestep, an agent either takes an exploration action or outputs an exploitation policy, such that the number of exploration actions is bounded and each exploitation policy is ε-welfare optimal. Finally, we reduce PAC-MDP to Kwik-Af, introduce the Equitable Explicit Explore Exploit (E4) learner, and show that it Kwik-Af learns.

PDF Details

RLC Conference 2024 Conference Paper

On Welfare-Centric Fair Reinforcement Learning

Cyrus Cousins
Kavosh Asadi
Elita Lobo
Michael Littman

We propose a welfare-centric fair reinforcement-learning setting, in which an agent enjoys vector-valued reward from a set of beneficiaries. Given a welfare function W(·), the task is to select a policy π̂ that approximately optimizes the welfare of theirvalue functions from start state s0, i. e. , π̂ ≈ argmaxπ W Vπ1 (s0 ), Vπ2 (s0 ), .. ., Vπg (s0 ). We find that welfare-optimal policies are stochastic and start-state dependent. Whether individual actions are mistakes depends on the policy, thus mistake bounds, regret analysis, and PAC-MDP learning do not readily generalize to our setting. We develop the adversarial-fair KWIK (Kwik-Af) learning model, wherein at each timestep, an agent either takes an exploration action or outputs an exploitation policy, such that the number of exploration actions is bounded and each exploitation policy is ε-welfare optimal. Finally, we reduce PAC-MDP to Kwik-Af, introduce the Equitable Explicit Explore Exploit (E4) learner, and show that it Kwik-Af learns.

PDF Details

AAMAS Conference 2023 Conference Paper

Learning Properties in Simulation-Based Games

Cyrus Cousins
Bhaskar Mishra
Enrique Areyan Viqueira
Amy Greenwald

Empirical game-theoretic analysis (EGTA) is primarily concerned with learning equilibria of simulation-based games. Recent statistical approaches have tackled this problem by first learning a uniform approximation of the game’s utilities, and then applying precision-recall theorems: i. e. , all equilibria of the true game are approximate equilibria in the estimated game, and vice-versa. In this work, we generalize this approach to all game properties that are well-behaved (i. e. , Lipschitz continuous in utilities), including regret (which defines Nash and correlated equilibria), adversarial values, power-mean welfare, and Gini social welfare. We show that, given a well-behaved welfare function, while optimal welfare is well-behaved, the welfare of optimal (i. e. , welfare-maximizing or minimizing) equilibria is not well behaved. We thus define a related property based on a Lagrangian relaxation of the equilibrium constraints that is well behaved. We call this property Λ-stable welfare. As determining the welfare of an optimal equilibrium is an essential step in computing the price of anarchy, we conclude with a discussion of an alternative, more stable notion of anarchy based on Λ-stable welfare, which we call the anarchy gap.

PDF

EWRL Workshop 2023 Workshop Paper

Percentile Criterion Optimization in Offline Reinforcement Learning

Cyrus Cousins
Elita Lobo
Marek Petrik
Yair Zick

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion. The percentile criterion is optimized by constructing an uncertainty set that contains the true model with high probability and optimizing the policy for the worst model in the set. Since the percentile criterion is non-convex, constructing uncertainty sets is often challenging. Existing works use Bayesian credible regions as uncertainty sets, but they are often unnecessarily large and result in learning overly conservative policies. To overcome these shortcomings, we propose a novel Value-at-Risk based dynamic programming algorithm to optimize the percentile criterion without explicitly constructing any uncertainty sets. Our theoretical and empirical results show that our algorithm implicitly constructs much smaller uncertainty sets and learns less conservative robust policies.

PDF

NeurIPS Conference 2023 Conference Paper

Percentile Criterion Optimization in Offline Reinforcement Learning

Cyrus Cousins
Elita Lobo
Marek Petrik
Yair Zick

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion. The percentile criterion is optimized by constructing an uncertainty set that contains the true model with high probability and optimizing the policy for the worst model in the set. Since the percentile criterion is non-convex, constructing these sets itself is challenging. Existing works use Bayesian credible regions as uncertainty sets, but they are often unnecessarily large and result in learning overly conservative policies. To overcome these shortcomings, we propose a novel Value-at-Risk based dynamic programming algorithm to optimize the percentile criterion without explicitly constructing any uncertainty sets. Our theoretical and empirical results show that our algorithm implicitly constructs much smaller uncertainty sets and learns less-conservative robust policies.

PDF Details

ICML Conference 2021 Conference Paper

Adversarial Multi Class Learning under Weak Supervision with Performance Guarantees

Alessio Mazzetto
Cyrus Cousins
Dylan Sam
Stephen H. Bach
Eli Upfal

We develop a rigorous approach for using a set of arbitrarily correlated weak supervision sources in order to solve a multiclass classification task when only a very small set of labeled data is available. Our learning algorithm provably converges to a model that has minimum empirical risk with respect to an adversarial choice over feasible labelings for a set of unlabeled data, where the feasibility of a labeling is computed through constraints defined by rigorously estimated statistics of the weak supervision sources. We show theoretical guarantees for this approach that depend on the information provided by the weak supervision sources. Notably, this method does not require the weak supervision sources to have the same labeling space as the multiclass classification task. We demonstrate the effectiveness of our approach with experiments on various image classification tasks.

Details

NeurIPS Conference 2021 Conference Paper

An Axiomatic Theory of Provably-Fair Welfare-Centric Machine Learning

Cyrus Cousins

We address an inherent difficulty in welfare-theoretic fair machine learning (ML), by proposing an equivalently-axiomatically justified alternative setting, and studying the resulting computational and statistical learning questions. Welfare metrics quantify overall wellbeing across a population of groups, and welfare-based objectives and constraints have recently been proposed to incentivize fair ML methods to satisfy their diverse needs. However, many ML problems are cast as loss minimization tasks, rather than utility maximization, and thus require nontrivial modeling to construct utility functions. We define a complementary metric, termed malfare, measuring overall societal harm, with axiomatic justification via the standard axioms of cardinal welfare, and cast fair ML as malfare minimization over the risk values (expected losses) of each group. Surprisingly, the axioms of cardinal welfare (malfare) dictate that this is not equivalent to simply defining utility as negative loss and maximizing welfare. Building upon these concepts, we define fair-PAC learning, where a fair-PAC learner is an algorithm that learns an ε-δ malfare-optimal model with bounded sample complexity, for any data distribution and (axiomatically justified) malfare concept. Finally, we show conditions under which many standard PAC-learners may be converted to fair-PAC learners, which places fair-PAC learning on firm theoretical ground, as it yields statistical — and in some cases computational — efficiency guarantees for many well-studied ML models. Fair-PAC learning is also practically relevant, as it democratizes fair ML by providing concrete training algorithms with rigorous generalization guarantees.

PDF Details

NeurIPS Conference 2021 Conference Paper

Fast Doubly-Adaptive MCMC to Estimate the Gibbs Partition Function with Weak Mixing Time Bounds

Shahrzad Haddadan
Yue Zhuang
Cyrus Cousins
Eli Upfal

We present a novel method for reducing the computational complexity of rigorously estimating the partition functions of Gibbs (or Boltzmann) distributions, which arise ubiquitously in probabilistic graphical models. A major obstacle to applying the Gibbs distribution in practice is the need to estimate their partition function (normalizing constant). The state of the art in addressing this problem is multi-stage algorithms which consist of a cooling schedule and a mean estimator in each step of the schedule. While the cooling schedule in these algorithms is adaptive, the mean estimate computations use MCMC as a black-box to draw approximately-independent samples. Here we develop a doubly adaptive approach, combining the adaptive cooling schedule with an adaptive MCMC mean estimator, whose number of Markov chain steps adapts dynamically to the underlying chain. Through rigorous theoretical analysis, we prove that our method outperforms the state of the art algorithms in several factors: (1) The computational complexity of our method is smaller; (2) Our method is less sensitive to loose bounds on mixing times, an inherent components in these algorithms; and (3) The improvement obtained by our method is particularly significant in the most challenging regime of high precision estimates. We demonstrate the advantage of our method in experiments run on classic factor graphs, such as voting models and Ising models.

PDF Details

AAMAS Conference 2021 Conference Paper

Learning Competitive Equilibria in Noisy Combinatorial Markets

Enrique Areyan Viqueira
Cyrus Cousins
Amy Greenwald

We present a methodology to robustly estimate the competitive equilibria (CE) of combinatorial markets under the assumption that buyers do not know their precise valuations for bundles of goods, but instead can only provide noisy estimates. We first show tight lower- and upper-bounds on the buyers’ utility loss, and hence the set of CE, given a uniform approximation of one market by another. We then present two probably-approximately-correct algorithms for learning CE with finite-sample guarantees. The first is a baseline and the second leverages a connection between the first welfare theorem of economics and uniform approximations to adaptively prune value queries when it is determined that they are provably not part of a CE. Extensive experimentation shows that pruning achieves better estimates than the baseline with far fewer samples.

PDF

NeurIPS Conference 2020 Conference Paper

Sharp uniform convergence bounds through empirical centralization

Cyrus Cousins
Matteo Riondato

We introduce the use of empirical centralization to derive novel practical, probabilistic, sample-dependent bounds to the Supremum Deviation (SD) of empirical means of functions in a family from their expectations. Our bounds have optimal dependence on the maximum (i. e. , wimpy) variance and the function ranges, and the same dependence on the number of samples as existing SD bounds. To compute the SD bounds in practice, we develop tightly-concentrated Monte Carlo estimators of the empirical Rademacher average of the empirically-centralized family, and we show novel concentration results for the empirical wimpy variance. Our experimental evaluation shows that our bounds greatly outperform non-centralized bounds and are extremely practical even at small sample sizes.

PDF Details

UAI Conference 2019 Conference Paper

Empirical Mechanism Design: Designing Mechanisms from Data

Enrique Areyan Viqueira
Cyrus Cousins
Yasser Mohammad
Amy Greenwald

We introduce a methodology for the design of parametric mechanisms, which are multiagent systems inhabited by strategic agents, with knobs that can be adjusted to achieve specific goals. We assume agents play approximate equilibria, which we estimate using the probably approximately correct learning framework. Under this assumption, we further learn approximately optimal mechanism parameters. We do this theoretically, assuming a finite design space, and heuristically, using Bayesian optimization (BO). Our BO algorithm incorporates the noise associated with modern concentration inequalities, such as Hoeffding’s, into the underlying Gaussian process. We show experimentally that our search techniques outperform standard baselines in a stylized but rich model of advertisement exchanges.

Details

AAMAS Conference 2019 Conference Paper

Learning Simulation-Based Games from Data

Enrique Areyan Viqueira
Amy Greenwald
Cyrus Cousins
Eli Upfal

We tackle a fundamental problem in empirical game-theoretic analysis (EGTA), that of learning equilibria of simulation-based games. Such games cannot be described in analytical form; instead, a blackbox simulator can be queried to obtain noisy samples of utilities. Our approach to EGTA is in the spirit of probably approximately correct learning. We design algorithms that learn empirical games, which uniformly approximate the utilities of simulation-based games from finitely many samples. Our methodology learns all the equilibria of simulation-based games, as opposed to a single one.

PDF