Anand Krishna Papers

AAAI Conference 2025 Conference Paper

p-Mean Regret for Stochastic Bandits

Anand Krishna
Philips George John
Adarsh Barik
Vincent Y. F. Tan

In this work, we extend the concept of the p-mean welfare objective from social choice theory to study p-mean regret in stochastic multi-armed bandit problems. The p-mean regret, defined as the difference between the optimal mean among the arms and the p-mean of the expected rewards, offers a flexible framework for evaluating bandit algorithms, enabling algorithm designers to balance fairness and efficiency by adjusting the parameter p. Our framework encompasses both average cumulative regret and Nash regret as special cases. We introduce a simple, unified UCB-based algorithm (Explore-Then-UCB) that achieves novel p-mean regret bounds. Our algorithm consists of two phases: a carefully calibrated uniform exploration phase to initialize sample means, followed by the UCB1 algorithm of Auer et al. (2002). Under mild assumptions, we prove that our algorithm achieves a p-mean regret bound of Otilde( sqrt( k / T^{1/(2|p|)} ) ) for all p <= -1, where k represents the number of arms and T the time horizon. When -1< p < 0, we achieve a regret bound of Otilde( sqrt( k^{1.5} / T^{1/2} ) ). For the range 0 < p <= 1, we achieve a p-mean regret scaling as Otilde( sqrt( k / T ) ), which matches the previously established lower bound up to logarithmic factors. This result stems from the fact that the p-mean regret of any algorithm is at least its average cumulative regret for p <= 1. In the case of Nash regret (the limit as p approaches zero), our unified approach differs from prior work of Barman et al. (2023), which requires a new Nash Confidence Bound algorithm. Notably, we achieve the same regret bound up to constant factors using our more general method.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Achieving Envy-Freeness with Limited Subsidies under Dichotomous Valuations

Siddharth Barman
Anand Krishna
Yadati Narahari
Soumyarup Sadhukhan

We study the problem of allocating indivisible goods among agents in a fair manner. While envy-free allocations of indivisible goods are not guaranteed to exist, envy-freeness can be achieved by additionally providing some subsidy to the agents. These subsidies can be alternatively viewed as a divisible good (money) that is fractionally assigned among the agents to realize an envy-free outcome. In this setup, we bound the subsidy required to attain envy-freeness among agents with dichotomous valuations, i. e. , among agents whose marginal value for any good is either zero or one. We prove that, under dichotomous valuations, there exists an allocation that achieves envy-freeness with a per-agent subsidy of either 0 or 1. Furthermore, such an envy-free solution can be computed efficiently in the standard value-oracle model. Notably, our results hold for general dichotomous valuations and, in particular, do not require the (dichotomous) valuations to be additive, submodular, or even subadditive. Also, our subsidy bounds are tight and provide a linear (in the number of agents) factor improvement over the bounds known for general monotone valuations.

PDF Details DOI

Possible papers

p-Mean Regret for Stochastic Bandits

Achieving Envy-Freeness with Limited Subsidies under Dichotomous Valuations