Author name cluster

Yanjun Han

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits

Yuzhou Gu
Yanjun Han
Jian Qian

We study the evolution of information in interactive decision making through the lens of a stochastic multi-armed bandit problem. Focusing on a fundamental example where a unique optimal arm outperforms the rest by a fixed margin, we characterize the optimal success probability and mutual information over time. Our findings reveal distinct growth phases in mutual information---initially linear, transitioning to quadratic, and finally returning to linear---highlighting curious behavioral differences between interactive and non-interactive environments. In particular, we show that optimal success probability and mutual information can be decoupled, where achieving optimal learning does not necessarily require maximizing information gain. These findings shed new light on the intricate interplay between information and learning in interactive decision making.

PDF Details

NeurIPS Conference 2024 Conference Paper

Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability

Fan Chen
Dylan J. Foster
Yanjun Han
Jian Qian
Alexander Rakhlin
Yunbei Xu

We develop a unifying framework for information-theoretic lower bound in statistical estimation and interactive decision making. Classical lower bound techniques---such as Fano's method, Le Cam's method, and Assouad's lemma---are central to the study of minimax risk in statistical estimation, yet are insufficient to provide tight lower bounds for \emph{interactive decision making} algorithms that collect data interactively (e. g. , algorithms for bandits and reinforcement learning). Recent work of Foster et al. provides minimax lower bounds for interactive decision making using seemingly different analysis techniques from the classical methods. These results---which are proven using a complexity measure known as the \emph{Decision-Estimation Coefficient} (DEC)---capture difficulties unique to interactive learning, yet do not recover the tightest known lower bounds for passive estimation. We propose a unified view of these distinct methodologies through a new lower bound approach called \emph{interactive Fano method}. As an application, we introduce a novel complexity measure, the \emph{Fractional Covering Number}, which facilitates the new lower bounds for interactive decision making that extend the DEC methodology by incorporating the complexity of estimation. Using the fractional covering number, we (i) provide a unified characterization of learnability for \emph{any} stochastic bandit problem, (ii) close the remaining gap between the upper and lower bounds in Foster et al. (up to polynomial factors) for any interactive decision making problem in which the underlying model class is convex.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Online Estimation via Offline Estimation: An Information-Theoretic Framework

Dylan J. Foster
Yanjun Han
Jian Qian
Alexander Rakhlin

The classical theory of statistical estimation aims to estimate a parameter of interest under data generated from a fixed design (''offline estimation''), while the contemporary theory of online learning provides algorithms for estimation under adaptively chosen covariates (''online estimation''). Motivated by connections between estimation and interactive decision making, we ask: is it possible to convert offline estimation algorithms into online estimation algorithms in a black-box fashion? We investigate this question from an information-theoretic perspective by introducing a new framework, Oracle-Efficient Online Estimation (OEOE), where the learner can only interact with the data stream indirectly through a sequence of offline estimators produced by a black-box algorithm operating on the stream. Our main results settle the statistical and computational complexity of online estimation in this framework. $\bullet$ Statistical complexity. We show that information-theoretically, there exist algorithms that achieve near-optimal online estimation error via black-box offline estimation oracles, and give a nearly-tight characterization for minimax rates in the OEOE framework. $\bullet$ Computational complexity. We show that the guarantees above cannot be achieved in a computationally efficient fashion in general, but give a refined characterization for the special case of conditional density estimation: computationally efficient online estimation via black-box offline estimation is possible whenever it is possible via unrestricted algorithms. Finally, we apply our results to give offline oracle-efficient algorithms for interactive decision making.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Stochastic contextual bandits with graph feedback: from independence number to MAS number

Yuxiao Wen
Yanjun Han
Zhengyuan Zhou

We consider contextual bandits with graph feedback, a class of interactive learning problems with richer structures than vanilla contextual bandits, where taking an action reveals the rewards for all neighboring actions in the feedback graph under all contexts. Unlike the multi-armed bandits setting where a growing literature has painted a near-complete understanding of graph feedback, much remains unexplored in the contextual bandits counterpart. In this paper, we make inroads into this inquiry by establishing a regret lower bound $\Omega(\sqrt{\beta_M(G) T})$, where $M$ is the number of contexts, $G$ is the feedback graph, and $\beta_M(G)$ is our proposed graph-theoretic quantity that characterizes the fundamental learning limit for this class of problems. Interestingly, $\beta_M(G)$ interpolates between $\alpha(G)$ (the independence number of the graph) and $\mathsf{m}(G)$ (the maximum acyclic subgraph (MAS) number of the graph) as the number of contexts $M$ varies. We also provide algorithms that achieve near-optimal regret for important classes of context sequences and/or feedback graphs, such as transitively closed graphs that find applications in auctions and inventory control. In particular, with many contexts, our results show that the MAS number essentially characterizes the statistical complexity for contextual bandits, as opposed to the independence number in multi-armed bandits.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Learning and Collusion in Multi-unit Auctions

Simina Branzei
Mahsa Derakhshan
Negin Golrezaei
Yanjun Han

In a carbon auction, licenses for CO2 emissions are allocated among multiple interested players. Inspired by this setting, we consider repeated multi-unit auctions with uniform pricing, which are widely used in practice. Our contribution is to analyze these auctions in both the offline and online settings, by designing efficient bidding algorithms with low regret and giving regret lower bounds. We also analyze the quality of the equilibria in two main variants of the auction, finding that one variant is susceptible to collusion among the bidders while the other is not.

PDF Details

NeurIPS Conference 2022 Conference Paper

Beyond the Best: Distribution Functional Estimation in Infinite-Armed Bandits

Yifei Wang
Tavor Baharav
Yanjun Han
Jiantao Jiao
David Tse

In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution, and each arm can be sampled further to obtain noisy estimates of the average reward of that arm. Prior work focuses on the best arm, i. e. estimating the maximum of the average reward distribution. We consider a general class of distribution functionals beyond the maximum and obtain optimal sample complexities in both offline and online settings. We show that online estimation, where the learner can sequentially choose whether to sample a new or existing arm, offers no advantage over the offline setting for estimating the mean functional, but significantly reduces the sample complexity for other functionals such as the median, maximum, and trimmed mean. We propose unified meta algorithms for the online and offline settings and derive matching lower bounds using different Wasserstein distances. For the special case of median estimation, we identify a curious thresholding phenomenon on the indistinguishability between Gaussian convolutions with respect to the noise level, which may be of independent interest.

PDF Details

NeurIPS Conference 2022 Conference Paper

Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions

Wei Zhang
Yanjun Han
Zhengyuan Zhou
Aaron Flores
Tsachy Weissman

With the advent and increasing consolidation of e-commerce, digital advertising has very recently replaced traditional advertising as the main marketing force in the economy. In the past four years, a particularly important development in the digital advertising industry is the shift from second-price auctions to first-price auctions for online display ads. This shift immediately motivated the intellectually challenging question of how to bid in first-price auctions, because unlike in second-price auctions, bidding one's private value truthfully is no longer optimal. Following a series of recent works in this area, we consider a differentiated setup: we do not make any assumption about other bidders' maximum bid (i. e. it can be adversarial over time), and instead assume that we have access to a hint that serves as a prediction of other bidders' maximum bid, where the prediction is learned through some blackbox machine learning model. We consider two types of hints: one where a single point-prediction is available, and the other where a hint interval (representing a type of confidence region into which others' maximum bid falls) is available. We establish minimax optimal regret bounds for both cases and highlight the quantitatively different behavior between the two settings. We also provide improved regret bounds when the others' maximum bid exhibits the further structure of sparsity. Finally, we complement the theoretical results with demonstrations using real bidding data.

PDF Details

NeurIPS Conference 2022 Conference Paper

Oracle-Efficient Online Learning for Smoothed Adversaries

Nika Haghtalab
Yanjun Han
Abhishek Shetty
Kunhe Yang

We study the design of computationally efficient online learning algorithms under smoothed analysis. In this setting, at every step, an adversary generates a sample from an adaptively chosen distribution whose density is upper bounded by $1/\sigma$ times the uniform density. Given access to an offline optimization (ERM) oracle, we give the first computationally efficient online algorithms whose sublinear regret depends only on the pseudo/VC dimension $d$ of the class and the smoothness parameter $\sigma$. In particular, we achieve \emph{oracle-efficient} regret bounds of $ O ( \sqrt{T d\sigma^{-1}} ) $ for learning real-valued functions and $ O ( \sqrt{T d\sigma^{-\frac{1}{2}} } )$ for learning binary-valued functions. Our results establish that online learning is computationally as easy as offline learning, under the smoothed analysis framework. This contrasts the computational separation between online learning with worst-case adversaries and offline learning established by [HK16]. Our algorithms also achieve improved bounds for some settings with binary-valued functions and worst-case adversaries. These include an oracle-efficient algorithm with $O ( \sqrt{T(d |\mathcal{X}|)^{1/2} })$ regret that refines the earlier $O ( \sqrt{T|\mathcal{X}|})$ bound of [DS16] for finite domains, and an oracle-efficient algorithm with $O(T^{3/4} d^{1/2})$ regret for the transductive setting.

PDF Details

ICML Conference 2021 Conference Paper

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

Yanjun Han
Yining Wang
Xi Chen

In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. {The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback. } We show that, with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $\widetilde\Theta_{d}(\sqrt{N^d T})$ if the reward function is a $d$-degree polynomial with $d< K$, and $\Theta_K(\sqrt{N^K T})$ if the reward function is not a low-degree polynomial. {Both bounds are significantly different from the bound $O(\sqrt{\mathrm{poly}(N, K)T})$ for the linear case, which suggests that there is a fundamental gap between the linear and non-linear reward structures. } Our result also finds applications to adversarial assortment optimization problem in online recommendation. We show that in the worst-case of adversarial assortment problem, the optimal algorithm must treat each individual $\binom{N}{K}$ assortment as independent.

Details

SODA Conference 2021 Conference Paper

On the Competitive Analysis and High Accuracy Optimality of Profile Maximum Likelihood

Yanjun Han
Kirankumar Shiragur

A striking result of Acharya et al. [ADOS17] showed that to estimate symmetric properties of discrete distributions, plugging in the distribution that maximizes the likelihood of observed multiset of frequencies, also known as the profile maximum likelihood (PML) distribution, is competitive compared with any estimators regardless of the symmetric property. Specifically, given n observations from the discrete distribution, if some estimator incurs an error ∊ with probability at most δ, then plugging in the PML distribution incurs an error 2 ∊ with probability at most. In this paper, we strengthen the above result and show that using a careful chaining argument, the error probability can be reduced to δ 1 – c · exp( c′n 1/3 + c ) for arbitrarily small constants c > 0 and some constant c′ > 0. The improved competitive analysis leads to the optimality of the PML plug-in approach for estimating various symmetric properties within higher accuracy ∊ ≫ n –1/3. In particular, we show that the PML distribution is an optimal estimator of the sorted distribution: it is ∊ -close in sorted ℓ 1 distance to the true distribution with support size k for any n = Ω( k/ ( ∊ 2 log k )) and ∊ ≫ n –1/3, which are the information-theoretically optimal sample complexity and the largest error regime where the classical empirical distribution is sub-optimal, respectively. In order to strengthen the analysis of the PML, a key ingredient is to employ novel “continuity” properties of the PML distributions and construct a chain of suitable quantized PMLs, or “coverings”. We also construct a novel approximation-based estimator for the sorted distribution with a near-optimal concentration property without any sample splitting, where as a byproduct we obtain better trade-offs between the polynomial approximation error and the maximum magnitude of coefficients in the Poisson approximation of 1-Lipschitz functions.

Details

NeurIPS Conference 2021 Conference Paper

On the Value of Interaction and Function Approximation in Imitation Learning

Nived Rajaraman
Yanjun Han
Lin Yang
Jingbo Liu
Jiantao Jiao
Kannan Ramchandran

We study the statistical guarantees for the Imitation Learning (IL) problem in episodic MDPs. Rajaraman et al. (2020) show an information theoretic lower bound that in the worst case, a learner which can even actively query the expert policy suffers from a suboptimality growing quadratically in the length of the horizon, $H$. We study imitation learning under the $\mu$-recoverability assumption of Ross et al. (2011) which assumes that the difference in the $Q$-value under the expert policy across different actions in a state do not deviate beyond $\mu$ from the maximum. We show that the reduction proposed by Ross et al. (2010) is statistically optimal: the resulting algorithm upon interacting with the MDP for $N$ episodes results in a suboptimality bound of $\widetilde{\mathcal{O}} \left( \mu |\mathcal{S}| H / N \right)$ which we show is optimal up to log-factors. In contrast, we show that any algorithm which does not interact with the MDP and uses an offline dataset of $N$ expert trajectories must incur suboptimality growing as $\gtrsim |\mathcal{S}| H^2/N$ even under the $\mu$-recoverability assumption. This establishes a clear and provable separation of the minimax rates between the active setting and the no-interaction setting. We also study IL with linear function approximation. When the expert plays actions according to a linear classifier of known state-action features, we use the reduction to multi-class classification to show that with high probability, the suboptimality of behavior cloning is $\widetilde{O}(dH^2/N)$ given $N$ rollouts from the optimal policy. This is optimal up to log-factors but can be improved to $\widetilde{O}(dH/N)$ if we have a linear expert with parameter-sharing across time steps. In contrast, when the MDP transition structure is known to the learner such as in the case of simulators, we demonstrate fundamental differences compared to the tabular setting in terms of the performance of an optimal algorithm, Mimic-MD (Rajaraman et al. (2020)) when extended to the function approximation setting. Here, we introduce a new problem called confidence set linear classification, that can be used to construct sample-efficient IL algorithms.

PDF Details

NeurIPS Conference 2021 Conference Paper

Optimal prediction of Markov chains with and without spectral gap

Yanjun Han
Soham Jana
Yihong Wu

We study the following learning problem with dependent data: Given a trajectory of length $n$ from a stationary Markov chain with $k$ states, the goal is to predict the distribution of the next state. For $3 \leq k \leq O(\sqrt{n})$, the optimal prediction risk in the Kullback-Leibler divergence is shown to be $\Theta(\frac{k^2}{n}\log \frac{n}{k^2})$, in contrast to the optimal rate of $\Theta(\frac{\log \log n}{n})$ for $k=2$ previously shown in Falahatgar et al in 2016. These nonparametric rates can be attributed to the memory in the data, as the spectral gap of the Markov chain can be arbitrarily small. To quantify the memory effect, we study irreducible reversible chains with a prescribed spectral gap. In addition to characterizing the optimal prediction risk for two states, we show that, as long as the spectral gap is not excessively small, the prediction risk in the Markov model is $O(\frac{k^2}{n})$, which coincides with that of an iid model with the same number of parameters.

PDF Details

JMLR Journal 2020 Journal Article

Lower Bounds for Learning Distributions under Communication Constraints via Fisher Information

Leighton Pate Barnes
Yanjun Han
Ayfer Ozgur

We consider the problem of learning high-dimensional, nonparametric and structured (e.g., Gaussian) distributions in distributed networks, where each node in the network observes an independent sample from the underlying distribution and can use $k$ bits to communicate its sample to a central processor. We consider three different models for communication. Under the independent model, each node communicates its sample to a central processor by independently encoding it into $k$ bits. Under the more general sequential or blackboard communication models, nodes can share information interactively but each node is restricted to write at most $k$ bits on the final transcript. We characterize the impact of the communication constraint $k$ on the minimax risk of estimating the underlying distribution under $\ell^2$ loss. We develop minimax lower bounds that apply in a unified way to many common statistical models and reveal that the impact of the communication constraint can be qualitatively different depending on the tail behavior of the score function associated with each model. A key ingredient in our proofs is a geometric characterization of Fisher information from quantized samples. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

PDF Details

NeurIPS Conference 2020 Conference Paper

Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects

Zijun Gao
Yanjun Han

A central goal of causal inference is to detect and estimate the treatment effects of a given treatment or intervention on an outcome variable of interest, where a member known as the heterogeneous treatment effect (HTE) is of growing popularity in recent practical applications such as the personalized medicine. In this paper, we model the HTE as a smooth nonparametric difference between two less smooth baseline functions, and determine the tight statistical limits of the nonparametric HTE estimation as a function of the covariate geometry. In particular, a two-stage nearest-neighbor-based estimator throwing away observations with poor matching quality is near minimax optimal. We also establish the tight dependence on the density ratio without the usual assumption that the covariate densities are bounded away from zero, where a key step is to employ a novel maximal inequality which could be of independent interest.

PDF Details

NeurIPS Conference 2019 Conference Paper

Batched Multi-armed Bandits Problem

Zijun Gao
Yanjun Han
Zhimei Ren
Zhengqing Zhou

In this paper, we study the multi-armed bandit problem in the batched setting where the employed policy must split data into a small number of batches. While the minimax regret for the two-armed stochastic bandits has been completely characterized in \cite{perchet2016batched}, the effect of the number of arms on the regret for the multi-armed case is still open. Moreover, the question whether adaptively chosen batch sizes will help to reduce the regret also remains underexplored. In this paper, we propose the BaSE (batched successive elimination) policy to achieve the rate-optimal regrets (within logarithmic factors) for batched multi-armed bandits, with matching lower bounds even if the batch sizes are determined in an adaptive manner.

PDF Details

NeurIPS Conference 2018 Conference Paper

Entropy Rate Estimation for Markov Chains with Large State Space

Yanjun Han
Jiantao Jiao
Chuan-Zheng Lee
Tsachy Weissman
Yihong Wu
Tiancheng Yu

Entropy estimation is one of the prototypical problems in distribution property testing. To consistently estimate the Shannon entropy of a distribution on $S$ elements with independent samples, the optimal sample complexity scales sublinearly with $S$ as $\Theta(\frac{S}{\log S})$ as shown by Valiant and Valiant \cite{Valiant--Valiant2011}. Extending the theory and algorithms for entropy estimation to dependent data, this paper considers the problem of estimating the entropy rate of a stationary reversible Markov chain with $S$ states from a sample path of $n$ observations. We show that \begin{itemize} \item Provided the Markov chain mixes not too slowly, \textit{i. e. }, the relaxation time is at most $O(\frac{S}{\ln^3 S})$, consistent estimation is achievable when $n \gg \frac{S^2}{\log S}$. \item Provided the Markov chain has some slight dependency, \textit{i. e. }, the relaxation time is at least $1+\Omega(\frac{\ln^2 S}{\sqrt{S}})$, consistent estimation is impossible when $n \lesssim \frac{S^2}{\log S}$. \end{itemize} Under both assumptions, the optimal estimation accuracy is shown to be $\Theta(\frac{S^2}{n \log S})$. In comparison, the empirical entropy rate requires at least $\Omega(S^2)$ samples to be consistent, even when the Markov chain is memoryless. In addition to synthetic experiments, we also apply the estimators that achieve the optimal sample complexity to estimate the entropy rate of the English language in the Penn Treebank and the Google One Billion Words corpora, which provides a natural benchmark for language modeling and relates it directly to the widely used perplexity measure.

PDF Details

NeurIPS Conference 2018 Conference Paper

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal

Jiantao Jiao
Weihao Gao
Yanjun Han

We analyze the Kozachenko–Leonenko (KL) fixed k-nearest neighbor estimator for the differential entropy. We obtain the first uniform upper bound on its performance for any fixed k over H\"{o}lder balls on a torus without assuming any conditions on how close the density could be from zero. Accompanying a recent minimax lower bound over the H\"{o}lder ball, we show that the KL estimator for any fixed k is achieving the minimax rates up to logarithmic factors without cognizance of the smoothness parameter s of the H\"{o}lder ball for $s \in (0, 2]$ and arbitrary dimension d, rendering it the first estimator that provably satisfies this property.

PDF Details

NeurIPS Conference 2010 Conference Paper

Avoiding False Positive in Multi-Instance Learning

Yanjun Han
Qing Tao
Jue Wang

In multi-instance learning, there are two kinds of prediction failure, i. e. , false negative and false positive. Current research mainly focus on avoding the former. We attempt to utilize the geometric distribution of instances inside positive bags to avoid both the former and the latter. Based on kernel principal component analysis, we define a projection constraint for each positive bag to classify its constituent instances far away from the separating hyperplane while place positive instances and negative instances at opposite sides. We apply the Constrained Concave-Convex Procedure to solve the resulted problem. Empirical results demonstrate that our approach offers improved generalization performance.

PDF Details