Arrow Research search

Author name cluster

My Phan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

ICML Conference 2024 Conference Paper

When is Transfer Learning Possible?

  • My Phan
  • Kianté Brantley
  • Stephanie Milani
  • Soroush Mehri
  • Gokul Swamy 0001
  • Geoffrey J. Gordon

We present a general framework for transfer learning that is flexible enough to capture transfer in supervised, reinforcement, and imitation learning. Our framework enables new insights into the fundamental question of when we can successfully transfer learned information across problems. We model the learner as interacting with a sequence of problem instances, or environments, each of which is generated from a common structural causal model (SCM) by choosing the SCM’s parameters from restricted sets. We derive a procedure that can propagate restrictions on SCM parameters through the SCM’s graph structure to other parameters that we are trying to learn. The propagated restrictions then enable more efficient learning (i. e. , transfer). By analyzing the procedure, we are able to challenge widely-held beliefs about transfer learning. First, we show that having sparse changes across environments is neither necessary nor sufficient for transfer. Second, we show an example where the common heuristic of freezing a layer in a network causes poor transfer performance. We then use our procedure to select a more refined set of parameters to freeze, leading to successful transfer learning.

ICML Conference 2021 Conference Paper

Towards Practical Mean Bounds for Small Samples

  • My Phan
  • Philip S. Thomas
  • Erik G. Learned-Miller

Historically, to bound the mean for small sample sizes, practitioners have had to choose between using methods with unrealistic assumptions about the unknown distribution (e. g. , Gaussianity) and methods like Hoeffding’s inequality that use weaker assumptions but produce much looser (wider) intervals. In 1969, \citet{Anderson1969} proposed a mean confidence interval strictly better than or equal to Hoeffding’s whose only assumption is that the distribution’s support is contained in an interval $[a, b]$. For the first time since then, we present a new family of bounds that compares favorably to Anderson’s. We prove that each bound in the family has {\em guaranteed coverage}, i. e. , it holds with probability at least $1-\alpha$ for all distributions on an interval $[a, b]$. Furthermore, one of the bounds is tighter than or equal to Anderson’s for all samples. In simulations, we show that for many distributions, the gain over Anderson’s bound is substantial.

NeurIPS Conference 2020 Conference Paper

Model Selection in Contextual Stochastic Bandit Problems

  • Aldo Pacchiano
  • My Phan
  • Yasin Abbasi Yadkori
  • Anup Rao
  • Julian Zimmert
  • Tor Lattimore
  • Csaba Szepesvari

We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial master algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically. Using our techniques, we address model selection in a variety of problems such as misspecified linear contextual bandits \citep{lattimore2019learning}, linear bandit with unknown dimension \citep{Foster-Krishnamurthy-Luo-2019} and reinforcement learning with unknown feature maps. Our algorithm requires the knowledge of the optimal base regret to adjust the master learning rate. We show that without such prior knowledge any master can suffer a regret larger than the optimal base regret.

NeurIPS Conference 2019 Conference Paper

Thompson Sampling and Approximate Inference

  • My Phan
  • Yasin Abbasi Yadkori
  • Justin Domke

We study the effects of approximate inference on the performance of Thompson sampling in the $k$-armed bandit problems. Thompson sampling is a successful algorithm for online decision-making but requires posterior inference, which often must be approximated in practice. We show that even small constant inference error (in $\alpha$-divergence) can lead to poor performance (linear regret) due to under-exploration (for $\alpha 0$) by the approximation. While for $\alpha > 0$ this is unavoidable, for $\alpha \leq 0$ the regret can be improved by adding a small amount of forced exploration even when the inference error is a large constant.