Arrow Research search

Author name cluster

Jonathan Lee

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
1 author row

Possible papers

5

AAAI Conference 2025 Conference Paper

Cost-Aware Near-Optimal Policy Learning

  • Joy He-Yueya
  • Jonathan Lee
  • Matthew Jörke
  • Emma Brunskill

It is often of interest to learn a context-sensitive decision policy, such as in contextual multi-armed bandit processes. To quantify the efficiency of a machine learning algorithm for such settings, probably approximately correct (PAC) bounds, which bound the number of samples required, or cumulative regret guarantees, are typically used. However, real-world settings often have limited resources for experimentation, and decisions/interventions may differ in the amount of resources required (e.g., money or time). Therefore, it is of interest to consider how to design an experiment strategy that reduces the experimental budget needed to learn a near-optimal contextual policy. Unlike reinforcement learning or bandit approaches that embed costs into the reward function, we focus on reducing resource use in learning a near-optimal policy without resource constraints. We introduce two resource-aware algorithms for the contextual bandit setting and prove their soundness. Simulations based on real-world datasets demonstrate that our algorithms significantly reduce the resources needed to learn a near-optimal decision policy compared to previous resource-unaware methods.

TMLR Journal 2024 Journal Article

Estimating Optimal Policy Value in Linear Contextual Bandits Beyond Gaussianity

  • Jonathan Lee
  • Weihao Kong
  • Aldo Pacchiano
  • Vidya Muthukumar
  • Emma Brunskill

In many bandit problems, the maximal reward achievable by a policy is often unknown in advance. We consider the problem of estimating the optimal policy value in the sublinear data regime before the optimal policy is even learnable. We refer to this as $V^*$ estimation. It was previously shown that fast $V^*$ estimation is possible but only in disjoint linear bandits with Gaussian covariates. Whether this is possible for more realistic context distributions has remained an open and important question for tasks such as model selection. In this paper, we first provide lower bounds showing that this general problem is hard. However, under stronger assumptions, we give an algorithm and analysis proving that $\widetilde{\mathcal{O}}(\sqrt{d})$ sublinear estimation of $V^*$ is indeed information-theoretically possible, where $d$ is the dimension. We subsequently introduce a practical and computationally efficient algorithm that estimates a problem-specific upper bound on $V^*$, valid for general distributions and tight for Gaussian context distributions. We prove our algorithm requires only $\widetilde{\mathcal{O}}(\sqrt{d})$ samples to estimate the upper bound. We use this upper bound in conjunction with the estimator to derive novel and improved guarantees for several applications in bandit model selection and testing for treatment effects. We present promising experimental benefits on a semi-synthetic simulation using historical data on warfarin treatment dosage outcomes.

NeurIPS Conference 2023 Conference Paper

Experiment Planning with Function Approximation

  • Aldo Pacchiano
  • Jonathan Lee
  • Emma Brunskill

We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms---for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies---producing in advance a set of policies for data collection is paramount. We study the setting where a large dataset of contexts but not rewards is available and may be used by the learner to design an effective data collection strategy. Although when rewards are linear this problem has been well studied, results are still missing for more complex reward models. In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection.

NeurIPS Conference 2023 Conference Paper

Supervised Pretraining Can Learn In-Context Reinforcement Learning

  • Jonathan Lee
  • Annie Xie
  • Aldo Pacchiano
  • Yash Chandak
  • Chelsea Finn
  • Ofir Nachum
  • Emma Brunskill

Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i. e. , reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce and study the Decision-Pretrained Transformer (DPT), a supervised pretraining method where a transformer predicts an optimal action given a query state and an in-context dataset of interactions from a diverse set of tasks. While simple, this procedure produces a model with several surprising capabilities. We find that the trained transformer can solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline, despite not being explicitly trained to do so. The model also generalizes beyond the pretraining distribution to new tasks and automatically adapts its decision-making strategies to unknown structure. Theoretically, we show DPT can be viewed as an efficient implementation of Bayesian posterior sampling, a provably sample-efficient RL algorithm. We further leverage this connection to provide guarantees on the regret of the in-context algorithm yielded by DPT, and prove that it can learn faster than algorithms used to generate the pretraining data. These results suggest a promising yet simple path towards instilling strong in-context decision-making abilities in transformers.

YNIMG Journal 2018 Journal Article

7T MR of intracranial pathology: Preliminary observations and comparisons to 3T and 1.5T

  • Emmanuel C. Obusez
  • Mark Lowe
  • Se-Hong Oh
  • Irene Wang
  • Jennifer Bullen
  • Paul Ruggieri
  • Virginia Hill
  • Daniel Lockwood

Purpose There have been an increasing number of studies involving ultra-high-field 7T of intracranial pathology, however, comprehensive clinical studies of neuropathology at 7T still remain limited. 7T has the advantage of a higher signal-to-noise ratio and a higher contrast-to-noise ratio, compared to current low field clinical MR scanners. We hypothesized 7T applied clinically, may improve detection and characterization of intracranial pathology. Materials and methods We performed an IRB-approved 7T prospective study of patients with neurological disease who previously had lower field 3T and 1. 5T. All patients underwent 7T scans, using comparable clinical imaging protocols, with the aim of qualitatively comparing neurological lesions at 7T with 3T or 1. 5T. To qualitatively assess lesion conspicuity at 7T compared with low field, 80-paired images were viewed by 10 experienced neuroradiologists and scored on a 5-point scale. Inter-rater agreement was characterized using a raw percent agreement and mean weighted kappa. Results One-hundred and four patients with known neurological disease have been scanned to date. Fifty-five patients with epilepsy, 18 patients with mild traumatic brain injury, 11 patients with known or suspected multiple sclerosis, 9 patients with amyotrophic lateral sclerosis, 4 patients with intracranial neoplasm, 2 patients with orbital melanoma, 2 patients with cortical infarcts, 2 patients with cavernous malformations, and 1 patient with cerebral amyloid angiopathy. From qualitative observations, we found better resolution and improved detection of lesions at 7T compared to 3T. There was a 55% raw inter-rater agreement that lesions were more conspicuous on 7T than 3T/1. 5T, compared with a 6% agreement that lesions were more conspicuous on 3T/1. 5T than 7T. Conclusion Our findings show that the primary clinical advantages of 7T magnets, which include higher signal-to-noise ratio, higher contrast-to-noise ratio, smaller voxels and stronger susceptibility contrast, may increase lesion conspicuity, detection and characterization compared to low field 1. 5T and 3T. However, low field which detects a plethora of intracranial pathology remains the mainstay for diagnostic imaging until limitations at 7T are addressed and further evidence of utility provided.