Author name cluster

Finale Doshi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

1 author row

ICLR Conference 2025 Conference Paper

Connecting Federated ADMM to Bayes

Siddharth Swaroop
Mohammad Emtiyaz Khan
Finale Doshi

We provide new connections between two distinct federated learning approaches based on (i) ADMM and (ii) Variational Bayes (VB), and propose new variants by combining their complementary strengths. Specifically, we show that the dual variables in ADMM naturally emerge through the "site" parameters used in VB with isotropic Gaussian covariances. Using this, we derive two versions of ADMM from VB that use flexible covariances and functional regularisation, respectively. Through numerical experiments, we validate the improvements obtained in performance. The work shows connection between two fields that are believed to be fundamentally different and combines them to improve federated learning.

UAI Conference 2025 Conference Paper

Transparent Trade-offs between Properties of Explanations

Hiwot Belay Tadesse
Alihan Hüyük
Yaniv Yacoby
Weiwei Pan
Finale Doshi

When explaining machine learning models, it is important for explanations to have certain properties like faithfulness, robustness, smoothness, low complexity, etc. However, many properties are in tension with each other, making it challenging to achieve them simultaneously. For example, reducing the complexity of an explanation can make it less expressive, compromising its faithfulness. The ideal balance of trade-offs between properties tends to vary across different tasks and users. Motivated by these varying needs, we aim to find explanations that make optimal trade-offs while allowing for transparent control over the balance between different properties. Unlike existing methods that encourage desirable properties implicitly through their design, our approach optimizes explanations explicitly for a linear mixture of multiple properties. By adjusting the mixture weights, users can control the balance between those properties and create explanations with precisely what is needed for their particular task.

ICLR Conference 2023 Conference Paper

Performance Bounds for Model and Policy Transfer in Hidden-parameter MDPs

Haotian Fu
Jiayu Yao
Omer Gottesman
Finale Doshi
George Konidaris 0001

In the Hidden-Parameter MDP (HiP-MDP) framework, a family of reinforcement learning tasks is generated by varying hidden parameters specifying the dynamics and reward function for each individual task. HiP-MDP is a natural model for families of tasks in which meta- and lifelong-reinforcement learning approaches can succeed. Given a learned context encoder that infers the hidden parameters from previous experience, most existing algorithms fall into two categories: $\textit{model transfer}$ and $\textit{policy transfer}$, depending on which function the hidden parameters are used to parameterize. We characterize the robustness of model and policy transfer algorithms with respect to hidden parameter estimation error. We first show that the value function of HiP-MDPs is Lipschitz continuous under certain conditions. We then derive regret bounds for both settings through the lens of Lipschitz continuity. Finally, we empirically corroborate our theoretical analysis by experimentally varying the hyper-parameters governing the Lipschitz constants of two continuous control problems; the resulting performance is consistent with our predictions.

ICML Conference 2023 Conference Paper

The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning

Sarah Rathnam
Sonali Parbhoo
Weiwei Pan
Susan A. Murphy
Finale Doshi

Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al. , 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions. In fact, it functions like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. Our equivalence theorem leads to an explicit formula to set regularization parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific method across simple empirical examples as well as a medical cancer simulator.

ICML Conference 2021 Conference Paper

Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement

Andrew Slavin Ross
Finale Doshi

In representation learning, there has been recent interest in developing algorithms to disentangle the ground-truth generative factors behind a dataset, and metrics to quantify how fully this occurs. However, these algorithms and metrics often assume that both representations and ground-truth factors are flat, continuous, and factorized, whereas many real-world generative processes involve rich hierarchical structure, mixtures of discrete and continuous variables with dependence between them, and even varying intrinsic dimensionality. In this work, we develop benchmarks, algorithms, and metrics for learning such hierarchical representations.

ICML Conference 2021 Conference Paper

State Relevance for Off-Policy Evaluation

Simon P. Shen
Yecheng Jason Ma 0001
Omer Gottesman
Finale Doshi

Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are of different lengths. In this work, we introduce Omitting-States-Irrelevant-to-Return Importance Sampling (OSIRIS), an estimator which reduces variance by strategically omitting likelihood ratios associated with certain states. We formalize the conditions under which OSIRIS is unbiased and has lower variance than ordinary importance sampling, and we demonstrate these properties empirically.

ICML Conference 2020 Conference Paper

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

Omer Gottesman
Joseph Futoma
Yao Liu 0009
Sonali Parbhoo
Leo A. Celi
Emma Brunskill
Finale Doshi

Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. In this paper we develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates. This is accomplished by highlighting observations in the data whose removal will have a large effect on the OPE estimate, and formulating a set of rules for choosing which ones to present to domain experts for validation. We develop methods to compute exactly the influence functions for fitted Q-evaluation with two different function classes: kernel-based and linear least squares, as well as importance sampling methods. Experiments on medical simulations and real-world intensive care unit data demonstrate that our method can be used to identify limitations in the evaluation process and make evaluation more robust.

UAI Conference 2020 Conference Paper

PoRB-Nets: Poisson Process Radial Basis Function Networks

Beau Coker
Melanie Fernandes Pradier
Finale Doshi

Bayesian neural networks (BNNs) are flexible function priors well-suited to situations in which data are scarce and uncertainty must be quantified. Yet, common weight priors are able to encode little functional knowledge and can behave in undesirable ways. We present a novel prior over radial basis function networks (RBFNs) that allows for independent specification of functional amplitude variance and lengthscale (i. e. , smoothness), where the inverse lengthscale corresponds to the concentration of radial basis functions. When the lengthscale is uniform over the input space, we prove consistency and approximate variance stationarity. This is in contrast to common BNN priors, which are highly nonstationary. When the input dependence of the lengthscale is unknown, we show how it can be inferred. We compare this model’s behavior to standard BNNs and Gaussian processes using synthetic and real examples.

ICML Conference 2019 Conference Paper

Combining parametric and nonparametric models for off-policy evaluation

Omer Gottesman
Yao Liu 0009
Scott Sussex
Emma Brunskill
Finale Doshi

We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning. Our method takes a mixture-of-experts approach to combine parametric and non-parametric models of the environment such that the final value estimate has the least expected error. We do so by first estimating the local accuracy of each model and then using a planner to select which model to use at every time step as to minimize the return error estimate along entire trajectories. Across a variety of domains, our mixture-based approach outperforms the individual models alone as well as state-of-the-art importance sampling-based estimators.

ICML Conference 2018 Conference Paper

Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning

Stefan Depeweg
José Miguel Hernández-Lobato
Finale Doshi
Steffen Udluft

Bayesian neural networks with latent variables are scalable and flexible probabilistic models: they account for uncertainty in the estimation of the network weights and, by making use of latent variables, can capture complex noise patterns in the data. Using these models we show how to perform and utilize a decomposition of uncertainty in aleatoric and epistemic components for decision making purposes. This allows us to successfully identify informative points for active learning of functions with heteroscedastic and bimodal noise. Using the decomposition we further define a novel risk-sensitive criterion for reinforcement learningto identify policies that balance expected cost, model-bias and noise aversion.

ICML Conference 2018 Conference Paper

Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors

Soumya Ghosh
Jiayu Yao
Finale Doshi

Bayesian Neural Networks (BNNs) have recently received increasing attention for their ability to provide well-calibrated posterior uncertainties. However, model selection—even choosing the number of nodes—remains an open question. Recent work has proposed the use of a horseshoe prior over node pre-activations of a Bayesian neural network, which effectively turns off nodes that do not help explain the data. In this work, we propose several modeling and inference advances that consistently improve the compactness of the model learned while maintaining predictive performance, especially in smaller-sample settings including reinforcement learning.

ICRA Conference 2012 Conference Paper

A Bayesian nonparametric approach to modeling battery health

Joshua Mason Joseph
Finale Doshi
Nicholas Roy

The batteries of many consumer products are both a substantial portion of the product's cost and commonly a first point of failure. Accurately predicting remaining battery life can lower costs by reducing unnecessary battery replacements. Unfortunately, battery dynamics are extremely complex, and we often lack the domain knowledge required to construct a model by hand. In this work, we take a data-driven approach and aim to learn a model of battery time-to-death from training data. Using a Dirichlet process prior over mixture weights, we learn an infinite mixture model for battery health. The Bayesian aspect of our model helps to avoid over-fitting while the nonparametric nature of the model allows the data to control the size of the model, preventing under-fitting. We demonstrate our model's effectiveness by making time-to-death predictions using real data from nickel-metal hydride battery packs.

ICML Conference 2011 Conference Paper

Infinite Dynamic Bayesian Networks

Finale Doshi
David Wingate
Joshua B. Tenenbaum
Nicholas Roy

ICML Conference 2011 Conference Paper

Online Discovery of Feature Dependencies

Alborz Geramifard
Finale Doshi
Josh Redding
Nicholas Roy
Jonathan P. How

ICML Conference 2009 Conference Paper

Accelerated sampling for the Indian Buffet Process

Finale Doshi
Zoubin Ghahramani

We often seek to identify co-occurring hidden features in a set of observations. The Indian Buffet Process (IBP) provides a non-parametric prior on the features present in each observation, but current inference techniques for the IBP often scale poorly. The collapsed Gibbs sampler for the IBP has a running time cubic in the number of observations, and the uncollapsed Gibbs sampler, while linear, is often slow to mix. We present a new linear-time collapsed Gibbs sampler for conjugate likelihood models and demonstrate its efficacy on large real-world datasets.

UAI Conference 2009 Conference Paper

Correlated Non-Parametric Latent Feature Models

Finale Doshi
Zoubin Ghahramani

We are often interested in explaining data through a set of hidden factors or features. When the number of hidden features is unknown, the Indian Buffet Process (IBP) is a nonparametric latent feature model that does not bound the number of active features in dataset. However, the IBP assumes that all latent features are uncorrelated, making it inadequate for many realworld problems. We introduce a framework for correlated nonparametric feature models, generalising the IBP. We use this framework to generate several specific models and demonstrate applications on realworld datasets.

ICML Conference 2008 Conference Paper

Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs

Finale Doshi
Joelle Pineau
Nicholas Roy

IROS Conference 2007 Conference Paper

Collision detection in legged locomotion using supervised learning

Finale Doshi
Emma Brunskill
Alexander C. Shkolnik
Thomas Kollar
Khashayar Rohanimanesh
Russ Tedrake
Nicholas Roy

We propose a fast approach for detecting collision- free swing-foot trajectories for legged locomotion over extreme terrains. Instead of simulating the swing trajectories and checking for collisions along them, our approach uses machine learning techniques to predict whether a swing trajectory is collision-free. Using a set of local terrain features, we apply supervised learning to train a classifier to predict collisions. Both in simulation and on a real quadruped platform, our results show that our classifiers can improve the accuracy of collision detection compared to a real-time geometric approach without significantly increasing the computation time.