Author name cluster

Jonathan How

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

1 author row

AAAI Conference 2019 Conference Paper

Collective Online Learning of Gaussian Processes in Massive Multi-Agent Systems

Trong Nghia Hoang
Quang Minh Hoang
Kian Hsiang Low
Jonathan How

This paper presents a novel Collective Online Learning of Gaussian Processes (COOL-GP) framework for enabling a massive number of GP inference agents to simultaneously perform (a) efficient online updates of their GP models using their local streaming data with varying correlation structures and (b) decentralized fusion of their resulting online GP models with different learned hyperparameter settings and inducing inputs. To realize this, we exploit the notion of a common encoding structure to encapsulate the local streaming data gathered by any GP inference agent into summary statistics based on our proposed representation, which is amenable to both an efficient online update via an importance sampling trick as well as multi-agent model fusion via decentralized message passing that can exploit sparse connectivity among agents for improving efficiency and enhance the robustness of our framework against transmission loss. We provide a rigorous theoretical analysis of the approximation loss arising from our proposed representation to achieve efficient online updates and model fusion. Empirical evaluations show that COOL-GP is highly effective in model fusion, resilient to information disparity between agents, robust to transmission loss, and can scale to thousands of agents.

PDF Details

NeurIPS Conference 2016 Conference Paper

Improving PAC Exploration Using the Median Of Means

Jason Pazis
Ronald Parr
Jonathan How

We present the first application of the median of means in a PAC exploration algorithm for MDPs. Using the median of means allows us to significantly reduce the dependence of our bounds on the range of values that the value function can take, while introducing a dependence on the (potentially much smaller) variance of the Bellman operator. Additionally, our algorithm is the first algorithm with PAC bounds that can be applied to MDPs with unbounded rewards.

PDF Details

AAAI Conference 2016 Conference Paper

Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments

Miao Liu
Christopher Amato
Emily Anesta
John Griffith
Jonathan How

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general framework for multiagent sequential decision-making under uncertainty. Although Dec-POMDPs are typically intractable to solve for real-world problems, recent research on macro-actions (i. e. , temporally-extended actions) has signiﬁcantly increased the size of problems that can be solved. However, current methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. To accommodate more realistic scenarios, when such information is not available, this paper presents a policy-based reinforcement learning approach, which learns the agent policies based solely on trajectories generated by previous interaction with the environment (e. g. , demonstrations). We show that our approach is able to generate valid macro-action controllers and develop an expectationmaximization (EM) algorithm (called Policy-based EM or PoEM), which has convergence guarantees for batch learning. Our experiments show PoEM is a scalable learning method that can learn optimal policies and improve upon hand-coded “expert” solutions.

PDF Details

RLDM Conference 2015 Conference Abstract

Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Envi- ronments

Miao Liu
Christopher Amato
Emily Anesta
John Griffith
Jonathan How

This paper presents a probabilistic framework for learning decentralized control policies for co- operative multiagent systems operating in a large partially observable stochastic environment based on batch data (trajectories). In decentralized domains, because of communication limitations, the agents cannot share their entire belief states, so execution must proceed based on local information. Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general framework for modeling multi- agent sequential decision making processes in the presence of uncertainty. Although Dec-POMDPs are typically intractable to solve for real-world problems, recent research on macro-actions in Dec-POMDPs has significantly increased the size of problems that can be solved. However, existing methods are confined to tree-based policies in finite-horizon problems, and assume the underlying POMDP models are known a priori. To accommodate more realistic scenarios when the full POMDP model is unavailable and the plan- ning horizon is unbounded, this paper presents a policy-based reinforcement learning approach to learn the macro-action policies represented by Mealy machines. Based on trajectories of macro-actions, observations, and rewards generated by interacting with the environment with hand-coded policies (demonstrations) and random exploration, an expectation-maximization (EM) algorithm is proposed to learn the decentralized macro-action policies, leading to a new framework called POEM (Policy-based EM), which has conver- gence guarantee for bath learning. The performance of POEM is demonstrated on two domains, including a benchmark navigation-among-movable-obstacle problem, and a newly designed large search and rescue problem. Our empirical study shows POEM is a scalable batch learning method that can learn optimal policies and achieve policy improvement over hand-coded (suboptimal) policies for missions in partially observable stochastic environments.

PDF Details

RLDM Conference 2015 Conference Abstract

RLPy: A Value-Function-Based Reinforcement Learning Framework for Education and Re- search

Alborz Geramifard
Christoph Dann
Robert Klein
William Dabney
Jonathan How

RLPy (http: //acl. mit. edu/rlpy ) is an open-source reinforcement learning (RL) package with the focus on linear function approximation for value-based techniques and planning problems with discrete actions. The aim of this package is to: a) boost the RL education process, and b) enable crisp and easy to debug experimentation with existing and new methods. RLPy achieves these goals by providing a rich library of fine-grained, easily exchangeable components for learning agents (e. g. , policies or representations of value functions). Developed in Python, RLPy allows fast prototyping, yet harnesses the power of state-of- the-art numerical libraries such as scipy and parallelization to scale to large problems. Furthermore, RLPy is self-contained. The package includes code profiling, domain visualizations, and data analysis. Finally RLPy is available under the Modified BSD License that allows integration with 3rd party softwares with little legal entanglement.

PDF Details

NeurIPS Conference 2015 Conference Paper

Streaming, Distributed Variational Inference for Bayesian Nonparametrics

Trevor Campbell
Julian Straub
John Fisher III
Jonathan How

This paper presents a methodology for creating streaming, distributed inference algorithms for Bayesian nonparametric (BNP) models. In the proposed framework, processing nodes receive a sequence of data minibatches, compute a variational posterior for each, and make asynchronous streaming updates to a central model. In contrast to previous algorithms, the proposed framework is truly streaming, distributed, asynchronous, learning-rate-free, and truncation-free. The key challenge in developing the framework, arising from fact that BNP models do not impose an inherent ordering on their components, is finding the correspondence between minibatch and central BNP posterior components before performing each update. To address this, the paper develops a combinatorial optimization problem over component correspondences, and provides an efficient solution technique. The paper concludes with an application of the methodology to the DP mixture model, with experimental results demonstrating its practical scalability and performance.

PDF Details

RLDM Conference 2013 Conference Abstract

Bayesian Nonparametric Adaptive Control using Gaussian Processes

Girish Chowdhary
Hassan Kingravi
Jonathan How
Patricio Vela

The problem of making control decisions over time for acheiving a desired behavior goal for a dynamical systems has been widely studied in control systems literature. The paradigm of Model Reference Adaptive control is concerned with guaranteeing stability of the dynamical system being controlled and en- suring that it behaves like a designer chosen reference model in presence of uncertainty. Most current model reference adaptive control methods rely on parametric adaptive elements, in which the number of parame- ters of the adaptive element are fixed a-priori, often through expert judgment. Examples of such adaptive elements are the commonly used Radial Basis Function (RBF) Neural Networks (NNs) with pre- allocated centers allocated based on the expected operating domain. If the system operates outside of the expected operating domain, such adaptive elements can become non-effective, thus rendering the adaptive controller only semi-global in nature. This paper investigates Gaussian Process based adaptive elements which gen- eralize the notion of Gaussian distributions to function approximation. We show that these nonparametric adaptive elements guarantee good closed loop performance with minimal prior domain knowledge of the un- certainty through stochastic stability arguments. Online implementable GP inference method are evaluated in simulations and compared with RBF-NN adaptive controllers with pre-allocated centers.

PDF Details

NeurIPS Conference 2013 Conference Paper

Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

Trevor Campbell
Miao Liu
Brian Kulis
Jonathan How
Lawrence Carin

This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a low-variance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets.

PDF Details

RLDM Conference 2013 Conference Abstract

Off-Policy Learning Combined with Automatic Feature Expansion for Solving Large MDPs

Alborz Geramifard
Christoph Dann
Jonathan How

Reinforcement learning (RL) techniques with cheap computational complexity and minimal hand-tuning that scale to large problems are highly desired among RL practitioners. Linear function ap- proximation has scaled existing RL techniques to large problems [Lagoudakis and Parr, 2003; Silver et al. , 2012], however that technique has two major drawbacks: 1) conventional off-policy techniques such as Q- Learning can be unstable when combined with linear function approximation [Baird, 1995] and 2) finding the “right” set of features for approximation can be challenging. The first drawback has been recently addressed with the introduction of the Greedy-GQ algorithm, a con- vergent extension of Q-Learning [Maei et al. , 2010]. The second drawback led to representation expansion techniques that add new features along the learning process [Geramifard et al. , 2011; Keller et al. , 2006; Parr et al. , 2007]. Amongst these techniques, incremental Feature Dependency Discovery (iFDD) has shown great potential as it scaled to large problems while enjoying convergence results [Geramifard et al. , 2011]. Recently, iFDD+ [Geramifard et al. , 2013b] improved the performance of iFDD in the prediction problem while outperforming the previous state-of-the-art batch expansion technique OMP-TD [Painter-Wakefield and Parr, 2012]. This paper connects Greedy-GQ learning with iFDD+ and, for the first time, introduces an online off-policy learning with automatic feature expansion technique. Given sparse features, the new algorithm has per- time-step complexity independent of the total number of features, while for most existing techniques feature discovery is at least quadratic in the number features [Keller et al. , 2006; Parr et al. , 2007]. Empirical results across 3 domains with sizes up to 77 billion state-action pairs verify the scalability of our new approach.

PDF Details

RLDM Conference 2013 Conference Abstract

Off-Policy Reinforcement Learning with Gaussian Processes

Girish Chowdhary
Miao Liu
Robert Grande
Jonathan How

An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian Processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the batch setting, and theoretical and practical extensions are provided for the online case. In particular, the convergence results in the batch case extend theoretical results on the Fitted Q-Iteration family of algorithms and the online results provide a theoretical grounding for the use of sparse, budgeted GP representations. These results reveal a reason for potential divergence of off-policy approximate rein- forcement learning employing Gaussian kernels as well as hyperparameter selection conditions to eliminate this possibility. Empirical results demonstrate GPQ has competitive learning speeds in addition to its con- vergence guarantees and its ability to automatically choose its own basis locations.

PDF Details

NeurIPS Conference 2013 Conference Paper

Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

Daniel Levine
Jonathan How

We consider the sensor selection problem on multivariate Gaussian distributions where only a \emph{subset} of latent variables is of inferential interest. For pairs of vertices connected by a unique path in the graph, we show that there exist decompositions of nonlocal mutual information into local information measures that can be computed efficiently from the output of message passing algorithms. We integrate these decompositions into a computationally efficient greedy selector where the computational expense of quantification can be distributed across nodes in the network. Experimental results demonstrate the comparative efficiency of our algorithms for sensor selection in high-dimensional distributions. We additionally derive an online-computable performance bound based on augmentations of the relevant latent variable set that, when such a valid augmentation exists, is applicable for \emph{any} distribution with nuisances.

PDF Details

RLDM Conference 2013 Conference Abstract

Simultaneous Clustering on Representation Expansion for Learning Multimodel MDPs

Trevor Campbell
Robert Klein
Alborz Geramifard
Jonathan How

This paper addresses the problem of model learning in a Markov decision process (MDP) that exhibits an underlying multiple model structure. In particular, each observed episode from the MDP has a latent classification that determines from which of an unknown number of models it was generated, and the goal is to determine both the number and the parameterization of the underlying models. The main challenge in solving this problem arises from the coupling between the separation of observations into groupings and the selection of a low-dimensional representation for each group. Present approaches to multiple model learning involve computationally expensive probabilistic inference over Bayesian nonparametric models. We propose Simultaneous Clustering on Representation Expansion (SCORE), an iterative scheme based on classical clustering and adaptive linear representations, which addresses this codependence in an efficient manner and guarantees convergence to a local optimum in model error. Both a batch and an incremental version of SCORE are presented. Empirical results on simulated domains demonstrate the advantages of SCORE when compared to contemporary techniques with respect to both sample and time complexity.

PDF Details