Author name cluster

Robby Goetschalckx

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

IJCAI Conference 2015 Conference Paper

Active Imitation Learning of Hierarchical Policies

Mandana Hamidi
Prasad Tadepalli
Robby Goetschalckx
Alan Fern

In this paper, we study the problem of imitation learning of hierarchical policies from demonstrations. The main difﬁculty in learning hierarchical policies by imitation is that the high level intention structure of the policy, which is often critical for understanding the demonstration, is unobserved. We formulate this problem as active learning of Probabilistic State-Dependent Grammars (PSDGs) from demonstrations. Given a set of expert demonstrations, our approach learns a hierarchical policy by actively selecting demonstrations and using queries to explicate their intentional structure at selected points. Our contributions include a new algorithm for imitation learning of hierarchical policies and principled heuristics for the selection of demonstrations and queries. Experimental results in ﬁve different domains exhibit successful learning using fewer queries than a variety of alternatives.

PDF Details

IJCAI Conference 2015 Conference Paper

Multitask Coactive Learning

Robby Goetschalckx
Alan Fern
Prasad Tadepalli

In this paper we investigate the use of coactive learning in a multitask setting. In coactive learning, an expert presents the learner with a problem and the learner returns a candidate solution. The expert then improves on the solution if necessary and presents the improved solution to the learner. The goal for the learner is to learn to produce solutions which cannot be further improved by the expert while minimizing the average expert effort. In this paper, we consider the setting where there are multiple experts (tasks), and in each iteration one expert presents a problem to the learner. While the experts are expected to have different solution preferences, they are also assumed to share similarities, which should enable generalization across experts. We analyze several algorithms for this setting and derive bounds on the average expert effort during learning. Our main contribution is the balanced Perceptron algorithm, which is the first coactive learning algorithm that is both able to generalize across experts when possible, while also guaranteeing convergence to optimal solutions for individual experts. Our experiments in three domains confirm that this algorithm is effective in the multitask setting, compared to natural baselines.

PDF Details

AAAI Conference 2014 Conference Paper

Coactive Learning for Locally Optimal Problem Solving

Robby Goetschalckx
Alan Fern
Prasad Tadepalli

Coactive learning is an online problem solving setting where the solutions provided by a solver are interactively improved by a domain expert, which in turn drives learning. In this paper we extend the study of coactive learning to problems where obtaining a globally optimal or near-optimal solution may be intractable or where an expert can only be expected to make small, local improvements to a candidate solution. The goal of learning in this new setting is to minimize the cost as measured by the expert effort over time. We first establish theoretical bounds on the average cost of the existing coactive Perceptron algorithm. In addition, we consider new online algorithms that use cost-sensitive and Passive-Aggressive (PA) updates, showing similar or improved theoretical bounds. We provide an empirical evaluation of the learners in various domains, which show that the Perceptron based algorithms are quite effective and that unlike the case for online classification, the PA algorithms do not yield significant performance gains.

PDF Details

AAAI Conference 2014 Conference Paper

Imitation Learning with Demonstrations and Shaping Rewards

Kshitij Judah
Alan Fern
Prasad Tadepalli
Robby Goetschalckx

Imitation Learning (IL) is a popular approach for teaching behavior policies to agents by demonstrating the desired target policy. While the approach has lead to many successes, IL often requires a large set of demonstrations to achieve robust learning, which can be expensive for the teacher. In this paper, we consider a novel approach to improve the learning efficiency of IL by providing a shaping reward function in addition to the usual demonstrations. Shaping rewards are numeric functions of states (and possibly actions) that are generally easily specified, and capture general principles of desired behavior, without necessarily completely specifying the behavior. Shaping rewards have been used extensively in reinforcement learning, but have been seldom considered for IL, though they are often easy to specify. Our main contribution is to propose an IL approach that learns from both shaping rewards and demonstrations. We demonstrate the effectiveness of the approach across several IL problems, even when the shaping reward is not fully consistent with the demonstrations.

PDF Details

IJCAI Conference 2011 Conference Paper

Continuous Correlated Beta Processes

Robby Goetschalckx
Pascal Poupart
Jesse Hoey

In this paper we consider a (possibly continuous) space of Bernoulli experiments. We assume that the Bernoulli distributions of the points are correlated. All evidence data comes in the form of successful or failed experiments at different points. Current state-of-the-art methods for expressing a distribution over a continuum of Bernoulli distributions use logistic Gaussian processes or Gaussian copula processes. However, both of these require computationally expensive matrix operations (cubic in the general case). We introduce a more intuitive approach, directly correlating beta distributions by sharing evidence between them according to a kernel function, an approach which has linear time complexity. The approach can easily be extended to multiple outcomes, giving a continuous correlated Dirichlet process. This approach can be used for classification (both binary and multi-class) and learning the actual probabilities of the Bernoulli distributions. We show results for a number of data sets, as well as a case-study where a mixture of continuous beta processes is used as part of an automated stroke rehabilitation system.

PDF Details DOI

IJCAI Conference 2009 Conference Paper

Scott Sanner
Robby Goetschalckx
Kurt Driessens
Guy Shani

Real-time dynamic programming (RTDP) solves Markov decision processes (MDPs) when the initial state is restricted, by focusing dynamic programming on the envelope of states reachable from an initial state set. RTDP often provides performance guarantees without visiting the entire state space. Building on RTDP, recent work has sought to improve its efﬁciency through various optimizations, including maintaining upper and lower bounds to both govern trial termination and prioritize state exploration. In this work, we take a Bayesian perspective on these upper and lower bounds and use a value of perfect information (VPI) analysis to govern trial termination and exploration in a novel algorithm we call VPI-RTDP. VPI-RTDP leads to an improvement over state-of-the-art RTDP methods, empirically yielding up to a three-fold reduction in the amount of time and number of visited states required to achieve comparable policy performance.

PDF Details

ECAI Conference 2008 Conference Paper

Reinforcement Learning with the Use of Costly Features

Robby Goetschalckx
Scott Sanner
Kurt Driessens

A common solution approach to reinforcement learning problems with large state spaces (where value functions cannot be represented exactly) is to compute an approximation of the value function in terms of state features. However, little attention has been paid to the cost of computing these state features (e. g. , search-based features). To this end, we introduce a cost-sensitive sparse linear-value function approximation algorithm - FOVEA - and demonstrate its performance on an experimental domain with a range of feature costs.

Details

EWRL Workshop 2008 Conference Paper

Reinforcement Learning with the Use of Costly Features

Robby Goetschalckx
Scott Sanner
Kurt Driessens

Abstract In many practical reinforcement learning problems, the state space is too large to permit an exact representation of the value function, much less the time required to compute it. In such cases, a common solution approach is to compute an approximation of the value function in terms of state features. However, relatively little attention has been paid to the cost of computing these state features. For example, search-based features may be useful for value prediction, but their computational cost must be traded off with their impact on value accuracy. To this end, we introduce a new cost-sensitive sparse linear regression paradigm for value function approximation in reinforcement learning where the learner is able to select only those costly features that are sufficiently informative to justify their computation. We illustrate the learning behavior of our approach using a simple experimental domain that allows us to explore the effects of a range of costs on the cost-performance trade-off.

Details