Author name cluster

Paul Ruvolo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

IJCAI Conference 2015 Conference Paper

Autonomous Cross-Domain Knowledge Transfer in Lifelong Policy Gradient Reinforcement Learning

Haitham Bou Ammar
Eric Eaton
Jose Marcio Luna
Paul Ruvolo

Online multi-task learning is an important capability for lifelong learning agents, enabling them to acquire models for diverse tasks over time and rapidly learn new tasks by building upon prior experience. However, recent progress toward lifelong reinforcement learning (RL) has been limited to learning from within a single task domain. For truly versatile lifelong learning, the agent must be able to autonomously transfer knowledge between different task domains. A few methods for cross-domain transfer have been developed, but these methods are computationally inefficient for scenarios where the agent must learn tasks consecutively. In this paper, we develop the first cross-domain lifelong RL framework. Our approach efficiently optimizes a shared repository of transferable knowledge and learns projection matrices that specialize that knowledge to different task domains. We provide rigorous theoretical guarantees on the stability of this approach, and empirically evaluate its performance on diverse dynamical systems. Our results show that the proposed method can learn effectively from interleaved task domains and rapidly acquire high performance in new domains.

PDF Details

AAAI Conference 2015 Conference Paper

Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment

Haitham Bou Ammar
Eric Eaton
Paul Ruvolo
Matthew Taylor

The success of applying policy gradient reinforcement learning (RL) to difﬁcult control tasks hinges crucially on the ability to determine a sensible initialization for the policy. Transfer learning methods tackle this problem by reusing knowledge gleaned from solving other related tasks. In the case of multiple task domains, these algorithms require an inter-task mapping to facilitate knowledge transfer across domains. However, there are currently no general methods to learn an inter-task mapping without requiring either background knowledge that is not typically present in RL settings, or an expensive analysis of an exponential number of inter-task mappings in the size of the state and action spaces. This paper introduces an autonomous framework that uses unsupervised manifold alignment to learn intertask mappings and effectively transfer samples between different task domains. Empirical results on diverse dynamical systems, including an application to quadrotor control, demonstrate its effectiveness for cross-domain transfer in the context of policy gradient RL.

PDF Details

ICML Conference 2014 Conference Paper

Online Multi-Task Learning for Policy Gradient Methods

Haitham Bou-Ammar
Eric Eaton
Paul Ruvolo
Matthew E. Taylor

Policy gradient algorithms have shown considerable recent success in solving high-dimensional sequential decision making tasks, particularly in robotics. However, these methods often require extensive experience in a domain to achieve high performance. To make agents more sample-efficient, we developed a multi-task policy gradient method to learn decision making tasks consecutively, transferring knowledge between tasks to accelerate learning. Our approach provides robust theoretical guarantees, and we show empirically that it dramatically accelerates learning on a variety of dynamical systems, including an application to quadrotor control.

Details

AAAI Conference 2014 Conference Paper

Online Multi-Task Learning via Sparse Dictionary Optimization

Paul Ruvolo
Eric Eaton

This paper develops an efficient online algorithm for learning multiple consecutive tasks based on the K- SVD algorithm for sparse dictionary optimization. We first derive a batch multi-task learning method that builds upon K-SVD, and then extend the batch algorithm to train models online in a lifelong learning setting. The resulting method has lower computational complexity than other current lifelong learning algorithms while maintaining nearly identical model performance. Additionally, the proposed method offers an alternate formulation for lifelong learning that supports both task and feature similarity matrices.

PDF Details

AAAI Conference 2013 Conference Paper

Active Task Selection for Lifelong Machine Learning

Paul Ruvolo
Eric Eaton

In a lifelong learning framework, an agent acquires knowledge incrementally over consecutive learning tasks, continually building upon its experience. Recent lifelong learning algorithms have achieved nearly identical performance to batch multi-task learning methods while reducing learning time by three orders of magnitude. In this paper, we further improve the scalability of lifelong learning by developing curriculum selection methods that enable an agent to actively select the next task to learn in order to maximize performance on future learning tasks. We demonstrate that active task selection is highly reliable and effective, allowing an agent to learn high performance models using up to 50% fewer tasks than when the agent has no control over the task order. We also explore a variant of transfer learning in the lifelong learning setting in which the agent can focus knowledge acquisition toward a particular target task.

PDF Details

ICML Conference 2013 Conference Paper

ELLA: An Efficient Lifelong Learning Algorithm

Paul Ruvolo
Eric Eaton

The problem of learning multiple consecutive tasks, known as lifelong learning, is of great importance to the creation of intelligent, general-purpose, and flexible machines. In this paper, we develop a method for online multi-task learning in the lifelong learning setting. The proposed Efficient Lifelong Learning Algorithm (ELLA) maintains a sparsely shared basis for all task models, transfers knowledge from the basis to learn each new task, and refines the basis over time to maximize performance across all tasks. We show that ELLA has strong connections to both online dictionary learning for sparse coding and state-of-the-art batch multi-task learning methods, and provide robust theoretical performance guarantees. We show empirically that ELLA yields nearly identical performance to batch multi-task learning while learning tasks sequentially in three orders of magnitude (over 1, 000x) less time.

Details

IROS Conference 2012 Conference Paper

Control by Gradient Collocation: Applications to optimal obstacle avoidance and minimum torque control

Paul Ruvolo
Tingfan Wu
Javier R. Movellan

We present a new machine learning algorithm for learning optimal feedback control policies to guide a robot to a goal in the presence of obstacles. Our method works by first reducing the problem of obstacle avoidance to a continuous state, action, and time control problem, and then uses efficient collocation methods to solve for an optimal feedback control policy. This formulation of the obstacle avoidance problem improves over standard approaches, such as potential field methods, by being resistant to local minima, allowing for moving obstacles, handling stochastic systems, and computing feedback control strategies that take into account the robot's (possibly non-linear) dynamics. In addition to contributing a new method for obstacle avoidance, our work contributes to the state-of-the-art in collocation methods for non-linear stochastic optimal control problems in two important ways: (1) we show that taking into account local gradient and second-order derivative information of the optimal value function at the collocation points allows us to exploit knowledge of the derivative information about the system dynamics, and (2) we show that computational savings can be achieved by directly fitting the gradient of the optimal value function rather than the optimal value function itself. We validate our approach on three problems: non-convex obstacle avoidance of a point-mass robot, obstacle avoidance for a 2 degree of freedom robotic manipulator, and optimal control of a non-linear dynamical system.

Details

NeurIPS Conference 2010 Conference Paper

An Alternative to Low-level-Sychrony-Based Methods for Speech Detection

Javier Movellan
Paul Ruvolo

Determining whether someone is talking has applications in many areas such as speech recognition, speaker diarization, social robotics, facial expression recognition, and human computer interaction. One popular approach to this problem is audio-visual synchrony detection. A candidate speaker is deemed to be talking if the visual signal around that speaker correlates with the auditory signal. Here we show that with the proper visual features (in this case movements of various facial muscle groups), a very accurate detector of speech can be created that does not use the audio signal at all. Further we show that this person independent visual-only detector can be used to train very accurate audio-based person dependent voice models. The voice model has the advantage of being able to identify when a particular person is speaking even when they are not visible to the camera (e. g. in the case of a mobile robot). Moreover, we show that a simple sensory fusion scheme between the auditory and visual models improves performance on the task of talking detection. The work here provides dramatic evidence about the efficacy of two very different approaches to multimodal speech detection on a challenging database.

PDF Details

IROS Conference 2010 Conference Paper

Approaches and databases for online calibration of binaural sound localization for robotic heads

Holger Finger
Shih-Chii Liu
Paul Ruvolo
Javier R. Movellan

In this paper, we evaluate adaptive sound localization algorithms for robotic heads. To this end we built a 3 degree-of-freedom head with two microphones encased in artificial pinnae (outer ears). The geometry of the head and pinnae induce temporal differences in the sound recorded at each microphone. These differences change with the frequency of the sound, location of the sound, and orientation of the robot in a complex manner. To learn the relationship between these auditory differences and the location of a sound source, we applied machine learning methods to a database of different audio source locations and robot head orientations. Our approach achieves a mean error of 2. 5 degrees for azimuth and 11 degrees for elevation for estimating the position of an audio source. The impressive results highlight the benefits of a two-stage regression model to make use of the properties of the artificial pinnae for elevation estimation. In this work, the algorithms were trained using ground truth data provided by a motion capture system. We are currently generalizing the approach so that the training signal is provided online based on a real-time face detection and speech detection system.

Details

NeurIPS Conference 2009 Conference Paper

Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

Jacob Whitehill
Ting-Fan Wu
Jacob Bergsma
Javier Movellan
Paul Ruvolo

Modern machine learning-based approaches to computer vision require very large databases of labeled images. Some contemporary vision systems already require on the order of millions of images for training (e. g. , Omron face detector). While the collection of these large databases is becoming a bottleneck, new Internet-based services that allow labelers from around the world to be easily hired and managed provide a promising solution. However, using these services to label large databases brings with it new theoretical and practical challenges: (1) The labelers may have wide ranging levels of expertise which are unknown a priori, and in some cases may be adversarial; (2) images may vary in their level of difficulty; and (3) multiple labels for the same image must be combined to provide an estimate of the actual label of the image. Probabilistic approaches provide a principled way to approach these problems. In this paper we present a probabilistic model and use it to simultaneously infer the label of each image, the expertise of each labeler, and the difficulty of each image. On both simulated and real data, we demonstrate that the model outperforms the commonly used ``Majority Vote heuristic for inferring image labels, and is robust to both adversarial and noisy labelers.

PDF Details

ICRA Conference 2008 Conference Paper

Auditory mood detection for social and educational robots

Paul Ruvolo
Ian R. Fasel
Javier R. Movellan

Social robots face the fundamental challenge of detecting and adapting their behavior to the current social mood. For example, robots that assist teachers in early education must choose different behaviors depending on whether the children are crying, laughing, sleeping, or singing songs. Interactive robotic applications require perceptual algorithms that both run in real time and are adaptable to the challenging conditions of daily life. This paper explores a novel approach to auditory mood detection which was born out of our experience immersing social robots in classroom environments. We propose a new set of low-level spectral contrast features that extends a class of features which have proven very successful for object recognition in the modern computer vision literature. Features are selected and combined using machine learning approaches so as to make decisions about the ongoing auditory mood. We demonstrate excellent performance on two standard emotional speech databases (the Berlin Emotional Speech [W. Burkhardt et al. , 2005], and the ORATOR dataset [H. Quast, 2001]). In addition we establish strong baseline performance for mood detection on a database collected from a social robot immersed in a classroom of 18-24 months old children [J. Movellan er al. , 2007]. This approach operates in real time at little computational cost. It has the potential to greatly enhance the effectiveness of social robots in daily life environments.

Details

NeurIPS Conference 2008 Conference Paper

Optimization on a Budget: A Reinforcement Learning Approach

Paul Ruvolo
Ian Fasel
Javier Movellan

Many popular optimization algorithms, like the Levenberg-Marquardt algorithm (LMA), use heuristic-based controllers'' that modulate the behavior of the optimizer during the optimization process. For example, in the LMA a damping parameter is dynamically modified based on a set rules that were developed using various heuristic arguments. Reinforcement learning (RL) is a machine learning approach to learn optimal controllers by examples and thus is an obvious candidate to improve the heuristic-based controllers implicit in the most popular and heavily used optimization algorithms. Improving the performance of off-the-shelf optimizers is particularly important for time-constrained optimization problems. For example the LMA algorithm has become popular for many real-time computer vision problems, including object tracking from video, where only a small amount of time can be allocated to the optimizer on each incoming video frame. Here we show that a popular modern reinforcement learning technique using a very simply state space can dramatically improve the performance of general purpose optimizers, like the LMA. Most surprisingly the controllers learned for a particular domain appear to work very well also on very different optimization domains. For example we used RL methods to train a new controller for the damping parameter of the LMA. This controller was trained on a collection of classic, relatively small, non-linear regression problems. The modified LMA performed better than the standard LMA on these problems. Most surprisingly, it also dramatically outperformed the standard LMA on a difficult large scale computer vision problem for which it had not been trained before. Thus the controller appeared to have extracted control rules that were not just domain specific but generalized across a wide range of optimization domains. "

PDF Details