Arrow Research search

Author name cluster

Reinaldo A. C. Bianchi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

ECAI Conference 2012 Conference Paper

Heuristically Accelerated Reinforcement Learning: Theoretical and Experimental Results

  • Reinaldo A. C. Bianchi
  • Carlos H. C. Ribeiro
  • Anna Helena Reali Costa

Since finding control policies using Reinforcement Learning (RL) can be very time consuming, in recent years several authors have investigated how to speed up RL algorithms by making improved action selections based on heuristics. In this work we present new theoretical results - convergence and a superior limit for value estimation errors - for the class that encompasses all heuristics-based algorithms, called Heuristically Accelerated Reinforcement Learning. We also expand this new class by proposing three new algorithms, the Heuristically Accelerated Q(λ), SARSA(λ) and TD(λ), the first algorithms that uses both heuristics and eligibility traces. Empirical evaluations were conducted in traditional control problems and results show that using heuristics significantly enhances the performance of the learning process.

IJCAI Conference 2011 Conference Paper

Using Cases as Heuristics in Reinforcement Learning: A Transfer Learning Application

  • Luiz A. Celiberto Jr.
  • Jackson P. Matsuura
  • Ramon Lopez de Mantaras
  • Reinaldo A. C. Bianchi

In this paper we propose to combine three AI techniques to speed up a Reinforcement Learning algorithm in a Transfer Learning problem: Case-based Reasoning, Heuristically Accelerated Reinforcement Learning and Neural Networks. To do so, we propose a new algorithm, called L3, which works in 3 stages: in the first stage, it uses Reinforcement Learning to learn how to perform one task, and stores the optimal policy for this problem as a case-base; in the second stage, it uses a Neural Network to map actions from one domain to actions in the other domain and; in the third stage, it uses the case-base learned in the first stage as heuristics to speed up the learning performance in a related, but different, task. The RL algorithm used in the first phase is the Q-learning and in the third phase is the recently proposed Case-based Heuristically Accelerated Q-learning. A set of empirical evaluations were conducted in transferring the learning between two domains, the Acrobot and the Robocup 3D: the policy learned during the solution of the Acrobot Problem is transferred and used to speed up the learning of stability policies for a humanoid robot in the Robocup 3D simulator. The results show that the use of this algorithm can lead to a significant improvement in the performance of the agent.

ECAI Conference 2010 Conference Paper

Case-Based Multiagent Reinforcement Learning: Cases as Heuristics for Selection of Actions

  • Reinaldo A. C. Bianchi
  • Ramón López de Mántaras

This work presents a new approach that allows the use of cases in a case base as heuristics to speed up Multiagent Reinforcement Learning algorithms, combining Case-Based Reasoning (CBR) and Multiagent Reinforcement Learning (MRL) techniques. This approach, called Case-Based Heuristically Accelerated Multiagent Reinforcement Learning (CB-HAMRL), builds upon an emerging technique, Heuristic Accelerated Reinforcement Learning (HARL), in which RL methods are accelerated by making use of heuristic information. CB-HAMRL is a subset of MRL that makes use of a heuristic function [Hscr ] derived from a case base, in a Case-Based Reasoning manner. An algorithm that incorporates CBR techniques into the Heuristically Accelerated Minimax–Q is also proposed and a set of empirical evaluations were conducted in a simulator for the Littman's robot soccer domain, comparing the three solutions for this problem: MRL, HAMRL and CB-HAMRL. Experimental results show that using CB-HAMRL, the agents learn faster than using RL or HAMRL methods.

ECAI Conference 2008 Conference Paper

Learning to Select Object Recognition Methods for Autonomous Mobile Robots

  • Reinaldo A. C. Bianchi
  • Arnau Ramisa
  • Ramón López de Mántaras

Selecting which algorithms should be used by a mobile robot computer vision system is a decision that is usually made a priori by the system developer, based on past experience and intuition, not systematically taking into account information that can be found in the images and in the visual process itself to learn which algorithm should be used, in execution time. This paper presents a method that uses Reinforcement Learning to decide which algorithm should be used to recognize objects seen by a mobile robot in an indoor environment, based on simple attributes extracted on-line from the images, such as mean intensity and intensity deviation. Two state-of-the-art object recognition algorithms can be selected: the constellation method proposed by Lowe together with its interest point detector and descriptor, the Scale-Invariant Feature Transform and a bag of features approach. A set of empirical evaluations was conducted using a household mobile robots image database, and results obtained shows that the approach adopted here is very promising.

IJCAI Conference 2007 Conference Paper

  • Reinaldo A. C. Bianchi
  • Carlos H. C. Ribeiro
  • Anna H. R. Costa

This work presents a new algorithm, called Heuristically Accelerated Minimax-Q (HAMMQ), that allows the use of heuristics to speed up the well-known Multiagent Reinforcement Learning algorithm Minimax-Q. A heuristic function that influences the choice of the actions characterises the HAMMQ algorithm. This function is associated with a preference policy that indicates that a certain action must be taken instead of another. A set of empirical evaluations were conducted for the proposed algorithm in a simplified simulator for the robot soccer domain, and experimental results show that even very simple heuristics enhances significantly the performance of the multiagent reinforcement learning algorithm.