Arrow Research search

Author name cluster

Brahim Chaib-draa

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

48 papers
2 author rows

Possible papers

48

IJCAI Conference 2022 Conference Paper

Continual Semantic Segmentation Leveraging Image-level Labels and Rehearsal

  • Mathieu Pagé Fortin
  • Brahim Chaib-draa

Despite the remarkable progress of deep learning models for semantic segmentation, the success of these models is strongly limited by the following aspects: 1) large datasets with pixel-level annotations must be available and 2) training must be performed with all classes simultaneously. Indeed, in incremental learning scenarios, where new classes are added to an existing framework, these models are prone to catastrophic forgetting of previous classes. To address these two limitations, we propose a weakly-supervised mechanism for continual semantic segmentation that can leverage cheap image-level annotations and a novel rehearsal strategy that intertwines the learning of past and new classes. Specifically, we explore two rehearsal technique variants: 1) imprinting past objects on new images and 2) transferring past representations in intermediate features maps. We conduct extensive experiments on Pascal-VOC by varying the proportion of fully- and weakly-supervised data in various setups and show that our contributions consistently improve the mIoU on both past and novel classes. Interestingly, we also observe that models trained with less data in incremental steps sometimes outperform the same architectures trained with more data. We discuss the significance of these results and propose some hypotheses regarding the dynamics between forgetting and learning.

AAAI Conference 2021 Conference Paper

Multi-task Learning by Leveraging the Semantic Information

  • Fan Zhou
  • Brahim Chaib-draa
  • Boyu Wang

One crucial objective of multi-task learning is to align distributions across tasks so that the information between them can be transferred and shared. However, existing approaches only focused on matching the marginal feature distribution while ignoring the semantic information, which may hinder the learning performance. To address this issue, we propose to leverage the label information in multi-task learning by exploring the semantic conditional relations among tasks. We first theoretically analyze the generalization bound of multi-task learning based on the notion of Jensen-Shannon divergence, which provides new insights into the value of label information in multi-task learning. Our analysis also leads to a concrete algorithm that jointly matches the semantic distribution and controls label distribution divergence. To confirm the effectiveness of the proposed method, we first compare the algorithm with several baselines on some benchmarks and then test the algorithms under label space shift conditions. Empirical results demonstrate that the proposed method could outperform most baselines and achieve state-of-the-art performance, particularly showing the benefits under the label shift conditions.

IROS Conference 2019 Conference Paper

GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness Classifier

  • Alexandre Gariépy
  • Jean-Christophe Ruel
  • Brahim Chaib-draa
  • Philippe Giguère

Grasping is a fundamental robotic task needed for the deployment of household robots or furthering warehouse automation. However, few approaches are able to perform grasp detection in real time (frame rate). To this effect, we present Grasp Quality Spatial Transformer Network (GQ-STN), a one-shot grasp detection network. Being based on the Spatial Transformer Network (STN), it produces not only a grasp configuration, but also directly outputs a depth image centered at this configuration. By connecting our architecture to an externally-trained grasp robustness evaluation network, we can train efficiently to satisfy a robustness metric via the backpropagation of the gradient emanating from the evaluation network. This removes the difficulty of training detection networks on sparsely annotated databases, a common issue in grasping. We further propose to use this robustness classifier to compare approaches, being more reliable than the traditional rectangle metric. Our GQ-STN is able to detect robust grasps on the depth images of the Dex-Net 2. 0 dataset with 92. 4 % accuracy in a single pass of the network. We finally demonstrate in a physical benchmark that our method can propose robust grasps more often than previous sampling-based methods, while being more than 60 times faster.

IJCAI Conference 2018 Conference Paper

Generative Adversarial Positive-Unlabelled Learning

  • Ming Hou
  • Brahim Chaib-draa
  • Chao Li
  • Qibin Zhao

In this work, we consider the task of classifying binary positive-unlabeled (PU) data. The existing discriminative learning based PU models attempt to seek an optimal reweighting strategy for U data, so that a decent decision boundary can be found. However, given limited P data, the conventional PU models tend to suffer from overfitting when adapted to very flexible deep neural networks. In contrast, we are the first to innovate a totally new paradigm to attack the binary PU task, from perspective of generative learning by leveraging the powerful generative adversarial networks (GAN). Our generative positive-unlabeled (GenPU) framework incorporates an array of discriminators and generators that are endowed with different roles in simultaneously producing positive and negative realistic samples. We provide theoretical analysis to justify that, at equilibrium, GenPU is capable of recovering both positive and negative data distributions. Moreover, we show GenPU is generalizable and closely related to the semi-supervised classification. Given rather limited P data, experiments on both synthetic and real-world dataset demonstrate the effectiveness of our proposed framework. With infinite realistic and diverse sample streams generated from GenPU, a very flexible classifier can then be trained using deep neural networks.

IJCAI Conference 2017 Conference Paper

Fast Recursive Low-rank Tensor Learning for Regression

  • Ming Hou
  • Brahim Chaib-draa

In this work, we develop a fast sequential low-rank tensor regression framework, namely recursive higher-order partial least squares (RHOPLS). It addresses the great challenges posed by the limited storage space and fast processing time required by dynamic environments when dealing with large-scale high-speed general tensor sequences. Smartly integrating a low-rank modification strategy of Tucker into a PLS-based framework, we efficiently update the regression coefficients by effectively merging the new data into the previous low-rank approximation of the model at a small-scale factor (feature) level instead of the large raw data (observation) level. Unlike batch models, which require accessing the entire data, RHOPLS conducts a blockwise recursive calculation scheme and thus only a small set of factors is needed to be stored. Our approach is orders of magnitude faster than all other methods while maintaining a highly comparable predictability with the cutting-edge batch methods, as verified on challenging real-life tasks.

AAAI Conference 2016 Conference Paper

Common and Discriminative Subspace Kernel-Based Multiblock Tensor Partial Least Squares Regression

  • Ming Hou
  • Qibin Zhao
  • Brahim Chaib-draa
  • Andrzej Cichocki

In this work, we introduce a new generalized nonlinear tensor regression framework called kernel-based multiblock tensor partial least squares (KMTPLS) for predicting a set of dependent tensor blocks from a set of independent tensor blocks through the extraction of a small number of common and discriminative latent components. By considering both common and discriminative features, KMTPLS effectively fuses the information from multiple tensorial data sources and unifies the single and multiblock tensor regression scenarios into one general model. Moreover, in contrast to multilinear model, KMTPLS successfully addresses the nonlinear dependencies between multiple response and predictor tensor blocks by combining kernel machines with joint Tucker decomposition, resulting in a significant performance gain in terms of predictability. An efficient learning algorithm for KMTPLS based on sequentially extracting common and discriminative latent vectors is also presented. Finally, to show the effectiveness and advantages of our approach, we test it on the real-life regression task in computer vision, i. e. , reconstruction of human pose from multiview video sequences.

IROS Conference 2015 Conference Paper

Learning terrain types with the Pitman-Yor process mixtures of Gaussians for a legged robot

  • Patrick Dallaire
  • Krzysztof Walas
  • Philippe Giguère
  • Brahim Chaib-draa

One of the major goals for mobile robots is to be able to traverse any kind of terrains. A possible way to achieve this goal is by the use of legged robots, as they have increased mobility. However, this would require them to be able to modify their gaits, based on the identification of the terrain that they are currently traversing. In this paper, we introduce a number of novel methods to address this issue of autonomous terrain classification and clustering, based on tactile data collected with a walking robot. The proposed learning methods are based on the Pitman-Yor process mixture of Gaussians, a Bayesian nonparametric prior, well-suited for density estimation. This model is initially used to learn the non-Gaussian distribution of the features produced from proprioceptive (force/torque) signals from the legs, registered during the interaction of one robot foot with a terrain. Then, we exploit its capacity on clustering and discovering structures in the data to identify terrains in the feature space. Experiments were conducted on a six-legged robot, thus demonstrating the applicability of the Pitman-Yor process mixture of Gaussians for terrain identification. In particular, we obtained a classification success rate of 82% and 51% accuracy, with our supervised learning and unsupervised learning approach respectively.

UAI Conference 2014 Conference Paper

Bayesian Filtering with Online Gaussian Process Latent Variable Models

  • Yali Wang
  • Marcus A. Brubaker
  • Brahim Chaib-draa
  • Raquel Urtasun

In this paper we present a novel non-parametric approach to Bayesian filtering, where the prediction and observation models are learned in an online fashion. Our approach is able to handle multimodal distributions over both models by employing a mixture model representation with Gaussian Processes (GP) based components. To cope with the increasing complexity of the estimation process, we explore two computationally efficient GP variants, sparse online GP and local GP, which help to manage computation requirements for each mixture component. Our experiments demonstrate that our approach can track human motion much more accurately than existing approaches that learn the prediction and observation models offline and do not update these models with the incoming data stream.

AAAI Conference 2014 Conference Paper

Learning the Structure of Probabilistic Graphical Models with an Extended Cascading Indian Buffet Process

  • Patrick Dallaire
  • Philippe Giguère
  • Brahim Chaib-draa

This paper presents an extension of the cascading Indian buffet process (CIBP) intended to learning arbitrary directed acyclic graph structures as opposed to the CIBP, which is limited to purely layered structures. The extended cascading Indian buffet process (eCIBP) essentially consists in adding an extra sampling step to the CIBP to generate connections between non-consecutive layers. In the context of graphical model structure learning, the proposed approach allows learning structures having an unbounded number of hidden random variables and automatically selecting the model complexity. We evaluated the extended process on multivariate density estimation and structure identification tasks by measuring the structure complexity and predictive performance. The results suggest the extension leads to extracting simpler graphs without scarifying predictive precision.

IJCAI Conference 2013 Conference Paper

A KNN Based Kalman Filter Gaussian Process Regression

  • Yali Wang
  • Brahim Chaib-draa

The standard Gaussian process (GP) regression is often intractable when a data set is large or spatially nonstationary. In this paper, we address these challenging data properties by designing a novel K nearest neighbor based Kalman filter Gaussian process (KNN-KFGP) regression. Based on a state space model established by the KNN driven data grouping, our KNN-KFGP recursively filters out the latent function values in a computationally ef- ficient and accurate Kalman filtering framework. Moreover, KNN allows each test point to find its strongly correlated local training subset, so our KNN-KFGP provides a suitable way to deal with spatial nonstationary problems. We evaluate the performance of our KNN-KFGP on several synthetic and real data sets to show its validity.

KER Journal 2013 Journal Article

Repeated games for multiagent systems: a survey

  • Andriy Burkov
  • Brahim Chaib-draa

Abstract Repeated games are an important mathematical formalism to model and study long-term economic interactions between multiple self-interested parties (individuals or groups of individuals). They open attractive perspectives in modeling long-term multiagent interactions. This overview paper discusses the most important results that actually exist for repeated games. These results arise from both economics and computer science. Contrary to a number of existing surveys of repeated games, most of which originated from the economic research community, we are first to pay a special attention to a number of important distinctive features proper to artificial agents. More precisely, artificial agents, as opposed to the human agents mainly aimed by the economic research, are usually bounded whether in terms of memory or performance. Therefore, their decisions have to be based on the strategies defined using finite representations. Furthermore, these strategies have to be efficiently computed or approximated using a limited computational resource usually available to artificial agents.

NeurIPS Conference 2012 Conference Paper

A Marginalized Particle Gaussian Process Regression

  • Yali Wang
  • Brahim Chaib-draa

We present a novel marginalized particle Gaussian process (MPGP) regression, which provides a fast, accurate online Bayesian filtering framework to model the latent function. Using a state space model established by the data construction procedure, our MPGP recursively filters out the estimation of hidden function values by a Gaussian mixture. Meanwhile, it provides a new online method for training hyperparameters with a number of weighted particles. We demonstrate the estimated performance of our MPGP on both simulated and real large data sets. The results show that our MPGP is a robust estimation algorithm with high computational efficiency, which outperforms other state-of-art sparse GP methods.

ICRA Conference 2012 Conference Paper

An adaptive nonparametric particle filter for state estimation

  • Yali Wang
  • Brahim Chaib-draa

Particle filter is one of the most widely applied stochastic sampling tools for state estimation problems in practice. However, the proposal distribution in the traditional particle filter is the transition probability based on state equation, which would heavily affect estimation performance in that the samples are blindly drawn without considering the current observation information. Additionally, the fixed particle number in the typical particle filter would lead to wasteful computation, especially when the posterior distribution greatly varies over time. In this paper, an advanced adaptive nonparametric particle filter is proposed by incorporating gaussian process based proposal distribution into KLD-Sampling particle filter framework so that the high-qualified particles with adaptively KLD based quantity are drawn from the learned proposal with observation information at each time step to improve the approximation accuracy and efficiency. Our state estimation experiments on univariate nonstationary growth model and two-link robot arm show that the adaptive nonparametric particle filter outperforms the existing approaches with smaller size of particles.

JMLR Journal 2011 Journal Article

A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

  • Stéphane Ross
  • Joelle Pineau
  • Brahim Chaib-draa
  • Pierre Kreitmann

Bayesian learning methods have recently been shown to provide an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). The primary focus of this paper is to extend these ideas to the case of partially observable domains, by introducing the Bayes-Adaptive Partially Observable Markov Decision Processes. This new framework can be used to simultaneously (1) learn a model of the POMDP domain through interaction with the environment, (2) track the state of the system under partial observability, and (3) plan (near-)optimal sequences of actions. An important contribution of this paper is to provide theoretical results showing how the model can be finitely approximated while preserving good learning performance. We present approximate algorithms for belief tracking and planning in this model, as well as empirical results that illustrate how the model estimate and agent's return improve as a function of experience. [abs] [ pdf ][ bib ] &copy JMLR 2011. ( edit, beta )

AAMAS Conference 2011 Conference Paper

Toward Error-Bounded Algorithms for Infinite-Horizon DEC-POMDPs

  • Jilles S. Dibangoye
  • Abdel-Illah Mouaddib
  • Brahim Chaib-draa

Over the past few years, attempts to scale up infinite-horizon DEC-POMDPs are mainly due to approximate algorithms, but without the theoretical guarantees of their exact counterparts. In contrast, ε -optimal methods have only theoretical significance but are not efficient in practice. In this paper, we introduce an algorithmic framework ( β -PI) that exploits the scalability of the former while preserving the theoretical properties of the latter. We build upon β -PI a family of approximate algorithms that can find (provably) errorbounded solutions in reasonable time. Among this family, H-PI uses a branch-and-bound search method that computes a near-optimal solution over distributions over histories experienced by the agents. These distributions often lie near structured, low-dimensional subspace embedded in the high-dimensional sufficient statistic. By planning only on this subspace, H-PI successfully solves all tested benchmarks, outperforming standard algorithms, both in solution time and policy quality.

AAAI Conference 2010 Conference Paper

An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games

  • Andriy Burkov
  • Brahim Chaib-draa

This paper presents a technique for approximating, up to any precision, the set of subgame-perfect equilibria (SPE) in repeated games with discounting. The process starts with a single hypercube approximation of the set of SPE payoff profiles. Then the initial hypercube is gradually partitioned on to a set of smaller adjacent hypercubes, while those hypercubes that cannot contain any SPE point are gradually withdrawn. Whether a given hypercube can contain an equilibrium point is verified by an appropriate mixed integer program. A special attention is paid to the question of extracting players’ strategies and their representability in form of finite automata.

ICRA Conference 2010 Conference Paper

Apprenticeship learning via soft local homomorphisms

  • Abdeslam Boularias
  • Brahim Chaib-draa

We consider the problem of apprenticeship learning when the expert's demonstration covers only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient solution to this problem based on the assumption that the expert is optimally acting in a Markov Decision Process (MDP). However, past work on IRL requires an accurate estimate of the frequency of encountering each feature of the states when the robot follows the expert's policy. Given that the complete policy of the expert is unknown, the features frequencies can only be empirically estimated from the demonstrated trajectories. In this paper, we propose to use a transfer method, known as soft homomorphism, in order to generalize the expert's policy to unvisited regions of the state space. The generalized policy can be used either as the robot's final policy, or to calculate the features frequencies within an IRL algorithm. Empirical results show that our approach is able to learn good policies from a small number of demonstrations.

NeurIPS Conference 2010 Conference Paper

Bootstrapping Apprenticeship Learning

  • Abdeslam Boularias
  • Brahim Chaib-draa

We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing a utility function that is a linear combination of state-action features. Most IRL algorithms use a simple Monte Carlo estimation to approximate the expected feature counts under the expert's policy. In this paper, we show that the quality of the learned policies is highly sensitive to the error in estimating the feature counts. To reduce this error, we introduce a novel approach for bootstrapping the demonstration by assuming that: (i), the expert is (near-)optimal, and (ii), the dynamics of the system is known. Empirical results on gridworlds and car racing problems show that our approach is able to learn good policies from a small number of demonstrations.

AAMAS Conference 2010 Conference Paper

Quasi Deterministic POMDPs and DecPOMDPs

  • Camille Besse
  • Brahim Chaib-draa

In this paper, we study a particular subclass of partially observablemodels, called quasi-deterministic partially observable Markov decision processes (Q{\sc DET-POMDP}s), characterized by deterministictransitions and stochastic observations. While this framework doesnot model the same general problems as POMDPs, it still captures anumber of interesting and challenging problems and have, in somecases, interesting properties. By studying the observability available in this subclass, we suggest that Q{\sc DET-POMDP}s may fall manysteps in the complexity hierarchy. An extension of this frameworkto the decentralized case also reveals a subclass of numerous problems that can be approximated in polynomial space.

ICRA Conference 2010 Conference Paper

Solving the continuous time multiagent patrol problem

  • Jean-Samuel Marier
  • Camille Besse
  • Brahim Chaib-draa

This paper compares two algorithms to solve a multiagent patrol problem with uncertain durations. The first algorithm is reactive and allows adaptive and robust behavior, while the second one uses planning to maximize longterm information retrieval. Experiments suggest that on the considered instances, using a reactive and local coordination algorithm performs almost as well as planning for long-term, while using much less computation time.

IJCAI Conference 2009 Conference Paper

  • Jilles Steeve Dibangoye
  • Guy Shani
  • Brahim Chaib-draa
  • Abdel-Illah Mouaddib

Over the past few years, point-based POMDP solvers scaled up to produce approximate solutions to mid-sized domains. However, to solve real world problems, solvers must exploit the structure of the domain. In this paper we focus on the topological structure of the problem, where the state space contains layers of states. We present here the Topological Order Planner (TOP) that utilizes the topological structure of the domain to compute belief space trajectories. TOP rapidly produces trajectories focused on the solveable regions of the belief space, thus reducing the number of redundant backups considerably. We demonstrate TOP to produce good quality policies faster than any other pointbased algorithm on domains with sufficient structure.

IROS Conference 2009 Conference Paper

Bayesian reinforcement learning in continuous POMDPs with gaussian processes

  • Patrick Dallaire
  • Camille Besse
  • Stéphane Ross
  • Brahim Chaib-draa

Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical model to handle real-world sequential decision processes but require a known model to be solved by most approaches. However, mainstream POMDP research focuses on the discrete case and this complicates its application to most realistic problems that are naturally modeled using continuous state spaces. In this paper, we consider the problem of optimal control in continuous and partially observable environments when the parameters of the model are unknown. We advocate the use of Gaussian Process Dynamical Models (GPDMs) so that we can learn the model through experience with the environment. Our results on the blimp problem show that the approach can learn good models of the sensors and actuators in order to maximize long-term rewards.

ICAART Conference 2009 Conference Paper

Learning User Intentions in Spoken Dialogue Systems

  • Hamid R. Chinaei
  • Brahim Chaib-draa
  • Luc Lamontagne

A common problem in spoken dialogue systems is finding the intention of the user. This problem deals with obtaining one or several topics for each transcribed, possibly noisy, sentence of the user. In this work, we apply the recent unsupervised learning method, Hidden Topic Markov Models (HTMM), for finding the intention of the user in dialogues. This technique combines two methods of Latent Dirichlet Allocation (LDA) and Hidden Markov Model (HMM) in order to learn topics of documents. We show that HTMM can be also used for obtaining intentions for the noisy transcribed sentences of the user in spoken dialogue systems. We argue that in this way we can learn possible states in a speech domain which can be used in the design stage of its spoken dialogue system. Furthermore, we discuss that the learned model can be augmented and used in a POMDP (Partially Observable Markov Decision Process) dialogue manager of the spoken dialogue system.

ICRA Conference 2008 Conference Paper

Bayesian reinforcement learning in continuous POMDPs with application to robot navigation

  • Stéphane Ross
  • Brahim Chaib-draa
  • Joelle Pineau

We consider the problem of optimal control in continuous and partially observable environments when the parameters of the model are not known exactly. Partially observable Markov decision processes (POMDPs) provide a rich mathematical model to handle such environments but require a known model to be solved by most approaches. This is a limitation in practice as the exact model parameters are often difficult to specify exactly. We adopt a Bayesian approach where a posterior distribution over the model parameters is maintained and updated through experience with the environment. We propose a particle filter algorithm to maintain the posterior distribution and an online planning algorithm, based on trajectory sampling, to plan the best action to perform under the current posterior. The resulting approach selects control actions which optimally trade-off between 1) exploring the environment to learn the model, 2) identifying the system's state, and 3) exploiting its knowledge in order to maximize long-term rewards. Our preliminary results on a simulated robot navigation problem show that our approach is able to learn good models of the sensors and actuators, and performs as well as if it had the true model.

ICAPS Conference 2008 Conference Paper

Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression

  • Abdeslam Boularias
  • Brahim Chaib-draa

High dimensionality of belief space in DEC-POMDPs is one of the major causes that makes the optimal joint policy computation intractable. The belief state for a given agent is a probability distribution over the system states and the policies of other agents. Belief compression is an efficient POMDP approach that speeds up planning algorithms by projecting the belief state space to a low-dimensional one. In this paper, we introduce a new method for solving DEC-POMDP problems, based on the compression of the policy belief space. The reduced policy space contains sequences of actions and observations that are linearly independent. We tested our approach on two benchmark problems, and the preliminary results confirm that Dynamic Programming algorithm scales up better when the policy belief is compressed.

IJCAI Conference 2007 Conference Paper

  • St
  • eacute; phane Ross
  • Brahim Chaib-draa

Solving large Partially Observable Markov Decision Processes (POMDPs) is a complex task which is often intractable. A lot of effort has been made to develop approximate offline algorithms to solve ever larger POMDPs. However, even state-of-the-art approaches fail to solve large POMDPs in reasonable time. Recent developments in online POMDP search suggest that combining offline computations with online computations is often more efficient and can also considerably reduce the error made by approximate policies computed offline. In the same vein, we propose a new anytime online search algorithm which seeks to minimize, as efficiently as possible, the error made by an approximate value function computed offline. In addition, we show how previous online computations can be reused in following time steps in order to prevent redundant computations. Our preliminary results indicate that our approach is able to tackle large state space and observation space efficiently and under real-time constraints.

AAMAS Conference 2007 Conference Paper

A Q-decomposition and Bounded RTDP Approach to Resource Allocation

  • Pierrick Plamondon
  • Brahim Chaib-draa
  • Abder Rezak Benaskeur

This paper contributes to solve effectively stochastic resource allocation problems known to be NP-Complete. To address this complex resource management problem, a Q- decomposition approach is proposed when the resources which are already shared among the agents, but the actions made by an agent may influence the reward obtained by at least another agent. The Q-decomposition allows to coordinate these reward separated agents and thus permits to reduce the set of states and actions to consider. On the other hand, when the resources are available to all agents, no Q- decomposition is possible and we use heuristic search. In particular, the bounded Real-time Dynamic Programming (bounded rtdp) is used. Bounded rtdp concentrates the planning on significant states only and prunes the action space. The pruning is accomplished by proposing tight upper and lower bounds on the value function.

ICRA Conference 2007 Conference Paper

Adaptive Play Q-Learning with Initial Heuristic Approximation

  • Andriy Burkov
  • Brahim Chaib-draa

The problem of an effective coordination of multiple autonomous robots is one of the most important tasks of the modern robotics. In turn, it is well known that the learning to coordinate multiple autonomous agents in a multiagent system is one of the most complex challenges of the state-of-the-art intelligent system design. Principally, this is because of the exponential growth of the environment's dimensionality with the number of learning agents. This challenge is known as "curse of dimensionality", and relates to the fact that the dimensionality of the multiagent coordination problem is exponential in the number of learning agents, because each state of the system is a joint state of all agents and each action is a joint action composed of actions of each agent. In this paper, we address this problem for the restricted class of environments known as goal-directed stochastic games with action-penalty representation. We use a single-agent problem solution as a heuristic approximation of the agents' initial preferences and, by so doing, we restrict to a great extent the space of multiagent learning. We show theoretically the correctness of such an initialization, and the results of experiments in a well-known two-robot grid world problem show that there is a significant reduction of complexity of the learning process.

NeurIPS Conference 2007 Conference Paper

Bayes-Adaptive POMDPs

  • Stephane Ross
  • Brahim Chaib-draa
  • Joelle Pineau

Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforce- ment learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). Our goal is to extend these ideas to the more general Partially Observable MDP (POMDP) framework, where the state is a hidden variable. To address this problem, we in- troduce a new mathematical model, the Bayes-Adaptive POMDP. This new model allows us to (1) improve knowledge of the POMDP domain through interaction with the environment, and (2) plan optimal sequences of actions which can trade- off between improving the model, identifying the state, and gathering reward. We show how the model can be finitely approximated while preserving the value func- tion. We describe approximations for belief tracking and planning in this model. Empirical results on two domains show that the model estimate and agent’s return improve over time, as the agent learns better model estimates.

AAMAS Conference 2007 Conference Paper

Multiagent Learning in Adaptive Dynamic Systems

  • Andriy Burkov
  • Brahim Chaib-draa

Classically, an approach to the multiagent policy learning supposed that the agents, via interactions and/or by using preliminary knowledge about the reward functions of all players, would find an interdependent solution called "equilibrium". Recently, however, certain researchers question the necessity and the validity of the concept of equilibrium as the most important multiagent solution concept. They argue that a "good" learning algorithm is one that is efficient with respect to a certain class of counterparts. Adaptive players is an important class of agents that learn their policies separately from the maintenance of the beliefs about their counterparts' future actions and make their decisions based on that policy and the current belief. In this paper, we propose an efficient learning algorithm in presence of the adaptive counterparts called Adaptive Dynamics Learner (ADL), which is able to learn an efficient policy over the opponents' adaptive dynamics rather than over the simple actions and beliefs and, by so doing, to exploit these dynamics to obtain a higher utility than any equilibrium strategy can provide. We tested our algorithm on a substantial representative set of the most known and demonstrative matrix games and observed that ADL agent is highly efficient against Adaptive Play Q -learning (APQ) agent and Infinitesimal Gradient Ascent (IGA) agent. In self-play, when possible, ADL is able to converge to a Pareto optimal strategy maximizing the welfare of all players.

AAMAS Conference 2007 Conference Paper

Periodic Real-Time Resource Allocation for Teams of Progressive Processing Agents

  • Gilles Steeve Dibangoye
  • Abdel-Illah Mouaddib
  • Brahim Chaib-draa

In this paper, we focus on the problem of finding a periodic allocation strategy for teams of resource-bounded agents. We propose a real-time dynamic algorithm RTDA*, that exploits two key properties to avoid the exponential increase in the state and action spaces associated with multi-agent systems. First, resource allocation at each time period follows an earliest deadline first order (EDF) over agents. Second, the resources are undivided, i. e. , the resources allocated to an agent restrict their availability to others over time. We can therefore view each incoming agent as a cyclic individual resource-bounded processing, namely " cyclic progressive reasoning unit " (C-PRU), and solve, off-line, the single agent resource allocation problem. In the on-line phase, our algorithm exploits pre-compiled policies, as heuristic metrics, to build near-optimal joint decisions at each time period.

AAMAS Conference 2007 Conference Paper

Reducing the Complexity of Multiagent Reinforcement Learning

  • Andriy Burkov
  • Brahim Chaib-draa

It is known that the complexity of the reinforcement learning algorithms, such as Q -learning, may be exponential in the number of environment's states. It was shown, however, that the learning complexity for the goal-directed problems may be substantially reduced by initializing the Q -values with a "good" approximative function. In the multiagent case, there exists such a good approximation for a big class of problems, namely, for goal-directed stochastic games. These games, for example, can reflect coordination and common interest problems of cooperative robotics. The approximative function for these games is nothing but the relaxed, singleagent, problem solution, which can easily be found by each agent individually. In this article, we show that (1) an optimal single-agent solution is a "good approximation for the goal-directed stochastic games with action-penalty representation and (b) the complexity is reduced when the learning is initialized with this approximative function, as compared to the uninformed case.

NeurIPS Conference 2007 Conference Paper

Theoretical Analysis of Heuristic Search Methods for Online POMDPs

  • Stephane Ross
  • Joelle Pineau
  • Brahim Chaib-draa

Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the theoretical properties of the former, and exploiting the scalability of the latter. In this paper, we provide theoretical guarantees on an anytime algorithm for POMDPs which aims to reduce the error made by approximate offline value iteration algorithms through the use of an efficient online searching procedure. The algorithm uses search heuristics based on an error analysis of lookahead search, to guide the online search towards reachable beliefs with the most potential to reduce error. We provide a general theorem showing that these search heuristics are admissible, and lead to complete and epsilon-optimal algorithms. This is, to the best of our knowledge, the strongest theoretical result available for online POMDP solution methods. We also provide empirical evidence showing that our approach is also practical, and can find (provably) near-optimal solutions in reasonable time.

JAAMAS Journal 2006 Journal Article

Conversational semantics sustained by commitments

  • Roberto A. Flores
  • Philippe Pasquier
  • Brahim Chaib-draa

Abstract We propose an operational model that combines message meaning and conversational structure in one comprehensive approach. Our long-term research goal is to lay down principles uniting message meaning and conversational structure while providing an operational foundation that could be implemented in open computer systems. In this paper we explore our advances in one aspect of meaning that in theories of language use is known as “signal meaning”, and propose a layered model in which the meaning of messages can be defined according to their fitness to advance the state of joint activities. Messages in our model are defined in terms of social commitments, which have been shown to entice conversational structure.

JAAMAS Journal 2006 Journal Article

DIAGAL: An Agent Communication Language Based on Dialogue Games and Sustained by Social Commitments

  • Brahim Chaib-draa
  • Marc-André Labrie
  • Philippe Pasquier

Abstract In recent years, social commitment based approaches have been proposed to solve problems issuing from previous mentalistic based semantics for agent communication languages. This paper follows the same line of thought since it presents the latest version of our dialogue game based agent communication language – DIAlogue-Game based Agent Language (DIAGAL) – which allows agents to manipulate the public layer of social commitments through dialogue, by creating, canceling and updating their social commitments. To make apparent such commitments, we consider here Agent Communication Language (ACL) from the dialectic point of view, where agents “play a game” based on commitments. Such games based on commitments are incorporated in the DIAGAL language, which has been developed having in mind the following questions: (a) What kind of structure does the game have? How are rules specified within the game? (b) What kind of games compositions are allowed? (c) How do participants in conversations reach agreement on the current game? How are games opened or closed? Using such games we show how we can study the commitments dynamic to model agent dialogue and we present metrics that can be used to evaluate the quality of a dialogue between agents. Next, we use an example (summer festival organization) to show how DIAGAL can be used in analyzing and modeling automated conversations in offices. Finally, we present the results and analysis of the summer festival simulations that we realized through our dialogue game simulator (DGS).