Author name cluster

Barbara Caputo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers

2 author rows

ICML Conference 2025 Conference Paper

Interaction-Aware Gaussian Weighting for Clustered Federated Learning

Alessandro Licciardi
Davide Leo
Eros Fanì
Barbara Caputo
Marco Ciccone

Federated Learning (FL) emerged as a decentralized paradigm to train models while preserving privacy. However, conventional FL struggles with data heterogeneity and class imbalance, which degrade model performance. Clustered FL balances personalization and decentralized training by grouping clients with analogous data distributions, enabling improved accuracy while adhering to privacy constraints. This approach effectively mitigates the adverse impact of heterogeneity in FL. In this work, we propose a novel clustering method for FL, FedGWC (Federated Gaussian Weighting Clustering), which groups clients based on their data distribution, allowing training of a more robust and personalized model on the identified clusters. FedGWC identifies homogeneous clusters by transforming individual empirical losses to model client interactions with a Gaussian reward mechanism. Additionally, we introduce the Wasserstein Adjusted Score, a new clustering metric for FL to evaluate cluster cohesion with respect to the individual class distribution. Our experiments on benchmark datasets show that FedGWC outperforms existing FL algorithms in cluster quality and classification accuracy, validating the efficacy of our approach.

ICML Conference 2024 Conference Paper

Accelerating Heterogeneous Federated Learning with Closed-form Classifiers

Eros Fanì
Raffaello Camoriano
Barbara Caputo
Marco Ciccone

Federated Learning (FL) methods often struggle in highly statistically heterogeneous settings. Indeed, non-IID data distributions cause client drift and biased local solutions, particularly pronounced in the final classification layer, negatively impacting convergence speed and accuracy. To address this issue, we introduce Federated Recursive Ridge Regression (Fed3R). Our method fits a Ridge Regression classifier computed in closed form leveraging pre-trained features. Fed3R is immune to statistical heterogeneity and is invariant to the sampling order of the clients. Therefore, it proves particularly effective in cross-device scenarios. Furthermore, it is fast and efficient in terms of communication and computation costs, requiring up to two orders of magnitude fewer resources than the competitors. Finally, we propose to leverage the Fed3R parameters as an initialization for a softmax classifier and subsequently fine-tune the model using any FL algorithm (Fed3R with Fine-Tuning, Fed3R+FT). Our findings also indicate that maintaining a fixed classifier aids in stabilizing the training and learning more discriminative features in cross-device settings. Official website: https: //fed-3r. github. io/.

IROS Conference 2022 Conference Paper

FedDrive: Generalizing Federated Learning to Semantic Segmentation in Autonomous Driving

Lidia Fantauzzo
Eros Fanì
Debora Caldarola
Antonio Tavera
Fabio Cermelli
Marco Ciccone
Barbara Caputo

Semantic Segmentation is essential to make self-driving vehicles autonomous, enabling them to understand their surroundings by assigning individual pixels to known categories. However, it operates on sensible data collected from the users' cars; thus, protecting the clients' privacy becomes a primary concern. For similar reasons, Federated Learning has been recently introduced as a new machine learning paradigm aiming to learn a global model while preserving privacy and leveraging data on millions of remote devices. Despite several efforts on this topic, no work has explicitly addressed the challenges of federated learning in semantic segmentation for driving so far. To fill this gap, we propose FedDrive, a new benchmark consisting of three settings and two datasets, incorporating the real-world challenges of statistical heterogeneity and domain generalization. We benchmark state-of-the-art algorithms from the federated learning literature through an in-depth analysis, combining them with style transfer methods to improve their generalization ability. We demonstrate that correctly handling normalization statistics is crucial to deal with the aforementioned challenges. Furthermore, style transfer improves performance when dealing with significant appearance shifts. Official website: https://feddrive.github.io.

ICRA Conference 2019 Conference Paper

Knowledge is Never Enough: Towards Web Aided Deep Open World Recognition

Massimiliano Mancini
Hakan Karaoguz
Elisa Ricci 0001
Patric Jensfelt
Barbara Caputo

While today's robots are able to perform sophisticated tasks, they can only act on objects they have been trained to recognize. This is a severe limitation: any robot will inevitably see new objects in unconstrained settings, and thus will always have visual knowledge gaps. However, standard visual modules are usually built on a limited set of classes and are based on the strong prior that an object must belong to one of those classes. Identifying whether an instance does not belong to the set of known categories (i. e. open set recognition), only partially tackles this problem, as a truly autonomous agent should be able not only to detect what it does not know, but also to extend dynamically its knowledge about the world. We contribute to this challenge with a deep learning architecture that can dynamically update its known classes in an end-to-end fashion. The proposed deep network, based on a deep extension of a non-parametric model, detects whether a perceived object belongs to the set of categories known by the system and learns it without the need to retrain the whole system from scratch. Annotated images about the new category can be provided by an `oracle' (i. e. human supervision), or by autonomous mining of the Web. Experiments on two different databases and on a robot platform demonstrate the promise of our approach.

IROS Conference 2019 Conference Paper

The RGB-D Triathlon: Towards Agile Visual Toolboxes for Robots

Fabio Cermelli
Massimiliano Mancini
Elisa Ricci 0001
Barbara Caputo

Deep networks have brought significant advances in robot perception, enabling to improve the capabilities of robots in several visual tasks, ranging from object detection and recognition to pose estimation, semantic scene segmentation and many others. Still, most approaches typically address visual tasks in isolation, resulting in overspecialized models which achieve strong performances in specific applications but work poorly in other (often related) tasks. This is clearly sub-optimal for a robot which is often required to perform simultaneously multiple visual recognition tasks in order to properly act and interact with the environment. This problem is exacerbated by the limited computational and memory resources typically available onboard to a robotic platform. The problem of learning flexible models which can handle multiple tasks in a lightweight manner has recently gained attention in the computer vision community and benchmarks supporting this research have been proposed. In this work we study this problem in the robot vision context, proposing a new benchmark, the RGB-D Triathlon, and evaluating state of the art algorithms in this novel challenging scenario. We also define a new evaluation protocol, better suited to the robot vision setting. Results shed light on the strengths and weaknesses of existing approaches and on open issues, suggesting directions for future research.

ICRA Conference 2018 Conference Paper

Adaptive Deep Learning Through Visual Domain Localization

Gabriele Angeletti
Barbara Caputo
Tatiana Tommasi

A commercial robot, trained by its manufacturer to recognize a predefined number and type of objects, might be used in many settings, that will in general differ in their illumination conditions, background, type and degree of clutter, and so on. Recent computer vision works tackle this generalization issue through domain adaptation methods, assuming as source the visual domain where the system is trained and as target the domain of deployment. All approaches assume to have access to images from all classes of the target during training, an unrealistic condition in robotics applications. We address this issue proposing an algorithm that takes into account the specific needs of robot vision. Our intuition is that the nature of the domain shift experienced mostly in robotics is local. We exploit this through the learning of maps that spatially ground the domain and quantify the degree of shift, embedded into an end-to-end deep domain adaptation architecture. By explicitly localizing the roots of the domain shift we significantly reduce the number of parameters of the architecture to tune, we gain the flexibility necessary to deal with subset of categories in the target domain at training time, and we provide a clear feedback on the rationale behind any classification decision, which can be exploited in human-robot interactions. Experiments on two different settings of the iCub World database confirm the suitability of our method for robot vision.

IROS Conference 2018 Conference Paper

Kitting in the Wild through Online Domain Adaptation

Massimiliano Mancini
Hakan Karaoguz
Elisa Ricci 0001
Patric Jensfelt
Barbara Caputo

Technological developments call for increasing perception and action capabilities of robots. Among other skills, vision systems that can adapt to any possible change in the working conditions are needed. Since these conditions are unpredictable, we need benchmarks which allow to assess the generalization and robustness capabilities of our visual recognition algorithms. In this work we focus on robotic kitting in unconstrained scenarios. As a first contribution, we present a new visual dataset for the kitting task. Differently from standard object recognition datasets, we provide images of the same objects acquired under various conditions where camera, illumination and background are changed. This novel dataset allows for testing the robustness of robot visual recognition algorithms to a series of different domain shifts both in isolation and unified. Our second contribution is a novel online adaptation algorithm for deep models, based on batch-normalization layers, which allows to continuously adapt a model to the current working conditions. Differently from standard domain adaptation algorithms, it does not require any image from the target domain at training time. We benchmark the performance of the algorithm on the proposed dataset, showing its capability to fill the gap between the performances of a standard architecture and its counterpart adapted offline to the given target domain.

ICRA Conference 2018 Conference Paper

Recognizing Objects in-the-Wild: Where do we Stand?

Mohammad Reza Loghmani
Barbara Caputo
Markus Vincze

The ability to recognize objects is an essential skill for a robotic system acting in human-populated environments. Despite decades of effort from the robotic and vision research communities, robots are still missing good visual perceptual systems, preventing the use of autonomous agents for realworld applications. The progress is slowed down by the lack of a testbed able to accurately represent the world perceived by the robot in-the-wild. In order to fill this gap, we introduce a large-scale, multi-view object dataset collected with an RGB-D camera mounted on a mobile robot. The dataset embeds the challenges faced by a robot in a real-life application and provides a useful tool for validating object recognition algorithms. Besides describing the characteristics of the dataset, the paper evaluates the performance of a collection of well-established deep convolutional networks on the new dataset and analyzes the transferability of deep representations from Web images to robotic data. Despite the promising results obtained with such representations, the experiments demonstrate that object classification with real-life robotic data is far from being solved. Finally, we provide a comparative study to analyze and highlight the open challenges in robot vision, explaining the discrepancies in the performance.

ICRA Conference 2017 Conference Paper

A deep representation for depth images from synthetic data

Fabio Maria Carlucci
Paolo Russo 0001
Barbara Caputo

Convolutional Neural Networks (CNNs) trained on large scale RGB databases have become the secret sauce in the majority of recent approaches for object categorization from RGB-D data. Thanks to colorization techniques, these methods exploit the filters learned from 2D images to extract meaningful representations in 2. 5D. Still, the perceptual signature of these two kind of images is very different, with the first usually strongly characterized by textures, and the second mostly by silhouettes of objects. Ideally, one would like to have two CNNs, one for RGB and one for depth, each trained on a suitable data collection, able to capture the perceptual properties of each channel for the task at hand. This has not been possible so far, due to the lack of a suitable depth database. This paper addresses this issue, proposing to opt for synthetically generated images rather than collecting by hand a 2. 5D large scale database. While being clearly a proxy for real data, synthetic images allow to trade quality for quantity, making it possible to generate a virtually infinite amount of data. We show that the filters learned from such data collection, using the very same architecture typically used on visual data, learns very different filters, resulting in depth features (a) able to better characterize the different facets of depth images, and (b) complementary with respect to those derived from CNNs pre-trained on 2D datasets. Experiments on two publicly available databases show the power of our approach.

IROS Conference 2017 Conference Paper

Learning deep visual object models from noisy web data: How to make it work

Nizar Massouh
Francesca Babiloni
Tatiana Tommasi
Jay Young
Nick Hawes
Barbara Caputo

Deep networks thrive when trained on large scale data collections. This has given ImageNet a central role in the development of deep architectures for visual object classification. However, ImageNet was created during a specific period in time, and as such it is prone to aging, as well as dataset bias issues. Moving beyond fixed training datasets will lead to more robust visual systems, especially when deployed on robots in new environments which must train on the objects they encounter there. To make this possible, it is important to break free from the need for manual annotators. Recent work has begun to investigate how to use the massive amount of images available on the Web in place of manual image annotations. We contribute to this research thread with two findings: (1) a study correlating a given level of noisily labels to the expected drop in accuracy, for two deep architectures, on two different types of noise, that clearly identifies GoogLeNet as a suitable architecture for learning from Web data; (2) a recipe for the creation of Web datasets with minimal noise and maximum visual variability, based on a visual and natural language processing concept expansion strategy. By combining these two results, we obtain a method for learning powerful deep object models automatically from the Web. We confirm the effectiveness of our approach through object categorization experiments using our Web-derived version of ImageNet on a popular robot vision benchmark database, and on a lifelong object discovery task on a mobile robot.

ICRA Conference 2017 Conference Paper

Semantic web-mining and deep vision for lifelong object discovery

Jay Young
Lars Kunze
Valerio Basile
Elena Cabrio
Nick Hawes
Barbara Caputo

Autonomous robots that are to assist humans in their daily lives must recognize and understand the meaning of objects in their environment. However, the open nature of the world means robots must be able to learn and extend their knowledge about previously unknown objects on-line. In this work we investigate the problem of unknown object hypotheses generation, and employ a semantic Web-mining framework along with deep-learning-based object detectors. This allows us to make use of both visual and semantic features in combined hypotheses generation. Experiments on data from mobile robots in real world application deployments show that this combination improves performance over the use of either method in isolation.

JMLR Journal 2012 Journal Article

Multi Kernel Learning with Online-Batch Optimization

Francesco Orabona
Luo Jie
Barbara Caputo

In recent years there has been a lot of interest in designing principled classification algorithms over multiple cues, based on the intuitive notion that using more features should lead to better performance. In the domain of kernel methods, a principled way to use multiple features is the Multi Kernel Learning (MKL) approach. Here we present a MKL optimization algorithm based on stochastic gradient descent that has a guaranteed convergence rate. We directly solve the MKL problem in the primal formulation. By having a p-norm formulation of MKL, we introduce a parameter that controls the level of sparsity of the solution, while leading to an easier optimization problem. We prove theoretically and experimentally that 1) our algorithm has a faster convergence rate as the number of kernels grows; 2) the training complexity is linear in the number of training examples; 3) very few iterations are sufficient to reach good solutions. Experiments on standard benchmark databases support our claims. [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

ICRA Conference 2011 Conference Paper

Towards semi-supervised learning of semantic spatial concepts

Jesus Martínez-Gómez
Barbara Caputo

The ability of building robust semantic space representations of environments is crucial for the development of truly autonomous robots. This task, inherently connected with cognition, is traditionally achieved by training the robot with a supervised learning phase. We argue that the design of robust and autonomous systems would greatly benefit from adopting a semi-supervised online learning approach. Indeed, the support of open-ended, lifelong learning is fundamental in order to cope with the dazzling variability of the real world, and online learning provides precisely this kind of ability. Here we focus on the robot place recognition problem, and we present an online place classification algorithm that is able to detect gap in its own knowledge based on a confidence measure. For every incoming new image frame, the method is able to decide if (a) it is a known room with a familiar appearance, (b) it is a known room with a challenging appearance, or (c) it is a new, unknown room. Experiments on a subset of the challenging COLD database show the promise of our approach.

IROS Conference 2010 Conference Paper

Object recognition using visuo-affordance maps

Arjan Gijsberts
Tatiana Tommasi
Giorgio Metta
Barbara Caputo

One of the major challenges in developing autonomous systems is to make them able to recognize and categorize objects robustly. However, the appearance-based algorithms that are widely employed for robot perception do not explore the functionality of objects, described in terms of their affordances. These affordances (e. g. , manipulation, grasping) are discriminative for object categories and are important cues for reliable robot performance in everyday environments. In this paper, we propose a strategy for object recognition that integrates both visual appearance and grasp affordance features. Following previous work, we hypothesize that additional grasp information improves object recognition, even if we reconstruct the grasp modality from visual features using a mapping function. We considered two different representations for the grasp modality: (1) motor information of the hand posture while grasping and (2) a more general grasp affordance descriptor. Using a multi-modal classifier we show that having real grasp information significantly boost object recognition. This improvement is preserved, although to a lesser extent, if the grasp modality is reconstructed using the mapping function.

JMLR Journal 2009 Journal Article

Bounded Kernel-Based Online Learning

Francesco Orabona
Joseph Keshet
Barbara Caputo

A common problem of kernel-based online algorithms, such as the kernel-based Perceptron algorithm, is the amount of memory required to store the online hypothesis, which may increase without bound as the algorithm progresses. Furthermore, the computational load of such algorithms grows linearly with the amount of memory used to store the hypothesis. To attack these problems, most previous work has focused on discarding some of the instances, in order to keep the memory bounded. In this paper we present a new algorithm, in which the instances are not discarded, but are instead projected onto the space spanned by the previous online hypothesis. We call this algorithm Projectron. While the memory size of the Projectron solution cannot be predicted before training, we prove that its solution is guaranteed to be bounded. We derive a relative mistake bound for the proposed algorithm, and deduce from it a slightly different algorithm which outperforms the Perceptron. We call this second algorithm Projectron++. We show that this algorithm can be extended to handle the multiclass and the structured output settings, resulting, as far as we know, in the first online bounded algorithm that can learn complex classification tasks. The method of bounding the hypothesis representation can be applied to any conservative online algorithm and to other online algorithms, as it is demonstrated for ALMA 2. Experimental results on various data sets show the empirical advantage of our technique compared to various bounded online algorithms, both in terms of memory and accuracy. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

ICRA Conference 2009 Conference Paper

Model adaptation with least-squares SVM for adaptive hand prosthetics

Francesco Orabona
Claudio Castellini
Barbara Caputo
Angelo Emanuele Fiorilla
Giulio Sandini

The state-of-the-art in control of hand prosthetics is far from optimal. The main control interface is represented by surface electromyography (EMG): the activation potentials of the remnants of large muscles of the stump are used in a non-natural way to control one or, at best, two degrees-of-freedom. This has two drawbacks: first, the dexterity of the prosthesis is limited, leading to poor interaction with the environment; second, the patient undergoes a long training time. As more dexterous hand prostheses are put on the market, the need for a finer and more natural control arises. Machine learning can be employed to this end. A desired feature is that of providing a pre-trained model to the patient, so that a quicker and better interaction can be obtained. To this end we propose model adaptation with least-squares SVMs, a technique that allows the automatic tuning of the degree of adaptation. We test the effectiveness of the approach on a database of EMG signals gathered from human subjects. We show that, when pre-trained models are used, the number of training samples needed to reach a certain performance is reduced, and the overall performance is increased, compared to what would be achieved by starting from scratch.

NeurIPS Conference 2009 Conference Paper

Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Jie Luo
Barbara Caputo
Vittorio Ferrari

Given a corpus of news items consisting of images accompanied by text captions, we want to find out ` whos doing what, i. e. associate names and action verbs in the captions to the face and body pose of the persons in the images. We present a joint model for simultaneously solving the image-caption correspondences and learning visual appearance models for the face and pose classes occurring in the corpus. These models can then be used to recognize people and actions in novel images without captions. We demonstrate experimentally that our joint face and pose model solves the correspondence problem better than earlier models covering only the face, and that it can perform recognition of new uncaptioned images.

IROS Conference 2009 Conference Paper

You live, you learn, you forget: Continuous learning of visual places with a forgetting mechanism

Muhammad Muneeb Ullah
Francesco Orabona
Barbara Caputo

To fulfill the dream of having autonomous robots at home, there is a need for spatial representations augmented with semantic concepts. Vision has emerged recently as the key modality to recognize semantic categories like places (office, corridor, kitchen, etc). A crucial aspect of these semantic place representations is that they change over time, due to the dynamism of the world. This calls for visual algorithms able to learn from experience while at the same time managing the continuous flow of incoming data. This paper addresses these issues by presenting an SVM-based algorithm able to (a) learn continuously from experience with a fast updating rule, and (b) control the memory growth via a random forgetting mechanism while at the same time preserving an accuracy comparable to that of the batch algorithm. We apply our method to two different scenarios where learning from experience plays an important role: (1) continuous learning of visual places under dynamic changes, and (2) knowledge transfer of visual concepts across robot platforms. For both scenarios, results confirm the effectiveness of our approach.

ICRA Conference 2008 Conference Paper

SVM-based discriminative accumulation scheme for place recognition

Andrzej Pronobis
Óscar Martínez Mozos
Barbara Caputo

Integrating information coming from different sensors is a fundamental capability for autonomous robots. For complex tasks like topological localization, it would be desirable to use multiple cues, possibly from different modalities, so to achieve robust performance. This paper proposes a new method for integrating multiple cues. For each cue we train a large margin classifier which outputs a set of scores indicating the confidence of the decision. These scores are then used as input to a Support Vector Machine, that learns how to weight each cue, for each class, optimally during training. We call this algorithm SVM-based Discriminative Accumulation Scheme (SVM-DAS). We applied our method to the topological localization task, using vision and laser-based cues. Experimental results clearly show the value of our approach.

ICML Conference 2008 Conference Paper

The projectron: a bounded kernel-based Perceptron

Francesco Orabona
Joseph Keshet
Barbara Caputo

We present a discriminative online algorithm with a bounded memory growth, which is based on the kernel-based Perceptron. Generally, the required memory of the kernel-based Perceptron for storing the online hypothesis is not bounded. Previous work has been focused on discarding part of the instances in order to keep the memory bounded. In the proposed algorithm the instances are not discarded, but projected onto the space spanned by the previous online hypothesis. We derive a relative mistake bound and compare our algorithm both analytically and empirically to the state-of-the-art Forgetron algorithm (Dekel et al, 2007). The first variant of our algorithm, called Projectron, outperforms the Forgetron. The second variant, called Projectron++, outperforms even the Perceptron.

ICRA Conference 2008 Conference Paper

Towards robust place recognition for robot localization

Muhammad Muneeb Ullah
Andrzej Pronobis
Barbara Caputo
Jie Luo
Patric Jensfelt
Henrik I. Christensen

Localization and context interpretation are two key competences for mobile robot systems. Visual place recognition, as opposed to purely geometrical models, holds promise of higher flexibility and association of semantics to the model. Ideally, a place recognition algorithm should be robust to dynamic changes and it should perform consistently when recognizing a room (for instance a corridor) in different geographical locations. Also, it should be able to categorize places, a crucial capability for transfer of knowledge and continuous learning. In order to test the suitability of visual recognition algorithms for these tasks, this paper presents a new database, acquired in three different labs across Europe. It contains image sequences of several rooms under dynamic changes, acquired at the same time with a perspective and omnidirectional camera, mounted on a socket. We assess this new database with an appearance- based algorithm that combines local features with support vector machines through an ad-hoc kernel. Results show the effectiveness of the approach and the value of the database.

IROS Conference 2007 Conference Paper

Confidence-based cue integration for visual place recognition

Andrzej Pronobis
Barbara Caputo

A distinctive feature of intelligent systems is their capability to analyze their level of expertise for a given task; in other words, they know what they know. As a way towards this ambitious goal, this paper presents a recognition algorithm able to measure its own level of confidence and, in case of uncertainty, to seek for extra information so to increase its own knowledge and ultimately achieve better performance. We focus on the visual place recognition problem for topological localization, and we take an SVM approach. We propose a new method for measuring the confidence level of the classification output, based on the distance of a test image and the average distance of training vectors. This method is combined with a discriminative accumulation scheme for cue integration. We show with extensive experiments that the resulting algorithm achieves better performances for two visual cues than the classic single cue SVM on the same task, while minimising the computational load. More important, our method provides a reliable measure of the level of confidence of the decision.

IROS Conference 2007 Conference Paper

Incremental learning for place recognition in dynamic environments

Jie Luo
Andrzej Pronobis
Barbara Caputo
Patric Jensfelt

Vision-based place recognition is a desirable feature for an autonomous mobile system. In order to work in realistic scenarios, visual recognition algorithms should be adaptive, i. e. should be able to learn from experience and adapt continuously to changes in the environment. This paper presents a discriminative incremental learning approach to place recognition. We use a recently introduced version of the incremental SVM, which allows to control the memory requirements as the system updates its internal representation. At the same time, it preserves the recognition performance of the batch algorithm. In order to assess the method, we acquired a database capturing the intrinsic variability of places over time. Extensive experiments show the power and the potential of the approach.

IROS Conference 2006 Conference Paper

A Discriminative Approach to Robust Visual Place Recognition

Andrzej Pronobis
Barbara Caputo
Patric Jensfelt
Henrik I. Christensen

An important competence for a mobile robot system is the ability to localize and perform context interpretation. This is required to perform basic navigation and to facilitate local specific services. Usually localization is performed based on a purely geometric model. Through use of vision and place recognition a number of opportunities open up in terms of flexibility and association of semantics to the model. To achieve this, the present paper presents an appearance based method for place recognition. The method is based on a large margin classifier in combination with a rich global image descriptor. The method is robust to variations in illumination and minor scene changes. The method is evaluated across several different cameras, changes in time-of-day and weather conditions. The results clearly demonstrate the value of the approach

IROS Conference 2006 Conference Paper

A Discriminative Approach to Robust Visual Place Recognition

Andrzej Pronobis
Barbara Caputo
Patric Jensfelt
Henrik I. Christensen

An important competence for a mobile robot system is the ability to localize and perform context interpretation. This is required to perform basic navigation and to facilitate local specific services. Usually localization is performed based on a purely geometric model. Through use of vision and place recognition a number of opportunities open up in terms of flexibility and association of semantics to the model. To achieve this we present an appearance based method for place recognition. The method is based on a large margin classifier in combination with a rich global image descriptor. The method is robust to variations in illumination and minor scene changes. The method is evaluated across several different cameras, changes in time-of-day and weather conditions. The results clearly demonstrate the value of the approach.