Multi-Modal Representation Learning with Tactile Data

Hyung-Gun Chi; Jose A. Barreiros; Jean Mercat; Karthik Ramani; Thomas Kollar

Back to IROS

IROS 2024

Multi-Modal Representation Learning with Tactile Data

Conference Paper Accepted Paper Artificial Intelligence · Robotics

Details

Abstract

Advancements in embodied language models like PALM-E and RT-2 have significantly enhanced language-conditioned robotic manipulation. However, these advances remain predominantly focused on vision and language, often overlooking the pivotal role of tactile feedback which is advantageous in contact-rich interactions. Our research introduces a novel approach that synergizes tactile information with vision and language. We present the Multi-Modal Wand (MMWand) dataset enriched with linguistic descriptions and tactile data. By integrating tactile feedback, we aim to bridge the divide between human linguistic understanding and robotic sensory interpretation. Our multi-modal representation model is trained on these datasets by employing the multi-modal embedding alignment principle from ImageBind which has shown promising results, emphasizing the potential of tactile data in robotic applications. The validation of our approach in downstream robotics tasks, such as texture-based object classification, cross-modality retrieval, and the dense reward function for visuomotor control, attests to its effectiveness. Our contributions underscore the importance of tactile feedback in multi-modal robotic learning and its potential to enhance robotic tasks. The MMWand dataset is publicly available at https://hyung-gun.me/mmwand/.

Authors

Keywords

Representation learning
Bridges
Tactile sensors
Linguistics
Data models
Robots
Intelligent robots
Multimodal Learning
Multimodal Representation
Tactile Data
Multimodal Representation Learning
Object Classification
Language Model
Robotic Applications
Robot Manipulator
Robotic Tasks
Tactile Information
Linguistic Description
Sensory Modalities
Latent Space
Inertial Measurement Unit
Robotic Arm
Elastography
Tactile Sensor
Description Language
Image Encoder
Image Annotation
Tactile Input
Text Modality
Perception Of The Robot
Texture Of Objects
Text Encoder
Tactile Interaction

Context

Venue: IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span: 1988-2025
Indexed papers: 26578
Paper id: 128173416476748673