Arrow Research search
Back to IROS

IROS 2024

Multi-Modal Representation Learning with Tactile Data

Conference Paper Accepted Paper Artificial Intelligence ยท Robotics

Abstract

Advancements in embodied language models like PALM-E and RT-2 have significantly enhanced language-conditioned robotic manipulation. However, these advances remain predominantly focused on vision and language, often overlooking the pivotal role of tactile feedback which is advantageous in contact-rich interactions. Our research introduces a novel approach that synergizes tactile information with vision and language. We present the Multi-Modal Wand (MMWand) dataset enriched with linguistic descriptions and tactile data. By integrating tactile feedback, we aim to bridge the divide between human linguistic understanding and robotic sensory interpretation. Our multi-modal representation model is trained on these datasets by employing the multi-modal embedding alignment principle from ImageBind which has shown promising results, emphasizing the potential of tactile data in robotic applications. The validation of our approach in downstream robotics tasks, such as texture-based object classification, cross-modality retrieval, and the dense reward function for visuomotor control, attests to its effectiveness. Our contributions underscore the importance of tactile feedback in multi-modal robotic learning and its potential to enhance robotic tasks. The MMWand dataset is publicly available at https://hyung-gun.me/mmwand/.

Authors

Keywords

  • Representation learning
  • Bridges
  • Tactile sensors
  • Linguistics
  • Data models
  • Robots
  • Intelligent robots
  • Multimodal Learning
  • Multimodal Representation
  • Tactile Data
  • Multimodal Representation Learning
  • Object Classification
  • Language Model
  • Robotic Applications
  • Robot Manipulator
  • Robotic Tasks
  • Tactile Information
  • Linguistic Description
  • Sensory Modalities
  • Latent Space
  • Inertial Measurement Unit
  • Robotic Arm
  • Elastography
  • Tactile Sensor
  • Description Language
  • Image Encoder
  • Image Annotation
  • Tactile Input
  • Text Modality
  • Perception Of The Robot
  • Texture Of Objects
  • Text Encoder
  • Tactile Interaction

Context

Venue
IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span
1988-2025
Indexed papers
26578
Paper id
128173416476748673