DAP: Diffusion-based Affordance Prediction for Multi-modality Storage

Haonan Chang; Kowndinya Boyalakuntla; Yuhan Liu; Xinyu Zhang; Liam Schramm; Abdeslam Boularias

Back to IROS

IROS 2024

DAP: Diffusion-based Affordance Prediction for Multi-modality Storage

Conference Paper Accepted Paper Artificial Intelligence · Robotics

Details

Abstract

Solving storage problems—where objects must be accurately placed into containers with precise orientations and positions—presents a distinct challenge that extends beyond traditional rearrangement tasks. These challenges are primarily due to the need for fine-grained 6D manipulation and the inherent multi-modality of solution spaces, where multiple viable goal configurations exist for the same storage container. We present a novel Diffusion-based Affordance Prediction (DAP) pipeline for the multi-modal object storage problem. DAP leverages a two-step approach, initially identifying a placeable region on the container and then precisely computing the relative pose between the object and that region. Existing methods either struggle with multi-modality issues or computation-intensive training. Our experiments demonstrate DAP’s superior performance and training efficiency over the current state-of-the-art RPDiff, achieving remarkable results on the RPDiff benchmark. Additionally, our experiments showcase DAP’s data efficiency in real-world applications, an advancement over existing simulation-driven approaches. Our contribution fills a gap in robotic manipulation research by offering a solution that is both computationally efficient and capable of handling real-world variability. Code and supplementary material can be found at: https://github.com/changhaonan/DPS.git.

Authors

Keywords

Training
Codes
Affordances
Pipelines
Pose estimation
Containers
Prediction algorithms
Diffusion models
Computational efficiency
Intelligent robots
Training Efficiency
Relative Pose
Prediction Pipeline
Storage Problems
Multimodal Problems
Denoising
Bimodal
Point Cloud
Bounding Box
Diffusion Model
Days Of Training
Coordinates Of Points
Semantic Segmentation
Target Object
Labeled Data
Robotic Arm
Instance Segmentation
Dishwasher
Transformer Architecture
Conditional Variational Autoencoder
Point Cloud Registration
Neural Field
Pose Prediction
Point Cloud Segmentation
Gaussian Noise
Training Objective
Attention Mechanism

Context

Venue: IEEE/RSJ International Conference on Intelligent Robots and Systems
Archive span: 1988-2025
Indexed papers: 26578
Paper id: 1135221680222286823