From Discriminative to Generative: A Diffusion-Based Paradigm for Multi-Agent Collaborative Perception

Kexin Gong; Puyi Yao; Guiyang Luo; Quan Yuan; Tiange Fu; Hui Zhang; Jinglin Li

doi:10.1609/aaai.v40i6.42423

Back to AAAI

AAAI 2026

From Discriminative to Generative: A Diffusion-Based Paradigm for Multi-Agent Collaborative Perception

Conference Paper AAAI Technical Track on Computer Vision III Artificial Intelligence

PDF Details DOI

Abstract

Collaborative perception leveraging intermediate feature fusion has emerged as a leading paradigm to significantly enhance the environmental perception capabilities of autonomous driving systems. However, existing methods typically rely on discriminative supervision guided by downstream tasks. This paradigm compels models to learn minimal, task-specific representations, which conflicts with the goal of cooperative perception to capture comprehensive information, thereby limiting generalization. To address this issue, we propose DiGS-CP, a novel two-stage generative supervised collaborative perception framework. Specifically, we introduce a diffusion-based generative task that conditions on fused object-level features to generate representations of object-level point clouds. The proposed generative supervision provides fine-grained, task-agnostic signals that encourages the fusion module to learn comprehensive representations beyond task-specific requirements. By preserving and integrating complementary information from collaborative agents, our approach overcomes the limitations of task-specific learning and enhances the generalizability of the learned features. Furthermore, our two-stage architecture requires agents to transmit only object-level features, significantly reducing communication overhead. Extensive experiments on three benchmark datasets demonstrate that DiGS-CP achieves state-of-the-art performance in 3D object detection, while maintaining low bandwidth requirements and exhibiting excellent generalization ability.

From Discriminative to Generative: A Diffusion-Based Paradigm for Multi-Agent Collaborative Perception

Abstract

Authors

Keywords

Context