Arrow Research search
Back to ICRA

ICRA 2025

E2B: A Single Modality Point-Based Tracker with Event Cameras

Conference Paper Accepted Paper Artificial Intelligence ยท Robotics

Abstract

High-speed object tracking holds significant relevance across robotic domains, such as drones and autonomous driving. Compared to conventional cameras, event cameras are equipped with the ability to capture object motion information at exceptionally high temporal resolution with relatively low power consumption and remain immune from motion-blurring effects. Regrettably, many existing methods adopt a framebased approach by stacking events into Event Frame, which overlooks the sparsity and high temporal resolution of events. This approach is also reliant on the huge pre-training backbone and reaches a performance plateau but demands unrealistically large networks and high power consumption, rendering it impractical for real-time applications in battery-constrained robotic scenarios. In this paper, we propose an efficient and effective single-modality tracker using Point Cloud representation named E2B (Event to Box). By directly handling the raw output of event cameras without dataformat transformation, E2B leverages events' coordinate guidance to accurately map Event Cloud features to 2D bounding boxes. Moreover, E2B incorporates the pyramid structure into the multi-stage feature extraction architecture to effectively track objects across diverse scales. In the experiments, E2B performs outstandingly on two large-scale and one synthetic event-based tracking datasets, covering both indoor and outdoor environments, as well as rigid and non-rigid objects.

Authors

Keywords

  • Point cloud compression
  • Power demand
  • Robot kinematics
  • Robot vision systems
  • Stacking
  • Cameras
  • Rendering (computer graphics)
  • Feature extraction
  • Real-time systems
  • Object tracking
  • Dynamic Vision Sensor
  • Point Cloud
  • Low Power Consumption
  • Diverse Scales
  • Cloud Features
  • Tracking Dataset
  • Event Frames
  • Point Cloud Representation
  • Convolutional Neural Network
  • K-nearest Neighbor
  • Pedestrian
  • Intersection Over Union
  • 3D Space
  • Multilayer Perceptron
  • Feature Points
  • Weight Coefficient
  • Temporal Domain
  • Feature Extraction Network
  • Fixed Time Interval
  • Search Region
  • Multi-stage Structure
  • Traditional Cameras
  • Feature Extraction Block
  • Point Cloud Features
  • Template Feature
  • Prediction Box
  • Template Region
  • Structural Hierarchy
  • Global Feature Extraction

Context

Venue
IEEE International Conference on Robotics and Automation
Archive span
1984-2025
Indexed papers
30179
Paper id
129679758423960663