Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking

Eric Crawford; Joelle Pineau

Back to AAAI

AAAI 2020

Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking

Conference Paper AAAI Technical Track: Machine Learning Artificial Intelligence

PDF Details

Abstract

The ability to detect and track objects in the visual world is a crucial skill for any intelligent agent, as it is a necessary precursor to any object-level reasoning process. Moreover, it is important that agents learn to track objects without supervision (i. e. without access to annotated training videos) since this will allow agents to begin operating in new environments with minimal human assistance. The task of learning to discover and track objects in videos, which we call unsupervised object tracking, has grown in prominence in recent years; however, most architectures that address it still struggle to deal with large scenes containing many objects. In the current work, we propose an architecture that scales well to the large-scene, many-object setting by employing spatially invariant computations (convolutions and spatial attention) and representations (a spatially local object speciﬁcation scheme). In a series of experiments, we demonstrate a number of attractive features of our architecture; most notably, that it outperforms competing methods at tracking objects in cluttered scenes with many objects, and that it can generalize well to videos that are larger and/or contain more objects than videos encountered during training.

Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking

Abstract

Authors

Keywords

Context