Differentiable Hierarchical Visual Tokenization

Marius Aasan; Martine Hjelkrem Tan; Nico Catalano; Changkyu Choi; Adín Ramírez Rivera

Back to NeurIPS

NeurIPS 2025

Differentiable Hierarchical Visual Tokenization

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Vision Transformers rely on fixed patch tokens that ignore the spatial and semantic structure of images. In this work, we introduce an end-to-end differentiable tokenizer that adapts to image content with pixel-level granularity while remaining backward-compatible with existing architectures for retrofitting pretrained models. Our method uses hierarchical model selection with information criteria to provide competitive performance in both image-level classification and dense-prediction tasks, and even supports out-of-the-box raster-to-vector conversion.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: Annual Conference on Neural Information Processing Systems
Archive span: 1987-2025
Indexed papers: 30776
Paper id: 182031199987859285