Author name cluster

Torsten Sattler

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering

Jonas Kulhanek
Marie-Julie Rakotosaona
Fabian Manhardt
Christina Tsalicoglou
Michael Niemeyer
Torsten Sattler
Songyou Peng
Federico Tombari

In this work, we present a novel level-of-detail (LOD) method for 3D Gaussian Splatting that enables real-time rendering of large-scale scenes on memory-constrained devices. Our approach introduces a hierarchical LOD representation that iteratively selects optimal subsets of Gaussians based on camera distance, thus largely reducing both rendering time and GPU memory usage. We construct each LOD level by applying a depth-aware 3D smoothing filter, followed by importance-based pruning and fine-tuning to maintain visual fidelity. To further reduce memory overhead, we partition the scene into spatial chunks and dynamically load only relevant Gaussians during rendering, employing an opacity-blending mechanism to avoid visual artifacts at chunk boundaries. Our method achieves state-of-the-art performance on both outdoor (Hierarchical 3DGS) and indoor (Zip-NeRF) datasets, delivering high-quality renderings with reduced latency and memory requirements.

PDF Details

NeurIPS Conference 2025 Conference Paper

NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods

Jonas Kulhanek
Torsten Sattler

Novel view synthesis is an important problem with many applications, including AR/VR, gaming, and robotic simulations. With the recent rapid development of Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) methods, it is becoming difficult to keep track of the current state of the art (SoTA) due to methods using different evaluation protocols, codebases being difficult to install and use, and methods not generalizing well to novel 3D scenes. In our experiments, we show that even tiny differences in the evaluation protocols of various methods can artificially boost the performance of these methods. This raises questions about the validity of quantitative comparisons performed in the literature. To address these questions, we propose NerfBaselines, an evaluation framework which provides consistent benchmarking tools, ensures reproducibility, and simplifies the installation and use of various methods. We validate our implementation experimentally by reproducing the numbers reported in the original papers. For improved accessibility, we release a web platform that compares commonly used methods on standard benchmarks. We strongly believe NerfBaselines is a valuable contribution to the community as it ensures that quantitative results are comparable and thus truly measure progress in the field of novel view synthesis.

PDF Details

IROS Conference 2024 Conference Paper

Camera Pose Estimation from Bounding Boxes

Václav Vávra
Torsten Sattler
Zuzana Kukelova

Visual localization is an important part of many interesting applications, including robotics. The dominant localization strategy is to estimate the camera pose from 2D-3D matches between 2D pixel positions and 3D points. Yet, such approaches can be quite memory intensive and can lead to privacy risks. An interesting alternative to point-based matches is to use higher-level primitives for pose estimation. Consequently, this work investigates using correspondences between 2D and 3D bounding boxes for camera pose estimation. The resulting scene representation is compact and poses fewer privacy risks. In this setting, there are typically orders of magnitude fewer matches available compared to classical feature-based methods. In addition, the available correspondences are significantly more noisy. We investigate multiple strategies based on converting bounding box correspondences to point correspondences and propose a novel and simple 2-point camera absolute pose solver (DP2P) that exploits the fact that the depths of the objects can be approximated from the sizes of their bounding boxes.

Details

NeurIPS Conference 2024 Conference Paper

WildGaussians: 3D Gaussian Splatting In the Wild

Jonas Kulhanek
Songyou Peng
Zuzana Kukelova
Marc Pollefeys
Torsten Sattler

While the field of 3D scene reconstruction is dominated by NeRFs due to their photorealistic quality, 3D Gaussian Splatting (3DGS) has recently emerged, offering similar quality with real-time rendering speeds. However, both methods primarily excel with well-controlled 3D scenes, while in-the-wild data - characterized by occlusions, dynamic objects, and varying illumination - remains challenging. NeRFs can adapt to such conditions easily through per-image embedding vectors, but 3DGS struggles due to its explicit representation and lack of shared parameters. To address this, we introduce WildGaussians, a novel approach to handle occlusions and appearance changes with 3DGS. By leveraging robust DINO features and integrating an appearance modeling module within 3DGS, our method achieves state-of-the-art results. We demonstrate that WildGaussians matches the real-time rendering speed of 3DGS while surpassing both 3DGS and NeRF baselines in handling in-the-wild data, all within a simple architectural framework.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction

Zehao Yu
Songyou Peng
Michael Niemeyer
Torsten Sattler
Andreas Geiger

In recent years, neural implicit surface reconstruction methods have become popular for multi-view 3D reconstruction. In contrast to traditional multi-view stereo methods, these approaches tend to produce smoother and more complete reconstructions due to the inductive smoothness bias of neural networks. State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views. Yet, their performance drops significantly for larger and more complex scenes and scenes captured from sparse viewpoints. This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints, in particular in less-observed and textureless areas. Motivated by recent advances in the area of monocular geometry prediction, we systematically explore the utility these cues provide for improving neural implicit surface reconstruction. We demonstrate that depth and normal cues, predicted by general-purpose monocular estimators, significantly improve reconstruction quality and optimization time. Further, we analyse and investigate multiple design choices for representing neural implicit surfaces, ranging from monolithic MLP models over single-grid to multi-resolution grid representations. We observe that geometric monocular priors improve performance both for small-scale single-object as well as large-scale multi-object scenes, independent of the choice of representation.

PDF Details

ICRA Conference 2020 Conference Paper

To Learn or Not to Learn: Visual Localization from Essential Matrices

Qunjie Zhou
Torsten Sattler
Marc Pollefeys
Laura Leal-Taixé

Visual localization is the problem of estimating a camera within a scene and a key technology for autonomous robots. State-of-the-art approaches for accurate visual localization use scene-specific representations, resulting in the overhead of constructing these models when applying the techniques to new scenes. Recently, learned approaches based on relative pose estimation have been proposed, carrying the promise of easily adapting to new scenes. However, they are currently significantly less accurate than state-of-the-art approaches. In this paper, we are interested in analyzing this behavior. To this end, we propose a novel framework for visual localization from relative poses. Using a classical feature-based approach within this framework, we show state-of-the-art performance. Replacing the classical approach with learned alternatives at various levels, we then identify the reasons for why deep learned approaches do not perform well. Based on our analysis, we make recommendations for future work.

Details