Author name cluster

Philipp Henzler

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

ICLR Conference 2025 Conference Paper

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

Nikolai Kalischek
Michael Oechsle
Fabian Manhardt
Philipp Henzler
Konrad Schindler
Federico Tombari

We introduce a novel method for generating 360° panoramas from text prompts or images. Our approach leverages recent advances in 3D generation by employing multi-view diffusion models to jointly synthesize the six faces of a cubemap. Unlike previous methods that rely on processing equirectangular projections or autoregressive generation, our method treats each face as a standard perspective image, simplifying the generation process and enabling the use of existing multi-view diffusion models. We demonstrate that these models can be adapted to produce high-quality cubemaps without requiring correspondence-aware attention layers. Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set, whilst achieving state-of-the-art results, both qualitatively and quantitatively.

Details

NeurIPS Conference 2025 Conference Paper

ROGR: Relightable 3D Objects using Generative Relighting

Jiapeng Tang
Matthew Levine
Dor Verbin
Stephan Garbin
Matthias Niessner
Ricardo Martin Brualla
Pratul Srinivasan
Philipp Henzler

We introduce ROGR, a novel approach that reconstructs a relightable 3D model of an object captured from multiple views, driven by a generative relighting model that simulates the effects of placing the object under novel environment illuminations. Our method samples the appearance of the object under multiple lighting environments, creating a dataset that is used to train a lighting-conditioned Neural Radiance Field (NeRF) that outputs the object's appearance under any input environmental lighting. The lighting-conditioned NeRF uses a novel dual-branch architecture to encode the general lighting effects and specularities separately. The optimized lighting-conditioned NeRF enables efficient feed-forward relighting under arbitrary environment maps without requiring per-illumination optimization or light transport simulation. We evaluate our approach on the established TensoIR and Stanford-ORB datasets, where it improves upon the state-of-the-art on most metrics, and showcase our approach on real-world object captures.

PDF Details

NeurIPS Conference 2024 Conference Paper

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Ruiqi Gao
Aleksander Hołyński
Philipp Henzler
Arthur Brussee
Ricardo Martin-Brualla
Pratul Srinivasan
Jonathan T. Barron
Ben Poole

Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent novel views of a scene. These generated views can be used as input to robust 3D reconstruction techniques to produce 3D representations that can be rendered from any viewpoint in real-time. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

IllumiNeRF: 3D Relighting Without Inverse Rendering

Xiaoming Zhao
Pratul P. Srinivasan
Dor Verbin
Keunhong Park
Ricardo Martin-Brualla
Philipp Henzler

Existing methods for relightable view synthesis --- using a set of images of an object under unknown lighting to recover a 3D representation that can be rendered from novel viewpoints under a target illumination --- are based on inverse rendering, and attempt to disentangle the object geometry, materials, and lighting that explain the input images. Furthermore, this typically involves optimization through differentiable Monte Carlo rendering, which is brittle and computationally-expensive. In this work, we propose a simpler approach: we first relight each input image using an image diffusion model conditioned on target environment lighting and estimated object geometry. We then reconstruct a Neural Radiance Field (NeRF) with these relit images, from which we render novel views under the target lighting. We demonstrate that this strategy is surprisingly competitive and achieves state-of-the-art results on multiple relighting benchmarks. Please see our project page at illuminerf. github. io.

PDF Details DOI