Author name cluster

Benjamin Planche

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

ICLR Conference 2025 Conference Paper

3D Vision-Language Gaussian Splatting

Qucheng Peng
Benjamin Planche
Zhongpai Gao
Meng Zheng 0002
Anwesa Choudhuri
Terrence Chen
Chen Chen 0001
Ziyan Wu 0001

Recent advancements in 3D reconstruction methods and vision-language models have propelled the development of multi-modal 3D scene understanding, which has vital applications in robotics, autonomous driving, and virtual/augmented reality. However, current multi-modal scene understanding approaches have naively embedded semantic representations into 3D reconstruction methods without striking a balance between visual and language modalities, which leads to unsatisfying semantic rasterization of translucent or reflective objects, as well as over-fitting on color modality. To alleviate these limitations, we propose a solution that adequately handles the distinct visual and semantic modalities, i.e., a 3D vision-language Gaussian splatting model for scene understanding, to put emphasis on the representation learning of language modality. We propose a novel cross-modal rasterizer, using modality fusion along with a smoothed semantic indicator for enhancing semantic rasterization. We also employ a camera-view blending technique to improve semantic consistency between existing and synthesized views, thereby effectively mitigating over-fitting. Extensive experiments demonstrate that our method achieves state-of-the-art performance in open-vocabulary semantic segmentation, surpassing existing methods by a significant margin.

Details

ICLR Conference 2025 Conference Paper

6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering

Zhongpai Gao
Benjamin Planche
Meng Zheng 0002
Anwesa Choudhuri
Terrence Chen
Ziyan Wu 0001

Novel view synthesis has advanced significantly with the development of neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS). However, achieving high quality without compromising real-time rendering remains challenging, particularly for physically-based rendering using ray/path tracing with view-dependent effects. Recently, N-dimensional Gaussians (N-DG) introduced a 6D spatial-angular representation to better incorporate view-dependent effects, but the Gaussian representation and control scheme are sub-optimal. In this paper, we revisit 6D Gaussians and introduce 6D Gaussian Splatting (6DGS), which enhances color and opacity representations and leverages the additional directional information in the 6D space for optimized Gaussian control. Our approach is fully compatible with the 3DGS framework and significantly improves real-time radiance field rendering by better modeling view-dependent effects and fine details. Experiments demonstrate that 6DGS significantly outperforms 3DGS and N-DG, achieving up to a 15.73 dB improvement in PSNR with a reduction of 66.5\% Gaussian points compared to 3DGS. The project page is: https://gaozhongpai.github.io/6dgs/.

Details

ICLR Conference 2025 Conference Paper

Order-aware Interactive Segmentation

Bin Wang 0068
Anwesa Choudhuri
Meng Zheng 0002
Zhongpai Gao
Benjamin Planche
Andong Deng
Qin Liu 0008
Terrence Chen

Interactive segmentation aims to accurately segment target objects with minimal user interactions. However, current methods often fail to accurately separate target objects from the background, due to a limited understanding of order, the relative depth between objects in a scene. To address this issue, we propose OIS: order-aware interactive segmentation, where we explicitly encode the relative depth between objects into order maps. We introduce a novel order-aware attention, where the order maps seamlessly guide the user interactions (in the form of clicks) to attend to the image features. We further present an object-aware attention module to incorporate a strong object-level understanding to better differentiate objects with similar order. Our approach allows both dense and sparse integration of user clicks, enhancing both accuracy and efficiency as compared to prior works. Experimental results demonstrate that OIS achieves state-of-the-art performance, improving mIoU after one click by 7.61 on the HQSeg44K dataset and 1.32 on the DAVIS dataset as compared to the previous state-of-the-art SegNext, while also doubling inference speed compared to current leading methods.

Details

NeurIPS Conference 2024 Conference Paper

DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering

Zhongpai Gao
Benjamin Planche
Meng Zheng
Xiao Chen
Terrence Chen
Ziyan Wu

Digitally reconstructed radiographs (DRRs) are simulated 2D X-ray images generated from 3D CT volumes, widely used in preoperative settings but limited in intraoperative applications due to computational bottlenecks. Physics-based Monte Carlo simulations provide accurate representations but are extremely computationally intensity. Analytical DRR renderers are much more efficient, but at the price of ignoring anisotropic X-ray image formation phenomena such as Compton scattering. We propose a novel approach that balances realistic physics-inspired X-ray simulation with efficient, differentiable DRR generation using 3D Gaussian splatting (3DGS). Our direction-disentangled 3DGS (DDGS) method decomposes the radiosity contribution into isotropic and direction-dependent components, able to approximate complex anisotropic interactions without complex runtime simulations. Additionally, we adapt the 3DGS initialization to account for tomography data properties, enhancing accuracy and efficiency. Our method outperforms state-of-the-art techniques in image accuracy and inference speed, demonstrating its potential for intraoperative applications and inverse problems like pose registration.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Disguise without Disruption: Utility-Preserving Face De-identification

Zikui Cai
Zhongpai Gao
Benjamin Planche
Meng Zheng
Terrence Chen
M. Salman Asif
Ziyan Wu

With the rise of cameras and smart sensors, humanity generates an exponential amount of data. This valuable information, including underrepresented cases like AI in medical settings, can fuel new deep-learning tools. However, data scientists must prioritize ensuring privacy for individuals in these untapped datasets, especially for images or videos with faces, which are prime targets for identification methods. Proposed solutions to de-identify such images often compromise non-identifying facial attributes relevant to downstream tasks. In this paper, we introduce Disguise, a novel algorithm that seamlessly de-identifies facial images while ensuring the usability of the modified data. Unlike previous approaches, our solution is firmly grounded in the domains of differential privacy and ensemble-learning research. Our method involves extracting and substituting depicted identities with synthetic ones, generated using variational mechanisms to maximize obfuscation and non-invertibility. Additionally, we leverage supervision from a mixture-of-experts to disentangle and preserve other utility attributes. We extensively evaluate our method using multiple datasets, demonstrating a higher de-identification rate and superior consistency compared to prior approaches in various downstream tasks.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Implicit Modeling of Non-rigid Objects with Cross-Category Signals

Yuchun Liu
Benjamin Planche
Meng Zheng
Zhongpai Gao
Pierre Sibut-Bourde
Fan Yang
Terrence Chen
Ziyan Wu

Deep implicit functions (DIFs) have emerged as a potent and articulate means of representing 3D shapes. However, methods modeling object categories or non-rigid entities have mainly focused on single-object scenarios. In this work, we propose MODIF, a multi-object deep implicit function that jointly learns the deformation fields and instance-specific latent codes for multiple objects at once. Our emphasis is on non-rigid, non-interpenetrating entities such as organs. To effectively capture the interrelation between these entities and ensure precise, collision-free representations, our approach facilitates signaling between category-specific fields to adequately rectify shapes. We also introduce novel inter-object supervision: an attraction-repulsion loss is formulated to refine contact regions between objects. Our approach is demonstrated on various medical benchmarks, involving modeling different groups of intricate anatomical entities. Experimental results illustrate that our model can proficiently learn the shape representation of each organ and their relations to others, to the point that shapes missing from unseen instances can be consistently recovered by our method. Finally, MODIF can also propagate semantic information throughout the population via accurate point correspondences.

PDF Details DOI

ICLR Conference 2024 Conference Paper

PBADet: A One-Stage Anchor-Free Approach for Part-Body Association

Zhongpai Gao
Huayi Zhou 0001
Abhishek Sharma
Meng Zheng 0002
Benjamin Planche
Terrence Chen
Ziyan Wu 0001

The detection of human parts (e.g., hands, face) and their correct association with individuals is an essential task, e.g., for ubiquitous human-machine interfaces and action recognition. Traditional methods often employ multi-stage processes, rely on cumbersome anchor-based systems, or do not scale well to larger part sets. This paper presents PBADet, a novel one-stage, anchor-free approach for part-body association detection. Building upon the anchor-free object representation across multi-scale feature maps, we introduce a singular part-to-body center offset that effectively encapsulates the relationship between parts and their parent bodies. Our design is inherently versatile and capable of managing multiple parts-to-body associations without compromising on detection accuracy or robustness. Comprehensive experiments on various datasets underscore the efficacy of our approach, which not only outperforms existing state-of-the-art techniques but also offers a more streamlined and efficient solution to the part-body association challenge.

Details

AAAI Conference 2023 Conference Paper

Progressive Multi-View Human Mesh Recovery with Self-Supervision

Xuan Gong
Liangchen Song
Meng Zheng
Benjamin Planche
Terrence Chen
Junsong Yuan
David Doermann
Ziyan Wu

To date, little attention has been given to multi-view 3D human mesh estimation, despite real-life applicability (e.g., motion capture, sport analysis) and robustness to single-view ambiguities. Existing solutions typically suffer from poor generalization performance to new settings, largely due to the limited diversity of image/3D-mesh pairs in multi-view training data. To address this shortcoming, people have explored the use of synthetic images. But besides the usual impact of visual gap between rendered and target data, synthetic-data-driven multi-view estimators also suffer from overfitting to the camera viewpoint distribution sampled during training which usually differs from real-world distributions. Tackling both challenges, we propose a novel simulation-based training pipeline for multi-view human mesh recovery, which (a) relies on intermediate 2D representations which are more robust to synthetic-to-real domain gap; (b) leverages learnable calibration and triangulation to adapt to more diversified camera setups; and (c) progressively aggregates multi-view information in a canonical 3D space to remove ambiguities in 2D representations. Through extensive benchmarking, we demonstrate the superiority of the proposed solution especially for unseen in-the-wild scenarios.

PDF Details DOI

NeurIPS Conference 2019 Conference Paper

Incremental Scene Synthesis

Benjamin Planche
Xuejian Rong
Ziyan Wu
Srikrishna Karanam
Harald Kosch
Yingli Tian
Jan Ernst
ANDREAS HUTTER

We present a method to incrementally generate complete 2D or 3D scenes with the following properties: (a) it is globally consistent at each step according to a learned scene prior, (b) real observations of a scene can be incorporated while observing global consistency, (c) unobserved regions can be hallucinated locally in consistence with previous observations, hallucinations and global priors, and (d) hallucinations are statistical in nature, i. e. , different scenes can be generated from the same observations. To achieve this, we model the virtual scene, where an active agent at each step can either perceive an observed part of the scene or generate a local hallucination. The latter can be interpreted as the agent's expectation at this step through the scene and can be applied to autonomous navigation. In the limit of observing real data at each point, our method converges to solving the SLAM problem. It can otherwise sample entirely imagined scenes from prior distributions. Besides autonomous agents, applications include problems where large data is required for building robust real-world applications, but few samples are available. We demonstrate efficacy on various 2D as well as 3D data.

PDF Details

IROS Conference 2019 Conference Paper

Seeing Beyond Appearance - Mapping Real Images into Geometrical Domains for Unsupervised CAD-based Recognition

Benjamin Planche
Sergey Zakharov
Ziyan Wu 0001
Andreas Hutter
Harald Kosch
Slobodan Ilic

While convolutional neural networks are dominating the field of computer vision, one usually does not have access to the large amount of domain-relevant data needed for their training. Therefore, it has become common practice to use available synthetic samples along domain adaptation schemes to prepare algorithms for the target domain. Tackling this problem from a different angle, we introduce a pipeline to map unseen target samples into the synthetic domain used to train task-specific methods. Denoising the data and retaining only the features these recognition algorithms are familiar with, our solution greatly improves their performance. As this mapping is easier to learn than the opposite one (i. e. , to generate realistic features to augment the source samples), we demonstrate how our whole solution can be trained purely on augmented synthetic data and still performs better than methods trained with domain-relevant information (e. g. , real images or realistic textures for the 3D models). Applying our approach to object recognition from texture-less CAD data, we present a custom generative network which fully utilizes the purely geometrical information to learn robust features and to achieve a more refined mapping for unseen color images.

Details

IROS Conference 2017 Conference Paper

3D object instance recognition and pose estimation using triplet loss with dynamic margin

Sergey Zakharov
Wadim Kehl
Benjamin Planche
Andreas Hutter
Slobodan Ilic

In this paper, we address the problem of 3D object instance recognition and pose estimation of localized objects in cluttered environments using convolutional neural networks. Inspired by the descriptor learning approach of Wohlhart et al. [1], we propose a method that introduces the dynamic margin in the manifold learning triplet loss function. Such a loss function is designed to map images of different objects under different poses to a lower-dimensional, similarity-preserving descriptor space on which efficient nearest neighbor search algorithms can be applied. Introducing the dynamic margin allows for faster training times and better accuracy of the resulting low-dimensional manifolds. Furthermore, we contribute the following: adding in-plane rotations (ignored by the baseline method) to the training, proposing new background noise types that help to better mimic realistic scenarios and improve accuracy with respect to clutter, adding surface normals as another powerful image modality representing an object surface leading to better performance than merely depth, and finally implementing an efficient online batch generation that allows for better variability during the training phase. We perform an exhaustive evaluation to demonstrate the effects of our contributions. Additionally, we assess the performance of the algorithm on the large BigBIRD dataset [2] to demonstrate good scalability properties of the pipeline with respect to the number of models.

Details