Arrow Research search

Author name cluster

Yuze He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

NeurIPS Conference 2024 Conference Paper

AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos

  • Yuze He
  • Wang Zhao
  • Shaohui Liu
  • Yubin Hu
  • Yushi Bai
  • Yu-Hui Wen
  • Yong-Jin Liu

We introduce AlphaTablets, a novel and generic representation of 3D planes that features continuous 3D surface and precise boundary delineation. By representing 3D planes as rectangles with alpha channels, AlphaTablets combine the advantages of current 2D and 3D plane representations, enabling accurate, consistent and flexible modeling of 3D planes. We derive differentiable rasterization on top of AlphaTablets to efficiently render 3D planes into images, and propose a novel bottom-up pipeline for 3D planar reconstruction from monocular videos. Starting with 2D superpixels and geometric cues from pre-trained models, we initialize 3D planes as AlphaTablets and optimize them via differentiable rendering. An effective merging scheme is introduced to facilitate the growth and refinement of AlphaTablets. Through iterative optimization and merging, we reconstruct complete and accurate 3D planes with solid surfaces and clear boundaries. Extensive experiments on the ScanNet dataset demonstrate state-of-the-art performance in 3D planar reconstruction, underscoring the great potential of AlphaTablets as a generic 3D plane representation for various applications.

ICRA Conference 2024 Conference Paper

MMPI: a Flexible Radiance Field Representation by Multiple Multi-plane Images Blending

  • Yuze He
  • Peng Wang 0099
  • Yubin Hu 0001
  • Wang Zhao 0001
  • Ran Yi 0002
  • Yong-Jin Liu 0001
  • Wenping Wang 0001

This paper presents a flexible representation of neural radiance fields based on multi-plane images (MPI), for high-quality view synthesis of complex scenes. MPI with Normalized Device Coordinate (NDC) parameterization is widely used in NeRF learning for its simple definition, easy calculation, and powerful ability to represent unbounded scenes. However, existing NeRF works that adopt MPI representation for novel view synthesis can only handle simple forward-facing unbounded scenes (e. g. , the scenes in the LLFF dataset), where the input cameras are all observing in similar directions with small relative translations. Hence, extending these MPIbased methods to more complex scenes like large-range or even 360-degree scenes is very challenging. In this paper, we explore the potential of MPI and show that MPI can synthesize high-quality novel views of complex scenes with diverse camera distributions and view directions, which are not only limited to simple forward-facing scenes. Our key idea is to encode the neural radiance field with multiple MPIs facing different directions and blend them with an adaptive blending operation. For each region of the scene, the blending operation gives larger blending weights to those advantaged MPIs with stronger local representation abilities while giving lower weights to those with weaker representation abilities. Such blending operation automatically modulates the multiple MPIs to appropriately represent the diverse local density and color information. Experiments on the KITTI dataset and ScanNet dataset demonstrate that our proposed MMPI synthesizes high-quality images from diverse camera pose distributions and is fast to train, outperforming the previous fast-training NeRF methods for novel view synthesis. Moreover, we show that MMPI can encode extremely long trajectories and produce novel view renderings, demonstrating its potential in applications like autonomous driving. Our demo video is available at https://youtube.com/watch?v=mbNKwN5urC8.

AAAI Conference 2024 Conference Paper

O^2-Recon: Completing 3D Reconstruction of Occluded Objects in the Scene with a Pre-trained 2D Diffusion Model

  • Yubin Hu
  • Sheng Ye
  • Wang Zhao
  • Matthieu Lin
  • Yuze He
  • Yu-Hui Wen
  • Ying He
  • Yong-Jin Liu

Occlusion is a common issue in 3D reconstruction from RGB-D videos, often blocking the complete reconstruction of objects and presenting an ongoing problem. In this paper, we propose a novel framework, empowered by a 2D diffusion-based in-painting model, to reconstruct complete surfaces for the hidden parts of objects. Specifically, we utilize a pre-trained diffusion model to fill in the hidden areas of 2D images. Then we use these in-painted images to optimize a neural implicit surface representation for each instance for 3D reconstruction. Since creating the in-painting masks needed for this process is tricky, we adopt a human-in-the-loop strategy that involves very little human engagement to generate high-quality masks. Moreover, some parts of objects can be totally hidden because the videos are usually shot from limited perspectives. To ensure recovering these invisible areas, we develop a cascaded network architecture for predicting signed distance field, making use of different frequency bands of positional encoding and maintaining overall smoothness. Besides the commonly used rendering loss, Eikonal loss, and silhouette loss, we adopt a CLIP-based semantic consistency loss to guide the surface from unseen camera angles. Experiments on ScanNet scenes show that our proposed framework achieves state-of-the-art accuracy and completeness in object-level reconstruction from scene-level RGB-D videos. Code: https://github.com/THU-LYJ-Lab/O2-Recon.

NeurIPS Conference 2023 Conference Paper

Benchmarking Foundation Models with Language-Model-as-an-Examiner

  • Yushi Bai
  • Jiahao Ying
  • Yixin Cao
  • Xin Lv
  • Yuze He
  • Xiaozhi Wang
  • Jifan Yu
  • Kaisheng Zeng

Numerous benchmarks have been established to assess the performance of foundation models on open-ended question answering, which serves as a comprehensive test of a model's ability to understand and generate language in a manner similar to humans. Most of these works focus on proposing new datasets, however, we see two main issues within previous benchmarking pipelines, namely testing leakage and evaluation automation. In this paper, we propose a novel benchmarking framework, Language-Model-as-an-Examiner, where the LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner. Our framework allows for effortless extensibility as various LMs can be adopted as the examiner, and the questions can be constantly updated given more diverse trigger topics. For a more comprehensive and equitable evaluation, we devise three strategies: (1) We instruct the LM examiner to generate questions across a multitude of domains to probe for a broad acquisition, and raise follow-up questions to engage in a more in-depth assessment. (2) Upon evaluation, the examiner combines both scoring and ranking measurements, providing a reliable result as it aligns closely with human annotations. (3) We additionally propose a decentralized Peer-examination method to address the biases in a single examiner. Our data and benchmarking results are available at: http: //lmexam. xlore. cn.

AAAI Conference 2023 Conference Paper

DarkFeat: Noise-Robust Feature Detector and Descriptor for Extremely Low-Light RAW Images

  • Yuze He
  • Yubin Hu
  • Wang Zhao
  • Jisheng Li
  • Yong-Jin Liu
  • Yuxing Han
  • Jiangtao Wen

Low-light visual perception, such as SLAM or SfM at night, has received increasing attention, in which keypoint detection and local feature description play an important role. Both handcraft designs and machine learning methods have been widely studied for local feature detection and description, however, the performance of existing methods degrades in the extreme low-light scenarios in a certain degree, due to the low signal-to-noise ratio in images. To address this challenge, images in RAW format that retain more raw sensing information have been considered in recent works with a denoise-then-detect scheme. However, existing denoising methods are still insufficient for RAW images and heavily time-consuming, which limits the practical applications of such scheme. In this paper, we propose DarkFeat, a deep learning model which directly detects and describes local features from extreme low-light RAW images in an end-to-end manner. A novel noise robustness map and selective suppression constraints are proposed to effectively mitigate the influence of noise and extract more reliable keypoints. Furthermore, a customized pipeline of synthesizing dataset containing low-light RAW image matching pairs is proposed to extend end-to-end training. Experimental results show that DarkFeat achieves state-of-the-art performance on both indoor and outdoor parts of the challenging MID benchmark, outperforms the denoise-then-detect methods and significantly reduces computational costs up to 70%. Code is available at https://github.com/THU-LYJ-Lab/DarkFeat.