Author name cluster

Shuhan Shen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

ICLR Conference 2025 Conference Paper

NeuralPlane: Structured 3D Reconstruction in Planar Primitives with Neural Fields

Hanqiao Ye
Yuzhou Liu
Yangdong Liu
Shuhan Shen

3D maps assembled from planar primitives are compact and expressive in representing man-made environments. In this paper, we present **NeuralPlane**, a novel approach that explores **neural** fields for multi-view 3D **plane** reconstruction. Our method is centered upon the core idea of distilling geometric and semantic cues from inconsistent 2D plane observations into a unified 3D neural representation, which unlocks the full leverage of plane attributes. It is accomplished through several key designs, including: 1) a monocular module that generates geometrically smooth and semantically meaningful segments known as 2D plane observations, 2) a plane-guided training procedure that implicitly learns accurate 3D geometry from the multi-view plane observations, and 3) a self-supervised feature field termed *Neural Coplanarity Field* that enables the modeling of scene semantics alongside the geometry. Without relying on prior plane annotations, our method achieves high-fidelity reconstruction comprising planar primitives that are not only crisp but also well-aligned with the semantic content. Comprehensive experiments on ScanNetv2 and ScanNet++ demonstrate the superiority of our method in both geometry and semantics.

Details

IROS Conference 2024 Conference Paper

BEV 2 PR: BEV-Enhanced Visual Place Recognition with Structural Cues

Fudong Ge
Yiwei Zhang
Shuhan Shen
Weiming Hu 0004
Yue Wang
Jin Gao

In this paper, we propose a new image-based visual place recognition (VPR) framework by exploiting the structural cues in bird’s-eye view (BEV) from a single monocular camera. The motivation arises from two key observations about place recognition methods based on both appearance and structure: 1) For the methods relying on LiDAR sensors, the integration of LiDAR in robotic systems has led to increased expenses, while the alignment of data between different sensors is also a major challenge. 2) Other image-/camera-based methods, involving integrating RGB images and their derived variants (e. g. , pseudo depth images, pseudo 3D point clouds), exhibit several limitations, such as the failure to effectively exploit the explicit spatial relationships between different objects. To tackle the above issues, we design a new BEV-enhanced VPR framework, namely BEV 2 PR, generating a composite descriptor with both visual cues and spatial awareness based on a single camera. The key points lie in: 1) We use BEV features as an explicit source of structural knowledge in constructing global features. 2) The lower layers of the pretrained backbone from BEV generation are shared for visual and structural streams in VPR, facilitating the learning of fine-grained local features in the visual stream. 3) The complementary visual and structural features can jointly enhance VPR performance. Our BEV 2 PR framework enables consistent performance improvements over several popular aggregation modules for RGB global features. The experiments on our collected VPR-NuScenes dataset demonstrate an absolute gain of 2. 47% on Recall@1 for the strong Conv-AP baseline to achieve the best performance in our setting, and notably, a 18. 06% gain on the hard set. The code and dataset will be available at https://github.com/FudongGe/BEV2PR.

Details

AAAI Conference 2022 Conference Paper

MMA: Multi-Camera Based Global Motion Averaging

Hainan Cui
Shuhan Shen

In order to fully perceive the surrounding environment, many intelligent robots and self-driving cars are equipped with a multi-camera system. Based on this system, the structurefrom-motion (SfM) technology is used to realize scene reconstruction, but the fixed relative poses between cameras in the multi-camera system are usually not considered. This paper presents a tailor-made multi-camera based motion averaging system, where the fixed relative poses are utilized to improve the accuracy and robustness of SfM. Our approach starts by dividing the images into reference images and nonreference images, and edges in view-graph are divided into four categories accordingly. Then, a multi-camera based rotating averaging problem is formulated and solved in two stages, where an iterative re-weighted least squares scheme is used to deal with outliers. Finally, a multi-camera based translation averaging problem is formulated and a l1-norm based optimization scheme is proposed to compute the relative translations of multi-camera system and reference camera positions simultaneously. Experiments demonstrate that our algorithm achieves superior accuracy and robustness on various data sets compared to the state-of-the-art methods.

PDF Details

IROS Conference 2022 Conference Paper

Multi-Camera-LiDAR Auto-Calibration by Joint Structure-from-Motion

Diantao Tu
Baoyu Wang
Hainan Cui
Yuqian Liu
Shuhan Shen

Multiple sensors, especially cameras and LiDARs, are widely used in autonomous vehicles. In order to fuse data from different sensors accurately, precise calibrations are required, including camera intrinsic parameters, and relative poses between multiple cameras and LiDARs. However, most existing camera-LiDAR calibration methods need to place manually designed calibration objects in multiple locations and multiple times, which are time-consuming and labor-intensive, and are not suitable for frequent use. To address that, in this paper we proposed a novel calibration pipeline that can automatically calibrate multiple cameras and multiple LiDARs in a Structure-from-Motion (SfM) process. In our pipeline, we first perform a global SfM on all images with the help of rough LiDAR data to get the initial poses of all sensors. Then, feature points on lines and planes are extracted from both SfM point cloud and LiDARs. With these features, a global Bundle Adjustment is performed to minimize the point reprojection errors, point-to-line errors, and point-to-plane errors together. During this minimization process, camera intrinsic parameters, camera and LiDAR poses, and SfM point cloud are refined jointly. The proposed method uses the characteristics of natural scenes, does not require manually designed calibration objects, and incorporates all calibration parameters into a unified optimization framework. Experiments on autonomous vehicles with different sensor configurations demonstrate the effectiveness and robustness of the proposed method.

Details

IROS Conference 2021 Conference Paper

Recalling Direct 2D-3D Matches for Large-Scale Visual Localization

Zhuo Song
Chuting Wang
Yuqian Liu
Shuhan Shen

Estimating the 6-DoF camera pose of an image with respect to a 3D scene model, known as visual localization, is a fundamental problem in many computer vision and robotics tasks. Among various visual localization methods, the direct 2D-3D matching method has become the preferred method for many practical applications due to its computational efficiency. When using direct 2D-3D matching methods in large-scale scenes, a vocabulary tree can be used to accelerate the matching process, which will also induce the quantization artifacts leading to reduce the inlier ratio and decrease the localization accuracy. To this end, in this paper two simple and effective mechanisms, called visibility-based recalling and space-based recalling, are proposed to recover lost matches caused by the quantization artifacts, thus can largely improve the localization accuracy and success rate without increasing too much computational time. Experimental results on long-term visual localization benchmarks demonstrate the effectiveness of our method compared with state-of-the-arts.

Details

ICRA Conference 2021 Conference Paper

Semantically Guided Multi-View Stereo for Dense 3D Road Mapping

Mingzhe Lv
Diantao Tu
Xincheng Tang
Yuqian Liu
Shuhan Shen

Compared to widely used LiDAR-based mapping in autonomous driving field, image-based mapping method has the advantages of low cost, high resolution, and no need for complex calibration. However, the image-based 3D mapping depends heavily on the texture richness and always leaves holes and outliers in low-textured areas, such as the road surface. To this end, this paper proposed a novel semantically guided Multi-View Stereo method for dense 3D road mapping, which integrates semantic information into PatchMatch-based MVS pipeline and uses image semantic segmentation as soft constraints in neighbor views selection, depth-map initialization, depth propagation, and depth-map completion. Experimental results on public and our own datasets show that, with the help of semantics, the proposed method achieves superior completeness with comparable accuracy for 3D road mapping compared to state-of-the-art MVS methods.

Details