Author name cluster

Guilian Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

2 author rows

AAAI Conference 2026 Conference Paper

VPSentry: Semi-supervised Video Polyp Segmentation via Sentry-guided Long-term Prototype Fusion with Correlation Dynamic Propagation

Guilian Chen
Xiaoling Luo
Huisi Wu
Jing Qin

Automated polyp segmentation in colonoscopy videos is an essential computer-aided technology for early detection and removal of polyps. However, most existing video polyp segmentation methods are designed with pixel-level temporal learning mechanisms, at the cost of time-consuming frame-wise annotations. In this paper, we present VPSentry, a novel semi-supervised segmentation model with a sentry mechanism. Our model integrates a prototype memory to store the long-term spatiotemporal cues of colonoscopy videos. Moreover, we devise adaptive prototypes to capture and generalize critical representations from individual frames, enabling long-term temporal fusion across labeled and unlabeled frames. In addition, we propose a correlation dynamic propagation module that propagates information from prototypes to features while simultaneously extracting dynamic features to perceive variations in polyp details between adjacent frames. Since colonoscopy scenes may change among consecutive frames, we further employ a sentry mechanism to assess the inter-frame continuity. This mechanism guides the prototype memory updating and the correlation dynamic propagation, further facilitating robust temporal propagation and dynamic detail perception for semi-supervised learning of long-term colonoscopy video sequences. Extensive experiments on the large-scale SUN-SEG dataset demonstrate that our model achieves optimal segmentation performance with real-time inference efficiency.

PDF Details DOI

ICRA Conference 2023 Conference Paper

MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts

Zizhang Wu
Yuanzhu Gan
Lei Wang
Guilian Chen
Jian Pu

Monocular 3D object detection reveals an economical but challenging task in autonomous driving. Recently center-based monocular methods have developed rapidly with a great trade-off between speed and accuracy, where they usually depend on the object center's depth estimation via 2D features. However, the visual semantic features without sufficient pixel geometry information, may affect the performance of clues for spatial 3D detection tasks. To alleviate this, we propose MonoPGC, a novel end-to-end Monocular 3D object detection framework with rich Pixel Geometry Contexts. We introduce the pixel depth estimation as our auxiliary task and design depth cross-attention pyramid module (DCPM) to inject local and global depth geometry knowledge into visual features. In addition, we present the depth-space-aware transformer (DSAT) to integrate 3D space position and depth-aware features efficiently. Besides, we design a novel depth-gradient positional encoding (DGPE) to bring more distinct pixel geometry contexts into the transformer for better object detection. Extensive experiments demonstrate that our method achieves the state-of-the-art performance on the KITTI dataset.

Details

ICRA Conference 2023 Conference Paper

MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

Zizhang Wu
Guilian Chen
Yuanzhu Gan
Lei Wang
Jian Pu

Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving, especially under adverse weather. The current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data. However, these fusion approaches usually adopt the straightforward concatenation operation between multi-modal features, which ignores the semantic alignment with radar features and sufficient correlations across modals. In this paper, we present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features and enhance the cross-modal information interaction. To achieve so, we inject the semantic alignment into the radar features via the semantic-aligned radar encoder (SARE) to produce image-guided radar features. Then, we propose the radar-guided fusion transformer (RGFT) to fuse our radar and image features to strengthen the two modals' correlation from the global scope via the cross-attention mechanism. Extensive experiments show that MVFusion achieves state-of-the-art performance (51. 7% NDS and 45. 3% mAP) on the nuScenes dataset. We shall release our code and trained networks upon publication.

Details