Author name cluster

Eng Gee Lim

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

AAAI Conference 2025 Conference Paper

CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection

Xiaolei Wang
Xiaoyang Wang
Huihui Bai
Eng Gee Lim
Jimin Xiao

Existing unsupervised distillation-based methods rely on the differences between encoded and decoded features to locate abnormal regions in test images. However, the decoder trained only on normal samples still reconstructs abnormal patch features well, degrading performance. This issue is particularly pronounced in unsupervised multi-class anomaly detection tasks. We attribute this behavior to ‘over-generalization’ (OG) of decoder: the significantly increasing diversity of patch patterns in multi-class training enhances the model generalization on normal patches, but also inadvertently broadens its generalization to abnormal patches. To mitigate ‘OG’, we propose a novel approach that leverages class-agnostic learnable prompts to capture common textual normality across various visual patterns, and then apply them to guide the decoded features towards a ‘normal’ textual representation, suppressing ‘over-generalization’ of the decoder on abnormal patterns. To further improve performance, we also introduce a gated mixture-of-experts module to specialize in handling diverse patch patterns and reduce mutual interference between them in multi-class training. Our method achieves competitive performance on the MVTec AD and VisA datasets, demonstrating its effectiveness.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

DriftRemover: Hybrid Energy Optimizations for Anomaly Images Synthesis and Segmentation

Siyue Yao
Haotian Xu
Mingjie Sun
Siyue Yu
Jimin Xiao
Eng Gee Lim

This paper tackles the challenge of anomaly image synthesis and segmentation to generate various anomaly images and their segmentation labels to mitigate the issue of data scarcity. Existing approaches employ the precise mask to guide the generation, relying on additional mask generators, leading to increased computational costs and limited anomaly diversity. Although a few works use coarse masks as the guidance to expand diversity, they lack effective generation of labels for synthetic images, thereby reducing their practicality. Therefore, our proposed method simultaneously generates anomaly images and their corresponding masks by utilizing coarse masks and anomaly categories. The framework utilizes attention maps from synthesis process as mask labels and employs two optimization modules to tackle drift challenges, which are mismatches between synthetic results and real situations. Our evaluation demonstrates that our method improves pixel-level AP by 1. 3% and F1-MAX by 1. 8% in anomaly detection tasks on the MVTec dataset. Additionally, its successful application in practical scenarios highlights its effectiveness, improving IoU by 37. 2% and F-measure by 25. 1% with the Floor Dirt dataset. The code is available at https: //github. com/JJessicaYao/DriftRemover.

PDF Details DOI

IROS Conference 2025 Conference Paper

NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar

Runwei Guan
Jianan Liu
Liye Jia
Haocheng Zhao
Shanliang Yao
Xiaohui Zhu
Ka Lok Man
Eng Gee Lim

Recently, visual grounding and multi-sensors setting have been incorporated into perception system for terrestrial autonomous driving systems and Unmanned Surface Vessels (USVs), yet the high complexity of modern learning-based visual grounding model using multi-sensors prevents such model to be deployed on USVs in the real-life. To this end, we design a low-power multi-task model named NanoMVG for waterway embodied perception, guiding both camera and 4D millimeter-wave radar to locate specific object(s) through natural language. NanoMVG can perform both box-level and mask-level visual grounding tasks simultaneously. Compared to other visual grounding models, NanoMVG achieves highly competitive performance on the WaterVG dataset, particularly in harsh environments. Moreover, the real-world experiments with deployment of NanoMVG on embedded edge device of USV demonstrates its fast inference speed for real-time perception and capability of boasting ultra-low power consumption for long endurance.

Details

ICRA Conference 2025 Conference Paper

Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension

Runwei Guan
Ruixiao Zhang 0001
Ningwei Ouyang
Jianan Liu
Ka Lok Man
Xiaohao Cai
Ming Xu 0011
Jeremy S. Smith

Embodied perception is essential for intelligent vehicles and robots in interactive environmental understanding. However, these advancements primarily focus on vision, with limited attention given to using 3D modeling sensors, restricting a comprehensive understanding of objects in response to prompts containing qualitative and quantitative queries. Recently, as a promising automotive sensor with affordable cost, 4D millimeter-wave radars provide denser point clouds than conventional radars and perceive both semantic and physical characteristics of objects, thereby enhancing the reliability of perception systems. To foster the development of natural language-driven context understanding in radar scenes for 3D visual grounding, we construct the first dataset, Talk2Radar, which bridges these two modalities for 3D Referring Expression Comprehension (REC). Talk2Radar contains 8, 682 referring prompt samples with 20, 558 referred objects. Moreover, we propose a novel model, T-RadarNet, for 3D REC on point clouds, achieving State-Of-The-Art (SOTA) performance on the Talk2Radar dataset compared to counterparts. Deformable-FPN and Gated Graph Fusion are meticulously designed for efficient point cloud feature modeling and cross-modal fusion between radar and text features, respectively. Comprehensive experiments provide deep insights into radar-based 3D REC. We release our project at https://github.com/GuanRunwei/Talk2Radar.

Details

ECAI Conference 2024 Conference Paper

Adversarial Erasing Transformer for Weakly Supervised Semantic Segmentation

Bingfeng Zhang
Siyue Yu
Xuru Gao
Mingjie Sun
Eng Gee Lim
Jimin Xiao

Weakly supervised semantic segmentation has attracted a lot of attention recently. Previous methods can be divided into two types, which are single-stage training and multi-stage training. In this paper, we focus on multi-stage training for image-level weakly supervised semantic segmentation. Many recent methods have tried to use transformer architecture as the backbone for CAM generation since it can capture global relationships to refine CAM accurately. However, we observe that such a backbone still fails to generate complete and smooth CAM. We argue that this is because the attention mechanism in the transformer can only pay attention to the most discriminative relationships. It is difficult to capture semantic-level long-range pair-wise relationships under image-level supervision. Thus, we propose an adversarial erasing transformer network called AETN, where an erasing attention mechanism is designed to establish more extensive pair-wise relationships. To cope with erasing, more target features will be forced to activate. Thus, better feature representation can be obtained for more accurate CAM generation. Besides, to further help our network learn better feature representation, we propose a self-consistent learning mechanism based on different augmentations. In this way, our AETN outperforms recent methods. Our AETN achieves 73. 0 mIoU on the PASCAL VOC 2012 val set and 73. 9 mIoU on the PASCAL VOC 2012 test set. Code is available a https: //github. com/siyueyu/AETN.

Details

IROS Conference 2024 Conference Paper

ASY-VRNet: Waterway Panoptic Driving Perception Model based on Asymmetric Fair Fusion of Vision and 4D mmWave Radar

Runwei Guan
Shanliang Yao
Ka Lok Man
Xiaohui Zhu
Yong Yue 0001
Jeremy S. Smith
Eng Gee Lim
Yutao Yue

Panoptic Driving Perception (PDP) is critical for the autonomous navigation of Unmanned Surface Vehicles (USVs). A PDP model typically integrates multiple tasks, necessitating the simultaneous and robust execution of various perception tasks to facilitate downstream path planning. The fusion of visual and radar sensors is currently acknowledged as a robust and cost-effective approach. However, most existing research has primarily focused on fusing visual and radar features dedicated to object detection or utilizing a shared feature space for multiple tasks, neglecting the individual representation differences between various tasks. To address this gap, we propose a pair of Asymmetric Fair Fusion (AFF) modules with favorable explainability designed to efficiently interact with independent features from both visual and radar modalities, tailored to the specific requirements of object detection and semantic segmentation tasks. The AFF modules treat image and radar maps as irregular point sets and transform these features into a crossed-shared feature space for multitasking, ensuring equitable treatment of vision and radar point cloud features. Leveraging AFF modules, we propose a novel and efficient PDP model, ASY-VRNet, which processes image and radar features based on irregular super-pixel point sets. Additionally, we propose an effective multi-task learning method specifically designed for PDP models. Compared to other lightweight models, ASY-VRNet achieves state-of-the-art performance in object detection, semantic segmentation, and drivable-area segmentation on the WaterScenes benchmark. Our project is publicly available at https://github.com/GuanRunwei/ASY-VRNet.

Details

AAAI Conference 2021 Conference Paper

Structure-Consistent Weakly Supervised Salient Object Detection with Local Saliency Coherence

Siyue Yu
Bingfeng Zhang
Jimin Xiao
Eng Gee Lim

Sparse labels have been attracting much attention in recent years. However, the performance gap between weakly supervised and fully supervised salient object detection methods is huge, and most previous weakly supervised works adopt complex training methods with many bells and whistles. In this work, we propose a one-round end-to-end training approach for weakly supervised salient object detection via scribble annotations without pre/post-processing operations or extra supervision data. Since scribble labels fail to offer detailed salient regions, we propose a local coherence loss to propagate the labels to unlabeled regions based on image features and pixel distance, so as to predict integral salient regions with complete object structures. We design a saliency structure consistency loss as self-consistent mechanism to ensure consistent saliency maps are predicted with different scales of the same image as input, which could be viewed as a regularization technique to enhance the model generalization ability. Additionally, we design an aggregation module (AGGM) to better integrate high-level features, low-level features and global context information for the decoder to aggregate various information. Extensive experiments show that our method achieves a new state-of-the-art performance on six benchmarks (e. g. for the ECSSD dataset: Fβ = 0. 8995, Eξ = 0. 9079 and MAE = 0. 0489), with an average gain of 4. 60% for F-measure, 2. 05% for E-measure and 1. 88% for MAE over the previous best method on this task. Source code is available at http: //github. com/siyueyu/SCWSSOD.

PDF Details