Author name cluster

Chris Xiaoxuan Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers

2 author rows

AAAI Conference 2025 Conference Paper

Risk Controlled Image Retrieval

Kaiwen Cai
Chris Xiaoxuan Lu
Xingyu Zhao
Wei Huang
Xiaowei Huang

Most image retrieval research prioritizes improving predictive performance, often overlooking situations where the reliability of predictions is equally important. The gap between model performance and reliability requirements highlights the need for a systematic approach to analyze and address the risks associated with image retrieval. Uncertainty quantification technique can be applied to mitigate this issue by assessing uncertainty for retrieval sets, but it provides only a heuristic estimate of uncertainty rather than a guarantee. To address these limitations, we present Risk Controlled Image Retrieval (RCIR), which generates retrieval sets with coverage guarantee, i.e., retrieval sets that are guaranteed to contain the true nearest neighbors with a predefined probability. RCIR can be easily integrated with existing uncertainty-aware image retrieval systems, agnostic to data distribution and model selection. To the best of our knowledge, this is the first work that provides coverage guarantees to image retrieval. The validity and efficiency of RCIR are demonstrated on four real-world datasets: CAR-196, CUB-200, Pittsburgh, and ChestX-Det.

PDF Details DOI

IROS Conference 2025 Conference Paper

VISC: mmWave Radar Scene Flow Estimation using Pervasive Visual-Inertial Supervision

Kezhong Liu
Yiwen Zhou
Mozi Chen
Jianhua He
Jingao Xu
Zheng Yang 0002
Chris Xiaoxuan Lu
Shengkai Zhang

This work proposes a mmWave radar’s scene flow estimation framework supervised by data from a widespread visual-inertial (VI) sensor suite, allowing crowdsourced training data from smart vehicles. Current scene flow estimation methods for mmWave radar are typically supervised by dense point clouds from 3D LiDARs, which are expensive and not widely available in smart vehicles. While VI data are more accessible, visual images alone cannot capture the 3D motions of moving objects, making it difficult to supervise their scene flow. Moreover, the temporal drift of VI rigid transformation also degenerates the scene flow estimation of static points. To address these challenges, we propose a drift-free rigid transformation estimator that fuses kinematic model-based ego-motions with neural network-learned results. It provides strong supervision signals to radar-based rigid transformation and infers the scene flow of static points. Then, we develop an optical-mmWave supervision extraction module that extracts the supervision signals of radar rigid transformation and scene flow. It strengthens the supervision by learning the scene flow of dynamic points with the joint constraints of optical and mmWave radar measurements. Extensive experiments demonstrate that, in smoke-filled environments, our method even outperforms state-of-the-art (SOTA) approaches using costly LiDARs.

Details

IROS Conference 2024 Conference Paper

Click to Grasp: Zero-Shot Precise Manipulation via Visual Diffusion Descriptors

Nikolaos Tsagkas
Jack Rome
Subramanian Ramamoorthy
Oisin Mac Aodha
Chris Xiaoxuan Lu

Precise manipulation that is generalizable across scenes and objects remains a persistent challenge in robotics. Current approaches for this task heavily depend on having a significant number of training instances to handle objects with pronounced visual and/or geometric part ambiguities. Our work explores the grounding of fine-grained part descriptors for precise manipulation in a zero-shot setting by utilizing web-trained text-to-image diffusion-based generative models. We tackle the problem by framing it as a dense semantic part correspondence task. Our model returns a gripper pose for manipulating a specific part, using as reference a user-defined click from a source image of a visually different instance of the same object. We require no manual grasping demonstrations as we leverage the intrinsic object geometry and features. Practical experiments in a real-world tabletop scenario validate the efficacy of our approach, demonstrating its potential for advancing semantic-aware robotics manipulation. Web page: https://tsagkas.github.io/click2grasp

Details

EAAI Journal 2024 Journal Article

Forecasting backdraft with multimodal method: Fusion of fire image and sensor data

Tianhang Zhang
Fangqiang Ding
Zilong Wang
Fu Xiao
Chris Xiaoxuan Lu
Xinyan Huang

Details DOI

ICRA Conference 2024 Conference Paper

Multimodal Indoor Localization Using Crowdsourced Radio Maps

Zhaoguang Yi
Xiangyu Wen 0001
Qiyue Xia
Peize Li
Francisco Zampella
Firas Alsehly
Chris Xiaoxuan Lu

Indoor Positioning Systems (IPS) traditionally rely on odometry and building infrastructures like WiFi, often supplemented by building floor plans for increased accuracy. However, the limitation of floor plans in terms of availability and timeliness of updates challenges their wide applicability. In contrast, the proliferation of smartphones and WiFi-enabled robots has made crowdsourced radio maps – databases pairing locations with their corresponding Received Signal Strengths (RSS) – increasingly accessible. These radio maps not only provide WiFi fingerprint-location pairs but encode movement regularities akin to the constraints imposed by floor plans. This work investigates the possibility of leveraging these radio maps as a substitute for floor plans in multimodal IPS. We introduce a new framework to address the challenges of radio map inaccuracies and sparse coverage. Our proposed system integrates an uncertainty-aware neural network model for WiFi localization and a bespoken Bayesian fusion technique for optimal fusion. Extensive evaluations on multiple real-world sites indicate a significant performance enhancement, with results showing ∼ 25% improvement over the best baseline.

Details

NeurIPS Conference 2024 Conference Paper

RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

Fangqiang Ding
Xiangyu Wen
Yunzhou Zhu
Yiming Li
Chris Xiaoxuan Lu

3D occupancy-based perception pipeline has significantly advanced autonomous driving by capturing detailed scene descriptions and demonstrating strong generalizability across various object categories and shapes. Current methods predominantly rely on LiDAR or camera inputs for 3D occupancy prediction. These methods are susceptible to adverse weather conditions, limiting the all-weather deployment of self-driving cars. To improve perception robustness, we leverage the recent advances in automotive radars and introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction. Our method, RadarOcc, circumvents the limitations of sparse radar point clouds by directly processing the 4D radar tensor, thus preserving essential scene details. RadarOcc innovatively addresses the challenges associated with the voluminous and noisy 4D radar data by employing Doppler bins descriptors, sidelobe-aware spatial sparsification, and range-wise self-attention mechanisms. To minimize the interpolation errors associated with direct coordinate transformations, we also devise a spherical-based feature encoding followed by spherical-to-Cartesian feature aggregation. We benchmark various baseline methods based on distinct modalities on the public K-Radar dataset. The results demonstrate RadarOcc's state-of-the-art performance in radar-based 3D occupancy prediction and promising results even when compared with LiDAR- or camera-based methods. Additionally, we present qualitative evidence of the superior performance of 4D radar in adverse weather conditions and explore the impact of key pipeline components through ablation studies.

PDF Details DOI

ICRA Conference 2024 Conference Paper

RaTrack: Moving Object Detection and Tracking with 4D Radar Point Cloud

Zhijun Pan
Fangqiang Ding
Hantao Zhong
Chris Xiaoxuan Lu

Mobile autonomy relies on the precise perception of dynamic environments. Robustly tracking moving objects in 3D world thus plays a pivotal role for applications like trajectory prediction, obstacle avoidance, and path planning. While most current methods utilize LiDARs or cameras for Multiple Object Tracking (MOT), the capabilities of 4D imaging radars remain largely unexplored. Recognizing the challenges posed by radar noise and point sparsity in 4D radar data, we introduce RaTrack, an innovative solution tailored for radar-based tracking. Bypassing the typical reliance on specific object types and 3D bounding boxes, our method focuses on motion segmentation and clustering, enriched by a motion estimation module. Evaluated on the View-of-Delft dataset, RaTrack showcases superior tracking precision of moving objects, largely surpassing the performance of the state of the art. We release our code and model at https://github.com/LJacksonPan/RaTrack.

Details

ICRA Conference 2024 Conference Paper

Robust 3D Object Detection from LiDAR-Radar Point Clouds via Cross-Modal Feature Augmentation

Jianning Deng
Gabriel Chan
Hantao Zhong
Chris Xiaoxuan Lu

This paper presents a novel framework for robust 3D object detection from point clouds via cross-modal hallucination. Our proposed approach is agnostic to either hallucination direction between LiDAR and 4D radar. We introduce multiple alignments on both spatial and feature levels to achieve simultaneous backbone refinement and hallucination generation. Specifically, spatial alignment is proposed to deal with the geometry discrepancy for better instance matching between LiDAR and radar. The feature alignment step further bridges the intrinsic attribute gap between the sensing modalities and stabilizes the training. The trained object detection models can deal with difficult detection cases better, even though only single-modal data is used as the input during the inference stage. Extensive experiments on the View-of-Delft (VoD) dataset show that our proposed method outperforms the state-of-the-art (SOTA) methods for both radar and LiDAR object detection while maintaining competitive efficiency in runtime.

Details

IROS Conference 2023 Conference Paper

Feature-based Visual Odometry for Bronchoscopy: A Dataset and Benchmark

Jianning Deng
Peize Li
Kevin Dhaliwal
Chris Xiaoxuan Lu
Mohsen Khadem

Bronchoscopy is a medical procedure that involves the insertion of a flexible tube with a camera into the airways to survey, diagnose and treat lung diseases. Due to the complex branching anatomical structure of the bronchial tree and the similarity of the inner surfaces of the segmental airways, navigation systems are now being routinely used to guide the operator during procedures to access the lung periphery. Current navigation systems rely on sensor-integrated bronchoscopes to track the position of the bronchoscope in real-time. This approach has limitations, including increased cost and limited use in non-specialized settings. To address this issue, researchers have proposed visual odometry algorithms to track the bronchoscope camera without the need for external sensors. However, due to the lack of publicly available datasets, limited progress is made. To this end, we have developed a database of bronchoscopy videos in a phantom lung model and ex-vivo human lungs. The dataset contains 34 video sequences with over 23, 000 frames with odometry ground truth data collected using electromagnetic tracking sensors. With our dataset, we empower the robotics and machine learning community to advance the field. We share our insights on challenges in endoscopic visual odometry. Furthermore, we provide benchmark results for this dataset. State-of-the-art feature extraction algorithms including SIFT, ORB, Superpoint, Shi- Tomasi, and LoFTR are tested on this dataset. The benchmark results demonstrate that the LoFTR algorithm outperforms other approaches, but still has significant errors in the presence of rapid movements and occlusions.

Details

NeurIPS Conference 2023 Conference Paper

MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing

Jianfei Yang
He Huang
Yunjiao Zhou
Xinyan Chen
Yuecong Xu
Shenghai Yuan
Han Zou
Chris Xiaoxuan Lu

4D human perception plays an essential role in a myriad of applications, such as home automation and metaverse avatar simulation. However, existing solutions which mainly rely on cameras and wearable devices are either privacy intrusive or inconvenient to use. To address these issues, wireless sensing has emerged as a promising alternative, leveraging LiDAR, mmWave radar, and WiFi signals for device-free human sensing. In this paper, we propose MM-Fi, the first multi-modal non-intrusive 4D human dataset with 27 daily or rehabilitation action categories, to bridge the gap between wireless sensing and high-level human perception tasks. MM-Fi consists of over 320k synchronized frames of five modalities from 40 human subjects. Various annotations are provided to support potential sensing tasks, e. g. , human pose estimation and action recognition. Extensive experiments have been conducted to compare the sensing capacity of each or several modalities in terms of multiple tasks. We envision that MM-Fi can contribute to wireless sensing research with respect to action recognition, human pose estimation, multi-modal learning, cross-modal supervision, and interdisciplinary healthcare research.

PDF Details

IROS Conference 2023 Conference Paper

RADA: Robust Adversarial Data Augmentation for Camera Localization in Challenging Conditions

Jialu Wang
Muhamad Risqi Utama Saputra
Chris Xiaoxuan Lu
Niki Trigoni
Andrew Markham

Camera localization is a fundamental problem for many applications in computer vision, robotics, and autonomy. Despite recent deep learning-based approaches, the lack of robustness in challenging conditions persists due to changes in appearance caused by texture-less planes, repeating structures, reflective surfaces, motion blur, and illumination changes. Data augmentation is an attractive solution, but standard image perturbation methods fail to improve localization robustness. To address this, we propose RADA, which concentrates on perturbing the most vulnerable pixels to generate relatively less image perturbations that perplex the network. Our method outperforms previous augmentation techniques, achieving up to twice the accuracy of state-of-the-art models even under ‘unseen’ challenging weather conditions. Videos of our results can be found at https://youtu.be/niOv7-fJeCA.The source code for RADA is publicly available at https://github.com/jialuwang123321/RADA.

Details

ICRA Conference 2022 Conference Paper

AutoPlace: Robust Place Recognition with Single-chip Automotive Radar

Kaiwen Cai
Bing Wang 0013
Chris Xiaoxuan Lu

This paper presents a novel place recognition approach to autonomous vehicles by using low-cost, single-chip automotive radar. Aimed at improving recognition robustness and fully exploiting the rich information provided by this emerging automotive radar, our approach follows a principled pipeline that comprises (1) dynamic points removal from instant Doppler measurement, (2) spatial-temporal feature embedding on radar point clouds, and (3) retrieved candidates refinement from Radar Cross Section measurement. Extensive experimental results on the public nuScenes dataset demonstrate that existing visual/LiDAR/spinning radar place recognition approaches are less suitable for single-chip automotive radar. In contrast, our purpose-built approach for automotive radar consistently outperforms a variety of baseline methods via a comprehensive set of metrics, providing insights into the efficacy when used in a realistic system.

Details

ICRA Conference 2022 Conference Paper

DC-Loc: Accurate Automotive Radar Based Metric Localization with Explicit Doppler Compensation

Pengen Gao
Shengkai Zhang
Wei Wang 0050
Chris Xiaoxuan Lu

Automotive mmWave radar has been widely used in the automotive industry due to its small size, low cost, and complementary advantages to optical sensors (e. g. , cameras, LiDAR, etc.) in adverse weathers, e. g. , fog, raining, and snowing. On the other side, its large wavelength also poses fundamental challenges to perceive the environment. Recent advances have made breakthroughs on its inherent drawbacks, i. e. , the multipath reflection and the sparsity of mmWave radar's point clouds. However, the frequency-modulated continuous wave modulation of radar signals makes it more sensitive to vehicles’ mobility than optical sensors. This work focuses on the problem of frequency shift, i. e. , the Doppler effect distorts the radar ranging measurements and its knock-on effect on metric localization. We propose a new radar-based metric localization framework, termed DC-Loc, which can obtain more accurate location estimation by restoring the Doppler distortion. Specifically, we first design a new algorithm that explicitly compensates the Doppler distortion of radar scans and then model the measurement uncertainty of the Doppler-compensated point cloud to further optimize the metric localization. Extensive experiments using the public nuScenes dataset and CARLA simulator demonstrate that our method outperforms the state-of-the-art approach by 25. 2% and 5. 6% improvements in terms of translation and rotation errors, respectively.

Details

AAMAS Conference 2022 Conference Paper

Multiagent Model-based Credit Assignment for Continuous Control

Dongge Han
Chris Xiaoxuan Lu
Tomasz Michalak
Michael Wooldridge

Deep reinforcement learning (RL) has recently shown great promise in robotic continuous control tasks. Nevertheless, prior research in this vein center around the centralized learning setting that largely relies on the communication availability among all the components of a robot. However, agents in the real world often operate in a decentralised fashion without communication due to latency requirements, limited power budgets and safety concerns. By formulating robotic components as a system of decentralised agents, this work presents a decentralised multiagent reinforcement learning framework for continuous control. To this end, we first develop a cooperative multiagent PPO framework that allows for centralized optimisation during training and decentralised operation during execution. However, the system only receives a global reward signal which is not attributed towards each agent. To address this challenge, we further propose a generic game-theoretic credit assignment framework which computes agent-specific reward signals. Last but not least, we also incorporate a model-based RL module into our credit assignment framework, which leads to significant improvement in sample efficiency. Finally, we empirically demonstrate the effectiveness of our framework on Mujoco locomotion control tasks.

PDF

IROS Conference 2022 Conference Paper

OdomBeyondVision: An Indoor Multi-modal Multi-platform Odometry Dataset Beyond the Visible Spectrum

Peize Li
Kaiwen Cai
Muhamad Risqi Utama Saputra
Zhuangzhuang Dai
Chris Xiaoxuan Lu

This paper presents a multimodal indoor odometry dataset, OdomBeyondVision, featuring multiple sensors across the different spectrum and collected with different mobile platforms. Not only does OdomBeyondVision contain the traditional navigation sensors, sensors such as IMUs, mechanical LiDAR, RGBD camera, it also includes several emerging sensors such as the single-chip mmWave radar, LWIR thermal camera and solid-state LiDAR. With the above sensors on UAV, UGV and handheld platforms, we respectively recorded the multimodal odometry data and their movement trajectories in various indoor scenes and different illumination conditions. We release the exemplar radar, radar-inertial and thermal-inertial odometry implementations to demonstrate their results for future works to compare against and improve upon. The full dataset including toolkit and documentation is publicly available at: https://github.com/MAPS-Lab/OdomBeyondVision.

Details

IROS Conference 2022 Conference Paper

STUN: Self-Teaching Uncertainty Estimation for Place Recognition

Kaiwen Cai
Chris Xiaoxuan Lu
Xiaowei Huang 0001

Place recognition is key to Simultaneous Localization and Mapping (SLAM) and spatial perception. However, a place recognition in the wild often suffers from erroneous predictions due to image variations, e. g. , changing viewpoints and street appearance. Integrating uncertainty estimation into the life cycle of place recognition is a promising method to mitigate the impact of variations on place recognition performance. However, existing uncertainty estimation approaches in this vein are either computationally inefficient (e. g. , Monte Carlo dropout) or at the cost of dropped accuracy. This paper proposes STUN, a self-teaching framework that learns to simultaneously predict the place and estimate the prediction uncertainty given an input image. To this end, we first train a teacher net using a standard metric learning pipeline to produce embedding priors. Then, supervised by the pretrained teacher net, a student net with an additional variance branch is trained to finetune the embedding priors and estimate the uncertainty sample by sample. During the online inference phase, we only use the student net to generate a place prediction in conjunction with the uncertainty. When compared with place recognition systems that are ignorant of the uncertainty, our framework features the uncertainty estimation for free without sacrificing any prediction accuracy. Our experimental results on the large-scale Pittsburgh30k dataset demonstrate that STUN outperforms the state-of-the-art methods in both recognition accuracy and the quality of uncertainty estimation.

Details

ICRA Conference 2021 Conference Paper

3D Motion Capture of an Unmodified Drone with Single-chip Millimeter Wave Radar

Peijun Zhao
Chris Xiaoxuan Lu
Bing Wang 0013
Niki Trigoni
Andrew Markham

Accurate motion capture of aerial robots in 3D is a key enabler for autonomous operation in indoor environments such as warehouses or factories, as well as driving forward research in these areas. The most commonly used solutions at present are optical motion capture (e. g. VICON) and Ultrawide-band (UWB), but these are costly and cumbersome to deploy, due to their requirement of multiple cameras/anchors spaced around the tracking area. They also require the drone to be modified to carry an active or passive marker. In this work, we present an inexpensive system that can be rapidly installed, based on single-chip millimeter wave (mmWave) radar. Importantly, the drone does not need to be modified or equipped with any markers, as we exploit the Doppler signals from the rotating propellers. Furthermore, 3D tracking is possible from a single point, greatly simplifying deployment. We develop a novel deep neural network and demonstrate decimeter level 3D tracking at 10Hz, achieving better performance than classical baselines. Our hope is that this low-cost system will act to catalyse inexpensive drone research and increased autonomy.

Details

AAAI Conference 2020 Conference Paper

AtLoc: Attention Guided Camera Localization

Bing Wang
Changhao Chen
Chris Xiaoxuan Lu
Peijun Zhao
Niki Trigoni
Andrew Markham

Deep learning has achieved impressive results in camera localization, but current single-image techniques typically suffer from a lack of robustness, leading to large outliers. To some extent, this has been tackled by sequential (multi-images) or geometry constraint approaches, which can learn to reject dynamic objects and illumination conditions to achieve better performance. In this work, we show that attention can be used to force the network to focus on more geometrically robust objects and features, achieving state-of-the-art performance in common benchmark, even if using only a single image as input. Extensive experimental evidence is provided through public indoor and outdoor datasets. Through visualization of the saliency maps, we demonstrate how the network learns to reject dynamic objects, yielding superior global camera pose regression performance. The source code is avaliable at https: //github. com/BingCS/AtLoc.

PDF Details

ICRA Conference 2020 Conference Paper

Heart Rate Sensing with a Robot Mounted mmWave Radar

Peijun Zhao
Chris Xiaoxuan Lu
Bing Wang 0013
Changhao Chen
Linhai Xie
Mengyu Wang
Niki Trigoni
Andrew Markham

Heart rate monitoring at home is a useful metric for assessing health e. g. of the elderly or patients in post-operative recovery. Although non-contact heart rate monitoring has been widely explored, typically using a static, wall-mounted device, measurements are limited to a single room and sensitive to user orientation and position. In this work, we propose mBeats, a robot mounted millimeter wave (mmWave) radar system that provide periodic heart rate measurements under different user poses, without interfering in a users daily activities. mBeats contains a mmWave servoing module that adaptively adjusts the sensor angle to the best reflection pro le. Furthermore, mBeats features a deep neural network predictor, which can estimate heart rate from the lower leg and additionally provides estimation uncertainty. Through extensive experiments, we demonstrate accurate and robust operation of mBeats in a range of scenarios. We believe by integrating mobility and adaptability, mBeats can empower many down-stream healthcare applications at home, such as palliative care, post-operative rehabilitation and telemedicine.

Details

AAAI Conference 2019 Conference Paper

MotionTransformer: Transferring Neural Inertial Tracking between Domains

Changhao Chen
Yishu Miao
Chris Xiaoxuan Lu
Linhai Xie
Phil Blunsom
Andrew Markham
Niki Trigoni

Inertial information processing plays a pivotal role in egomotion awareness for mobile agents, as inertial measurements are entirely egocentric and not environment dependent. However, they are affected greatly by changes in sensor placement/orientation or motion dynamics, and it is infeasible to collect labelled data from every domain. To overcome the challenges of domain adaptation on long sensory sequences, we propose MotionTransformer - a novel framework that extracts domain-invariant features of raw sequences from arbitrary domains, and transforms to new domains without any paired data. Through the experiments, we demonstrate that it is able to efficiently and effectively convert the raw sequence from a new unlabelled target domain into an accurate inertial trajectory, benefiting from the motion knowledge transferred from the labelled source domain. We also conduct real-world experiments to show our framework can reconstruct physically meaningful trajectories from raw IMU measurements obtained with a standard mobile phone in various attachments.

PDF Details