Arrow Research search

Author name cluster

Xiaohan Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
2 author rows

Possible papers

9

AAAI Conference 2026 Conference Paper

Segment and Matte Anything in a Unified Model

  • Zezhong Fan
  • Xiaohan Li
  • Topojoy Biswas
  • Kaushiki Nag
  • Kannan Achan

Segment Anything (SAM) has recently pushed the boundaries of segmentation by demonstrating remarkable zero-shot generalization and flexible prompting after training on over one billion masks. Despite this, its mask prediction accuracy often falls short of the precision required in real-world applications. While several refinement modules have been proposed to boost SAM’s segmentation quality, achieving highly accurate object delineation within a single, unified framework remains an open challenge. Furthermore, interactive image matting—which aims to generate fine-grained alpha mattes guided by diverse user hints—has not yet been explored in the context of SAM. Insights from recent studies highlight strong correlations between segmentation and matting, suggesting the feasibility of a unified model capable of both tasks. In this paper, we introduce Segment And Matte Anything (SAMA), a lightweight extension of SAM that delivers high-quality interactive image segmentation and matting with minimal extra parameters or computational cost. Our Multi-View Localization Encoder (MVLE) captures detailed features from local views, while the Localization Adapter (Local-Adapter) refines mask outputs by recovering subtle boundary details. We also incorporate two prediction heads for each task into the architecture to generate segmentation and matting tasks, simultaneously. Trained on a diverse dataset aggregated from publicly available sources, SAMA achieves state-of-the-art performance across multiple segmentation and matting benchmarks, showcasing its adaptability and effectiveness in a wide range of downstream tasks.

ECAI Conference 2025 Conference Paper

CG-FedLLM: How to Compress Gradients in Federated Fine-Tuning for Large Language Models

  • Huiwen Wu
  • Xiaogang Xu 0002
  • Deyi Zhang
  • Xiaohan Li
  • Jiafei Wu
  • Zhe Liu 0001

The success of current Large-Language Models (LLMs) hinges on extensive training data that are collected and stored centrally, called Centralized Learning (CL). However, such a collection manner poses a privacy threat, and one potential solution is Federated Learning (FL), which transfers gradients, not raw data, among clients. Unlike traditional networks, FL for LLMs incurs significant communication costs due to their tremendous parameters. In this study, we introduce an innovative approach to compress gradients to improve communication efficiency during LLM FL, formulating the new FL pipeline named CG-FedLLM. This approach integrates an encoder on the client side to acquire the compressed gradient features and a decoder on the server side to reconstruct the gradients. We also develop a novel training strategy that comprises Temporal-ensemble Gradient-Aware Pre-training (TGAP) to identify characteristic gradients of the target model and Federated AutoEncoder-Involved Fine-tuning (FAF) to compress gradients adaptively. Extensive experiments confirm that our approach reduces communication costs and improves performance (e. g. , average 3 points increment compared with traditional CL- and FL-based fine-tuning with several foundation models on well-recognized benchmarks, MMLU and C-Eval). This is because our encoder-decoder, trained via TGAP and FAF, can filter gradients while selectively preserving critical features. Furthermore, we present a series of experimental analyses that focus on the communication efficiency, accuracy, and generalization ability within this privacy-centric framework, providing insights into the development of more efficient and private LLMs fine-tuning.

IROS Conference 2025 Conference Paper

CODE: COllaborative Visual-UWB SLAM for Online Large-Scale Metric DEnse Mapping

  • Lin Chen 0042
  • Xuan Jia
  • Shuhui Bu
  • Guangming Wang 0001
  • Kun Li
  • Zhenyu Xia
  • Xiaohan Li
  • Pengcheng Han

This paper presents a novel collaborative online dense mapping system for multiple Unmanned Aerial Vehicles (UAVs). The system confers two primary benefits: it facilitates simultaneous UAVs co-localization and real-time dense map reconstruction, and it recovers the metric scale even in GNSS-denied conditions. To achieve these advantages, Ultrawideband (UWB) measurements, monocular Visual Odometry (VO), and co-visibility observations are jointly employed to recover both relative positions and global UAV poses, thereby ensuring optimality at both local and global scales. In the proposed methodology, a two-stage optimization strategy is proposed to reduce optimization burden. Initially, relative Sim3 transformations among UAVs are swiftly estimated, with UWB measurements facilitating metric scale recovery in the absence of GNSS. Subsequently, a global pose optimization is performed to effectively mitigate cumulative drift. By integrating UWB, VO, and co-visibility data within this framework, both local geometric consistency and global pose accuracy are robustly maintained. Through comprehensive simulation and empirical real-world testing, we demonstrate that our system not only improves UAV positioning accuracy in challenging scenarios but also facilitates the high-quality, online integration of dense point clouds in large-scale areas. This research offers valuable contributions and practical techniques for precise, real-time map reconstruction using an autonomous UAV fleet, particularly in GNSS-denied environments.

AAAI Conference 2025 Conference Paper

DR-Encoder: Encode Low-rank Gradients with Random Prior for Large Language Models Differentially Privately

  • Huiwen Wu
  • Deyi Zhang
  • Xiaohan Li
  • Xiaogang Xu
  • Jiafei Wu
  • Zhe Liu

The emergence of the large language model (LLM) has shown its superiority in a wide range of disciplines, including language understanding and translation, relational logic reasoning, and even partial differential equations solving. The transformer is the pervasive backbone architecture for the foundation model construction. It is vital to research how to adjust the Transformer architecture to achieve an end-to-end privacy guarantee in LLM fine-tuning. This paper investigates three potential information leaks during a federated fine-tuning procedure for LLM (FedLLM). Based on the potential information leakage, we insert two-stage randomness into FedLLM to provide an end-to-end privacy guarantee solution. The first stage is to train a gradient auto-encoder with a Gaussian random prior based on the statistical information of the gradients generated by local clients. The second stage is fine-tuning the overall LLM with a differential privacy guarantee by adopting appropriate Gaussian noises. We show our proposed method's efficiency and accuracy gains with several foundation models and two popular evaluation benchmarks. Furthermore, we present a comprehensive privacy analysis with Gaussian Differential Privacy (GDP) and Renyi Differential Privacy (RDP).

AAMAS Conference 2025 Conference Paper

OGS-SLAM: Hybrid ORB-Gaussian Splatting SLAM

  • Xiaohan Li
  • Wenxiang Shen
  • Dong Liu
  • Jun Wu

Traditional visual SLAM systems (dense and sparse) focus on building metric maps, but the internal representations are misaligned with human vision, making it insufficient for assisting robots in scene perception and interpretation. Conversely, aligning robot scene representation with human vision enables more intuitive human-to-robot commands and improves the generalization capability of deployed neural networks trained on natural images. Neural scene representation-based visual SLAM system, with its consistent and high-fidelity mapping, provides a novel way to assist robots in detailed scene depiction and comprehensive perception. However, end-to-end methods suffer from low accuracy in robot localization, which inevitably degrades mapping quality and limits their practical applications. In this paper, we propose a robust hybrid SLAM system, named OGS-SLAM, which integrates traditional visual SLAM with 3D Gaussian Splatting (3D GS) mapping. This system inherits the high localization accuracy of traditional SLAM while providing a scene model that aligns with human cognition, thereby offering a reliable foundation for downstream human-robot interaction tasks. Experiments demonstrate that our method outperforms state-of-the-art (SOTA) end-to-end SLAM systems in localization, mapping, and map semantic segmentation. Code will be available at: https: //github. com/realXiaohan/OGS-SLAM.

AAAI Conference 2024 Conference Paper

CTO-SLAM: Contour Tracking for Object-Level Robust 4D SLAM

  • Xiaohan Li
  • Dong Liu
  • Jun Wu

The demand for 4D ( 3D+time ) SLAM system is increasingly urgent, especially for decision-making and scene understanding. However, most of the existing simultaneous localization and mapping ( SLAM ) systems primarily assume static environments. They fail to represent dynamic scenarios due to the challenge of establishing robust long-term spatiotemporal associations in dynamic object tracking. We address this limitation and propose CTO-SLAM, a monocular and RGB-D object-level 4D SLAM system to track moving objects and estimate their motion simultaneously. In this paper, we propose contour tracking, which introduces contour features to enhance the keypoint representation of dynamic objects and coupled with pixel tracking to achieve long-term robust object tracking. Based on contour tracking, we propose a novel sampling-based object pose initialization algorithm and the following adapted bundle adjustment ( BA ) optimization algorithm to estimate dynamic object poses with high accuracy. The CTO-SLAM system is verified on both KITTI and VKITTI datasets. The experimental results demonstrate that our system effectively addresses cumulative errors in long-term spatiotemporal association and hence obtains substantial improvements over the state-of-the-art systems. The source code is available at https://github.com/realXiaohan/CTO-SLAM.

IROS Conference 2024 Conference Paper

mini-PointNetPlus: A Local Feature Descriptor in Deep Learning Model for Real-time 3D Environment Perception

  • Chuanyu Luo
  • Nuo Cheng
  • Sikun Ma
  • Jun Xiang
  • Xiaohan Li
  • Shengguang Lei
  • Pu Li 0001

Common deep learning models for 3D real-time environment perception often use pillarization/voxelization methods to convert point cloud data into pillars/voxels and then process it with a 2D/3D convolutional neural network (CNN). The pioneer work PointNet has been widely applied as a local feature descriptor, a fundamental component in deep learning models for 3D perception, to extract features of a point cloud. This is achieved by using a symmetric max-pooling operator which provides unique pillar/voxel features. However, by ignoring most of the points, the max-pooling operator causes an information loss, which reduces the model performance. To address this issue, we propose a novel local feature descriptor, mini-PointNetPlus, as an alternative for plug-and-play to PointNet. Our basic idea is to separately project the data points to the individual features considered, each leading to a permutation invariant. Thus, the proposed descriptor transforms an unordered point cloud to a stable order. The vanilla PointNet is proved to be a special case of our mini-PointNetPlus. Due to fully utilizing the features by the proposed descriptor, we demonstrate in experiment a considerable performance improvement for 3D perception.