Author name cluster

Fan Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

1 author row

AAAI Conference 2026 System Paper

AirNavigation: Let UAV Navigation Tell Its Own Story

Jianyu Jiang
Zequan Wang
Liang Yao
Shengxiang Xu
Fan Liu

Testing autonomous navigation algorithms of Unmanned Aerial Vehicles (UAVs) in real-world scenarios often entails significant safety risks. In this paper, we aim to build a flexible yet user-friendly UAV autonomous navigation simulator. Ideally, it should closely emulate real-world environments, support diverse UAV models and algorithms, and provide a flexible evaluation framework. Existing frameworks fail to satisfy all three requirements simultaneously. To this end, we present AirNavigation, an integrated simulation platform designed to support the end-to-end workflow of UAV navigation research. Specifically, our system leverages Unreal Engine to simulate highly realistic environments and diverse UAV models. It further facilitates semi-automated scene generation and multi-modal synthetic training data production. To lower the barrier of adoption, we develop a suite of user-friendly interfaces to enable seamless integration of diverse navigation algorithms. Moreover, we introduce a novel evaluation system powered by large language models to deliver personalized and fine-grained performance analysis.

PDF Details DOI

AAAI Conference 2026 Conference Paper

RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow

Liang Yao
Fan Liu
Hongbo Lu
Chuanyi Zhang
Rui Min
Shengxiang Xu
Shimin Di
Pai Peng

Remote sensing imagery presents vast, inherently unstructured spatial data, necessitating sophisticated reasoning to interpret complex user intents and contextual relationships beyond simple recognition tasks. In this paper, we aim to construct an Earth observation workflow to handle complex queries by reasoning about spatial context and user intent. As a reasoning workflow, it should autonomously explore and construct its own inference paths, rather than being confined to predefined ground‑truth sequences. Ideally, its architecture ought to be unified yet generalized, possessing capabilities to perform diverse reasoning tasks through one model without requiring additional fine-tuning. Existing remote sensing approaches rely on supervised fine-tuning paradigms and task‑specific heads, limiting both autonomous reasoning and unified generalization. To this end, we propose RemoteReasoner, a unified workflow for geospatial reasoning. The design of RemoteReasoner integrates a multi-modal large language model (MLLM) for interpreting user instructions and localizing targets, together with task transformation strategies that enable multi-granularity tasks, including object-, region-, and pixel-level. In contrast to existing methods, our framework is trained with reinforcement learning (RL) to endow the MLLM sufficient reasoning autonomy. At the inference stage, our transformation strategies enable diverse task output formats without requiring task-specific decoders or further fine-tuning. Experiments demonstrated that RemoteReasoner achieves state-of-the-art performance across multi-granularity reasoning tasks. Furthermore, it retains the MLLM's inherent generalization capability, demonstrating robust performance on unseen tasks and categories.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Bag of Tricks for Inference-time Computation of LLM Reasoning

Fan Liu
Wen-Shuo Chao
Naiqiang Tan
Hao Liu

With the advancement of large language models (LLMs), solving complex tasks (e. g. , math problems, code generation, etc. ) has garnered increasing attention. Inference-time computation methods (e. g. , Best-of-N, MCTS, etc. ) are of significant importance, as they have the potential to enhance the reasoning capabilities of LLMs without requiring external training computation. However, due to the inherent challenges of this technique, most existing methods remain proof-of-concept and are not yet sufficiently effective. In this paper, we investigate and benchmark strategies for improving inference-time computation across a wide range of reasoning tasks. Since most current methods rely on a pipeline that first generates candidate solutions (e. g. , generating chain-of-thought candidate solutions) and then selects them based on specific reward signals (e. g. , RLHF reward, process reward, etc. ), our research focuses on strategies for both candidate solution generation (e. g. , instructing prompts, hyperparameters: temperature and top-p, etc. ) and reward mechanisms (e. g. , self-evaluation, reward types, etc. ). The experimental results reveal that several previously overlooked strategies can be critical for the success of inference-time computation (e. g. , simplifying the temperature can improve general reasoning task performance by up to 5%). Based on extensive experiments (more than 20, 000 A100-80G GPU hours with over 1, 000 experiments) across a variety of models (e. g. , Llama, Qwen, and Mistral families) of various sizes, our proposed strategies outperform the baseline by a substantial margin in most cases, providing a stronger foundation for future research.

PDF Details

EAAI Journal 2025 Journal Article

Collaborative Semantic Contrastive for All-in-one Image Restoration

Bin Hu
Sai Yang
Fan Liu
Weiping Ding

All-in-one image restoration, aiming to develop a unified model to deal with multiple image restoration tasks simultaneously, has great impact on the realistic application of Artificial Intelligence (AI). Owing to the diversity of multiple tasks, all-in-one image restoration cannot detour the feature entanglement problem. Thus, we propose Collaborative Semantic Contrastive for All-in-one Image Restoration (CSCAir) following the philosophy of normalizing multiple degradations into a common semantic category. Specifically, we propose a novel online knowledge distillation method of Collaborative Contrastive Learning (CCL). CCL employs a cohort of student models with identical architecture, namely Positive Network (PN), Negative Network (NN) and All-in-one Restoration Network (ARN), which are collaboratively trained on-the-fly. PN and NN are committed to establish a semantic space intersected by the clear and unclear manifolds. ARN is pulled closer to the clear semantic domain while being pushed out of the unclear semantic scope with contrastive learning. During the learning process, a synergy loss is established to keep communication among all student models. CCL can construct a good semantic embedding whilst enforcing all types of degraded images into the clear manifold, thus obtaining excellent restoration results. Furthermore, we justify that common metrics in contrastive learning are inadequate in measuring the semantic similarity. Therefore, we propose utilizing Wasserstein distance upon dense local descriptors to guarantee semantic similarity without fidelity in CCL. This paper is significant for addressing the feature entanglement problem in the context of all-in-one image restoration. Comprehensive experiments have been conducted across multiple benchmarks under both single-task and all-in-one settings to realize the implemented Artificial Intelligence (AI). Our method respectively achieves 35. 87 decibel (dB) and 28. 25 decibel (dB) in Peak Signal-to-Noise Ratio (PSNR), 0. 9694 and 0. 9643 in Structural Similarity Index Measure (SSIM) on deraining and dehazing task, outperforming the state-of-the-art approaches.

Details DOI

NeurIPS Conference 2025 Conference Paper

Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition

Fan Liu
Jindong Han
Tengfei Lyu
Weijia Zhang
Zherui Yang
Lu Dai
Cancheng Liu
Hao Liu

Foundation models (FMs), such as GPT-4 and AlphaFold, are reshaping the landscape of scientific research. Beyond accelerating tasks such as hypothesis generation, experimental design, and result interpretation, they prompt a more fundamental question: Are FMs merely enhancing existing scientific methodologies, or are they redefining the way science is conducted? In this paper, we argue that FMs are catalyzing a transition toward a new scientific paradigm. We introduce a three-stage framework to describe this evolution: (1) Meta-Scientific Integration, where FMs enhance workflows within traditional paradigms; (2) Hybrid Human-AI Co-Creation, where FMs become active collaborators in problem formulation, reasoning, and discovery; and (3) Autonomous Scientific Discovery, where FMs operate as independent agents capable of generating new scientific knowledge with minimal human intervention. Through this lens, we review current applications and emerging capabilities of FMs across existing scientific paradigms. We further identify risks and future directions for FM-enabled scientific discovery. This position paper aims to support the scientific community in understanding the transformative role of FMs and to foster reflection on the future of scientific discovery.

PDF Details

AAAI Conference 2025 Conference Paper

Making Large Vision Language Models to Be Good Few-Shot Learners

Fan Liu
Wenwen Cai
Jian Huo
Chuanyi Zhang
Delong Chen
Jun Zhou

Few-shot classification (FSC) is a fundamental yet challenging task in computer vision that involves recognizing novel classes from limited data. While previous methods have focused on enhancing visual features or incorporating additional modalities, Large Vision Language Models (LVLMs) offer a promising alternative due to their rich knowledge and strong visual perception. However, LVLMs risk learning specific response formats rather than effectively extracting useful information from support data in FSC. In this paper, we investigate LVLMs' performance in FSC and identify key issues such as insufficient learning and the presence of severe position biases. To tackle above challenges, we adopt the meta-learning strategy to teach models ``learn to learn". By constructing a rich set of meta-tasks for instruction fine-tuning, LVLMs enhance the ability to extract information from few-shot support data for classification. Additionally, we further boost LVLM's few-shot learning capabilities through label augmentation (LA) and candidate selection (CS) in the fine-tuning and inference stages, respectively. LA is implemented via a character perturbation strategy to ensure the model focuses on support information. CS leverages attribute descriptions to filter out unreliable candidates and simplify the task. Extensive experiments demonstrate that our approach achieves superior performance on both general and fine-grained datasets. Furthermore, our candidate selection strategy has been proven beneficial for training-free LVLMs.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

Fan Liu
Zherui Yang
Cancheng Liu
Tianrui Song
Xiaofeng Gao
Hao Liu

Mathematical modeling is a cornerstone of scientific discovery and engineering practice, enabling the translation of real-world problems into formal systems across domains such as physics, biology, and economics. Unlike mathematical reasoning, which assumes a predefined formulation, modeling requires open-ended problem analysis, abstraction, and principled formalization. While Large Language Models (LLMs) have shown strong reasoning capabilities, they fall short in rigorous model construction, limiting their utility in real-world problem-solving. To this end, we formalize the task of LLM-powered real-world mathematical modeling, where agents must analyze problems, construct domain-appropriate formulations, and generate complete end-to-end solutions. We introduce MM-Bench, a curated benchmark of 111 problems from the Mathematical Contest in Modeling (MCM/ICM), spanning the years 2000 to 2025 and across ten diverse domains such as physics, biology, and economics. To tackle this task, we propose MM-Agent, an expert-inspired framework that decomposes mathematical modeling into four stages: open-ended problem analysis, structured model formulation, computational problem solving, and report generation. Experiments on MM-Bench show that MM-Agent significantly outperforms baseline agents, achieving an 11. 88\% improvement over human expert solutions while requiring only 15 minutes and \$0. 88 per task using GPT-4o. Furthermore, under official MCM/ICM protocols, MM-Agent assisted two undergraduate teams in winning the Finalist Award (\textbf{top 2. 0\% among 27, 456 teams}) in MCM/ICM 2025, demonstrating its practical effectiveness as a modeling copilot.

PDF Details

NeurIPS Conference 2024 Conference Paper

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

Zhao Xu
Fan Liu
Hao Liu

Although Large Language Models (LLMs) have demonstrated significant capabilities in executing complex tasks in a zero-shot manner, they are susceptible to jailbreak attacks and can be manipulated to produce harmful outputs. Recently, a growing body of research has categorized jailbreak attacks into token-level and prompt-level attacks. However, previous work primarily overlooks the diverse key factors of jailbreak attacks, with most studies concentrating on LLM vulnerabilities and lacking exploration of defense-enhanced LLMs. To address these issues, we introduced JailTrickBench to evaluate the impact of various attack settings on LLM performance and provide a baseline for jailbreak attacks, encouraging the adoption of a standardized evaluation framework. Specifically, we evaluate the eight key factors of implementing jailbreak attacks on LLMs from both target-level and attack-level perspectives. We further conduct seven representative jailbreak attacks on six defense methods across two widely used datasets, encompassing approximately 354 experiments with about 55, 000 GPU hours on A800-80G. Our experimental results highlight the need for standardized benchmarking to evaluate these attacks on defense-enhanced LLMs. Our code is available at https: //github. com/usail-hkust/JailTrickBench.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models

Jiaqi Li
Qianshan Wei
Chuanyi Zhang
Guilin Qi
Miaozeng Du
Yongrui Chen
Sheng Bi
Fan Liu

Machine unlearning (MU) empowers individuals with the `right to be forgotten' by removing their private or sensitive information encoded in machine learning models. However, it remains uncertain whether MU can be effectively applied to Multimodal Large Language Models (MLLMs), particularly in scenarios of forgetting the leaked visual data of concepts. To overcome the challenge, we propose an efficient method, Single Image Unlearning (SIU), to unlearn the visual recognition of a concept by fine-tuning a single associated image for few steps. SIU consists of two key aspects: (i) Constructing Multifaceted fine-tuning data. We introduce four targets, based on which we construct fine-tuning data for the concepts to be forgotten; (ii) Joint training loss. To synchronously forget the visual recognition of concepts and preserve the utility of MLLMs, we fine-tune MLLMs through a novel Dual Masked KL-divergence Loss combined with Cross Entropy loss. Alongside our method, we establish MMUBench, a new benchmark for MU in MLLMs and introduce a collection of metrics for its evaluation. Experimental results on MMUBench show that SIU completely surpasses the performance of existing methods. Furthermore, we surprisingly find that SIU can avoid invasive membership inference attacks and jailbreak attacks. To the best of our knowledge, we are the first to explore MU in MLLMs. We will release the code and benchmark in the near future.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Few-shot Classification via Ensemble Learning with Multi-Order Statistics

Sai Yang
Fan Liu
Delong Chen
Jun Zhou

Transfer learning has been widely adopted for few-shot classification. Recent studies reveal that obtaining good generalization representation of images on novel classes is the key to improving the few-shot classification accuracy. To address this need, we prove theoretically that leveraging ensemble learning on the base classes can correspondingly reduce the true error in the novel classes. Following this principle, a novel method named Ensemble Learning with Multi-Order Statistics (ELMOS) is proposed in this paper. In this method, after the backbone network, we use multiple branches to create the individual learners in the ensemble learning, with the goal to reduce the storage cost. We then introduce different order statistics pooling in each branch to increase the diversity of the individual learners. The learners are optimized with supervised losses during the pre-training phase. After pre-training, features from different branches are concatenated for classifier evaluation. Extensive experiments demonstrate that each branch can complement the others and our method can produce a state-of-the-art performance on multiple few-shot classification benchmark datasets.

PDF Details DOI

EAAI Journal 2022 Journal Article

A review of driver fatigue detection and its advances on the use of RGB-D camera and deep learning

Fan Liu
Delong Chen
Jun Zhou
Feng Xu

Driver fatigue is an essential reason for traffic accidents, which poses a severe threat to people’s lives and property. In this review, we summarize the latest research findings and analyze the developmental trends of driver fatigue detection. Firstly, we analyze and discuss four types of different fatigue detection technologies based on driver physiological signals, behavior features, vehicle running features, and information fusion, respectively. Then, we focus on RGB-D camera and deep learning which are two state-of-the-art solutions in this field. Finally, we present the work on integration of RGB-D camera and deep learning, where Generative Adversarial Networks and multi-channel schemes are utilized to enhance the performance. We conducted experiments to show that the fatigue features extracted by Convolutional Neural Networks are superior to traditional handcrafted ones while single features cannot guarantee robustness. Moreover, the latent fatigue features extracted by deep learning methods have been demonstrated to be effective for fatigue detection.

Details DOI

NeurIPS Conference 2022 Conference Paper

Practical Adversarial Attacks on Spatiotemporal Traffic Forecasting Models

Fan Liu
Hao Liu
Wenzhao Jiang

Machine learning based traffic forecasting models leverage sophisticated spatiotemporal auto-correlations to provide accurate predictions of city-wide traffic states. However, existing methods assume a reliable and unbiased forecasting environment, which is not always available in the wild. In this work, we investigate the vulnerability of spatiotemporal traffic forecasting models and propose a practical adversarial spatiotemporal attack framework. Specifically, instead of simultaneously attacking all geo-distributed data sources, an iterative gradient guided node saliency method is proposed to identify the time-dependent set of victim nodes. Furthermore, we devise a spatiotemporal gradient descent based scheme to generate real-valued adversarial traffic states under a perturbation constraint. Meanwhile, we theoretically demonstrate the worst performance bound of adversarial traffic forecasting attacks. Extensive experiments on two real-world datasets show that the proposed two-step framework achieves up to 67. 8% performance degradation on various advanced spatiotemporal forecasting models. Remarkably, we also show that adversarial training with our proposed attacks can significantly improve the robustness of spatiotemporal traffic forecasting models.

PDF Details

TIST Journal 2015 Journal Article

Local Structure-Based Sparse Representation for Face Recognition

Fan Liu
Jinhui Tang
Yan Song
Liyan Zhang
Zhenmin Tang

This article presents a simple yet effective face recognition method, called local structure-based sparse representation classification (LS_SRC). Motivated by the “divide-and-conquer” strategy, we first divide the face into local blocks and classify each local block, then integrate all the classification results to make the final decision. To classify each local block, we further divide each block into several overlapped local patches and assume that these local patches lie in a linear subspace. This subspace assumption reflects the local structure relationship of the overlapped patches, making sparse representation-based classification (SRC) feasible even when encountering the single-sample-per-person (SSPP) problem. To lighten the computing burden of LS_SRC, we further propose the local structure-based collaborative representation classification (LS_CRC). Moreover, the performance of LS_SRC and LS_CRC can be further improved by using the confusion matrix of the classifier. Experimental results on four public face databases show that our methods not only generalize well to SSPP problem but also have strong robustness to occlusion; little pose variation; and the variations of expression, illumination, and time.

Details DOI

TIST Journal 2015 Journal Article

Real-Time System for Driver Fatigue Detection by RGB-D Camera

Liyan Zhang
Fan Liu
Jinhui Tang

Drowsy driving is one of the major causes of fatal traffic accidents. In this article, we propose a real-time system that utilizes RGB-D cameras to automatically detect driver fatigue and generate alerts to drivers. By introducing RGB-D cameras, the depth data can be obtained, which provides extra evidence to benefit the task of head detection and head pose estimation. In this system, two important visual cues (head pose and eye state) for driver fatigue detection are extracted and leveraged simultaneously. We first present a real-time 3D head pose estimation method by leveraging RGB and depth data. Then we introduce a novel method to predict eye states employing the WLBP feature, which is a powerful local image descriptor that is robust to noise and illumination variations. Finally, we integrate the results from both head pose and eye states to generate the overall conclusion. The combination and collaboration of the two types of visual cues can reduce the uncertainties and resolve the ambiguity that a single cue may induce. The experiments were performed using an inside-car environment during the day and night, and theyfully demonstrate the effectiveness and robustness of our system as well as the proposed methods of predicting head pose and eye states.

Details DOI

AAMAS Conference 2013 Conference Paper

Roundabout Collision Avoidance for Multiple Robots Based on Minimum Enclosing Rectangle

Fan Liu
Ajit Narayanan

This paper describes a novel and dynamic rectangular roundabout (‘rectabout’) collision avoidance system based on Minimum Enclosing Rectangle (MER) paradigm. The approach is fully decentralized maneuver based on equal priority and involves local views. This maneuver is calculated by each robot involved in the possible collision course separately through its own local view. The virtual MER-based rectabout lies in the position of two robots and also is capable of dealing with static obstacles. Simulated demonstrations indicate that rectabout in conjunction with Minimal Predicted Distance (MPD) ensure that all robots remain free of collision, thereby making the procedure well-suited for real-time applications involving a number of independent robots planning routes to their goal destinations.

PDF

AAMAS Conference 2012 Conference Paper

Effective Methods for Generating Collision Free Paths for Multiple Robots based on Collision Type

Fan Liu
Ajit Narayanan
Quan Bai

PDF