Arrow Research search

Author name cluster

Fan Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

34 papers
2 author rows

Possible papers

34

EAAI Journal 2026 Journal Article

Efficient surrogate-based optimization framework integrating physics-informed neural networks, deep active learning and deep reinforcement learning: Multilayer thin films case study

  • Jinglai Zheng
  • Jie Huang
  • Andi Lin
  • Fan Li
  • Haiming Huang

Surrogate-based optimization has been widely applied across various fields. However, existing frameworks suffer from slow data generation, extensive training data requirement, and inefficient design space exploration. To address these limitations, we propose a novel optimization framework integrating physics-informed neural networks (PINNs), deep active learning (DAL) and deep reinforcement learning (DRL). PINNs serve as high-fidelity solvers, accelerating data generation by transfer learning. DAL quantifies the samples uncertainty, training a high-precision surrogate model with few samples. It then acts as the environment for DRL, guiding the agent to rapidly explore optimization strategies. To validate the proposed framework, we employ multilayer thin films as a challenging case study, optimizing the thermal shock resistance in engineering applications. The results indicate that PINNs-based data generation is 38. 37 % faster than finite element method, with a relative error below 0. 68 %. DAL reduces required samples, achieving at least 43. 97 % higher efficiency compared to other models for the same accuracy. Compared with particle swarm optimization and genetic algorithm, DRL achieves efficiency improvements of 66. 34 % and 66. 95 %. Overall, the synergetic integration of these artificial intelligence (AI) methods accelerates the entire optimization process by 70. 75 h compared to existing framework, showing its high efficiency and significant potential in engineering applications.

AAAI Conference 2026 Conference Paper

FusedRec: Fused Embedding Communication for Distributed Recommendation Training on GPUs

  • Xuanteng Huang
  • Fan Li
  • Riyang Hu
  • Jianchang Zhang
  • Yuan Peng
  • Yang Zhou
  • Fangying Chen
  • Xianwei Zhang

Recent years have witnessed the wide adoption of deep learning recommendation models (DLRMs) for many online services. Unlike traditional DNN training, DLRMs leverage massive embeddings to represent sparse features, which are stored in distributed GPUs following the model parallel paradigm. Existing approaches adopt deduplication to eliminate replicated embeddings involved in AltoAll transfers to avoid unnecessary communication. In our practices, we have observed that such a deduplication design exacerbates interconnect inefficiency due to the fragmented embedding transfers with reduced message sizes, hindering the performance of distributed DLRM training. This paper introduces FusedRec, a fused embedding communication and lookup mechanism to tackle the inefficiency due to deduplication. By seeking the opportunities to fuse embeddings from multiple categories into a group, FusedRec conducts the communication in a combined shot to alleviate bandwidth under-utilization. Meanwhile, a categorical-aware hashing algorithm is integrated into FusedRec to retain the category information during lookup without extra communication. Combining with efficient unique and recovery operations, comprehensive results show FusedRec achieves a 37.8% throughput speedup in average compared to the SOTA industry implementation, without hurting the recommendation qualities of our in-house models used in online production environments.

AAAI Conference 2026 Conference Paper

HeadHunt-VAD: Hunting Robust Anomaly-Sensitive Heads in MLLM for Tuning-Free Video Anomaly Detection

  • Zhaolin Cai
  • Fan Li
  • Ziwei Zheng
  • Haixia Bi
  • Lijun He

Video Anomaly Detection (VAD) aims to locate events that deviate from normal patterns in videos. Traditional approaches often rely on extensive labeled data and incur high computational costs. Recent tuning-free methods based on Multimodal Large Language Models (MLLMs) offer a promising alternative by leveraging their rich world knowledge. However, these methods typically rely on textual outputs, which introduces information loss, exhibits normalcy bias, and suffers from prompt sensitivity, making them insufficient for capturing subtle anomalous cues. To address these constraints, we propose HeadHunt-VAD, a novel tuning-free VAD paradigm that bypasses textual generation by directly hunting robust anomaly-sensitive internal attention heads within the frozen MLLM. Central to our method is a Robust Head Identification module that systematically evaluates all attention heads using a multi-criteria analysis of saliency and stability, identifying a sparse subset of heads that are consistently discriminative across diverse prompts. Features from these expert heads are then fed into a lightweight anomaly scorer and a temporal locator, enabling efficient and accurate anomaly detection with interpretable outputs. Extensive experiments show that HeadHunt-VAD achieves state-of-the-art performance among tuning-free methods on two major VAD benchmarks while maintaining high efficiency, validating head-level probing in MLLMs as a powerful and practical solution for real-world anomaly detection.

AAAI Conference 2026 Conference Paper

Invisible Triggers, Visible Threats! Road-Style Adversarial Creation Attack for Visual 3D Detection in Autonomous Driving

  • Jian Wang
  • Lijun He
  • Yixing Yong
  • Haixia Bi
  • Fan Li

Modern autonomous driving (AD) systems leverage 3D object detection to perceive foreground objects in 3D environments for subsequent prediction and planning. Visual 3D detection based on RGB cameras provides a cost-effective solution compared to the LiDAR paradigm. While achieving promising detection accuracy, current deep neural network-based models remain highly susceptible to adversarial examples. The underlying safety concerns motivate us to investigate realistic adversarial attacks in AD scenarios. Previous work has demonstrated the feasibility of placing adversarial posters on the road surface to induce hallucinations in the detector. However, the unnatural appearance of the posters makes them easily noticeable by humans, and their fixed content can be readily targeted and defended. To address these limitations, we propose the AdvRoad to generate diverse road-style adversarial posters. The adversaries have naturalistic appearances resembling the road surface while compromising the detector to perceive non-existent objects at the attack locations. We employ a two-stage approach, termed Road-Style Adversary Generation and Scenario-Associated Adaptation, to maximize the attack effectiveness on the input scene while ensuring the natural appearance of the poster, allowing the attack to be carried out stealthily without drawing human attention. Extensive experiments show that AdvRoad generalizes well to different detectors, scenes, and spoofing locations. Moreover, physical attacks further demonstrate the practical threats in real-world environments.

AAAI Conference 2026 Conference Paper

LAMDA: Two-Phase HPO via Learning Prior from Low-Fidelity Data

  • Fan Li
  • Shengbo Wang
  • Ke Li

Hyperparameter Optimization (HPO) is crucial in machine learning, aiming to optimize hyperparameters to enhance model performance. Although existing methods that leverage prior knowledge—drawn from either previous experiments or expert insights—can accelerate optimization, acquiring a correct prior for a specific HPO task is non-trivial. In this work, we propose to relieve the reliance on external knowledge by learning a reliable prior {directly} from low-fidelity (LF) problems. We introduce {Lamda}, an algorithm-agnostic framework designed to boost any baseline HPO algorithm. Specifically, {Lamda} operates in two phases: (1) it learns a reliable prior by exploring the LF landscape under limited computational budgets, and (2) it leverages this learned prior to guide the HPO process. We showcase how the {Lamda} framework can be integrated with various HPO algorithms to boost their performance, and further conduct theoretical analysis towards the integrated Bayesian optimization and bandit-based Hyperband. We conduct experiments on 56 HPO problems spanning diverse domains and model scales. Results show that {Lamda} consistently enhances its baseline algorithms. Compared to nine state-of-the-art HPO algorithms, our {Lamda} variant achieves the best performance in 51 out of 56 HPO tasks while it is the second best algorithm in the other 5 cases.

AAAI Conference 2026 Conference Paper

Refine3D: Scene-Adaptive Reference Point Refinement for Sparse 3D Object Detection

  • Fan Li
  • Jing Lu
  • Yunlu Xu
  • Changhong Wu
  • Tao Xu
  • Zhaoyi Xiang
  • Yi Niu

Sparse query-based detectors have emerged as the dominant paradigm in camera-only 3D object detection, owing to their exceptional performance and computational efficiency. A central component of these approaches is the use of reference points, which serve as learnable spatial anchors to guide queries in localizing target objects. However, existing methods typically employ a unified set of reference points across all scenes, a design we find suboptimal for handling complex scenarios with highly imbalanced object distributions, such as road intersections or occluded environments. In this paper, we investigate the adaptability of reference points and propose Refine3D, an adaptive refinement mechanism that achieves scene-level alignment between the distribution of reference points and ground-truth objects. In particular, we introduce a novel Reference Point Distribution Loss (RPD-Loss) to ensure reference points converge globally toward object positions, and a Scene-Adaptive Refinement head (SAR-Head) that predicts dynamic offsets for each reference point. Both components can be seamlessly integrated into mainstream sparse detectors. Extensive experiments on two challenging autonomous driving datasets demonstrate that Refine3D outperforms the state-of-the-art with improved detection accuracy and robustness.

AAAI Conference 2026 Conference Paper

RefSTAR: Blind Face Image Restoration with Reference Selection, Transfer, and Reconstruction

  • Zhicun Yin
  • Junjie Chen
  • Ming Liu
  • Zhixin Wang
  • Fan Li
  • Renjing Pei
  • Xiaoming Li
  • Rynson W. H. Lau

Introducing high-quality references can largely alleviate the uncertainty in blind face image restoration tasks, yet the equivocal utilization of reference priors makes it still a struggle to well preserve the human identity. We attribute the identity inconsistency to two deficiencies of existing reference-based face restoration methods, namely the inability to effectively determine which features need to be transferred, and the failure to preserve the structure and details of the selected features. This work mainly focuses on these two issues, and we present a novel blind face image restoration method that considers reference selection, transfer, and reconstruction (RefSTAR) to introduce proper features from reference images. Specifically, we construct a reference selection (RefSel) module, which can generate accurate masks to select reference features. For training the RefSel module, we construct a RefSel-HQ dataset through a mask generation pipeline, which contains annotated masks for 10,000 ground truth-reference pairs. To guarantee the exact introduction of selected reference features, a feature fusion paradigm is designed for reference feature transferring, and a Mask-Compatible Cycle-Consistency Loss is redesigned based on reference reconstruction to further ensure the presence of selected reference image features in the output image. Experiments on various backbone models demonstrate superior performance, showing better identity preservation ability and reference feature transfer quality.

JBHI Journal 2026 Journal Article

Spatiospectral Representation and Neural Decoding of Somatic Perception of Acupuncture Stimulations

  • Haitao Yu
  • Zaidong Lin
  • Fan Li
  • Jialin Liu
  • Chen Liu
  • Jiang Wang

Characterizing the neural representations underlying somatic perception is crucial for neural decoding of external stimulations. Acupuncture is an important therapeutic method of traditional Chinese medicine and can effectively modulate brain activity for the treatment of neural diseases. In this work, we investigate the neural representations based on the power spectral density (PSD) estimated from electroencephalogram (EEG) across the whole brain with deep learning. Frequency and spatial characteristics of PSD can reliably represent the dynamical brain responses to acupuncture with different manipulations, manifesting enhanced alpha power in parietal lobe. By removing aperiodic components, periodic spatial spectrum shows a higher representation ability of different brain states during acupuncture stimulations, and twiring-rotating (TR) manipulation have a more pronounced modulatory effect than lifting-thrusting (LT) manipulation. Moreover, we further infer the low-dimensional feature-disentangled representations with generative adversarial network (GAN), i. e. , w -latents of StyleGAN, which can capture the latent features of periodic spatial spectrum and strike a balance between separability and generalizability. The effectiveness of feature-disentangled representations is evaluated by decoding the acupuncture states, which can achieve a highest accuracy of 95. 71% with Transformer classifier. Compared with high-dimensional spatial spectrum, low-dimensional latent features can best characterize different brain states, indicating a precise representation of somatic perception of acupuncture stimulations. Taken together, our results highlight the significant role of spatial spectral representation underlying somatic perception and serve as an important benchmark for the evaluation of acupuncture effect on human brain.

EAAI Journal 2025 Journal Article

A Dual Two-Stage Attention-based Model for interpretable hard landing prediction from flight data

  • Jiaxing Shang
  • Xiaoquan Li
  • Ruixiang Zhang
  • Linjiang Zheng
  • Xu Li
  • Riquan Zhang
  • Xinbin Zhao
  • Fan Li

Hard landings are a significant safety concern in aviation, with potential consequences ranging from poor passenger experiences to serious injuries or fatalities. Predicting and explaining hard landing events are equally important for enhancing flight safety, the former makes it possible to give proactive warnings, while the latter helps pilots identify the reasons and refine their skills. However, existing studies generally lack a comprehensive consideration for the fine-grained characteristics of flight data containing both inter-temporal and inter-parametric relationships, resulting in suboptimal prediction performance. In addition, most of existing approaches aim at improving the prediction performance but fail to provide interpretability for the causes of hard landing. To address the above problems, we propose DUTSAM, a DUal Two-Stage Attention-based interpretable Model for hard landing prediction from quick access recorder (QAR) data. The model consists of dual parallel modules, each of which combines a convolutional feature encoder and a two-stage attention mechanism. The two encoders capture fine-grained characteristics by encoding multivariate data from temporal domain and parametric domain respectively. After that, the dual two-stage attention mechanism captures the inter-temporal and inter-parametric correlations in reverse order to predict hard landing and provide interpretation from both temporal and parametric perspectives. Experimental results on a real QAR dataset with 37, 920 flights show that DUTSAM achieves better prediction performance compared with other state-of-the-art baselines in terms of Precision, Recall, and F1-score. Additionally, case study demonstrates that DUTSAM can uncover key flight parameters and moments strongly correlated to the hard landing events.

NeurIPS Conference 2025 Conference Paper

Adaptive Gradient Masking for Balancing ID and MLLM-based Representations in Recommendation

  • Yidong Wu
  • Siyuan Chen
  • Binrui Wu
  • Fan Li
  • Jiechao Gao

In large-scale recommendation systems, multimodal (MM) content is increasingly introduced to enhance the generalization of ID features. The rise of Multimodal Large Language Models (MLLMs) enables the construction of unified user and item representations. However, the semantic distribution gap between MM and ID representations leads to \textit{convergence inconsistency} during joint training: the ID branch converges quickly, while the MM branch requires more epochs, thus limiting overall performance. To address this, we propose a two-stage framework including MM representation learning and joint training optimization. First, we fine-tune the MLLM to generate unified user and item representations, and introduce collaborative signals by post-aligning user ID representations to alleviate semantic differences. Then, we propose an Adaptive Gradient Masking (AGM) training strategy to dynamically regulate parameter updates between ID and MLLM branches. AGM estimates the contribution of each representation with mutual information, and applies non-uniform gradient masking at the sub-network level to balance optimization. We provide theoretical analysis of AGM's effectiveness and further introduce an unbiased variant, AGM*, to enhance training stability. Experiments on offline and online A/B tests validate the effectiveness of our approach in mitigating convergence inconsistency and improving performance.

ICML Conference 2025 Conference Paper

Better to Teach than to Give: Domain Generalized Semantic Segmentation via Agent Queries with Diffusion Model Guidance

  • Fan Li
  • Xuan Wang
  • Min Qi
  • Zhaoxiang Zhang 0002
  • Yuelei Xu

Domain Generalized Semantic Segmentation (DGSS) trains a model on a labeled source domain to generalize to unseen target domains with consistent contextual distribution and varying visual appearance. Most existing methods rely on domain randomization or data generation but struggle to capture the underlying scene distribution, resulting in the loss of useful semantic information. Inspired by the diffusion model’s capability to generate diverse variations within a given scene context, we consider harnessing its rich prior knowledge of scene distribution to tackle the challenging DGSS task. In this paper, we propose a novel agent Query -driven learning framework based on Diff usion model guidance for DGSS, named QueryDiff. Our recipe comprises three key ingredients: (1) generating agent queries from segmentation features to aggregate semantic information about instances within the scene; (2) learning the inherent semantic distribution of the scene through agent queries guided by diffusion features; (3) refining segmentation features using optimized agent queries for robust mask predictions. Extensive experiments across various settings demonstrate that our method significantly outperforms previous state-of-the-art methods. Notably, it enhances the model’s ability to generalize effectively to extreme domains, such as cubist art styles. Code is available at https: //github. com/FanLiHub/QueryDiff.

NeurIPS Conference 2025 Conference Paper

CamEdit: Continuous Camera Parameter Control for Photorealistic Image Editing

  • Xinran Qin
  • Zhixin Wang
  • Fan Li
  • Haoyu Chen
  • Renjing Pei
  • Wenbo Li
  • Xiaochun Cao

Recent advances in diffusion models have substantially improved text-driven image editing. However, existing frameworks based on discrete textual tokens struggle to support continuous control over camera parameters and smooth transitions in visual effects. These limitations hinder their applications to realistic, camera-aware, and fine-grained editing tasks. In this paper, we present CamEdit, a diffusion-based framework for photorealistic image editing that enables continuous and semantically meaningful manipulation of common camera parameters such as aperture and shutter speed. CamEdit incorporates a continuous parameter prompting mechanism and a parameter-aware modulation module that guides the model in smoothly adjusting focal plane, aperture, and shutter speed, reflecting the effects of varying camera settings within the diffusion process. To support supervised learning in this setting, we introduce CamEdit50K, a dataset specifically designed for photorealistic image editing with continuous camera parameter settings. It contains over 50k image pairs combining real and synthetic data with dense camera parameter variations across diverse scenes. Extensive experiments demonstrate that CamEdit enables flexible, consistent, and high-fidelity image editing, achieving state-of-the-art performance in camera-aware visual manipulation and fine-grained photographic control.

IROS Conference 2025 Conference Paper

Consistent Feature Alignment for Cross-Modal Knowledge Distillation in Monocular 3D Object Detection

  • Fan Li
  • Rui Ding
  • Meng Yang 0002
  • Xuguang Lan

Cross-modal knowledge distillation (CMKD) in monocular 3D object detection transfers LiDAR’s accurate depth information to compensate for the limitations of camera model. However, current methods directly align the intermediate features of the teacher and student networks, in which the modality gap between LiDAR and camera hinders their effectiveness. To mitigate this issue, we design two modules, namely, Consistent Alignment Module (CAM) and Deformable Adapter Module (DAM) to reduce the modality gap of CMKD. The CAM transforms intermediate features of LiDAR and camera into some consistent features through a lightweight Target Head. It is based on the observation that some high-level features such as heatmaps and depths are highly correlated in CMKD, though modality gap appears between LiDAR and camera. Therefore, these features can be effectively transferred from teacher to student in CMKD. The DAM introduces a deformable adapter for the intermediate features of the student network to reduce background noise in CMKD. This helps to dynamically align its intermediate features with the teacher network. We then propose a Consistent Feature Alignment network (MonoCFA) for CMKD to boost monocular 3D object detection. Our network integrates the two designed modules at different levels of the teacher and student networks, in order to align the intermediate features of LiDAR and camera more accurately and reliably. Our model can be widely applied to existing monocular 3D object detection models. For validation, we choose the representative MonoDLE, GUPNet, and DID-M3D as base models. Experiments on the KITTI benchmark show that our method significantly outperforms the three base models by 39%, 15. 5%, and 15%, respectively, and achieves state-of-the-art when compared to other CMKD models.

NeurIPS Conference 2025 Conference Paper

No Object Is an Island: Enhancing 3D Semantic Segmentation Generalization with Diffusion Models

  • Fan Li
  • Xuan Wang
  • Xuanbin Wang
  • Zhaoxiang Zhang
  • Yuelei Xu

Enhancing the cross-domain generalization of 3D semantic segmentation is a pivotal task in computer vision that has recently gained increasing attention. Most existing methods, whether using consistency regularization or cross-modal feature fusion, focus solely on individual objects while overlooking implicit semantic dependencies among them, resulting in the loss of useful semantic information. Inspired by the diffusion model's ability to flexibly compose diverse objects into high-quality images across varying domains, we seek to harness its capacity for capturing underlying contextual distributions and spatial arrangements among objects to address the challenging task of cross-domain 3D semantic segmentation. In this paper, we propose a novel cross-modal learning framework based on diffusion models to enhance the generalization of 3D semantic segmentation, named XDiff3D. XDiff3D comprises three key ingredients: (1) constructing object agent queries from diffusion features to aggregate instance semantic information; (2) decoupling fine-grained local details from object agent queries to prevent interference with 3D semantic representation; (3) leveraging object agent queries as an interface to enhance the modeling of object semantic dependencies in 3D representations. Extensive experiments validate the effectiveness of our method, achieving state-of-the-art performance across multiple benchmarks in different task settings. Code is available at \url{https: //github. com/FanLiHub/XDiff3D}.

IJCAI Conference 2025 Conference Paper

PCAN: A Pandemic-Compatible Attentive Neural Network for Retail Sales Forecasting

  • Fan Li
  • Guoxuan Wang
  • Huiyu Chu
  • Dawei Cheng
  • Xiaoyang Wang

The outbreak of pandemic has a huge impact on production and consumption in the business world, especially for the retail sector. As a crucial component of decision-support technology in the retail industry, sales forecasting is significant for production planning and optimizing the supply of essential goods during the pandemic. However, due to the irregular fluctuation pattern caused by uncertainty and the complex temporal correlation between multiple covariates and sales, there is still no effective approach for sales forecasting in this extreme event. To fill this gap, we propose a Pandemic-Compatible Attentive Network (PCAN) for retail sales forecasting. Specifically, to capture the irregular fluctuation patterns from the sales series, we design a fluctuation attention mechanism based on association discrepancy in the time series. Then, a parallel attention module is developed to learn the complex relationship between target sales and various dynamic influence factors in a decoupled manner. Finally, we introduce a novel rectification decoding strategy to indicate fluctuation points in prediction. By evaluating PCAN on four real-world retail food datasets from the SF Express international supply chain system, the results show that our method achieves superior performance over the existing state-of-the-art baselines. The model has been deployed in the supply chain system as a fundamental component to serve a world-leading food retailer.

NeurIPS Conference 2025 Conference Paper

PocketSR: The Super-Resolution Expert in Your Pocket Mobiles

  • Haoze Sun
  • Linfeng Jiang
  • Fan Li
  • Renjing Pei
  • Zhixin Wang
  • Yong Guo
  • Jiaqi Xu
  • Haoyu Chen

Real-world image super-resolution (RealSR) aims to enhance the visual quality of in-the-wild images, such as those captured by mobile phones. While existing methods leveraging large generative models demonstrate impressive results, the high computational cost and latency make them impractical for edge deployment. In this paper, we introduce PocketSR, an ultra-lightweight, single-step model that brings generative modeling capabilities to RealSR while maintaining high fidelity. To achieve this, we design LiteED, a highly efficient alternative to the original computationally intensive VAE in SD, reducing parameters by 97. 5\% while preserving high-quality encoding and decoding. Additionally, we propose online annealing pruning for the U-Net, which progressively shifts generative priors from heavy modules to lightweight counterparts, ensuring effective knowledge transfer and further optimizing efficiency. To mitigate the loss of prior knowledge during pruning, we incorporate a multi-layer feature distillation loss. Through an in-depth analysis of each design component, we provide valuable insights for future research. PocketSR, with a model size of 146M parameters, processes 4K images in just 0. 8 seconds, achieving a remarkable speedup over previous methods. Notably, it delivers performance on par with state-of-the-art single-step and even multi-step RealSR models, making it a highly practical solution for edge-device applications.

IJCAI Conference 2025 Conference Paper

Priority Guided Explanation for Knowledge Tracing with Dual Ranking and Similarity Consistency

  • Fan Li
  • Tiancheng Zhang
  • Yifang Yin
  • Minghe Yu
  • Mengxiang Wang
  • Ge Yu

Knowledge tracing plays a pivotal role in enabling personalized learning on online platforms. While deep learning-based approaches have achieved impressive predictive performance, their limited interpretability poses a significant barrier to practical adoption. Existing explanation methods primarily focus on specific model architectures and fall short in 1) explicitly prioritizing critical interactions to generate fine-grained explanations, and 2) maintaining similarity consistency across interaction importance. These limitations hinder actionable insights for improving student outcomes. To bridge the gap, we propose a model-agnostic approach that provides enhanced explanations applicable to diverse knowledge tracing methods. Specifically, we propose a novel ranking loss designed to explicitly optimize the importance ranking of past interactions by comparing their corresponding perturbed outputs. Furthermore, we introduce a similarity loss to capture temporal dependencies, ensuring consistency in the assigned importance scores for conceptually similar interactions. Extensive experiments conducted on various knowledge tracing models and benchmark datasets demonstrate substantial enhancements in explanation quality.

EAAI Journal 2024 Journal Article

A benchmarking framework for eye-tracking-based vigilance prediction of vessel traffic controllers

  • Zhimin Li
  • Ruilin Li
  • Liqiang Yuan
  • Jian Cui
  • Fan Li

Vessel Traffic Controllers (VTCs) play a crucial role in ensuring safe navigation by maintaining a high level of vigilance. Eye-tracking has been identified as one of the most popular bio-signals for vigilance prediction. However, the existing studies on eye-tracking-based vigilance prediction usually utilized a limited set of features for analysis. A comprehensive analysis of various eye-tracking features and a unified model with general high performance remains a gap in current research. To address this issue, this study introduces a benchmarking framework for eye-tracking-based vigilance prediction and feature analysis. In the framework, a hierarchical analysis method is proposed, which explores a diverse set of eye-tracking features at both individual and group levels. Additionally, a vigilance ensemble model is proposed. Model interpretation is carried out by using the Shapley additive explanation (SHAP) method. The results highlight the superior performance of the proposed ensemble model. Moreover, a comparative analysis between professional VTCs and novices is performed. High-importance features and feature groups are identified separately. Upon comparison, it is also found that professionals demonstrate efficient attention allocation, while novices exhibit unique patterns influenced by their exploration strategies and scattered attention. The conclusions can serve as a valuable reference for maritime practitioners and provide insights into vigilance prediction across various domains.

IJCAI Conference 2024 Conference Paper

Hypergraph Self-supervised Learning with Sampling-efficient Signals

  • Fan Li
  • Xiaoyang Wang
  • Dawei Cheng
  • Wenjie Zhang
  • Ying Zhang
  • Xuemin Lin

Self-supervised learning (SSL) provides a promising alternative for representation learning on hypergraphs without costly labels. However, existing hypergraph SSL models are mostly based on contrastive methods with the instance-level discrimination strategy, suffering from two significant limitations: (1) They select negative samples arbitrarily, which is unreliable in deciding similar and dissimilar pairs, causing training bias. (2) They often require a large number of negative samples, resulting in expensive computational costs. To address the above issues, we propose SE-HSSL, a hypergraph SSL framework with three sampling-efficient self-supervised signals. Specifically, we introduce two sampling-free objectives leveraging the canonical correlation analysis as the node-level and group-level self-supervised signals. Additionally, we develop a novel hierarchical membership-level contrast objective motivated by the cascading overlap relationship in hypergraphs, which can further reduce membership sampling bias and improve the efficiency of sample utilization. Through comprehensive experiments on 7 real-world hypergraphs, we demonstrate the superiority of our approach over the state-of-the-art method in terms of both effectiveness and efficiency.

IJCAI Conference 2024 Conference Paper

LLMs Can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought

  • Zhuoxuan Jiang
  • Haoyuan Peng
  • Shanshan Feng
  • Fan Li
  • Dongsheng Li

Self-correction is emerging as a promising approach to mitigate the issue of hallucination in Large Language Models (LLMs). To facilitate effective self-correction, recent research has proposed mistake detection as its initial step. However, current literature suggests that LLMs often struggle with reliably identifying reasoning mistakes when using simplistic prompting strategies. To address this challenge, we introduce a unique prompting strategy, termed the Pedagogical Chain-of-Thought (PedCoT), which is specifically designed to guide the identification of reasoning mistakes, particularly mathematical reasoning mistakes. PedCoT consists of pedagogical principles for prompts (PPP) design, two-stage interaction process (TIP) and grounded PedCoT prompts, all inspired by the educational theory of the Bloom Cognitive Model (BCM). We evaluate our approach on two public datasets featuring math problems of varying difficulty levels. The experiments demonstrate that our zero-shot prompting strategy significantly outperforms strong baselines. The proposed method can achieve the goal of reliable mathematical mistake identification and provide a foundation for automatic math answer grading. The results underscore the significance of educational theory, serving as domain knowledge, in guiding prompting strategy design for addressing challenging tasks with LLMs effectively.

IJCAI Conference 2023 Conference Paper

Fighting against Organized Fraudsters Using Risk Diffusion-based Parallel Graph Neural Network

  • Jiacheng Ma
  • Fan Li
  • Rui Zhang
  • Zhikang Xu
  • Dawei Cheng
  • Yi Ouyang
  • Ruihui Zhao
  • Jianguang Zheng

Medical insurance plays a vital role in modern society, yet organized healthcare fraud causes billions of dollars in annual losses, severely harming the sustainability of the social welfare system. Existing works mostly focus on detecting individual fraud entities or claims, ignoring hidden conspiracy patterns. Hence, they face severe challenges in tackling organized fraud. In this paper, we proposed RDPGL, a novel Risk Diffusion-based Parallel Graph Learning approach, to fighting against medical insurance criminal gangs. In particular, we first leverage a heterogeneous graph attention network to encode the local context from the beneficiary-provider graph. Then, we devise a community-aware risk diffusion model to infer the global context of organized fraud behaviors with the claim-claim relation graph. The local and global representations are parallel concatenated together and trained simultaneously in an end-to-end manner. Our approach is extensively evaluated on a real-world medical insurance dataset. The experimental results demonstrate the superiority of our proposed approach, which could detect more organized fraud claims with relatively high precision compared with state-of-the-art baselines.

NeurIPS Conference 2022 Conference Paper

Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization

  • Qing Guo
  • Junya Chen
  • Dong Wang
  • Yuewei Yang
  • Xinwei Deng
  • Jing Huang
  • Larry Carin
  • Fan Li

Successful applications of InfoNCE (Information Noise-Contrastive Estimation) and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variational MI bounds from the lens of unnormalized statistical modeling and convex optimization. Our investigation yields a new unified theoretical framework encompassing popular variational MI bounds, and leads to a novel, simple, and powerful contrastive MI estimator we name FLO. Theoretically, we show that the FLO estimator is tight, and it converges under stochastic gradient descent. Empirically, the proposed FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using extensive benchmarks, and we further inspire the community with novel applications in meta-learning. Our presentation underscores the foundational importance of variational MI estimation in data-efficient learning.

YNIMG Journal 2020 Journal Article

A multi-model deep convolutional neural network for automatic hippocampus segmentation and classification in Alzheimer’s disease

  • Manhua Liu
  • Fan Li
  • Hao Yan
  • Kundong Wang
  • Yixin Ma
  • Li Shen
  • Mingqing Xu

Alzheimer's disease (AD) is a progressive and irreversible brain degenerative disorder. Mild cognitive impairment (MCI) is a clinical precursor of AD. Although some treatments can delay its progression, no effective cures are available for AD. Accurate early-stage diagnosis of AD is vital for the prevention and intervention of the disease progression. Hippocampus is one of the first affected brain regions in AD. To help AD diagnosis, the shape and volume of the hippocampus are often measured using structural magnetic resonance imaging (MRI). However, these features encode limited information and may suffer from segmentation errors. Additionally, the extraction of these features is independent of the classification model, which could result in sub-optimal performance. In this study, we propose a multi-model deep learning framework based on convolutional neural network (CNN) for joint automatic hippocampal segmentation and AD classification using structural MRI data. Firstly, a multi-task deep CNN model is constructed for jointly learning hippocampal segmentation and disease classification. Then, we construct a 3D Densely Connected Convolutional Networks (3D DenseNet) to learn features of the 3D patches extracted based on the hippocampal segmentation results for the classification task. Finally, the learned features from the multi-task CNN and DenseNet models are combined to classify disease status. Our method is evaluated on the baseline T1-weighted structural MRI data collected from 97 AD, 233 MCI, 119 Normal Control (NC) subjects in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The proposed method achieves a dice similarity coefficient of 87.0% for hippocampal segmentation. In addition, the proposed method achieves an accuracy of 88.9% and an AUC (area under the ROC curve) of 92.5% for classifying AD vs. NC subjects, and an accuracy of 76.2% and an AUC of 77.5% for classifying MCI vs. NC subjects. Our empirical study also demonstrates that the proposed multi-model method outperforms the single-model methods and several other competing methods.

IROS Conference 2020 Conference Paper

Learning Consistency Pursued Correlation Filters for Real-Time UAV Tracking

  • Changhong Fu 0001
  • Xiaoxiao Yang
  • Fan Li
  • Juntao Xu
  • Changjing Liu
  • Peng Lu 0003

Correlation filter (CF)-based methods have demonstrated exceptional performance in visual object tracking for unmanned aerial vehicle (UAV) applications, but suffer from the undesirable boundary effect. To solve this issue, spatially regularized correlation filters (SRDCF) proposes the spatial regularization to penalize filter coefficients, thereby significantly improving the tracking performance. However, the temporal information hidden in the response maps is not considered in SRDCF, which limits the discriminative power and the robustness for accurate tracking. This work proposes a novel approach with dynamic consistency pursued correlation filters, i. e. , the CPCF tracker. Specifically, through a correlation operation between adjacent response maps, a practical consistency map is generated to represent the consistency level across frames. By minimizing the difference between the practical and the scheduled ideal consistency map, the consistency level is constrained to maintain temporal smoothness, and rich temporal information contained in response maps is introduced. Besides, a dynamic constraint strategy is proposed to further improve the adaptability of the proposed tracker in complex situations. Comprehensive experiments are conducted on three challenging UAV benchmarks, i. e. , UAV123@10FPS, UAVDT, and DTB70. Based on the experimental results, the proposed tracker favorably surpasses the other 25 state-of-the-art trackers with real-time running speed (~43FPS) on a single CPU.

NeurIPS Conference 2020 Conference Paper

Reconsidering Generative Objectives For Counterfactual Reasoning

  • Danni Lu
  • Chenyang Tao
  • Junya Chen
  • Fan Li
  • Feng Guo
  • Lawrence Carin

There has been recent interest in exploring generative goals for counterfactual reasoning, such as individualized treatment effect (ITE) estimation. However, existing solutions often fail to address issues that are unique to causal inference, such as covariate balancing and (infeasible) counterfactual validation. As a step towards more flexible, scalable and accurate ITE estimation, we present a novel generative Bayesian estimation framework that integrates representation learning, adversarial matching and causal estimation. By appealing to the Robinson decomposition, we derive a reformulated variational bound that explicitly targets the causal effect estimation rather than specific predictive goals. Our procedure acknowledges the uncertainties in representation and solves a Fenchel mini-max game to resolve the representation imbalance for better counterfactual generalization, justified by new theory. Further, the latent variable formulation employed enables robustness to unobservable latent confounders, extending the scope of its applicability. The utility of the proposed solution is demonstrated via an extensive set of tests against competing solutions, both under various simulation setups and to real-world datasets, with encouraging results reported.

ICRA Conference 2020 Conference Paper

Training-Set Distillation for Real-Time UAV Object Tracking

  • Fan Li
  • Changhong Fu 0001
  • Fuling Lin
  • Yiming Li 0003
  • Peng Lu 0003

Correlation filter (CF) has recently exhibited promising performance in visual object tracking for unmanned aerial vehicle (UAV). Such online learning method heavily depends on the quality of the training-set, yet complicated aerial scenarios like occlusion or out of view can reduce its reliability. In this work, a novel time slot-based distillation approach is proposed to efficiently and effectively optimize the training-set's quality on the fly. A cooperative energy minimization function is established to score the historical samples adaptively. To accelerate the scoring process, frames with high confident tracking results are employed as the keyframes to divide the tracking process into multiple time slots. After the establishment of a new slot, the weighted fusion of the previous samples generates one key-sample, in order to reduce the number of samples to be scored. Besides, when the current time slot exceeds the maximum frame number, which can be scored, the sample with the lowest score will be discarded. Consequently, the training-set can be efficiently and reliably distilled. Comprehensive tests on two well-known UAV benchmarks prove the effectiveness of our method with real-time speed on single CPU.

YNIMG Journal 2014 Journal Article

A semi-parametric nonlinear model for event-related fMRI

  • Tingting Zhang
  • Fan Li
  • Marlen Z. Gonzalez
  • Erin L. Maresh
  • James A. Coan

Nonlinearity in evoked hemodynamic responses often presents in event-related fMRI studies. Volterra series, a higher-order extension of linear convolution, has been used in the literature to construct a nonlinear characterization of hemodynamic responses. Estimation of the Volterra kernel coefficients in these models is usually challenging due to the large number of parameters. We propose a new semi-parametric model based on Volterra series for the hemodynamic responses that greatly reduces the number of parameters and enables “information borrowing” among subjects. This model assumes that in the same brain region and under the same stimulus, the hemodynamic responses across subjects share a common but unknown functional shape that can differ in magnitude, latency and degree of interaction. We develop a computationally-efficient strategy based on splines to estimate the model parameters, and a hypothesis test on nonlinearity. The proposed method is compared with several existing methods via extensive simulations, and is applied to a real event-related fMRI study.

YNIMG Journal 2013 Journal Article

A semi-parametric model of the hemodynamic response for multi-subject fMRI data

  • Tingting Zhang
  • Fan Li
  • Lane Beckes
  • James A. Coan

A semi-parametric model for estimating hemodynamic response function (HRF) from multi-subject fMRI data is introduced within the context of the General Linear Model. The new model assumes that the HRFs for a fixed brain voxel under a given stimulus share the same unknown functional form across subjects, but differ in height, time to peak, and width. A nonparametric spline-smoothing method is developed to evaluate this common functional form, based on which subject-specific characteristics of the HRFs can be estimated. This semi-parametric model explicitly characterizes the common properties shared across subjects and is flexible in describing various brain hemodynamic activities across different regions and stimuli. In addition, the temporal differentiability of the employed spline basis enables an easy-to-compute way of evaluating latency and width differences in hemodynamic activity. The proposed method is applied to data collected as part of an ongoing study of socially mediated emotion regulation. Comparison with several existing methods is conducted through simulations and real data analysis.

YNIMG Journal 2012 Journal Article

Nonparametric inference of the hemodynamic response using multi-subject fMRI data

  • Tingting Zhang
  • Fan Li
  • Lane Beckes
  • Casey Brown
  • James A. Coan

Estimation and inferences for the hemodynamic response functions (HRF) using multi-subject fMRI data are considered. Within the context of the General Linear Model, two new nonparametric estimators for the HRF are proposed. The first is a kernel-smoothed estimator, which is used to construct hypothesis tests on the entire HRF curve, in contrast to only summaries of the curve as in most existing tests. To cope with the inherent large data variance, we introduce a second approach which imposes Tikhonov regularization on the kernel-smoothed estimator. An additional bias-correction step, which uses multi-subject averaged information, is introduced to further improve efficiency and reduce the bias in estimation for individual HRFs. By utilizing the common properties of brain activity shared across subjects, this is the main improvement over the standard methods where each subject's data is usually analyzed independently. A fast algorithm is also developed to select the optimal regularization and smoothing parameters. The proposed methods are compared with several existing regularization methods through simulations. The methods are illustrated by an application to the fMRI data collected under a psychology design employing the Monetary Incentive Delay (MID) task.

TAAS Journal 2009 Journal Article

Self-organizing fault-tolerant topology control in large-scale three-dimensional wireless networks

  • Yu Wang
  • Lijuan Cao
  • Teresa A. Dahlberg
  • Fan Li
  • Xinghua Shi

Topology control protocol aims to efficiently adjust the network topology of wireless networks in a self-adaptive fashion to improve the performance and scalability of networks. This is especially essential to large-scale multihop wireless networks (e.g., wireless sensor networks). Fault-tolerant topology control has been studied recently. In order to achieve both sparseness (i.e., the number of links is linear with the number of nodes) and fault tolerance (i.e., can survive certain level of node/link failures), different geometric topologies were proposed and used as the underlying network topologies for wireless networks. However, most of the existing topology control algorithms can only be applied to two-dimensional (2D) networks where all nodes are distributed in a 2D plane. In practice, wireless networks may be deployed in three-dimensional (3D) space, such as under water wireless sensor networks in ocean or mobile ad hoc networks among space shuttles in space. This article seeks to investigate self-organizing fault-tolerant topology control protocols for large-scale 3D wireless networks. Our new protocols not only guarantee k -connectivity of the network, but also ensure the bounded node degree and constant power stretch factor even under k −1 node failures. All of our proposed protocols are localized algorithms, which only use one-hop neighbor information and constant messages with small time complexity. Thus, it is easy to update the topology efficiently and self-adaptively for large-scale dynamic networks. Our simulation confirms our theoretical proofs for all proposed 3D topologies.

NeurIPS Conference 2005 Conference Paper

From Lasso regression to Feature vector machine

  • Fan Li
  • Yiming Yang
  • Eric Xing

Lasso regression tends to assign zero weights to most irrelevant or redundant features, and hence is a promising technique for feature selection. Its limitation, however, is that it only offers solutions to linear models. Kernel machines with feature scaling techniques have been studied for feature selection with non-linear models. However, such approaches require to solve hard non-convex optimization problems. This paper proposes a new approach named the Feature Vector Machine (FVM). It reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with non-linear models by introducing kernels defined on feature vectors. FVM generates sparse solutions in the nonlinear feature space and it is much more tractable compared to feature scaling kernel machines. Our experiments with FVM on simulated data show encouraging results in identifying the small number of dominating features that are non-linearly correlated to the response, a task the standard Lasso fails to complete.

AAAI Conference 2005 Conference Paper

Using Modified Lasso Regression to Learn Large Undirected Graphs in a Probabilistic Framework

  • Fan Li

Learning the structures of large undirected graphs with thousands of nodes from data has been an open challenge. In this paper, we use graphical Gaussian model (GGM) as the underlying model and propose a novel ARD style Wishart prior for the precision matrix of the GGM, which encodes the graph structure we want to learn. With this prior, we can get the MAP estimation of the precision matrix by solving (a modified version of) Lasso regressions and achieve a sparse solution. We use our approach to learn genetic regulatory networks from genome-wide expression microarray data and proteinbinding location analysis data. Evaluated on the basis of consistency with the GO annotations, the experiments show that our approach has a much better performance than the clustering-based approaches and BN learning approaches in discovering gene regulatory modules. Key words: data mining, machine learning, Bayesian networks.

JMLR Journal 2004 Journal Article

RCV1: A New Benchmark Collection for Text Categorization Research

  • David D. Lewis
  • Yiming Yang
  • Tony G. Rose
  • Fan Li

Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made available by Reuters, Ltd. for research purposes. Use of this data for research on text categorization requires a detailed understanding of the real world constraints under which the data was produced. Drawing on interviews with Reuters personnel and access to Reuters documentation, we describe the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data. We refer to the original data as RCV1-v1, and the corrected data as RCV1-v2. We benchmark several widely used supervised learning methods on RCV1-v2, illustrating the collection's properties, suggesting new directions for research, and providing baseline results for future studies. We make available detailed, per-category experimental results, as well as corrected versions of the category assignments and taxonomy structures, via online appendices. [abs] [ pdf ][ ps.gz ][ ps ] [ appendices ]