Arrow Research search

Author name cluster

Lin Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

56 papers
2 author rows

Possible papers

56

AAAI Conference 2026 Short Paper

Atom-level Adaptive Receptive Fields: A Pruning-Based Encoder for 2D Molecular Graphs (Student Abstract)

  • Yuhao Zhang
  • Ningkang Peng
  • Yafei Liu
  • Lin Li
  • Masaru Kitsuregawa
  • Yanhui Gu

The two-dimensional (2D) graph structure of a molecule encodes abundant latent property information. A well-designed molecular graph encoder can capture informative low-dimensional dense representations of molecules, which can subsequently be applied to a widerange of downstream tasks. To achieve fine-grained anddiscriminative molecular representations that capture localized structural information, we propose an novel atom-level adaptive receptive field encoder, enabling each atomic node in the molecular graph to dynamically adjust its receptive field size. To the best of our knowledge, we are the first to introduce an effective rank-guided pruning strategy for 2D molecular graphs.

AAAI Conference 2026 Conference Paper

C-GNN-PRUNE: A Unified Graph-Based Framework for Structure-Aware Pruning of Mixture-of-Experts Models

  • Lin Li
  • Yan Wang
  • Zhuopeng Wang

The Mixture-of-Experts (MoE) architecture has emerged as a promising paradigm for scaling large language models (LLMs) by activating only a sparse subset of experts per input. However, its massive parameter size remains a major obstacle to efficient deployment. Existing pruning methods often ignore two key aspects: the intricate structural dependencies among experts and the heterogeneous importance of different layers. To tackle these issues, we propose C-GNN-PRUNE, a unified and structure-aware compression framework tailored for MoE models. Our method introduces an EntropyGuided Allocation Module that dynamically assigns pruning budgets by leveraging expert activation entropy, enabling adaptive handling of inter-layer heterogeneity. To preserve structural collaboration patterns, we construct an expert interaction graph that fuses functional similarity and routing behavior, and employ a GNN-Based Embedding Module to learn structure-aware expert representations. These embeddings, along with co-activation patterns, are fed into a Community Detection Module to identify expert clusters for structured pruning. Finally, an Activation-Aware Selection Module retains the most critical experts in each community, balancing sparsity and expressiveness. Experiments on multiple open-source MoE models demonstrate that C-GNN-PRUNE consistently outperforms prior methods under various pruning ratios, achieving better trade-offs between compression and accuracy. This framework provides a modular and effective solution for structure-preserving compression of large-scale MoE models.

AAAI Conference 2026 Conference Paper

Exploring Selective Avoidance for Online User Behavior Analysis: A Forest of Thought Explanation

  • Xiaohua Wu
  • Lin Li
  • Kaize Shi
  • Xiaohui Tao
  • Jianwei Zhang
  • Yuefeng Li

The response behaviors observed in online user-generated content (UGC) frequently demonstrate non-linear characteristics, such as conditional branching and selective avoidance. These patterns present additional challenges for ensuring the trustworthiness of Large Language Model (LLMs) reasoning, particularly as their unidirectional, left-to-right inference mechanisms may not adequately capture such complex reasoning dynamics. To address this, we propose a Forest of Thought Explanation (FoTE), a novel prompting that models the selective avoidance in UGC while ensuring explanation consensus through reasoning paths across all decision sub-trees. FoTE firstly generates various reasoning paths through an adaptive CoT prompting. Each generated thought is subsequently evaluated through cooperative game theory to quantify its fair influence. The thoughts with the top-k contribution scores are preserved and randomly sampled to emulate selective avoidance for the next reasoning iteration. Through extensive evaluations across three open-source LLMs and two established social science problems (spanning four benchmark datasets), FoTE demonstrates superior success rates compared to competing prompting strategies. Notably, its performance gains increase with the strength of selective avoidance in social problems. The trustworthiness of our FoTE is enhanced by the incorporation of (1) a solid theoretical foundation and (2) a transparent reasoning path that converges toward consensus.

AAAI Conference 2026 Conference Paper

Guided Distillation and Risk Adaptive Evolution for Multi-Robot Navigation

  • Xuyang Li
  • Jianwu Fang
  • Lin Li
  • Boyuan Chen
  • Guangliang Li
  • Jianru Xue

Recent advancements in multi-robot navigation have explored methods that combine Large Language Models (LLMs) for tasks like scene understanding or high-level decision-making. However, these approaches face challenges with high inference latency and potential hallucinations. To address these challenges, we propose a knowledge-driven Reinforcement Learning (RL) framework, GUIDER, that utilizes an LLM in two different offline roles. First, we leverage the LLM as an offline knowledge source. Its expertise is distilled into a compact model, which is applied only when the RL agent is uncertain about its own value estimates and the model itself is confident in its prediction. Additionally, we utilize the LLM as an offline semantic engine. This process translates the LLM's high-level understanding of situational risk into a dynamic adjustment of the RL agent's behavioral style, evolving a function that optimally balances conservative and aggressive actions. We conduct extensive experiments in both terrestrial and maritime settings. Across all maritime scenarios (3–12 robots), GUIDER improves the task success rate and reduces the collision rate significantly compared to the state-of-the-art RL-based multi-robot navigation methods.

AAAI Conference 2026 Conference Paper

How Foundational Skills Influence VLM-based Embodied Agents: A Native Perspective

  • Bo Peng
  • Pi Bu
  • Keyu Pan
  • Xinrun Xu
  • Yingxiu Zhao
  • Miao Chen
  • Yang Du
  • Lin Li

Recent advances in vision–language models (VLMs) have shed light on human-level embodied intelligence. However, existing benchmarks for VLM-driven embodied agents still rely on high-level commands or discretised action spaces—``non-native'' settings that diverge markedly from the real world. Moreover, current benchmarks focus exclusively on high-level tasks, while lacking joint evaluation and analysis on both low- and high-level. To bridge these gaps, we present \textbf{NativeEmbodied}, a challenging benchmark for VLM-driven embodied agents that adopts a unified, native low-level action space. Built upon diverse simulated scenes, NativeEmbodied first designs three representative high-level tasks in complex scenarios to evaluate overall performance. For more detailed and comprehensive performance analysis, we further decouple the entangled skills behind complex tasks and construct four types of low-level tasks, each corresponding to a key fundamental embodied skill. This joint evaluation across task and skill granularities enables a fine-grained assessment of embodied agent. Comprehensive experiments on the best VLMs reveal pronounced deficiencies in certain fundamental embodied skills. Further analysis shows that these bottlenecks severely constrain performance on high-level tasks. Our NativeEmbodied not only pinpoints the key challenges faced by current VLM-driven embodied agents, but also provides valuable insight for future development of this field.

JBHI Journal 2026 Journal Article

HSD: Hough-Based Structure-Aware Detection of B-Lines in Lung Ultrasound

  • Tuo Liu
  • Hao Zhou
  • Jia-Hao Wang
  • Yu Zhang
  • Chen Chen
  • Yang Chen
  • Guang-Quan Zhou
  • Lin Li

B-lines are artifacts produced by the interaction of the ultrasound with the small air-liquid interface, which often serve as crucial biomarkers for evaluating lung pathology, such as the presence of liquid. However, due to the reverberation phenomenon, B-lines manifest as blurred, strip-like comet tails perpendicularly originating from the pleural line, making their automatic identification in speckle-noisy ultrasound images particularly challenging. This study proposes a Hough-based structure-aware detection framework, dubbed HSD, which leverages structural priors and the intrinsic relationship between the pleural line and B-lines to enhance B-line detection in ultrasound images. First, the proposed method adopts the shared encoder and two collaborative decoders to improve B-lines identification with the auxiliary pleural line detection, ensuring effective representation learning of linear structural features under inherent prior constraints. Specifically, one decoder incorporates Hough-based regression to reinforce the modeling of the global linear nature for B-line detection, alleviating the appearance influences of the fuzzy comet-tail. Simultaneously, another pathway enhances the exploration of the slender, curved morphology by integrating semantic context learning with linear heatmap regression, thereby facilitating the detection of the pleural line for calibration of B-lines. Second, we introduce a position-aware rectification module to ensure the consistency of the pleural line and its perpendicular alignment with B-lines. This post-processing module reduces the influence of ambiguous pixels, improving the robustness of B-line detection. Extensive experimental results on an in-house ultrasound dataset demonstrate the superiority of the proposed approach, which achieves a precision of 0. 743, a recall of 0. 953, and an F-measure of 0. 837, substantially ahead of other methods, suggesting its potential for detecting pathological indicators in lung ultrasound.

EAAI Journal 2026 Journal Article

Integrating domain knowledge in AI-driven geopolymer concrete property prediction system: A comprehensive survey

  • Yunong Li
  • Sasinipha Makklang
  • Lin Li
  • Zhengxin Chen

Artificial intelligence-driven geopolymer concrete property prediction systems have been increasingly adopted to support manufacturing-oriented mix design and performance forecasting. However, existing datasets used in these systems commonly exhibit four issues—unclear parameter selection, narrow range, insufficient samples, and incomplete consideration of influencing factors—which collectively constrain model reliability and interpretability for practical deployment. This paper presents the first comprehensive survey of domain knowledge integration between model input parameters and production-related factors affecting geopolymer concrete. Structured evidence mapping approach was applied to systematically analyze 47 published models, using seven study questions to examine algorithm selection, input parameter design, evaluation strategies, and interpretability practices. The review identifies 34 algorithm categories and 102 distinct input parameters used in existing models, while revealing 10 critical domain-informed factors that are consistently underrepresented. Finally, this study proposes a structured knowledge base for improving the reliability and interpretability of model prediction, providing guidance for the development of next-generation models suitable for industrial manufacturing applications.

AAAI Conference 2026 Conference Paper

Modeling Item-Level Dynamic Variability with Residual Diffusion for Bundle Recommendation

  • Dong Zhang
  • Lin Li
  • Ming Li
  • Amran Bhuiyan
  • Meng Sun
  • Xiaohui Tao
  • Jimmy Huang

Existing solutions for bundle recommendation (BR) have achieved remarkable effectiveness for predicting the user’s preference for prebuilt bundles. However, bundle-item (B-I) affiliation will vary dynamically in real scenarios. For ex ample, a bundle themed as ‘casual outfit’ may add ‘hat’ or remove ‘watch’ due to factors such as seasonal variations, changes in user preferences or inventory adjustments. Our empirical study demonstrates that the performance of main stream BR models may fluctuate or decline under item-level variability. This paper makes the first attempt to address the above problem and proposes Residual Diffusion for Bundle Recommendation (RDiffBR) as a model-agnostic generative framework which can assist a BR model in adapting this sce nario. During the initial training of the BR model, RDiffBR employs a residual diffusion model to process the item-level bundle embeddings which are generated by the BR model to represent bundle theme via a forward-reverse process. In the inference stage, RDiffBR reverses item-level bundle em beddings obtained by the well-trained bundle model under B-I variability scenarios to generate the effective item-level bundle embeddings. In particular, the residual connection in our residual approximator significantly enhances BR mod els’ ability to generate high-quality item-level bundle embed dings. Experiments on six BRmodelsandfourpublicdatasets from different domains show that RDiffBR improves the per formance of Recall and NDCG of backbone BR models by up to 23%, while only increases training time about 4%.

AAAI Conference 2026 Conference Paper

Personalize Anything for Free with Diffusion Transformer

  • Haoran Feng
  • Zehuan Huang
  • Lin Li
  • Lu Sheng

Personalized image generation aims to produce images of user-specified concepts while enabling flexible editing. Recent training-free approaches, while exhibiting higher computational efficiency than training-based methods, struggle with identity preservation, applicability, and compatibility with diffusion transformers (DiTs). In this paper, we uncover the untapped potential of DiT, where simply replacing denoising tokens with those of a reference subject achieves zero-shot subject reconstruction. This simple yet effective feature injection technique unlocks diverse scenarios, from personalization to image editing. Building upon this observation, we propose Personalize Anything, a training-free framework that achieves personalized image generation in DiT through:1) timestep-adaptive token replacement that enforces subject consistency via early-stage injection and enhances flexibility through late-stage regularization, and 2) patch perturbation strategies to boost structural diversity. Our method seamlessly supports layout-guided generation, multi-subject personalization, and mask-controlled editing. Evaluations demonstrate that our method, without requiring any training, achieves state-of-the-art performance in identity preservation and versatility. Our work establishes new insights into DiTs while delivering a practical paradigm for efficient personalization.

AAAI Conference 2026 Conference Paper

Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

  • Lin Li
  • Wei Chen
  • Jiahui Li
  • Kwang-Ting Cheng
  • Long Chen

Recent advances in multi-modal large language models (MLLMs) have significantly improved object-level grounding and region captioning. However, they remain limited in visual relation understanding, struggling even with binary relation detection, let alone N-ary relations involving multiple semantic roles. The core reason is the lack of modeling for structural semantic dependencies among multi-entities, leading to over-reliance on language priors (e.g., defaulting to "person drinks a milk" if a person is merely holding it). To this end, we propose Relation-R1, the first unified relation comprehension framework that explicitly integrates cognitive chain-of-thought (CoT)-guided supervised fine-tuning (SFT) and group relative policy optimization (GRPO) within a reinforcement learning (RL) paradigm. Specifically, we first establish foundational reasoning capabilities via SFT, enforcing structured outputs with thinking processes. Then, GRPO is utilized to refine these outputs via multi-rewards optimization, prioritizing visual-semantic grounding over language-induced biases, thereby improving generalization capability. Furthermore, we investigate the impact of various CoT strategies within this framework, demonstrating that a specific-to-general progressive approach in CoT guidance further improves generalization, especially in capturing synonymous N-ary relations. Extensive experiments on widely-used PSG and SWiG datasets demonstrate that Relation-R1 achieves state-of-the-art performance in both binary and N-ary relation understanding.

EAAI Journal 2025 Journal Article

A novel lightweight model combined with convolutional neural network and transformer for gearbox fault diagnosis using infrared thermal images

  • Xiao Zhuang
  • Jian Ge
  • Xiaolong Mao
  • Di Zhou
  • Hongbin Yao
  • Weifang Sun
  • Lin Li
  • Jiawei Xiang

Hybrid models combining convolutional neural network (CNN) and transformer show great promise in gearbox fault diagnosis, but their practical deployment still faces many bottlenecks due to the complex structure and high computational complexity of the transformer architecture. Therefore, this research proposes a novel framework, CNN-transformer lightweight network (CTLNet), which integrates CNN and transformer, for gearbox fault diagnosis to address the above challenges. First, this paper proposes a multi-scale dilation convolution module (MDC) to enhance the extraction of large local receptive field features in infrared images while keeping the parameter quantity unchanged. Second, the depthwise separable convolution residual module (SCR) is proposed and constructed to generate richer and diverse feature representations of the extracted infrared images, and reduces learning parameters and calculations. Finally, a global attention interaction (GAI) module is proposed to capture global information and generate attention feature maps with less calculation cost. A large number of experiments are conducted to verify the effectiveness of the proposed CTLNet. Results show that the proposed CTLNet outperforms other benchmarks with the highest accuracy of 99. 9 % and small calculation resource, while its parameters are only 0. 28 million (M) and its floating-point operations per second (FLOPs) are 153. 56 million (M). The proposed framework can accommodate the advantages of dilated convolution, separable convolution residual module, and global attention interaction to deal with the infrared thermal images, which differ from other existing lightweight CNN-Transformer hybrid approaches. The proposed CTLNet model can be deployed on edge computing devices, which further promotes the application of artificial intelligence models’ deployment in practical industrial application scenarios.

IROS Conference 2025 Conference Paper

A Recursive Total Least Squares Solution for Bearing-Only Target Motion Analysis and Circumnavigation

  • Lin Li
  • Xueming Liu
  • Zhoujingzi Qiu
  • Tianjiang Hu
  • Qingrui Zhang

Bearing-only Target Motion Analysis (TMA) is a promising technique for passive tracking in various applications as a bearing angle is easy to measure. Despite its advantages, bearing-only TMA is challenging due to the nonlinearity of the bearing measurement model and the lack of range information, which impairs observability and estimator convergence. This paper addresses these issues by proposing a Recursive Total Least Squares (RTLS) method for online target localization and tracking using mobile observers. The RTLS approach, inspired by previous results on Total Least Squares (TLS), mitigates biases in position estimation and improves computational efficiency compared to pseudo-linear Kalman filter (PLKF) methods. Additionally, we propose a circumnavigation controller to enhance system observability and estimator convergence by guiding the mobile observer in orbit around the target. Extensive simulations and experiments are performed to demonstrate the effectiveness and robustness of the proposed method. The proposed algorithm is also compared with the state-of-the-art approaches, which confirms its superior performance in terms of both accuracy and stability.

EAAI Journal 2025 Journal Article

A structure-aware routing based anomaly detection for industrial multi-sensor time series

  • Qixuan Zhao
  • Jingling Yuan
  • Peiliang Zhang
  • Xin Zhang
  • Jianquan Liu
  • Lin Li

Anomaly detection in multi-sensor time series (MTS) is a critical technology for ensuring the stable operation of modern industrial systems. Current mainstream methods identify anomalies by learning the structural consistency of normal data. As a result, natural structural breaks, a typical random non-stationary phenomenon in multi-sensor systems, are frequently misclassified as anomalies by these methods. To address this issue, we propose a Structure-Aware Routing (SaR) based Mixture-of-Experts (MoE) framework (SMoE) for anomaly detection. SMoE eliminates interference from structural breaks by assigning sensor series to specialized experts through SaR. First, the proposed SaR consists of Spatial Routing and Temporal Routing, which capture structural breaks at two levels: global breaks between sensors and local window-level breaks within individual sensors. Second, the SMoE-based anomaly detection framework can be applied to various sensor time series backbone networks, including large-scale models, significantly enhancing anomaly detection accuracy in MTS. Extensive experiments conducted on eight datasets across five industrial domains demonstrate that SMoE achieves an F1 score improvement ranging from 1% to 9% across four distinct backbone networks for anomaly detection. SMoE achieves an F1 score improvement of up to 8. 4% compared to ten advanced baselines.

IJCAI Conference 2025 Conference Paper

A Survey on Multi-View Knowledge Graph: Generation, Fusion, Applications and Future Directions

  • Zihan Yang
  • Xiaohui Tao
  • Taotao Cai
  • Yifu Tang
  • Haoran Xie
  • Lin Li
  • Jianxin Li
  • Qing Li

Knowledge Graphs (KGs) have revolutionized structured knowledge representation, yet their capacity to model real-world complexity and heterogeneity remains fundamentally constrained. The emerging paradigm of Multi-View Knowledge Graphs (MVKGs) addresses this gap through multi-view learning, but existing research lacks systematic integration. This survey provides the first systematic consolidation of MVKG methodologies, with four pivotal contributions: 1) The first unified taxonomy of view generation paradigms that rigorously categorizes view into four types: structure, semantic, representation, and knowledge & modality; 2) A novel methodological typology for view fusion that systematically classifies techniques by fusion targets (feature, decision, and hybrid); 3) Task-centric application mapping that bridges theoretical MVKG constructs to node/link/graph-level downstream tasks; 4) A forward-looking roadmap identifying underexplored challenges. By unifying fragmented methodologies and formalizing MVKG design principles, this survey serves as a roadmap for advancing KG versatility in complex AI-driven scenarios. In doing so, it paves the way for more efficient knowledge integration, enhanced decision-making, and cross-domain learning in real-world applications.

EAAI Journal 2025 Journal Article

ASDS-you only look once version 8: A real-time segmentation method for cross-scale prefabricated laminated slab components

  • Lin Li
  • Qing Jiang
  • Guanting Ye
  • Xun Chong
  • Xinyu Zhu

Prefabricated laminated slabs (PLS) are widely used globally due to their convenience. However, this convenience often comes with challenges in quality control. Although factories currently conduct quality inspections of PLS component arrangements, these inspections mainly rely on manual visual detection methods, which are highly inefficient. This paper proposes an improved You Only Look Once version 8 (YOLOv8) instance segmentation network for PLS inspection. To address the difficulties in detecting PLS components, we introduced multilevel auxiliary information in tandem with the main branch, designed an additional small-target feature fusion layer and segmentation header, and enhanced the original YOLOv8. These improvements allow for the extraction and segmentation of cross-scale information, reducing information gradient loss. However, this approach generates excessive cross-scale information, requiring a balance between the fusion weights of large-scale and small-scale information. To achieve this, we introduced a multilevel feature fusion module Semantic and Detail Infusion (SDI) and a dynamic upsampling module (Dysample). Experimental results show that the proposed method achieved a mean average precision (mAP50) of 93. 9 % and a detection speed of 108. 7 Frames Per Second. Additionally, to support future research and applications, our method provides code that allows for direct derivation of the coordinates of each component class relative to the floor slab. Thus, the proposed detection method holds significant practical application value.

AAAI Conference 2025 Conference Paper

Dynamic Uncertainty Estimation for Offline Reinforcement Learning

  • Jiesheng Wang
  • Lin Li
  • Wei Wei
  • Yujia Zhang
  • Xin Yang

Offline reinforcement learning confronts the distributional shift challenge, a consequence of learning policy from static datasets. Current methods primarily handle this issue by aligning the learned policy with the behavior policy or conservatively estimating Q-values for out-of-distribution (OOD) actions. However, these approaches can lead to overly pessimistic estimation of Q-values of the OOD actions in unfamiliar situations, resulting in a suboptimal policy. To address this, we propose a new method, Dynamic Uncertainty estimation for Offline Reinforcement Learning. This method introduces a base density-truncated OOD data sampling approach to reduce the impact of extrapolation errors on uncertainty estimation. It enables conservative estimation of Q-values for OOD actions while avoiding negative impacts on in-distribution data. We also develop a dynamic uncertainty estimation mechanism to prevent excessive pessimism and enhance the generalization of the Q-function. This mechanism dynamically adjusts the degree of pessimism in the Q-function by minimizing the error between target and estimated values. Our method outperforms existing algorithms, as demonstrated by experimental results based on the D4RL benchmark, and proves its superiority in addressing the distributional shift challenge.

NeurIPS Conference 2025 Conference Paper

Factor Decorrelation Enhanced Data Removal from Deep Predictive Models

  • Wenhao Yang
  • Lin Li
  • Xiaohui Tao
  • Kaize Shi

The imperative of user privacy protection and regulatory compliance necessitates sensitive data removal in model training, yet this process often induces distributional shifts that undermine model performance-particularly in out-of-distribution (OOD) scenarios. We propose a novel data removal approach that enhances deep predictive models through factor decorrelation and loss perturbation. Our approach introduces: (1) a discriminative-preserving factor decorrelation module employing dynamic adaptive weight adjustment and iterative representation updating to reduce feature redundancy and minimize inter-feature correlations. (2) a smoothed data removal mechanism with loss perturbation that creates information-theoretic safeguards against data leakage during removal operations. Extensive experiments on five benchmark datasets show that our approach outperforms other baselines and consistently achieves high predictive accuracy and robustness even under significant distribution shifts. The results highlight its superior efficiency and adaptability in both in-distribution and out-of-distribution scenarios.

EAAI Journal 2025 Journal Article

Generalized deep neural network for seismic site response prediction with transfer learning

  • Lin Li
  • Feng Jin
  • Duruo Huang
  • Chunhui He
  • Fulong Ma

Accurate prediction of site-specific seismic responses plays a pivotal role in evaluating earthquake effects on infrastructure. Traditional physics-based methods suffer from inherent model assumptions, significant parameter uncertainty, and high computational costs. This study proposes a generalized deep neural network that integrates seismic motion data and site information to predict three-directional seismic responses across various site types. Trained on an extensive dataset of recorded data from Kiban Kyoshin Network in Japan, the model demonstrated excellent performance on the test set, with correlation coefficients reaching 97 % between the predicted and target results. Utilizing transfer learning techniques, it was adapted to seismic response prediction at new sites not included in the training set. Compared to the state-of-the-art finite element method, the retrained model significantly improved prediction accuracy, with an overall average error reduction of approximately 50 %. Additionally, the model effectively captured the nonlinear response characteristics of a site during strong seismic events without any strong motion data to retrain. The proposed model demonstrated superior prediction accuracy, higher computational efficiency, and stronger generalization capabilities compared to traditional physics-based models.

JBHI Journal 2025 Journal Article

GuidedMorph: Two-Stage Deformable Registration for Breast MRI

  • Yaqian Chen
  • Hanxue Gu
  • Haoyu Dong
  • Qihang Li
  • Yuwen Chen
  • Nicholas Konz
  • Lin Li
  • Maciej A. Mazurowski

Accurately registering breast MR imagesfrom different time points enables the alignment of anatomical structures and tracking of tumor progression, sup porting more effective breast cancer detection, diagnosis, and treatment planning. However, the complexity of dense tissue and its highly non-rigid nature pose challenges for conventional registration methods, which primarily focus on aligning general structures while overlooking intricate internal details. To address this, we propose GuidedMorph, a novel two-stage registration framework designed to better align dense tissue. In addition to a single-scale network for global structure alignment, we introduce a framework that utilizes dense tissue information to track breast movement. The learned transformation fields are fused by introducing the Dual Spatial Transformer Network (DSTN), improving overall alignment accuracy. A novel warping method based on the Euclidean distance transform (EDT) is also proposed to accurately warp the registered dense tissue and breast masks, preserving fine structural details during deformation. It also operates effectively with the VoxelMorph and TransMorph backbones, offering a versatile solution for breast registration. We validate our method on ISPY2 and internal dataset, demonstrating superior performance in dense tissue, overall breast alignment, and breast structural similarity index measure (SSIM), with notable improvements by over 20. 9% in dense tissue Dice, 2. 1% in breast Dice, and 3. 5% in breast SSIM compared to the best baseline. The code is available at https://github.com/mazurowski-lab/GuidedMorph. git

AAAI Conference 2025 Conference Paper

Improving Generalization in Offline Reinforcement Learning via Latent Distribution Representation Learning

  • Da Wang
  • Lin Li
  • Wei Wei
  • Qixian Yu
  • Jianye Hao
  • Jiye Liang

Dealing with the distribution shift is a significant challenge when building offline reinforcement learning (RL) models that can generalize from a static dataset to out-of-distribution (OOD) scenarios. Previous approaches have employed pessimism or conservatism strategies. More recently, data-driven work has taken a distributional perspective, treating offline data as a domain adaptation problem. However, these methods use heuristic techniques to simulate distribution shifts, resulting in a limited diversity of artificially created distribution gaps. In this paper, we propose a novel perspective: offline datasets inherently contain multiple latent distributions, with behavior data from diverse policies potentially following different distributions and data from the same policy across various time phases also exhibiting distribution variance. We introduce the Latent Distribution Representation Learning (LAD) framework, which aims to characterize the multiple latent distributions within offline data and reduce the distribution gaps between any pair of them. LAD consists of a min-max adversarial process: it first identifies the "worst-case" distributions to enlarge the diversity of distribution gaps and then reduces these gaps to learn invariant representations for generalization. We derive a generalization error bound to support LAD theoretically and verify its effectiveness through extensive experiments.

IJCAI Conference 2025 Conference Paper

Indirect Alignment and Relationship Preservation for Domain Generalization

  • Wei Wei
  • Zixiong Li
  • Jing Yan
  • Mingwen Shao
  • Lin Li

Domain generalization (DG) aims to train models on multiple source domains to generalize effectively to unseen target domains, addressing performance degradation caused by domain shifts. Many existing methods rely on direct feature alignment, which disrupts natural sequence relationships, causes misalignment and feature distortion, and leads to overfitting, especially with significant domain gaps. To tackle these issues, we propose a novel DG approach with two key modules: the Sample Difference Keeping (SDK) module, which preserves natural sequence relationships to enhance feature diversity and separability, and the Sample Consistency Alignment (SCA) module, which achieves indirect alignment by modeling inter-class and inter-domain relationship consistencies. This approach mitigates overfitting and misalignment, ensuring adaptability to significant domain gaps. Extensive experiments demonstrate that our framework consistently outperforms state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

Interaction-Centric Knowledge Infusion and Transfer for Open Vocabulary Scene Graph Generation

  • Lin Li
  • Chuhan ZHANG
  • Dong Zhang
  • Chong Sun
  • Chen Li
  • Long Chen

Open-vocabulary scene graph generation (OVSGG) extends traditional SGG by recognizing novel objects and relationships beyond predefined categories, leveraging the knowledge from pre-trained large-scale models. Existing OVSGG methods always adopt a two-stage pipeline: 1) Infusing knowledge into large-scale models via pre-training on large datasets; 2) Transferring knowledge from pre-trained models with fully annotated scene graphs during supervised fine-tuning. However, due to a lack of explicit interaction modeling, these methods struggle to distinguish between interacting and non-interacting instances of the same object category. This limitation induces critical issues in both stages of OVSGG: it generates noisy pseudo-supervision from mismatched objects during knowledge infusion, and causes ambiguous query matching during knowledge transfer. To this end, in this paper, we propose an interACtion-Centric end-to-end OVSGG framework (ACC) in an interaction-driven paradigm to minimize these mismatches. For interaction-centric knowledge infusion, ACC employs a bidirectional interaction prompt for robust pseudo-supervision generation to enhance the model's interaction knowledge. For interaction-centric knowledge transfer, ACC first adopts interaction-guided query selection that prioritizes pairing interacting objects to reduce interference from non-interacting ones. Then, it integrates interaction-consistent knowledge distillation to bolster robustness by pushing relational foreground away from the background while retaining general knowledge. Extensive experimental results on three benchmarks show that ACC achieves state-of-the-art performance, demonstrating the potential of interaction-centric paradigms for real-world applications.

IJCAI Conference 2025 Conference Paper

L2M2: A Hierarchical Framework Integrating Large Language Model and Multi-agent Reinforcement Learning

  • Minghong Geng
  • Shubham Pateria
  • Budhitama Subagdja
  • Lin Li
  • Xin Zhao
  • Ah-Hwee Tan

Multi-agent reinforcement learning (MARL) has demonstrated remarkable success in collaborative tasks, yet faces significant challenges in scaling to complex scenarios requiring sustained planning and coordination across long horizons. While hierarchical approaches help decompose these tasks, they typically rely on hand-crafted subtasks and domain-specific knowledge, limiting their generalizability. We present L2M2, a novel hierarchical framework that leverages large language models (LLMs) for high-level strategic planning and MARL for low-level execution. L2M2 enables zero-shot planning that supports both end-to-end training and direct integration with pre-trained MARL models. Experiments in the VMAS environment demonstrate that L2M2's LLM-guided MARL achieves superior performance while requiring less than 20% of the training samples compared to baseline methods. In the MOSMAC environment, L2M2 demonstrates strong performance with pre-defined subgoals and maintains substantial effectiveness without subgoals - scenarios where baseline methods consistently fail. Analysis through kernel density estimation reveals L2M2's ability to automatically generate appropriate navigation plans, demonstrating its potential for addressing complex multi-agent coordination tasks.

EAAI Journal 2025 Journal Article

Multi-task driver gaze estimation in real world driving scenes

  • Xinmei Wu
  • Lin Li
  • Gang Zhou
  • Qilong Wu
  • Xinkai Zuo
  • Haihong Zhu
  • Shen He

Driver gaze estimation task is pivotal for safe driving. However, challenges persist when dealing with changing illumination, eyeglasses, adjacent zones, or personal behavior and appearance. To tackle these problems, we introduce a driver gaze estimation approach, including gaze zone and direction estimation. We propose a global facial feature extraction convolutional neural network (gCNN) embedded with attention network for driver gaze zone estimation. The incorporation of attention mechanisms in different dimensions (channel or spatial) at various stages facilitates the network in efficiently capturing overall generic features in early stages and concrete representations in later stages. This network is also applied to extract facial features in gaze direction estimation task. While a local eye feature extraction convolutional neural network (LeCNN) is proposed for fine-grained eye features extraction. The facial and eye features, as well as head pose, are concatenated and fused to regress the finer gaze direction. The experimental results show that the network achieves an error of 2. 43° and 4. 36° on MPIIFaceGaze and EyeDIAP datasets, respectively, outperforming the prior arts. Furthermore, in driver gaze zone estimation task, our method achieves accuracy of 98. 87 % on Laboratory for Intelligent and Safe Automobiles (LISA) Gaze dataset, with 3. 91 % improvement over prior arts. It also achieves a compatible performance of 82. 80 % on Driver Gaze in the Wild (DGW) dataset.

EAAI Journal 2025 Journal Article

Probabilistic intervals prediction based on adaptive regression with attention residual connections and covariance constraints

  • Fan Zhang
  • Min Wang
  • Lin Li
  • Yepeng Liu
  • Hua Wang

This paper introduces a novel prediction interval method called Adaptive Regression with Attention Residual Connection and Covariance Constraint (AR-ARCC). By integrating Monte Carlo and Bayesian methods, we leverage the strengths of both to achieve a more flexible and accurate method for generating prediction intervals. Additionally, through the optimization of the loss function, introduction of penalty terms, and improvement of mean squared error calculations, the model’s performance in interval prediction tasks is enhanced. Finally, the integration of an interactive channel heterogeneous self-attention module, combined with residual blocks, enhances the modeling capability of the neural network. The comprehensive application of these methods results in superior performance of the model in handling uncertainty and local variations.

NeurIPS Conference 2025 Conference Paper

Risk-aware Direct Preference Optimization under Nested Risk Measure

  • Lijun Zhang
  • Lin Li
  • Yajie Qi
  • Huizhong Song
  • Yaodong Yang
  • Jun Wang
  • Wei Wei

When fine-tuning pre-trained Large Language Models (LLMs) to align with human values and intentions, maximizing the estimated reward can lead to superior performance, but it also introduces potential risks due to deviations from the reference model's intended behavior. Most existing methods typically introduce KL divergence to constrain deviations between the trained model and the reference model; however, this may not be sufficient in certain applications that require tight risk control. In this paper, we introduce Risk-aware Direct Preference Optimization (Ra-DPO), a novel approach that incorporates risk-awareness by employing a class of nested risk measures. This approach formulates a constrained risk-aware advantage function maximization problem and then converts the Bradley-Terry model into a token-level representation. The objective function maximizes the likelihood of the policy while suppressing the deviation between a trained model and the reference model using a sequential risk ratio, thereby enhancing the model's risk-awareness. Experimental results across three open-source datasets: IMDb Dataset, Anthropic HH Dataset, and AlpacaEval, demonstrate the proposed method's superior performance in balancing alignment performance and model drift.

IJCAI Conference 2025 Conference Paper

Subgraph Information Bottleneck with Causal Dependency for Stable Molecular Relational Learning

  • Peiliang Zhang
  • Jingling Yuan
  • Chao Che
  • Yongjun Zhu
  • Lin Li

Molecular Relational Learning (MRL) is widely applied in molecular sciences. Recent studies attempt to retain molecular core information (e. g. , substructures) by Graph Information Bottleneck but primarily focus on information compression without considering the causal dependencies of chemical reactions among substructures. This oversight neglects the core factors that determine molecular relationships, making maintaining stable MRL in distribution-shifted data challenging. To bridge this gap, we propose the Causal Subgraph Information Bottleneck (CausalGIB) for stable MRL. CausalGIB leverages causal dependency to guide substructure representation and integrates subgraph information bottleneck to optimize the core substructure representation, generating stable representations. Specifically, we distinguish causal and confounding substructures by noise injection and substructure interaction based on causal analysis. Furthermore, by minimizing the discrepancy between causal and confounding information within subgraph information bottleneck, CausalGIB captures core substructures composed of causal substructures and aggregates them into molecular representations to improve their stability. Experimental results on nine datasets demonstrate that CausalGIB outperforms state-of-the-art models in two tasks and significantly enhances model’s stability in distribution-shifted data.

AAAI Conference 2025 Conference Paper

Zero-Shot Learning for Materials Science Texts: Leveraging Duck Typing Principles

  • Xin Zhang
  • Peiliang Zhang
  • Jingling Yuan
  • Lin Li

Materials science text mining (MSTM), involving tasks like property extraction and synthesis action retrieval, is pivotal for advancing research by deriving critical insights from scientific literature. Descriptors, serving as essential task labels, often vary in meaning depending on researchers' usage purposes across different mining tasks. (e.g., 'Material' can refer to both synthesis components and participants in fuel cell experiment). This meaning difference makes it difficult for existing methods, fine-tuned to specific task, to handle the same descriptors in other tasks. To overcome above limitation, we propose MatDuck, a simple and effective approach for Zero-Shot MSTM by evoking material knowledge within Large Language Models (LLMs). Specifically, inspired by the Duck Typing principles in programming languages, we present a ClassDefinition-Style Descriptor generation method that evokes task-specific characteristics to address usage variation. Subsequently, we introduce code-style in-context learning for zero-shot tasks, reframing them into code to leverage LLMs' proficiency in code understanding. Extensive experiments on eight benchmark datasets demonstrate that MatDuck, as a plug-and-play approach, significantly improves the Zero-Shot MSTM performance of LLMs by an average of 11.3% across seven tasks.

ICRA Conference 2024 Conference Paper

Active Inference for Reactive Temporal Logic Motion Planning

  • Ziyang Chen
  • Zhangli Zhou
  • Lin Li
  • Zhen Kan

Reactive planning enables the robots to deal with dynamic events in uncertain environments. However, existing methods heavily rely on the predefined hard-coded robot behaviors, e. g, a pre-coded temporal logic formula that specifies how robot should react. Little attention has been paid for autonomous generation of reactive tasks specifications during the runtime. As a first attempt towards this goal, this work develops a real-time decision-making and motion planning framework. It allows the robot to follow a global task planned offline while taking proactive decisions and generating temporal logic specifications for local reactive tasks when encountering dynamic events. Specifically, inspired by the causal knowledge graph, a proposition graph is developed, based on which the decision module encode the environment and the task as the Boolean logic and linear temporal logic (LTL), respectively. Based on the established proposition graph and perceived environment, the agent can autonomously generate an LTL formula to realize the local temporary task. A joint sampling algorithm is then developed, in which the automaton states of local and global task are jointly considered to generate a feasible planning that satisfies both global and local tasks. Experiments demonstrate the effectiveness of the proposed decision-making and motion planning.

ICRA Conference 2024 Conference Paper

An Integrated Position-velocity-force Method for Safety-enhanced Shared Control in Robot-assisted Surgical Cutting

  • Xilin Xiao
  • Xiaojian Li
  • Yudong Shi
  • Jin Fang
  • Lin Li
  • Pengfei He
  • Hangjie Mo

Numerous studies have emphasized the application of autonomous intelligence in human-robot shared control to enhance surgical convenience and efficiency. However, the neglect of human dominance may reduce surgical safety. This paper developed a safety-enhanced human-robot shared control method by intelligently allocating control authority, with the surgeon remaining the leader during the surgical procedure. Three controllers are designed initially, including a master hand position (MP) controller and a master hand velocity (MV) controller related to the surgeon's manipulation, and a planned trajectory tracking (PT) controller related to the robot. In precision surgical manipulation scenarios, precise tracking of the human's operation is achieved by combining MP and MV controllers, while a combination of MV and PT controllers is developed in high-efficiency surgical scenarios, which relaxes the requirement for precise tracking of hand position and enables precise robot assistance guided by the velocity of human hand. The autonomous scenarios and controllers switching are accomplished through a motion fusion mechanism, which is achieved via optimizing evaluation functions that are reliant on future states. Furthermore, a force feedback mechanism is proposed to help human understand the intent of autonomous control to improve safety. The feasibility and effectiveness of this method have been validated through simulations and experiments.

TIST Journal 2024 Journal Article

Boosting Healthiness Exposure in Category-Constrained Meal Recommendation Using Nutritional Standards

  • Ming Li
  • Lin Li
  • Xiaohui Tao
  • Zhongwei Xie
  • Qing Xie
  • Jingling Yuan

Food computing, a newly emerging topic, is closely linked to human life through computational methodologies. Meal recommendation, a food-related study about human health, aims to provide users a meal with courses constrained from specific categories (e.g., appetizers, main dishes) that can be enjoyed as a service. Historical interaction data, important user information, is often used by existing models to learn user preferences. However, if a user’s preferences favor less healthy meals, the model will follow that preference and make similar recommendations, potentially negatively impacting the user’s long-term health. This emphasizes the necessity for health-oriented and responsible meal recommendation systems. In this article, we propose a healthiness-aware and category-wise meal recommendation model called CateRec, which boosts healthiness exposure by using nutritional standards as knowledge to guide the model training. Two fundamental questions are raised and answered: (1) How can the healthiness of meals be evaluated? Two well-known nutritional standards from the World Health Organization and the United Kingdom Food Standards Agency are used to calculate the healthiness score of the meal. (2) How can the model training be guided in a health-oriented manner? We construct category-wise personalization partial rankings and category-wise healthiness partial rankings, and theoretically analyze that they meet the necessary properties and assumptions required to be trained by the maximum posterior estimator under Bayesian probability. The data analysis confirms the existence of user preferences leaning towards less healthy meals in two public datasets. A comprehensive experiment demonstrates that our CateRec effectively boosts healthiness exposure in terms of mean healthiness score and ranking exposure while being comparable to the state-of-the-art model in terms of recommendation accuracy.

TCS Journal 2024 Journal Article

Constructions of 2-resilient rotation symmetric Boolean functions with odd number of variables

  • Jiao Du
  • Lin Li
  • Shaojing Fu
  • Longjiang Qu
  • Chao Li

In this paper, a new method for constructing 2-resilient rotation symmetric Boolean functions with odd number of variables is presented. Based on an equivalent characterization of this class of functions and the relation between the orbit matrices and their complements, a system of equations about 2-tuples distribution matrix is established. Then the constructions of some 2-resilient rotation symmetric Boolean functions with odd number of variables can be converted into the solutions of the system of equations. We also give a sufficient condition for the constructed functions to have the maximum algebraic degree. Moreover, we provide a lower bound on the number of nonlinear 2-resilient rotation symmetric Boolean functions with odd number of variables. Particularly, the lower bound is 152 for seven number of variables.

NeurIPS Conference 2024 Conference Paper

MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers

  • Ning Ding
  • Yehui Tang
  • Haochen Qin
  • Zhenli Zhou
  • Chao Xu
  • Lin Li
  • Kai Han
  • Heng Liao

In order to reduce the computational complexity of large language models, great efforts have been made to to improve the efficiency of transformer models such as linear attention and flash-attention. However, the model size and corresponding computational complexity are constantly scaled up in pursuit of higher performance. In this work, we present MemoryFormer, a novel transformer architecture which significantly reduces the computational complexity (FLOPs) from a new perspective. We eliminate nearly all the computations of the transformer model except for the necessary computation required by the multi-head attention operation. This is made possible by utilizing an alternative method for feature transformation to replace the linear projection of fully-connected layers. Specifically, we first construct a group of in-memory lookup tables that store a large amount of discrete vectors to replace the weight matrix used in linear projection. We then use a hash algorithm to retrieve a correlated subset of vectors dynamically based on the input embedding. The retrieved vectors combined together will form the output embedding, which provides an estimation of the result of matrix multiplication operation in a fully-connected layer. Compared to conducting matrix multiplication, retrieving data blocks from memory is a much cheaper operation which requires little computations. We train MemoryFormer from scratch and conduct extensive experiments on various benchmarks to demonstrate the effectiveness of the proposed model.

AAAI Conference 2024 Conference Paper

MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech Synthesis

  • Wenhao Guan
  • Yishuang Li
  • Tao Li
  • Hukai Huang
  • Feng Wang
  • Jiayan Lin
  • Lingyan Huang
  • Lin Li

The style transfer task in Text-to-Speech (TTS) refers to the process of transferring style information into text content to generate corresponding speech with a specific style. However, most existing style transfer approaches are either based on fixed emotional labels or reference speech clips, which cannot achieve flexible style transfer. Recently, some methods have adopted text descriptions to guide style transfer. In this paper, we propose a more flexible multi-modal and style controllable TTS framework named MM-TTS. It can utilize any modality as the prompt in unified multi-modal prompt space, including reference speech, emotional facial images, and text descriptions, to control the style of the generated speech in a system. The challenges of modeling such a multi-modal style controllable TTS mainly lie in two aspects: 1) aligning the multi-modal information into a unified style space to enable the input of arbitrary modality as the style prompt in a single system, and 2) efficiently transferring the unified style representation into the given text content, thereby empowering the ability to generate prompt style-related voice. To address these problems, we propose an aligned multi-modal prompt encoder that embeds different modalities into a unified style space, supporting style transfer for different modalities. Additionally, we present a new adaptive style transfer method named Style Adaptive Convolutions (SAConv) to achieve a better style representation. Furthermore, we design a Rectified Flow based Refiner to solve the problem of over-smoothing Mel-spectrogram and generate audio of higher fidelity. Since there is no public dataset for multi-modal TTS, we construct a dataset named MEAD-TTS, which is related to the field of expressive talking head. Our experiments on the MEAD-TTS dataset and out-of-domain datasets demonstrate that MM-TTS can achieve satisfactory results based on multi-modal prompts. The audio samples and constructed dataset are available at https://multimodal-tts.github.io.

TIST Journal 2024 Journal Article

Optimal Treatment Strategies for Critical Patients with Deep Reinforcement Learning

  • Simi Job
  • Xiaohui Tao
  • Lin Li
  • Haoran Xie
  • Taotao Cai
  • Jianming Yong
  • Qing Li

Personalized clinical decision support systems are increasingly being adopted due to the emergence of data-driven technologies, with this approach now gaining recognition in critical care. The task of incorporating diverse patient conditions and treatment procedures into critical care decision-making can be challenging due to the heterogeneous nature of medical data. Advances in Artificial Intelligence (AI), particularly Reinforcement Learning (RL) techniques, enables the development of personalized treatment strategies for severe illnesses by using a learning agent to recommend optimal policies. In this study, we propose a Deep Reinforcement Learning (DRL) model with a tailored reward function and an LSTM-GRU-derived state representation to formulate optimal treatment policies for vasopressor administration in stabilizing patient physiological states in critical care settings. Using an ICU dataset and the Medical Information Mart for Intensive Care (MIMIC-III) dataset, we focus on patients with Acute Respiratory Distress Syndrome (ARDS) that has led to Sepsis, to derive optimal policies that can prioritize patient recovery over patient survival. Both the DDQN ( RepDRL-DDQN ) and Dueling DDQN ( RepDRL-DDDQN ) versions of the DRL model surpass the baseline performance, with the proposed model’s learning agent achieving an optimal learning process across our performance measuring schemes. The robust state representation served as the foundation for enhancing the model’s performance, ultimately providing an optimal treatment policy focused on rapid patient recovery.

JBHI Journal 2024 Journal Article

Robust Epileptic Seizure Detection Based on Biomedical Signals Using an Advanced Multi-View Deep Feature Learning Approach

  • Ijaz Ahmad
  • Zhenzhen Liu
  • Lin Li
  • Inam Ullah
  • Sunday Timothy Aboyeji
  • Xin Wang
  • Oluwarotimi Williams Samuel
  • Guanglin Li

Epilepsy is a neurological disorder characterized by abnormal neuronal discharges that manifest in life-threatening seizures. These are often monitored via EEG signals, a key aspect of biomedical signal processing (BSP). Accurate epileptic seizure (ES) detection significantly depends on the precise identification of key EEG features, which requires a deep understanding of the data's intrinsic domain. Therefore, this study presents an Advanced Multi-View Deep Feature Learning (AMV-DFL) framework based on machine learning (ML) technology to enhance the detection of relevant EEG signal features for ES. Our method initially applies a fast Fourier transform (FFT) on EEG data for traditional frequency domain feature (TFD-F) extraction and directly incorporates time domain (TD) features from the raw EEG signals, establishing a comprehensive traditional multi-view feature (TMV-F). Deep features are subsequently extracted autonomously from optimal layers of one-dimensional convolutional neural networks (1D CNN), resulting in multi-view deep features (MV-DF) integrating both time and frequency domains. A multi-view forest (MV-F) is an interpretable rule-based advanced ML classifier used to construct a robust, generalized classification. Tree-based SHAP explainable artificial intelligence (T-XAI) is incorporated for interpreting and explaining the underlying rules. Experimental results confirm our method's superiority, surpassing models using TMV-FL and single-view deep features (SV-DF) by 4% and outperforming other state-of-the-art methods by an average of 3% in classification accuracy. The AMV-DFL approach aids clinicians in identifying EEG features indicative of ES, potentially discovering novel biomarkers, and improving diagnostic capabilities in epilepsy management.

NeurIPS Conference 2024 Conference Paper

Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning

  • Lijun Zhang
  • Lin Li
  • Wei Wei
  • Huizhong Song
  • Yaodong Yang
  • Jiye Liang

A challenging problem in seeking to bring multi-agent reinforcement learning (MARL) techniques into real-world applications, such as autonomous driving and drone swarms, is how to control multiple agents safely and cooperatively to accomplish tasks. Most existing safe MARL methods learn the centralized value function by introducing a global state to guide safety cooperation. However, the global coupling arising from agents’ safety constraints and the exponential growth of the state-action space size limit their applicability in instant communication or computing resource-constrained systems and larger multi-agent systems. In this paper, we develop a novel scalable and theoretically-justified multi-agent constrained policy optimization method. This method utilizes the rigorous bounds of the trust region method and the bounds of the truncated advantage function to provide a new local policy optimization objective for each agent. Also, we prove that the safety constraints and the joint policy improvement can be met when each agent adopts a sequential update scheme to optimize a $\kappa$-hop policy. Then, we propose a practical algorithm called Scalable MAPPO-Lagrangian (Scal-MAPPO-L). The proposed method’s effectiveness is verified on a collection of benchmark tasks, and the results support our theory that decentralized training with local interactions can still improve reward performance and satisfy safe constraints.

IROS Conference 2024 Conference Paper

Soft Task Planning with Hierarchical Temporal Logic Specifications

  • Ziyang Chen
  • Zhangli Zhou
  • Lin Li
  • Zhen Kan

This works exploits soft constraints in linear temporal logic task planning to enhance the agent’s capability in handling potentially conflicting or even infeasible tasks. Different from most existing works that focus on sticking to the original plan and trying to find a relaxed plan if the workspace does not permit, we augment the soft constraints to represent possible candidate sub-tasks that can be selected to fulfill the global task. Specifically, a hierarchical temporal logic specification is developed to represent LTL tasks with soft constraints and preferences. The hierarchical structure consists of an outer and inner layer, where the outer layer uses co-safe LTL to specify the task-level specifications and the inner layer specifies the low-level task-related atomic propositions via soft constraints. To cope with the hierarchical temporal logic specification, a hierarchical iterative search (HIS) algorithm is developed, which incrementally searches feasible atomic propositions and automaton states, and returns a task plan with minimum cost. Rigorous analysis shows that HIS based planning is feasible (i. e. , the generated plan is applicable and satisfactory with respect to the task specification) and optimal (i. e, with minimum cost). Extensive simulation demonstrates the effectiveness of the proposed soft task planning approach.

JBHI Journal 2023 Journal Article

Large AI Models in Health Informatics: Applications, Challenges, and the Future

  • Jianing Qiu
  • Lin Li
  • Jiankai Sun
  • Jiachuan Peng
  • Peilun Shi
  • Ruiyang Zhang
  • Yinzhao Dong
  • Kyle Lam

Large AI models, or foundation models, are models recently emerging with massive scales both parameter-wise and data-wise, the magnitudes of which can reach beyond billions. Once pretrained, large AI models demonstrate impressive performance in various downstream tasks. A prime example is ChatGPT, whose capability has compelled people's imagination about the far-reaching influence that large AI models can have and their potential to transform different domains of our lives. In health informatics, the advent of large AI models has brought new paradigms for the design of methodologies. The scale of multi-modal data in the biomedical and health domain has been ever-expanding especially since the community embraced the era of deep learning, which provides the ground to develop, validate, and advance large AI models for breakthroughs in health-related areas. This article presents a comprehensive review of large AI models, from background to their applications. We identify seven key sectors in which large AI models are applicable and might have substantial influence, including: 1) bioinformatics; 2) medical diagnosis; 3) medical imaging; 4) medical informatics; 5) medical education; 6) public health; and 7) medical robotics. We examine their challenges, followed by a critical discussion about potential future directions and pitfalls of large AI models in transforming the field of health informatics.

AAAI Conference 2023 Short Paper

Long Legal Article Question Answering via Cascaded Key Segment Learning (Student Abstract)

  • Shugui Xie
  • Lin Li
  • Jingling Yuan
  • Qing Xie
  • Xiaohui Tao

Current sentence-level evidence extraction based methods may lose the discourse coherence of legal articles since they tend to make the extracted sentences scattered over the article. To solve the problem, this paper proposes a Cascaded Answer-guided key segment learning framework for long Legal article Question Answering, namely CALQA. The framework consists of three cascaded modules: Sifter, Reader, and Responder. The Sifter transfers a long legal article into several segments and works in an answer-guided way by automatically sifting out key fact segments in a coarse-to-fine approach through multiple iterations. The Reader utilizes a set of attention mechanisms to obtain semantic representations of the question and key fact segments. Finally, considering it a multi-label classification task the Responder predicts final answers in a cascaded manner. CALQA outperforms state-of-the-art methods in CAIL 2021 Law dataset.

AAAI Conference 2023 Short Paper

MGIA: Mutual Gradient Inversion Attack in Multi-Modal Federated Learning (Student Abstract)

  • Xuan Liu
  • Siqi Cai
  • Lin Li
  • Rui Zhang
  • Song Guo

Recent studies have demonstrated that local training data in Federated Learning can be recovered from gradients, which are called gradient inversion attacks. These attacks display powerful effects on either computer vision or natural language processing tasks. As it is known that there are certain correlations between multi-modality data, we argue that the threat of such attacks combined with Multi-modal Learning may cause more severe effects. Different modalities may communicate through gradients to provide richer information for the attackers, thus improving the strength and efficiency of the gradient inversion attacks. In this paper, we propose the Mutual Gradient Inversion Attack (MGIA), by utilizing the shared labels between image and text modalities combined with the idea of knowledge distillation. Our experimental results show that MGIA achieves the best quality of both modality data and label recoveries in comparison with other methods. In the meanwhile, MGIA verifies that multi-modality gradient inversion attacks are more likely to disclose private information than the existing single-modality attacks.

EAAI Journal 2023 Journal Article

Soil seismic response modeling of KiK-net downhole array sites with CNN and LSTM networks

  • Lin Li
  • Feng Jin
  • Duruo Huang
  • Gang Wang

Accurate prediction of soil seismic response is necessary for geotechnical engineering. The conventional physics-based models such as the finite element method (FEM) usually fail to obtain accurate predictions due to the model assumption and parameter uncertainties. And the physics-based models are computationally expensive. This study proposes deep learning models to develop data-driven surrogate models for the prediction of soil seismic response based on the recorded ground motions from KiK-net downhole array sites. Two kinds of advanced neural networks, convolution neural network (CNN) and long short-term memory (LSTM) neural network, are applied in this framework respectively. These models do not rely on any prior knowledge about the soil site. The performance of the deep learning models is demonstrated through both numerical and recorded examples. Compared with the state-of-art FEM models, the proposed models could achieve better prediction performance with higher efficiency. The average prediction error is reduced by more than 40% in time domain and 30% in frequency domain. Even though great variability exists during the propagation of seismic in the reality, the models can still get satisfactory predictions.

EAAI Journal 2023 Journal Article

Vibration suppression of ball-screw drive system based on flexible dynamics model

  • Lin Li
  • Qiangwei Zhang
  • Tie Zhang
  • Yanbiao Zou

Aiming at the problem of residual vibration of the ball-screw drive system when it stops in high-speed motion, a vibration suppression method based on the flexible dynamics model is proposed. A simplified flexible dynamics model of the ball-screw system is developed using the Lagrange method and rewritten as a parametric identification equation containing only the motor’s rotation angle. A Particle Swarm Optimization algorithm based on Recursive Least Square finite search space (RLS-PSO) is proposed for dynamic parameter identification and the results are used to design a coupled ZVD shaper to suppress residual vibration in the ball-screw drive system. The experimental results of model identification show that RLS-PSO is more accurate than WLS, PSO and GA, and the convergence speed is much higher compared to PSO and GA. The simplified dynamics model can reflect the dynamic characteristics of the system accurately. The results of the vibration experiments demonstrate the effectiveness of the input shaper designed using the identification results in suppressing residual vibration of the ball-screw drive system.

NeurIPS Conference 2023 Conference Paper

Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models

  • Lin Li
  • Jun Xiao
  • Guikun Chen
  • Jian Shao
  • Yueting Zhuang
  • Long Chen

Pretrained vision-language models, such as CLIP, have demonstrated strong generalization capabilities, making them promising tools in the realm of zero-shot visual recognition. Visual relation detection (VRD) is a typical task that identifies relationship (or interaction) types between object pairs within an image. However, naively utilizing CLIP with prevalent class-based prompts for zero-shot VRD has several weaknesses, e. g. , it struggles to distinguish between different fine-grained relation types and it neglects essential spatial information of two objects. To this end, we propose a novel method for zero-shot VRD: RECODE, which solves RElation detection via COmposite DEscription prompts. Specifically, RECODE first decomposes each predicate category into subject, object, and spatial components. Then, it leverages large language models (LLMs) to generate description-based prompts (or visual cues) for each component. Different visual cues enhance the discriminability of similar relation categories from different perspectives, which significantly boosts performance in VRD. To dynamically fuse different cues, we further introduce a chain-of-thought method that prompts LLMs to generate reasonable weights for different visual cues. Extensive experiments on four VRD benchmarks have demonstrated the effectiveness and interpretability of RECODE.

AAAI Conference 2022 Conference Paper

Controlling Underestimation Bias in Reinforcement Learning via Quasi-median Operation

  • Wei Wei
  • Yujia Zhang
  • Jiye Liang
  • Lin Li
  • Yyuze Li

How to get a good value estimation is one of the key problems in reinforcement learning (RL). Current off-policy methods, such as Maxmin Q-learning, TD3, and TADD, suffer from the underestimation problem when solving the overestimation problem. In this paper, we propose the Quasi-Median Operation, a novel way to mitigate the underestimation bias by selecting the quasi-median from multiple state-action values. Based on the quasi-median operation, we propose Quasi- Median Q-learning (QMQ) for the discrete action tasks and Quasi-Median Delayed Deep Deterministic Policy Gradient (QMD3) for the continuous action tasks. Theoretically, the underestimation bias of our method is improved while the estimation variance is significantly reduced compared to Maxmin Q-learning, TD3, and TADD. We conduct extensive experiments on the discrete and continuous action tasks, and results show that our method outperforms the state-of-the-art methods.

IJCAI Conference 2022 Conference Paper

Towards the Quantitative Interpretability Analysis of Citizens Happiness Prediction

  • Lin Li
  • Xiaohua Wu
  • Miao Kong
  • Dong Zhou
  • Xiaohui Tao

Evaluating the high-effect factors of citizens' happiness is beneficial to a wide range of policy-making for economics and politics in most countries. Benefiting from the high-efficiency of regression models, previous efforts by sociology scholars have analyzed the effect of happiness factors with high interpretability. However, restricted to their research concerns, they are specifically interested in some subset of factors modeled as linear functions. Recently, deep learning shows promising prediction accuracy while addressing challenges in interpretability. To this end, we introduce Shapley value that is inherent in solid theory for factor contribution interpretability to work with deep learning models by taking into account interactions between multiple factors. The proposed solution computes the Shapley value of a factor, i. e. , its average contribution to the prediction in different coalitions based on coalitional game theory. Aiming to evaluate the interpretability quality of our solution, experiments are conducted on a Chinese General Social Survey (CGSS) questionnaire dataset. Through systematic reviews, the experimental results of Shapley value are highly consistent with academic studies in social science, which implies our solution for citizens' happiness prediction has 2-fold implications, theoretically and practically.

NeurIPS Conference 2021 Conference Paper

Adder Attention for Vision Transformer

  • Han Shu
  • Jiahao Wang
  • Hanting Chen
  • Lin Li
  • Yujiu Yang
  • Yunhe Wang

Transformer is a new kind of calculation paradigm for deep learning which has shown strong performance on a large variety of computer vision tasks. However, compared with conventional deep models (e. g. , convolutional neural networks), vision transformers require more computational resources which cannot be easily deployed on mobile devices. To this end, we present to reduce the energy consumptions using adder neural network (AdderNet). We first theoretically analyze the mechanism of self-attention and the difficulty for applying adder operation into this module. Specifically, the feature diversity, i. e. , the rank of attention map using only additions cannot be well preserved. Thus, we develop an adder attention layer that includes an additional identity mapping. With the new operation, vision transformers constructed using additions can also provide powerful feature representations. Experimental results on several benchmarks demonstrate that the proposed approach can achieve highly competitive performance to that of the baselines while achieving an about 2~3× reduction on the energy consumption.

TIST Journal 2020 Journal Article

CoFi-points

  • Lin Li
  • Weike Pan
  • Zhong Ming

With the explosive growth of web resources, an increasingly important task in recommender systems is to provide high-quality personalized services by learning users’ preferences from historically observed information. As an effective preference learning technology, collaborative filtering has been widely extended to model the one-class or implicit feedback data, which is known as one-class collaborative filtering (OCCF). For a long time, pairwise ranking-oriented learning scheme has been viewed as a superior solution than the pointwise scheme for OCCF due to its higher accuracy in most cases. However, we argue that with appropriate model design, pointwise preference learning can achieve comparable or even better performance than the counterpart, i.e., pairwise preference learning. In particular, we propose a new preference assumption, i.e., pointwise preference on user/item-set. Based on this new assumption, we develop a novel, simple, and flexible solution called collaborative filtering via pointwise preference learning on user/item-set (CoFi-points). Furthermore, we derive two specific algorithms of CoFi-points with respect to the involved user-set and item-set, i.e., CoFi-points(u) and CoFi-points(i), referring to preference assumptions defined on user-set and item-set, respectively. Finally, we conduct extensive empirical studies on four real-world datasets with the state-of-the-art methods, and find that our solution can achieve very promising performance with respect to several ranking-oriented evaluation metrics.

AAAI Conference 2020 Short Paper

Selecting Portfolios Directly Using Recurrent Reinforcement Learning (Student Abstract)

  • Lin Li

Portfolio selection has attracted increasing attention in machine learning and AI communities recently. Existing portfolio selection using recurrent reinforcement learning (RRL) heavily relies on single asset trading system to heuristically obtain the portfolio weights. In this paper, we propose a novel method, the direct portfolio selection using recurrent reinforcement learning (DPS-RRL), to select portfolios directly. Instead of trading single asset one by one to obtain portfolio weights, our method learns to quantify the asset allocation weight directly via optimizing the Sharpe ratio of financial portfolios. We empirically demonstrate the effectiveness of our method, which is able to outperform state-of-the-art portfolio selection methods.

AAAI Conference 2018 Short Paper

Enhancing RNN Based OCR by Transductive Transfer Learning From Text to Images

  • Yang He
  • Jingling Yuan
  • Lin Li

This paper presents a novel approach for optical character recognition (OCR) on acceleration and to avoid underfitting by text. Previously proposed OCR models typically take much time in the training phase and require large amount of labelled data to avoid underfitting. In contrast, our method does not require such condition. This is a challenging task related to transferring the character sequential relationship from text to OCR. We build a model based on transductive transfer learning to achieve domain adaptation from text to image. We thoroughly evaluate our approach on different datasets, including a general one and a relatively small one. We also compare the performance of our model with the general OCR model on different circumstances. We show that (1) our approach accelerates the training phase 20-30% on time cost; and (2) our approach can avoid underfitting while model is trained on a small dataset

AAAI Conference 2014 Conference Paper

Identifying Domain-Dependent Influential Microblog Users: A Post-Feature Based Approach

  • Nian Liu
  • Lin Li
  • Guandong Xu
  • Zhenglu Yang

Users of a social network like to follow the posts published by influential users. Such posts usually are delivered quickly and thus will produce a strong influence on public opinions. In this paper, we focus on the problem of identifying domaindependent influential users(or topic experts). Some of traditional approaches are based on the post contents of users users to identify influential users, which may be biased by spammers who try to make posts related to some topics through a simple copy and paste. Others make use of user authentication information given by a service platform or user self description (introduction or label) in finding influential users. However, what users have published is not necessarily related to what they have registed and described. In addition, if there is no comments from other users, its less objective to assess a users post quality. To improve effectiveness of recognizing influential users in a topic of microblogs, we propose a post-feature based approach which is supplementary to postcontent based approaches. Our experimental results show that the post-feature based approach produces relatively higher precision than that of the content based approach.

YNIMG Journal 2014 Journal Article

Interleaved imaging of cerebral hemodynamics and blood flow index to monitor ischemic stroke and treatment in rat by volumetric diffuse optical tomography

  • Zi-Jing Lin
  • Ming Ren
  • Lin Li
  • Yueming Liu
  • Jianzhong Su
  • Shao-Hua Yang
  • Hanli Liu

Diffuse optical tomography (DOT) has been used by several groups to assess cerebral hemodynamics of cerebral ischemia in humans and animals. In this study, we combined DOT with an indocyanine green (ICG)-tracking method to achieve interleaved images of cerebral hemodynamics and blood flow index (BFI) using two middle cerebral artery occlusion (MCAO) rat models. To achieve volumetric images with high-spatial resolution, we first integrated a depth compensation algorithm (DCA) with a volumetric mesh-based rat head model to generate three-dimensional (3D) DOT on a rat brain atlas. Then, the experimental DOT data from two rat models were collected using interleaved strategy for cerebral hemodynamics and BFI during and after ischemic stroke, with and without a thrombolytic therapy for the embolic MCAO model. The acquired animal data were further analyzed using the integrated rat-atlas-guided DOT method to form time-evolving 3D images of both cerebral hemodynamics and BFI. In particular, we were able to show and identify therapeutic outcomes of a thrombolytic treatment applied to the embolism-induced ischemic model. This paper demonstrates that volumetric DOT is capable of providing high-quality, interleaved images of cerebral hemodynamics and blood perfusion in small animals during and after ischemic stroke, with excellent 3D visualization and quantifications.

YNIMG Journal 2012 Journal Article

Comparison of neural correlates of risk decision making between genders: An exploratory fNIRS study of the Balloon Analogue Risk Task (BART)

  • Mary Cazzell
  • Lin Li
  • Zi-Jing Lin
  • Sonal J. Patel
  • Hanli Liu

Functional magnetic resonance imaging (fMRI) research rarely reports gender differences in the neural correlates of risk decision making due to small sample sizes. In this functional near-infrared spectroscopy (fNIRS)-based imaging study of active and passive risk decision making, gender differences in oxygenated hemoglobin (HbO) concentration changes were investigated in the prefrontal cortex (PFC) of healthy adults. Forty adult participants (25–44years; males=23) completed two sets of 15 balloon trials in active and passive decision making modes of the Balloon Analogue Risk Task (BART). In active mode, participants chose the number of balloon inflations, decided when to collect money, or risked accrued money if balloons exploded. BART is psychometrically well established and has predictive validity to real-world risk taking. The blocked experimental design and modification of BART for fNIRS were guided by a previous fMRI study that examined the neural correlates of risk decision making in young adults [Rao, H. , Korczykowski, M. , Pluta, J. , Hoang, A. , Detre, J. A. , 2008. Neural correlates of voluntary and involuntary risk taking in the human brain: An fMRI study of the Balloon Analog Risk Task (BART). NeuroImage 42, 902–910]. Our findings were consistent with the previous fMRI study: no or little PFC activation during passive mode but strong PFC activation during active wins and losses among total sample. Active losses in females were associated with more significant bilateral activation in dorsal lateral prefrontal cortex (DLPFC) than males; no significant gender differences were found in DLPFC activation during active wins. Gender differences existed in direction and strength of correlations between BART behavioral and hemodynamic data. This study shows that use of fNIRS is a feasible, accessible, and less costly way to achieve adequate study power and investigate gender differences in neural correlates of risk decision making.

AAAI Conference 2012 Conference Paper

Recommending Related Microblogs: A Comparison Between Topic and WordNet based Approaches

  • Xing Chen
  • Lin Li
  • Guandong Xu
  • Zhenglu Yang
  • Masaru Kitsuregawa

Computing similarity between short microblogs is an important step in microblog recommendation. In this paper, we investigate a topic based approach and a WordNet based approach to estimate similarity scores between microblogs and recommend top related ones to users. Empirical study is conducted to compare their recommendation effectiveness using two evaluation measures. The results show that the WordNet based approach has relatively higher precision than that of the topic based approach using 548 tweets as dataset. In addition, the Kendall tau distance between two lists recommended by WordNet and topic approaches is calculated. Its average of all the 548 pair lists tells us the two approaches have the relative high disaccord in the ranking of related tweets.