Arrow Research search

Author name cluster

Kai Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

AAAI Conference 2026 Conference Paper

Decoupling Understanding from Reasoning via Problem Space Mapping for Small-Scale Model Reasoning

  • Li Wang
  • Changhao Zhang
  • Zengqi Xiu
  • Kai Lu
  • Xin Yu
  • Kui Zhang
  • Wenjun Wu

Despite recent advances in the reasoning capabilities of Large Language Models (LLMs), improving the reasoning ability of Small Language Models (SLMs, e.g., up to 1.5B parameters) remains challenging. A key obstacle lies in the complexity and variability of natural language: essentially equivalent problems often appear in diverse surface forms, often obscured by redundant or distracting details. This imposes a dual burden on SLMs: they must first extract the core problem from complex linguistic input, and then perform reasoning based on that understanding. The resulting vast and noisy problem space hinders optimization, particularly for models with limited capacity. To address this, we propose a new framework that decouples understanding from reasoning by mapping natural language problems into a canonical problem space-a semantically simplified yet expressive domain. This enables SLMs to focus on reasoning over standardized inputs, free from linguistic variability. Within this framework, we introduce DURIT (Decoupled Understanding from Reasoning via Iterative Training), a three-step algorithm that iteratively: (1) mapping natural language problems via reinforcement learning, (2) aligns reasoning trajectories through self-distillation, and (3) trains reasoning policies in the problem space. The mapper and reasoner are co-trained in an alternating loop throughout this process. Experiments show that DURIT substantially improves SLMs' performance on both in-domain and out-of-domain mathematical and logical reasoning tasks. Beyond improving reasoning capabilities, DURIT also improves the robustness of reasoning, validating decoupling understanding from reasoning as an effective strategy for strengthening SLMs.

AAAI Conference 2026 Conference Paper

DeloopSGNN: Revisiting Spectral GNNs Through the Lens of Spatial Aggregation

  • Duanyu Li
  • Huijun Wu
  • Min Xie
  • Kai Lu
  • Wenzhe Zhang
  • Zhenwei Wu
  • Yong Dong
  • Ruibo Wang

Graph Neural Networks (GNNs) have been studied from two primary perspectives: spectral, which employs global graph signal filtering and is theoretically more expressive, and spatial, which builds on local neighborhood aggregation and generalizes well across diverse graph structures. While spectral GNNs are expected to perform better in theory, they often underperform in practice compared to spatial models. To better understand this gap, we introduce a novel theoretical framework for converting spectral GNNs into the spatial domain, allowing for more intuitive analysis. This transformation reveals that signal looping and repeated high-order aggregation are major causes of over-smoothing in spectral GNNs. By addressing these issues in the spatial domain and converting the model back to the spectral domain, we propose DeloopSGNN, a spectral GNN with improved expressive capacity. Experiments on benchmark datasets show that DeloopSGNN achieves consistently strong performance in terms of accuracy and adversarial robustness, demonstrating that spectral GNNs can benefit significantly from careful architectural design grounded in our proposed framework.

EAAI Journal 2026 Journal Article

Spatiotemporal interactive multiple self-attention network for skeleton-based action recognition

  • Xin Wang
  • Long Liu
  • Siying Ren
  • Kai Lu

Graph Convolutional Networks (GCN) have shown significant advantages in skeleton-based action recognition as an effective technique for extracting action representations. However, the inherently limited receptive field of graph convolution restricts the ability of GCN-based methods to capture long-range dependencies among distant joints. Additionally, these methods normally utilize a uniform skeleton topology that models only the physically connected joints for all frames. This neglects the dependencies among non-physically connected joints and the temporal variability of joint features. To address these issues, we propose a SpatioTemporal Interactive Multiple Self-Attention (STI-MSA) network. The spatiotemporal interaction (STI) module first disentangles action features into spatiotemporal, spatial, and temporal sub-representations through convolution and multiple self-attention (MSA). Then, it performs sufficient cross-dimensional interactions to learn comprehensive and effective local and global spatiotemporal dependencies. We introduce a complementary integration of global distance encoding and the adjacency matrix as a unified prior for all representations. This enables the network to adaptively focus on the relationships between joints at varying distances, including physically and non-physically connected joints. The MSA constructs hierarchical topologies based on dimension-specific channel correlations, which are integrated with the global distance encoding and adjacency matrix to form shared local and global structures. This overcomes the limitations of fixed topologies and enhances representational capacity. Extensive experiments demonstrate the superior performance of our STI-MSA on public datasets.

AAAI Conference 2025 Conference Paper

Adaptive Dual Guidance Knowledge Distillation

  • Tong Li
  • Long Liu
  • Kang Liu
  • Xin Wang
  • Bo Zhou
  • Hongguang Yang
  • Kai Lu

Knowledge distillation (KD) aims to improve the performance of lightweight student networks under the guidance of pre-trained teachers. However, the large capacity gap between teachers and students limits the distillation gains. Previous methods addressing this problem have two weaknesses. First, most of them decrease the performance of pre-trained teachers, hindering students from achieving comparable performance. Second, these methods fail to dynamically adjust the transferred knowledge to be compatible with the representation ability of students, which is less effective in bridging the capacity gap. In this paper, we propose Adaptive Dual Guidance Knowledge Distillation (ADG-KD), which retains the guidance of the pre-trained teacher and uses the teacher's bidirectional optimization route guiding the student to alleviate the capacity gap problem. Specifically, ADG-KD introduces an initialized teacher, which has an identical structure to the pre-trained teacher and is optimized through the bidirectional supervision from both the pre-trained teacher and student. In this way, we construct the teacher's bidirectional optimization route to provide the students with an easy-to-hard and compatible knowledge sequence. ADG-KD trains the students under the proposed dual guidance approaches and automatically determines their importance weights, making the transferred knowledge better compatible with the representation ability of students. Extensive experiments on CIFAR-100, ImageNet, and MS-COCO demonstrate the effectiveness of our method.

ICML Conference 2025 Conference Paper

Agent Reviewers: Domain-specific Multimodal Agents with Shared Memory for Paper Review

  • Kai Lu
  • Shixiong Xu
  • Jinqiu Li
  • Kun Ding 0001
  • Gaofeng Meng

Feedback from peer review is essential to improve the quality of scientific articles. However, at present, many manuscripts do not receive sufficient external feedback for refinement before or during submission. Therefore, a system capable of providing detailed and professional feedback is crucial for enhancing research efficiency. In this paper, we have compiled the largest dataset of paper reviews to date by collecting historical open-access papers and their corresponding review comments and standardizing them using LLM. We then developed a multi-agent system that mimics real human review processes, based on LLMs. This system, named Agent Reviewers, includes the innovative introduction of multimodal reviewers to provide feedback on the visual elements of papers. Additionally, a shared memory pool that stores historical papers’ metadata is preserved, which supplies reviewer agents with background knowledge from different fields. Our system is evaluated using ICLR 2024 papers and achieves superior performance compared to existing AI-based review systems. Comprehensive ablation studies further demonstrate the effectiveness of each module and agent in this system.

AAAI Conference 2025 Conference Paper

Can Large Language Models Derive High-Level Cognition from Low-Level and Fragmented Foundational Information?

  • Yang Liu
  • Xiaoping Wang
  • Kai Lu

As one of the key technologies leading to Artificial General Intelligence (AGI), Large Language Models (LLMs) have achieved remarkable accomplishments. Exploring the capabilities of LLMs is crucial for scientific research, and many studies propose new challenges from various aspects to explore the boundaries of capabilities in LLMs. This paper attempts to push the challenges of information understanding, synthesizing and reasoning to the extreme, in order to explore the boundaries of more advanced dimensional cognitive capabilities in LLMs. It is defined as the task of High-Level Cognition (HLC), which involves obtaining high-level conclusions from low-level and fragmented foundational information. To evaluate HLC, we construct a dataset based on soccer matches. Experiments and analysis on this dataset show that current state-of-the-art LLMs lack the ability to effectively solve the task of HLC, because their performance is equivalent to random-level. However, by fine-tuning Llama3-8B-Instruct, there are improvements of 14.4%, 48.1%, and 19.4% over random-level in three types of evaluation tasks. This indicates that LLMs have great potential to solve the task of HLC.

NeurIPS Conference 2025 Conference Paper

COOPERA: Continual Open-Ended Human-Robot Assistance

  • Chenyang Ma
  • Kai Lu
  • Ruta Desai
  • Xavier Puig
  • Andrew Markham
  • Niki Trigoni

To understand and collaborate with humans, robots must account for individual human traits, habits, and activities over time. However, most robotic assistants lack these abilities, as they primarily focus on predefined tasks in structured environments and lack a human model to learn from. This work introduces COOPERA, a novel framework for COntinual, OPen-Ended human-Robot Assistance, where simulated humans, driven by psychological traits and long-term intentions, interact with robots in complex environments. By integrating continuous human feedback, our framework, for the first time, enables the study of long-term, open-ended human-robot collaboration (HRC) in different collaborative tasks across various time-scales. Within COOPERA, we introduce a benchmark and an approach to personalize the robot's collaborative actions by learning human traits and context-dependent intents. Experiments validate the extent to which our simulated humans reflect realistic human behaviors and demonstrate the value of inferring and personalizing to human intents for open-ended and long-term HRC.

JBHI Journal 2025 Journal Article

Real-Time Epileptic Seizure Prediction Method With Spatio-Temporal Information Transfer Learning

  • Kunying Meng
  • Denghai Wang
  • Donghui Zhang
  • Kunlin Guo
  • Kai Lu
  • Junfeng Lu
  • Renping Yu
  • Lipeng Zhang

Despite numerous studies aimed at improving accuracy, the accurate prediction of epileptic seizures remains a challenge in clinical practice due to the high computational cost, poor real-time performance, and over-reliance on labelled data. To address these issues, a real-time seizure prediction method with spatio-temporal information transfer learning (RTSPM-STITL) has been proposed in this study. In the RTSPM-STITL method, the human brain is regarded as a time-varying high-dimensional neurodynamic system, in which epileptic seizures are viewed as state transitions caused by time-varying system parameters. Specifically, the spatio-temporal information transfer (STIT) model is firstly constructed by the recurrent neural network (RNN) and trained by the Force Learning (a real-time learning mechanism). Then the STIT model is utilized to transform the high-dimensional neurodynamic data into low-dimensional time series to capture the dynamic features of epileptic seizures. Also, the critical slowing down effect (CSD) of seizure dynamics is used to detect warning signals. The experimental results demonstrate that the proposed method can achieve higher accuracy and sensitivity without labeled data on both the CHB-MIT and Siena scalp EEG databases. Especially, the parameters of the STIT model can be updated in real-time based on patient data, without iterative training. More importantly, the STIT model can maintain high sensitivity and accuracy with only 48400 parameters, which is reduced by more than 91% compared with contrast models in this experiment. Therefore, the proposed method can significantly reduce the computational cost and accurately predict epileptic seizures, as well as with high real-time, practicality, applicability, and interpretability.

AAAI Conference 2025 Conference Paper

RUNA: Object-Level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations

  • Bin Zhang
  • Jinggang Chen
  • Xiaoyang Qu
  • Guokuan Li
  • Kai Lu
  • Jiguang Wan
  • Jing Xiao
  • Jianzong Wang

Enabling object detectors to recognize out-of-distribution (OOD) objects is vital for building reliable systems. A primary obstacle stems from the fact that models frequently do not receive supervisory signals from unfamiliar data, leading to overly confident predictions regarding OOD objects. Despite previous progress that estimates OOD uncertainty based on the detection model and in-distribution (ID) samples, we explore using pre-trained vision-language representations for object-level OOD detection. We first discuss the limitations of applying image-level CLIP-based OOD detection methods to object-level scenarios. Building upon these insights, we propose RUNA, a novel framework that leverages a dual encoder architecture to capture rich contextual information and employs a regional uncertainty alignment mechanism to distinguish ID from OOD objects effectively. We introduce a few-shot fine-tuning approach that aligns region-level semantic representations to further improve the model's capability to discriminate between similar objects. Our experiments show that RUNA substantially surpasses state-of-the-art methods in object-level OOD detection, particularly in challenging scenarios with diverse and complex object instances.

JBHI Journal 2025 Journal Article

UKF-Based Model Parameter Estimation to Localize the Seizure Onset Zone in ECoG

  • Junfeng Lu
  • Donghui Zhang
  • Kunlin Guo
  • Kunying Meng
  • Denghai Wang
  • Kai Lu
  • Renping Yu
  • Lifang Yang

Drug-resistant epilepsy (DRE) patients typically require surgical intervention or neurostimulation. Therefore, accurate localization of the seizure onset zone (SOZ) is essential for effective clinical intervention. Although some physiologically meaningful parameters of neural computational models show substantial differences across brain regions during seizures, few studies pay attention to applying these model parameters to SOZ localization. To investigate whether the parameter can be used for accurate SOZ localization, the unscented kalman filter (UKF) is employed to estimate the excitatory-inhibitory balance parameter c from the Z6 neural computational model using DRE patients' electrocorticography (ECoG). The results indicate that this parameter follows a unimodal distribution during the pre-ictal period and the post-ictal period, while exhibiting a bimodal distribution during the ictal period. Then, the distribution of this parameter is combined with machine learning methods, and a bagged tree classifier is constructed to localize the SOZ. The classification results demonstrate that the classifier based on parameter distributions exhibits excellent performance, particularly during the post-ictal period, with an average accuracy of 91. 60%. Interestingly, SOZ localization is more accurate when no lesions are detected on magnetic resonance imaging (MRI) compared to when lesions are present. Finally, the model parameter distributions of the SOZs are utilized to predict the outcome of epilepsy surgery. Of note, the results demonstrate that the parameter distribution accurately predicts surgical outcomes with an average accuracy of 92. 56%. These findings suggest that the distribution of neural computational model parameters may serve as biomarkers for SOZ localization and epilepsy surgery outcome prediction, providing valuable support and assistance for clinical decision-making.

NeurIPS Conference 2024 Conference Paper

SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors

  • Chenyang Ma
  • Kai Lu
  • Ta-Ying Cheng
  • Niki Trigoni
  • Andrew Markham

Current state-of-the-art spatial reasoning-enhanced VLMs are trained to excel at spatial visual question answering (VQA). However, we believe that higher-level 3D-aware tasks, such as articulating dynamic scene changes and motion planning, require a fundamental and explicit 3D understanding beyond current spatial VQA datasets. In this work, we present SpatialPIN, a framework designed to enhance the spatial reasoning capabilities of VLMs through prompting and interacting with priors from multiple 3D foundation models in a zero-shot, training-free manner. Extensive experiments demonstrate that our spatial reasoning-imbued VLM performs well on various forms of spatial VQA and can extend to help in various downstream robotics tasks such as pick and stack and trajectory planning.

NeurIPS Conference 2023 Conference Paper

DynPoint: Dynamic Neural Point For View Synthesis

  • Kaichen Zhou
  • Jia-Xing Zhong
  • Sangyun Shin
  • Kai Lu
  • Yiyuan Yang
  • Andrew Markham
  • Niki Trigoni

The introduction of neural radiance fields has greatly improved the effectiveness of view synthesis for monocular videos. However, existing algorithms face difficulties when dealing with uncontrolled or lengthy scenarios, and require extensive training time specific to each new scenario. To tackle these limitations, we propose DynPoint, an algorithm designed to facilitate the rapid synthesis of novel views for unconstrained monocular videos. Rather than encoding the entirety of the scenario information into a latent representation, DynPoint concentrates on predicting the explicit 3D correspondence between neighboring frames to realize information aggregation. Specifically, this correspondence prediction is achieved through the estimation of consistent depth and scene flow information across frames. Subsequently, the acquired correspondence is utilized to aggregate information from multiple reference frames to a target frame, by constructing hierarchical neural point clouds. The resulting framework enables swift and accurate view synthesis for desired views of target frames. The experimental results obtained demonstrate the considerable acceleration of training time achieved - typically an order of magnitude - by our proposed method while yielding comparable outcomes compared to prior approaches. Furthermore, our method exhibits strong robustness in handling long-duration videos without learning a canonical representation of video content.

NeurIPS Conference 2023 Conference Paper

Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation

  • Jia-Xing Zhong
  • Ta-Ying Cheng
  • Yuhang He
  • Kai Lu
  • Kaichen Zhou
  • Andrew Markham
  • Niki Trigoni

A truly generalizable approach to rigid segmentation and motion estimation is fundamental to 3D understanding of articulated objects and moving scenes. In view of the closely intertwined relationship between segmentation and motion estimates, we present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner. Our architecture is composed of two interconnected, lightweight heads. These heads predict segmentation masks using point-level invariant features and estimate motion from SE(3) equivariant features, all without the need for category information. Our training strategy is unified and can be implemented online, which jointly optimizes the predicted segmentation and motion by leveraging the interrelationships among scene flow, segmentation mask, and rigid transformations. We conduct experiments on four datasets to demonstrate the superiority of our method. The results show that our method excels in both model performance and computational efficiency, with only 0. 25M parameters and 0. 92G FLOPs. To the best of our knowledge, this is the first work designed for category-agnostic part-level SE(3) equivariance in dynamic point clouds.

JBHI Journal 2022 Journal Article

Investigate the Neuro Mechanisms of Stereoscopic Visual Fatigue

  • Kang Yue
  • Mei Guo
  • Yue Liu
  • Haochen Hu
  • Kai Lu
  • Shanshan Chen
  • Danli Wang

Stereoscopic visual fatigue (SVF) due to prolonged immersion in the virtual environment can lead to negative user experience, thus hindering the development of virtual reality (VR) industry. Previous studies have focused on investigating the evaluation indicators associated with SVF, while few studies have been conducted to reveal the underlying neural mechanism, especially in VR applications. In this paper, a modified Go/NoGo paradigm was adopted to induce SVF in VR environment with Go trials for maintaining participants’ attention and NoGo trials for investigating the neural effects under SVF. Random dot stereograms (RDSs) with 11 disparities were presented to evoke the depth-related visual evoked potentials (DVEPs) during 64-channel EEG recordings. EEG datasets collected from 15 participants in NoGo trials were selected to conduct individual processing and group analysis, in which the characteristics of the DVEPs components for various fatigue degrees were compared and independent components were clustered to explore the original cortex areas related to SVF. Point-by-point permutation statistics revealed that DVEPs sample points from 230 ms to 280 ms (component P2) in most brain areas changed significantly when SVF increased. Additionally, independent component analysis (ICA) identified that component P2 which originated from posterior cingulate cortex and precuneus, was associated statistically with SVF. We believe that SVF is rather a conscious status concerning the changes of self-awareness or self-location awareness than the performance reduction of retinal image processing. Moreover, we suggest that indicators representing higher conscious state may be a better indicator for SVF evaluation in VR environments.

AAAI Conference 2020 Conference Paper

Representation Learning with Multiple Lipschitz-Constrained Alignments on Partially-Labeled Cross-Domain Data

  • Songlei Jian
  • Liang Hu
  • Longbing Cao
  • Kai Lu

The cross-domain representation learning plays an important role in tasks including domain adaptation and transfer learning. However, existing cross-domain representation learning focuses on building one shared space and ignores the unlabeled data in the source domain, which cannot effectively capture the distribution and structure heterogeneities in cross-domain data. To address this challenge, we propose a new cross-domain representation learning approach: MUltiple Lipschitz-constrained AligNments (MULAN) on partiallylabeled cross-domain data. MULAN produces two representation spaces: a common representation space to incorporate knowledge from the source domain and a complementary representation space to complement the common representation with target local topological information by Lipschitzconstrained representation transformation. MULAN utilizes both unlabeled and labeled data in the source and target domains to address distribution heterogeneity by Lipschitzconstrained adversarial distribution alignment and structure heterogeneity by cluster assumption-based class alignment while keeping the target local topological information in complementary representation by self alignment. Moreover, MULAN is effectively equipped with a customized learning process and an iterative parameter updating process. MULAN shows its superior performance on partially-labeled semisupervised domain adaptation and few-shot domain adaptation and outperforms the state-of-the-art visual domain adaptation models by up to 12. 1%.

IJCAI Conference 2019 Conference Paper

Adversarial Examples for Graph Data: Deep Insights into Attack and Defense

  • Huijun Wu
  • Chen Wang
  • Yuriy Tyshetskiy
  • Andrew Docherty
  • Kai Lu
  • Liming Zhu

Graph deep learning models, such as graph convolutional networks (GCN) achieve state-of-the-art performance for tasks on graph data. However, similar to other deep learning models, graph deep learning models are susceptible to adversarial attacks. However, compared with non-graph data the discrete nature of the graph connections and features provide unique challenges and opportunities for adversarial attacks and defenses. In this paper, we propose techniques for both an adversarial attack and a defense against adversarial attacks. Firstly, we show that the problem of discrete graph connections and the discrete features of common datasets can be handled by using the integrated gradient technique that accurately determines the effect of changing selected features or edges while still benefiting from parallel computations. In addition, we show that an adversarially manipulated graph using a targeted attack statistically differs from un-manipulated graphs. Based on this observation, we propose a defense approach which can detect and recover a potential adversarial perturbation. Our experiments on a number of datasets show the effectiveness of the proposed techniques.

AAAI Conference 2019 Conference Paper

Evolutionarily Learning Multi-Aspect Interactions and Influences from Network Structure and Node Content

  • Songlei Jian
  • Liang Hu
  • Longbing Cao
  • Kai Lu
  • Hang Gao

The formation of a complex network is highly driven by multi-aspect node influences and interactions, reflected on network structures and the content embodied in network nodes. Limited work has jointly modeled all these aspects, which typically focuses on topological structures but overlooks the heterogeneous interactions behind node linkage and contributions of node content to the interactive heterogeneities. Here, we propose a multi-aspect interaction and influence-unified evolutionary coupled system (MAI-ECS) for network representation by involving node content and linkage-based network structure. MAI-ECS jointly and iteratively learns two systems: a multi-aspect interaction learning system to capture heterogeneous hidden interactions between nodes and an influence propagation system to capture multiaspect node influences and their propagation between nodes. MAI-ECS couples, unifies and optimizes the two systems toward an effective representation of explicit node content and network structure, and implicit node interactions and influences. MAI-ECS shows superior performance in node classification and link prediction in comparison with the stateof-the-art methods on two real-world datasets. Further, we demonstrate the semantic interpretability of the results generated by MAI-ECS.

AAAI Conference 2018 Conference Paper

Metric-Based Auto-Instructor for Learning Mixed Data Representation

  • Songlei Jian
  • Liang Hu
  • Longbing Cao
  • Kai Lu

Mixed data with both categorical and continuous features are ubiquitous in real-world applications. Learning a good representation of mixed data is critical yet challenging for further learning tasks. Existing methods for representing mixed data often overlook the heterogeneous coupling relationships between categorical and continuous features as well as the discrimination between objects. To address these issues, we propose an auto-instructive representation learning scheme to enable margin-enhanced distance metric learning for a discrimination-enhanced representation. Accordingly, we design a metric-based auto-instructor (MAI) model which consists of two collaborative instructors. Each instructor captures the feature-level couplings in mixed data with fully connected networks, and guides the infinite-margin metric learning for the peer instructor with a contrastive order. By feeding the learned representation into both partition-based and density-based clustering methods, our experiments on eight UCI datasets show highly significant learning performance improvement and much more distinguishable visualization outcomes over the baseline methods.

IJCAI Conference 2017 Conference Paper

Embedding-based Representation of Categorical Data by Hierarchical Value Coupling Learning

  • Songlei Jian
  • Longbing Cao
  • Guansong Pang
  • Kai Lu
  • Hang Gao

Learning the representation of categorical data with hierarchical value coupling relationships is very challenging but critical for the effective analysis and learning of such data. This paper proposes a novel coupled unsupervised categorical data representation (CURE) framework and its instantiation, i. e. , a coupled data embedding (CDE) method, for representing categorical data by hierarchical value-to-value cluster coupling learning. Unlike existing embedding- and similarity-based representation methods which can capture only a part or none of these complex couplings, CDE explicitly incorporates the hierarchical couplings into its embedding representation. CDE first learns two complementary feature value couplings which are then used to cluster values with different granularities. It further models the couplings in value clusters within the same granularity and with different granularities to embed feature values into a new numerical space with independent dimensions. Substantial experiments show that CDE significantly outperforms three popular unsupervised embedding methods and three state-of-the-art similarity-based representation methods.