Author name cluster

Lin Zhu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

EAAI Journal 2026 Journal Article

Entity and relation feature learning framework for sparse temporal knowledge graph reasoning

Luyi Bai
Xiangxi Meng
Lin Zhu

In the realm of temporal knowledge graphs, reasoning mechanisms are essential for uncovering time-dependent relationships and ensuring high interpretability. However, existing models often struggle with sparsely populated temporal knowledge graphs, which record only critical knowledge units. To address these challenges, this paper proposes an Entity and Relation feature learning framework for Reasoning in Sparse Temporal Knowledge Graphs, denoted as STKGR-ER. STKGR-ER utilizes a graph attention network to dynamically aggregate entity features across relations and timestamps, enhancing semantic accuracy. Gated recurrent units then learn latent logical rules and temporal patterns, reinforcing relation embeddings. By enriching entity features and learning from relational sequences, STKGR-ER effectively addresses information scarcity and reduces irrelevant path interference. Experiments conducted on twelve sparse datasets, ranging in size from 870 to 4833 entities and containing up to 21, 552 training quadruples, including subsets of ICEWS14 and ICEWS05-15, demonstrate that STKGR-ER significantly improves performances (ICEWS: Integrated Crisis Early Warning System). Notably, on the Hits@10 metric, STKGR-ER surpasses the best multi-hop path baselines by 11. 91%, 13. 37%, and 18. 09% on ICEWS14-10%, ICEWS14-20%, and ICEWS14-30%, respectively, and by 5. 87%, 12. 54%, and 13. 07% on ICEWS05-15-2%, ICEWS05-15-3%, and ICEWS05-15-5%, respectively, highlighting its strong reasoning capabilities in sparse temporal environments.

Details DOI

YNIMG Journal 2026 Journal Article

Multimodal radiomics of precisely segmented hippocampal subfields: Iron deposition and structural biomarkers for early diagnosis of Alzheimer's disease

Dongxue Li
Junjie He
Benqin Liu
Lin Zhu
Yuezong Yang
Yunsong Peng
Lisha Nie
Rongpin Wang

Profiling imaging biomarkers of prodromal Alzheimer's disease (AD) against AD dementia may aid earlier diagnosis, yet approaches jointly capturing iron-related pathology and hippocampal subfield heterogeneity remain scarce. We developed a hippocampal-subfield multimodal radiomics framework integrating quantitative susceptibility mapping (QSM) and 3D T1-weighted MRI. A primary cohort of 92 participants (50 prodromal AD, 42 CE dementia) and an independent external cohort of 30 (15/15) were included. Twenty-four hippocampal subfields were segmented on super-resolution T1 images and propagated to co-registered QSM for feature extraction. Radiomic features were condensed into a radiomics score (Rad-score) via a training-only selection pipeline. Using the Rad-score as the sole predictor, a support vector machine (SVM) classifier was trained. On the external cohort, the SVM achieved an area under the receiver operating characteristic curve of 0.85 and an accuracy of 0.83. The predictive signature was dominated by QSM texture features in Cornu Ammonis 1 and the granule cell layer of the dentate gyrus, complemented by T1 first-order heterogeneity. Modality ablation suggested potential-but not definitive-complementarity of multimodal integration. This framework shows promise for AD stage classification and warrants further validation in larger independent cohorts.

Details DOI

NeurIPS Conference 2025 Conference Paper

$\Delta \mathrm{Energy}$: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization

Lin Zhu
Yifeng Yang
Xinbing Wang
Qinying Gu
Nanyang Ye

Recent approaches for vision-language models (VLMs) have shown remarkable success in achieving fast downstream adaptation. When applied to real-world downstream tasks, VLMs inevitably encounter both the in-distribution (ID) data and out-of-distribution (OOD) data. The OOD datasets often include both covariate shifts (e. g. , known classes with changes in image styles) and semantic shifts (e. g. , test-time unseen classes). This highlights the importance of improving VLMs' generalization ability to covariate-shifted OOD data, while effectively detecting open-set semantic-shifted OOD classes. In this paper, inspired by the substantial energy change observed in closed-set data when re-aligning vision-language modalities—specifically by directly reducing the maximum cosine similarity to a low value—we introduce a novel OOD score, named $\Delta\mathrm{Energy}$. $\Delta\mathrm{Energy}$ significantly outperforms the vanilla energy-based OOD score and provides a more reliable approach for OOD detection. Furthermore, $\Delta\mathrm{Energy}$ can simultaneously improve OOD generalization under covariate shifts, which is achieved by lower-bound maximization for $\Delta\mathrm{Energy}$ (termed EBM). EBM is theoretically proven to not only enhance OOD detection but also yields a domain-consistent Hessian, which serves as a strong indicator for OOD generalization. Based on this finding, we developed a unified fine-tuning framework that allows for improving VLMs' robustness in both OOD generalization and OOD detection. Extensive experiments on challenging OOD detection and generalization benchmarks demonstrate the superiority of our method, outperforming recent approaches by 10\%–25\% in AUROC.

PDF Details

ICLR Conference 2025 Conference Paper

Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models

Lin Zhu
Xinbing Wang
Chenghu Zhou
Qinying Gu
Nanyang Ye 0001

Given a style-reference image as the additional image condition, text-to-image diffusion models have demonstrated impressive capabilities in generating images that possess the content of text prompts while adopting the visual style of the reference image. However, current state-of-the-art methods often struggle to disentangle content and style from style-reference images, leading to issues such as content leakages. To address this issue, we propose a masking-based method that efficiently decouples content from style without the need of tuning any model parameters. By simply masking specific elements in the style reference's image features, we uncover a critical yet under-explored principle: guiding with appropriately-selected fewer conditions (e.g., dropping several image feature elements) can efficiently avoid unwanted content flowing into the diffusion models, enhancing the style transfer performances of text-to-image diffusion models. In this paper, we validate this finding both theoretically and experimentally. Extensive experiments across various styles demonstrate the effectiveness of our masking-based method and support our theoretical results.

Details

NeurIPS Conference 2025 Conference Paper

Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation

Nan Bao
Yifan Zhao
Lin Zhu
Jia Li

Semantic segmentation has achieved great success in ideal conditions. However, when facing extreme conditions (e. g. , insufficient light, fierce camera motion), most existing methods suffer from significant information loss of RGB, severely damaging segmentation results. Several researches exploit the high-speed and high-dynamic event modality as a complement, but event and RGB are naturally heterogeneous, which leads to feature-level mismatch and inferior optimization of existing multi-modality methods. Different from these researches, we delve into the edge secret of both modalities for resilient fusion and propose a novel Edge-awareness Semantic Concordance framework to unify the multi-modality heterogeneous features with latent edge cues. In this framework, we first propose Edge-awareness Latent Re-coding, which obtains uncertainty indicators while realigning event-RGB features into unified semantic space guided by re-coded distribution, and transfers event-RGB distributions into re-coded features by utilizing a pre-established edge dictionary as clues. We then propose Re-coded Consolidation and Uncertainty Optimization, which utilize re-coded edge features and uncertainty indicators to solve the heterogeneous event-RGB fusion issues under extreme conditions. We establish two synthetic and one real-world event-RGB semantic segmentation datasets for extreme scenario comparisons. Experimental results show that our method outperforms the state-of-the-art by a 2. 55% mIoU on our proposed DERS-XS, and possesses superior resilience under spatial occlusion. Our code and datasets are publicly available at https: //github. com/iCVTEAM/ESC.

PDF Details

NeurIPS Conference 2025 Conference Paper

Rethinking Scale-Aware Temporal Encoding for Event-based Object Detection

Lin Zhu
Tengyu Long
Xiao Wang
Lizhi Wang
Hua Huang

Event cameras provide asynchronous, low-latency, and high-dynamic-range visual signals, making them ideal for real-time perception tasks such as object detection. However, effectively modeling the temporal dynamics of event streams remains a core challenge. Most existing methods follow frame-based detection paradigms, applying temporal modules only at high-level features, which limits early-stage temporal modeling. Transformer-based approaches introduce global attention to capture long-range dependencies, but often add unnecessary complexity and overlook fine-grained temporal cues. In this paper, we propose a CNN-RNN hybrid framework that rethinks temporal modeling for event-based object detection. Our approach is based on two key insights: (1) introducing recurrent modules at lower spatial scales to preserve detailed temporal information where events are most dense, and (2) utilizing Decoupled Deformable-enhanced Recurrent Layers specifically designed according to the inherent motion characteristics of event cameras to extract multiple spatiotemporal features, and performing independent downsampling at multiple spatiotemporal scales to enable flexible, scale-aware representation learning. These multi-scale features are then fused via a feature pyramid network to produce robust detection outputs. Experiments on Gen1, 1 Mpx and eTram dataset demonstrate that our approach achieves superior accuracy over recent transformer-based models, highlighting the importance of precise temporal feature extraction in early stages. This work offers a new perspective on designing architectures for event-driven vision beyond attention-centric paradigms. Code: https: //github. com/BIT-Vision/SATE.

PDF Details

ICML Conference 2024 Conference Paper

CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection

Lin Zhu
Yifeng Yang
Qinying Gu
Xinbing Wang
Chenghu Zhou
Nanyang Ye 0001

Recent vision-language pre-trained models (VL-PTMs) have shown remarkable success in open-vocabulary tasks. However, downstream use cases often involve further fine-tuning of VL-PTMs, which may distort their general knowledge and impair their ability to handle distribution shifts. In real-world scenarios, machine learning systems inevitably encounter both covariate shifts (e. g. , changes in image styles) and semantic shifts (e. g. , test-time unseen classes). This highlights the importance of enhancing out-of-distribution (OOD) generalization on covariate shifts and simultaneously detecting semantic-shifted unseen classes. Thus a critical but underexplored question arises: How to improve VL-PTMs’ generalization ability to closed-set OOD data, while effectively detecting open-set unseen classes during fine-tuning? In this paper, we propose a novel objective function of OOD detection that also serves to improve OOD generalization. We show that minimizing the gradient magnitude of energy scores on training data leads to domain-consistent Hessians of classification loss, a strong indicator for OOD generalization revealed by theoretical analysis. Based on this finding, we have developed a unified fine-tuning framework that allows for concurrent optimization of both tasks. Extensive experiments have demonstrated the superiority of our method. The code is available at https: //github. com/LinLLLL/CRoFT.

Details

EAAI Journal 2024 Journal Article

Embedding-based entity alignment between multi-source temporal knowledge graphs

Lin Zhu
Nan Li
Luyi Bai

The goal of entity alignment is to identify entities in two multi-source knowledge graphs (KGs) that represent the same real-world object. Recent researches on multi-source entity alignment mainly concentrate on static KGs. In fact, temporal KGs have become valuable resources for numerous artificial intelligence applications, and entity alignment between multi-source temporal KGs is becoming more and more important. Current entity alignment models cannot support temporal tasks and fail to deal with the attributes with low literal similarity that share the same semantics through attribute embedding. Therefore, we propose a RDF (Resource Description Framework)-based model for representing temporal KGs, and an embedding-based entity alignment method for multi-source temporal KGs. This method computes for the similarity of temporal information and generates aligned attribute pairs in the predicate alignment module. We design an interactive module to make matched attributes and the matched entities help to find each other based on aligned attribute pairs. This module can calculate the similarity of attributes with low literal similarity. After getting the structure similarity of the structure embedding module, the final entity alignment result of temporal KGs is produced by the calculation of a binary linear regression function. Experimental results demonstrate that our proposed model outperforms existing approaches significantly.

Details DOI

AAAI Conference 2024 Conference Paper

Finding Visual Saliency in Continuous Spike Stream

Lin Zhu
Xianzhang Chen
Xiao Wang
Hua Huang

As a bio-inspired vision sensor, the spike camera emulates the operational principles of the fovea, a compact retinal region, by employing spike discharges to encode the accumulation of per-pixel luminance intensity. Leveraging its high temporal resolution and bio-inspired neuromorphic design, the spike camera holds significant promise for advancing computer vision applications. Saliency detection mimic the behavior of human beings and capture the most salient region from the scenes. In this paper, we investigate the visual saliency in the continuous spike stream for the first time. To effectively process the binary spike stream, we propose a Recurrent Spiking Transformer (RST) framework, which is based on a full spiking neural network. Our framework enables the extraction of spatio-temporal features from the continuous spatio-temporal spike stream while maintaining low power consumption. To facilitate the training and validation of our proposed model, we build a comprehensive real-world spike-based visual saliency dataset, enriched with numerous light conditions. Extensive experiments demonstrate the superior performance of our Recurrent Spiking Transformer framework in comparison to other spike neural network-based methods. Our framework exhibits a substantial margin of improvement in capturing and highlighting visual saliency in the spike stream, which not only provides a new perspective for spike-based saliency segmentation but also shows a new paradigm for full SNN-based transformer models. The code and dataset are available at https://github.com/BIT-Vision/SVS.

PDF Details DOI

AAAI Conference 2024 Conference Paper

HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

Xiao Wang
Zongzhen Wu
Bo Jiang
Zhimin Bao
Lin Zhu
Guoqi Li
Yaowei Wang
Yonghong Tian

The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which usually suffer from illumination, fast motion, privacy preservation, and large energy consumption. Meanwhile, the biologically inspired event cameras attracted great interest due to their unique features, such as high dynamic range, dense temporal but sparse spatial resolution, low latency, low power, etc. As it is a newly arising sensor, even there is no realistic large-scale dataset for HAR. Considering its great practical value, in this paper, we propose a large-scale benchmark dataset to bridge this gap, termed HARDVS, which contains 300 categories and more than 100K event sequences. We evaluate and report the performance of multiple popular HAR algorithms, which provide extensive baselines for future works to compare. More importantly, we propose a novel spatial-temporal feature learning and fusion framework, termed ESTF, for event stream based human activity recognition. It first projects the event streams into spatial and temporal embeddings using StemNet, then, encodes and fuses the dual-view representations using Transformer networks. Finally, the dual features are concatenated and fed into a classification head for activity prediction. Extensive experiments on multiple datasets fully validated the effectiveness of our model. Both the dataset and source code will be released at https://github.com/Event-AHU/HARDVS.

PDF Details DOI

EAAI Journal 2024 Journal Article

Quadruple mention text-enhanced temporal knowledge graph reasoning

Lin Zhu
Wenjun Zhao
Luyi Bai

Most temporal knowledge graphs (TKGs) are incomplete, and TKGs reasoning can complete the missing information. TKGs reasoning can utilize various external text information, and the quadruple mention text is important text information. Existing temporal knowledge graph reasoning models only utilize structural quadruple information to complete reasoning tasks, while ignoring the rich semantic information in quadruple mention texts. In this paper, we propose a Quadruple Mention text-enhanced TKGs reasoning model (QM-mod). It can utilize both graph structure information and quadruple mention texts to accomplish TKGs reasoning tasks. Specifically, we extract quadruple mention text information from the NOW (News on the web) corpus. Then, we conduct experiments on multiple tasks, including link prediction, quadruple classification, ablation experiments and embedding dimension analysis. Experimental results show that our model has performance advantages on most metrics. The average improvement is 1. 89% in the link prediction experiment and the average improvement is 8. 1% in the quadruple classification experiment.

Details DOI

AAAI Conference 2023 Conference Paper

Bayesian Cross-Modal Alignment Learning for Few-Shot Out-of-Distribution Generalization

Lin Zhu
Xinbing Wang
Chenghu Zhou
Nanyang Ye

Recent advances in large pre-trained models showed promising results in few-shot learning. However, their generalization ability on two-dimensional Out-of-Distribution (OoD) data, i.e., correlation shift and diversity shift, has not been thoroughly investigated. Researches have shown that even with a significant amount of training data, few methods can achieve better performance than the standard empirical risk minimization method (ERM) in OoD generalization. This few-shot OoD generalization dilemma emerges as a challenging direction in deep neural network generalization research, where the performance suffers from overfitting on few-shot examples and OoD generalization errors. In this paper, leveraging a broader supervision source, we explore a novel Bayesian cross-modal image-text alignment learning method (Bayes-CAL) to address this issue. Specifically, the model is designed as only text representations are fine-tuned via a Bayesian modelling approach with gradient orthogonalization loss and invariant risk minimization (IRM) loss. The Bayesian approach is essentially introduced to avoid overfitting the base classes observed during training and improve generalization to broader unseen classes. The dedicated loss is introduced to achieve better image-text alignment by disentangling the causal and non-casual parts of image features. Numerical experiments demonstrate that Bayes-CAL achieved state-of-the-art OoD generalization performances on two-dimensional distribution shifts. Moreover, compared with CLIP-like models, Bayes-CAL yields more stable generalization performances on unseen classes. Our code is available at https://github.com/LinLLLL/BayesCAL.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Certifiable Out-of-Distribution Generalization

Nanyang Ye
Lin Zhu
Jia Wang
Zhaoyu Zeng
Jiayao Shao
Chensheng Peng
Bikang Pan
Kaican Li

Machine learning methods suffer from test-time performance degeneration when faced with out-of-distribution (OoD) data whose distribution is not necessarily the same as training data distribution. Although a plethora of algorithms have been proposed to mitigate this issue, it has been demonstrated that achieving better performance than ERM simultaneously on different types of distributional shift datasets is challenging for existing approaches. Besides, it is unknown how and to what extent these methods work on any OoD datum without theoretical guarantees. In this paper, we propose a certifiable out-of-distribution generalization method that provides provable OoD generalization performance guarantees via a functional optimization framework leveraging random distributions and max-margin learning for each input datum. With this approach, the proposed algorithmic scheme can provide certified accuracy for each input datum's prediction on the semantic space and achieves better performance simultaneously on OoD datasets dominated by correlation shifts or diversity shifts. Our code is available at https://github.com/ZlatanWilliams/StochasticDisturbanceLearning.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Retinomorphic Object Detection in Asynchronous Visual Streams

Jianing Li
Xiao Wang
Lin Zhu
Jia Li
Tiejun Huang
Yonghong Tian

Due to high-speed motion blur and challenging illumination, conventional frame-based cameras have encountered an important challenge in object detection tasks. Neuromorphic cameras that output asynchronous visual streams instead of intensity frames, by taking the advantage of high temporal resolution and high dynamic range, have brought a new perspective to address the challenge. In this paper, we propose a novel problem setting, retinomorphic object detection, which is the first trial that integrates foveal-like and peripheral-like visual streams. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i. e. , PKU- Vidar-DVS) over 215. 5k spatio-temporal synchronized labels. Then, we design temporal aggregation representations to preserve the spatio-temporal information from asynchronous visual streams. Finally, we present a novel bio-inspired unifying framework to fuse two sensing modalities via a dynamic interaction mechanism. Our experimental evaluation shows that our approach has significant improvements over the stateof-the-art methods with the single-modality, especially in high-speed motion and low-light scenarios. We hope that our work will attract further research into this newly identified, yet crucial research direction. Our dataset can be available at https: //www. pkuml. org/resources/pku-vidar-dvs. html.

PDF Details

TIST Journal 2021 Journal Article

VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model

Guodao Sun
Hao Wu
Lin Zhu
Chaoqing Xu
Haoran Liang
Binwei Xu
Ronghua Liang

With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.

Details DOI

AAAI Conference 2019 Conference Paper

A Domain Generalization Perspective on Listwise Context Modeling

Lin Zhu
Yihong Chen
Bowen He

As one of the most popular techniques for solving the ranking problem in information retrieval, Learning-to-rank (LETOR) has received a lot of attention both in academia and industry due to its importance in a wide variety of data mining applications. However, most of existing LETOR approaches choose to learn a single global ranking function to handle all queries, and ignore the substantial differences that exist between queries. In this paper, we propose a domain generalization strategy to tackle this problem. We propose Query- Invariant Listwise Context Modeling (QILCM), a novel neural architecture which eliminates the detrimental influence of inter-query variability by learning query-invariant latent representations, such that the ranking system could generalize better to unseen queries. We evaluate our techniques on benchmark datasets, demonstrating that QILCM outperforms previous state-of-the-art approaches by a substantial margin.

PDF Details

IJCAI Conference 2019 Conference Paper

HDI-Forest: Highest Density Interval Regression Forest

Lin Zhu
Jiaxing Lu
Yihong Chen

By seeking the narrowest prediction intervals (PIs) that satisfy the specified coverage probability requirements, the recently proposed quality-based PI learning principle can extract high-quality PIs that better summarize the predictive certainty in regression tasks, and has been widely applied to solve many practical problems. Currently, the state-of-the-art quality-based PI estimation methods are based on deep neural networks or linear models. In this paper, we propose Highest Density Interval Regression Forest (HDI-Forest), a novel quality-based PI estimation method that is instead based on Random Forest. HDI-Forest does not require additional model training, and directly reuses the trees learned in a standard Random Forest model. By utilizing the special properties of Random Forest, HDI-Forest could efficiently and more directly optimize the PI quality metrics. Extensive experiments on benchmark datasets show that HDI-Forest significantly outperforms previous approaches, reducing the average PI width by over 20% while achieving the same or better coverage probability.

PDF Details

AAAI Conference 2005 Conference Paper

Simultaneous Heuristic Search for Conjunctive Subgoals

Lin Zhu

We study the problem of building effective heuristics for achieving conjunctive goals from heuristics for individual goals. We consider a straightforward method for building conjunctive heuristics that smoothly trades off between previous common methods. In addition to first explicitly formulating the problem of designing conjunctive heuristics, our major contribution is the discovery that this straightforward method substantially outperforms previously used methods across a wide range of domains. Based on a single positive real parameter k, our heuristic measure sums the individual heuristic values for the subgoal conjuncts, each raised to the k’th power. Varying k allows loose approximation and combination of the previous min, max, and sum approaches, while mitigating some of the weaknesses in those approaches. Our empirical work shows that for many benchmark planning domains there exist fixed parameter values that perform well— we give evidence that these values can be found automatically by training. Our method, applied to top-level conjunctive goals, shows dramatic improvements over the heuristic used in the FF planner across a wide range of planning competition benchmarks. Also, our heuristic, without computing landmarks, consistently improves upon the success ratio of a recently published landmark-based planner FF-L.

PDF Details