Author name cluster

Jie Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

JBHI Journal 2026 Journal Article

A 6G-Enabled Hierarchical Contrastive Learning Framework for Multi-Scale Medical Time Series Analysis

Le Sun
Jie Lin
Zhiguo Qu
Yimin Yu
Jinliang Liu
Deepak Gupta
Yanchun Zhang

Medical time series analysis, particularly for electrocardiogram (ECG) and electroencephalogram (EEG) signals, is essential in modern diagnostics, supporting early detection of conditions such as arrhythmias and epileptic seizures. However, existing approaches often struggle to capture multi-scale periodic patterns and longrange dependencies while meeting real-time processing demands. The envisioned 6G networks, with their terahertz communication and integrated sensing and communication (ISAC) capabilities, will generate vast volumes of high-fidelity physiological data at the network edge. This paradigm shift intensifies the conflict between the computational complexity of advanced AI models and the limited resources of edge devices, creating a critical bottleneck for deploying sophisticated analytics in real-world healthcare scenarios. To overcome these limitations, this paper introduces a 6G-enabled hierarchical contrastive learning framework, referred to as Hierarchical Contrastive Learning for Multi-Scale Medical time series analysis (HCL-MSM), which integrates three core components: a signal-adaptive encoder based on multi-period decomposition and 2D convolution, a patient-level contrastive module enhanced with decomposable multi-scale mixing, and a 6G-edge deployment module optimized via quantization and pruning. The framework effectively models nested physiological rhythms and cross-time dependencies in medical data, while maintaining low-latency operation under resource-constrained edge environments. We evaluated HCL-MSM on multiple clinical datasets under simulated 6G settings. Our framework achieves significant gains in arrhythmia detection, seizure prediction, and neurological monitoring. We evaluated HCL-MSM on multiple clinical datasets under simulated 6G settings. Our framework achieves significant gains in arrhythmia detection (F1-score: 86. 39 percent), seizure prediction (Recall: 87. 72 percent), and neurological monitoring (Recall: 87. 8 percent), outperforming existing state-of- the-art methods.

Details DOI

EAAI Journal 2026 Journal Article

Bridging local and global representations: An inter-and intra-window based transformer for unsupervised depth completion

Tao Li
Xiucheng Dong
Jie Lin
Yonghong Peng

Depth perception plays an important role in the tasks of autonomous driving and robot navigation. Unsupervised depth completion can generate dense depth maps from sparse depth measurements and aligned color images without the supervision of ground-truth annotations. As both local detail recovery and global semantic representation are crucial for unsupervised depth completion, unlike existing methods that rely solely on convolutional architectures, we propose a Transformer architecture, called inter-window and intra-window Transformer network, to leverage the long-range semantic modeling capability of Transformers. To bridge accurate local detail learning and efficient global representation within the Transformer architecture, we decompose the self-attention mechanism into parallel inter-window and intra-window self-attention. Specifically, the inter-window self-attention captures dependencies among window statistics to achieve efficient global context modeling. Moreover, for intra-window self-attention, point-to-plane distances are derived from depth features to guide the distinction of relevant and irrelevant image feature regions. Local attention is then applied to the arbitrary-shaped relevant regions to extract more informative and accurate image features. Extensive experiments on standard datasets, including Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI), New York University Depth Dataset Version 2 (NYUv2), and Visual Odometry with Inertial and Depth (VOID), demonstrate the superiority of our model over existing unsupervised methods. Compared to the state-of-the-art method Calibrated Backprojection Network (KBNet), our model achieves a 6. 4% improvement in Root Mean Squared Error (RMSE) on the outdoor KITTI dataset, a 6. 8% improvement on the indoor NYUv2 dataset, and a 2. 6% improvement on the indoor and outdoor VOID dataset.

Details DOI

AAAI Conference 2026 Conference Paper

CRAF: A Clinical Reasoning-Adaptive Framework via Reinforcement Learning for Similar Case Retrieval

Jie Lin
Lei Jiang
Zongyi Chen
Liansheng Wang

With the advancement of information retrieval (IR) technologies toward deep semantic understanding, reasoning-based methods—featuring explicit chain-of-thought generation—have demonstrated significant advantages in multi-hop and causal reasoning tasks. However, in complex clinical case retrieval scenarios, implicit reasoning cues within clinical data often hinder current models from effectively capturing deep semantic associations between queries and cases. Query rewriting and expansion techniques based on reasoning offer a promising solution to this challenge by uncovering and completing the latent clinical intent behind user queries, thereby enhancing semantic coverage and reasoning sensitivity. In this paper, we propose CRAF, a clinically adaptive reasoning framework tailored for similar case retrieval. Our method generates clinical reasoning paths and incorporates a fine-grained semantic reward mechanism, enabling efficient query rewriting through reinforcement learning. Experimental results on the PMC-Patients benchmark demonstrate that CRAF consistently delivers robust improvements across multiple retrieval tasks, achieving reasoning performance comparable to that of commercial models.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Aerodynamic Coefficients Prediction via Cross-Attention Fusion and Physical-Informed Training

Yueqing Wang
Peng Zhang
Yushuang Liu
Jianing Zhao
Jie Lin
Yi Chen

Aerodynamic coefficient prediction is pivotal in aircraft and vehicles' design, performance evaluation, and motion control. Integrating artificial neural networks into aerodynamic coefficient prediction offers a promising alternative to traditional numerical methods burdened by extensive computations and high costs. Nevertheless, this data-driven approach faces several critical challenges, which limit its further performance enhancement: i) The current research lacks a profound understanding of the complex interplay between the shape of an object and its aerodynamic characteristics. ii) The scarcity of high-quality aerodynamic data poses a significant barrier. The models trained on limited datasets lack generalization ability, struggling to accurately predict and adapt to diverse aerodynamic performance under new shapes or conditions. To overcome these challenges, we introduce an innovative framework that employs cross-attention to capture the intimate interplay between shape and flow conditions and allows for the direct utilization of pre-trained models on general shape datasets to mitigate the scarcity of aerodynamic data. Furthermore, to bolster the inference capabilities of this data-driven approach, we integrate physical information constraints into the model, leveraging them as guiding principles to enhance the model's predictive power under unknown conditions. Experimental validation demonstrates that our proposed method performs excellently in multiple aerodynamic prediction tasks. This achievement brings a new technological breakthrough to the field of aerodynamic prediction and provides robust support for the design optimization of complex systems such as aircraft and vehicles.

PDF Details DOI

ICRA Conference 2025 Conference Paper

Uncertainty-Aware Probabilistic Risk Quantification of SOTIF for Autonomous Vehicles

Botao Yao
Shuohan Huang
Chuanyi Liu
Peiyi Han
Jie Lin
Shaoming Duan

Ensuring the Safety of the Intended Functionality (SOTIF) for autonomous vehicles (AVs) is critical. Effective risk assessment helps AVs make decisions and avoid risks. However, existing methods face challenges due to environmental uncertainties, insufficient multi-dimensional risk quantification, and limited predictive accuracy. To address this challenge, we propose an uncertainty-aware probabilistic risk assessment framework that quantifies the risk of AVs violating safety constraints and calculates the expected average severity of such violations in uncertain environments. We first establish a general SOTIF risk model to characterize the static risk of the AV and surrounding traffic participants. Following this, we introduce a method for predicting dynamic uncertainty risks, resulting in probabilistic risk quantification. This framework accounts for multi-dimensional uncertainties and enhances safety under dynamic conditions. Extensive evaluations across typical traffic scenarios-including highways, intersections, and roundabouts-demonstrate that our method outperforms typical algorithms like Time Headway (THW) and Time-toCollision (TTC). Empirical studies in extreme scenarios further validate the framework's ability to reduce risks and improve system generalization. The related code is available at: https://github.com/idslab-autosec/risk_uncertainty.

Details

ICLR Conference 2024 Conference Paper

A Framework for Inference Inspired by Human Memory Mechanisms

Xiangyu Zeng
Jie Lin
Piao Hu
Ruizheng Huang
Zhicheng Zhang

How humans and machines make sense of current inputs for relation reasoning and question-answering while putting the perceived information into context of our past memories, has been a challenging conundrum in cognitive science and artificial intelligence. Inspired by human brain's memory system and cognitive architectures, we propose a PMI framework that consists of perception, memory and inference components. Notably, the memory module comprises working and long-term memory, with the latter endowed with a higher-order structure to retain extensive and complex relational knowledge and experience. Through a differentiable competitive write access, current perceptions update working memory, which is later merged with long-term memory via outer product associations, reducing information conflicts and averting memory overflow. In the inference module, relevant information is retrieved from two separate memory origins and associatively integrated to attain a more comprehensive and precise interpretation of current perceptions. We exploratively apply our PMI to improve prevailing Transformers and CNN models on question-answering tasks like bAbI-20k and Sort-of-CLEVR datasets, as well as detecting equilateral triangles, language modeling and image classification tasks, and in each case, our PMI enhancements consistently outshine their original counterparts significantly. Visualization analyses reveal that relational memory consolidation, along with the interaction and integration of information from diverse memory sources, substantially contributes to the model effectiveness on inference tasks.

Details

EAAI Journal 2024 Journal Article

A hypervolume fraction-based adaptive evolutionary algorithm for many-objective optimization and the application to electromagnetic device design

Jie Lin
Sheng Xin Zhang
Yi Jiao Xu
Shao Yong Zheng

Performance of many-objective evolutionary algorithms (MaOEAs) heavily depends on the environmental selection strategy which determines the offspring for next generations. One kind of selection strategy may only suit certain kinds of optimization problems. Moreover, one single strategy might not always work well at different evolutionary stages. To adaptively adjust the environmental selection strategy, this paper proposes a hypervolume fraction-based adaptive evolutionary algorithm (HFAEA). First, a hypervolume fraction-based estimation method is proposed to address the difficulty in detecting the feature of Pareto front. It calculates the ratio of the hypervolume of population coverage to the hypervolume of coordinate axis coverage. With a small or large hypervolume fraction, Pareto front is regarded as irregular or regular respectively and an adaptive switching strategy adaptively selects a proposed vector angle-based strategy or an improved reference vector-based strategy. HFAEA is compared with five state-of-the-art algorithms on 24 problems with a large hypervolume fraction and 24 problems with a small hypervolume fraction. Experimental results show that HFAEA is the most competitive in handling different kinds of problems. It outperforms algorithms that designed for irregular problems as well as algorithms that use uniformly distributed reference vectors in irregular problems. These findings highlight the effectiveness of the proposed hypervolume fraction-based estimation method. The superior performance is also demonstrated in two electromagnetic device optimization problems, including the designs of a compact single-layer butler matrix and a broadband filtering power divider, where better results than original ones are achieved and HFAEA also outperforms state-of-the-art MaOEAs.

Details DOI

ICRA Conference 2024 Conference Paper

A Track-based Colon Endoscopic Robot with Depth Perception Stereo Cameras for Haustral Fold Detection during Colonic Navigation

Shujing He
Yujie Zhang
Baoyi Huang
Jie Lin
Chaoyang Shi
Chengzhi Hu

Colon endoscopic robots represent a promising screening modality for the visualization of colon cancers with high sensitivity. However, current colonoscopy robots are often characterized by intricate and bulky mechanical structures, which pose practical challenges when moving through the complex and narrow environment of the colon. Moreover, these robots are typically equipped with a single camera, limiting their ability to accurately estimate the depth of haustral folds in the colon, which is of great importance for the active colonic navigation of the robots. To address these challenges, we develop a track-based stereoscopic endoscopic robot (TSER) which is equipped with four tracks positioned at the corners of its body. This innovative design maximizes the contact between the tracks and the colon wall, enhancing maneuverability. The tracks are constructed from de-molded polydimethylsiloxane (PDMS) and incorporate micro-patterns on their outer surfaces. We have proposed a straightforward strategy for detecting haustral folds using TSER’s stereo camera, which allows for precise identification of their position and depth. The TSER achieves an average motion speed of 9. 8 mm/s in a bellows tube that contains silicone oil and a speed of 5. 2 mm/s in an exvivo porcine intestinal segment. Impressively, the TSER boasts an 88. 11% accuracy rate in haustral fold depth estimation, surpassing the performance of existing geometric shape fitting methods. These results demonstrate that the TSER holds great potential for effective and efficient movement and inspection within the colon, offering a promising solution for improved colon cancer screening.

Details

EAAI Journal 2023 Journal Article

Locally informed gravitational search algorithm with hierarchical topological structure

Leyi Xiao
Chaodong Fan
Zhaoyang Ai
Jie Lin

Recently, gravitational search algorithm (GSA) has been successfully applied to solve various optimization problems. However, GSA tends to fall into local optimum because it ignores the environmental heterogeneity of agents. Therefore, locally informed gravitational search algorithm (LIGSA) based on neighborhood structure is proposed to balance exploration and exploitation. However, LIGSA ignores the differences in the evolutionary states between agents in the same neighborhood, and also ignores the differences in the evolutionary states between neighbors, which affects the performance of the algorithm. Therefore, a hierarchical locally informed gravitational search algorithm (HLIGSA) is proposed in this paper. This algorithm designs a hierarchical topology. In the lower layer, there are several non-overlapping neighborhoods, which constitute the whole population. Each agent in the neighborhood adaptively adjusts the gravitational constant according to its evolutionary state, so as to fully search the region of the neighborhood. The upper layer is a population composed of the best agents in each neighborhood, which performs gravity strategy or merging strategy for those neighborhoods that have lost the evolutionary capability, so as to balance the algorithm’s exploration and exploitation. The two layers work together to complete the entire search process. Experimental results show that HLIGSA outperforms many variants of GSA and many current state-of-the-art heuristic algorithms.

Details DOI

AAAI Conference 2021 Short Paper

Information Block Detection in Infographic Based on Spatial Proximity and Structural Similarity (Student Abstract)

Jie Lin
Xin Wu
Jianwei Lu
Yi Cai

The infographic is a type of visualization chart used to display information. Existing infographic understanding works utilize spatial proximity to group elements into information blocks. However, these works ignore structural features such as background color and boundary, which results in poor performance towards complex infographics. We propose a Spatial and Structural Feature Extraction model to group elements based on spatial proximity and structural similarity. We introduce a new dataset for information block detection. Experiments show that our model can effectively identify the information blocks in the infographic.

PDF Details

AAAI Conference 2021 Conference Paper

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

Peng Hu
Xi Peng
Hongyuan Zhu
Mohamed M. Sabry Aly
Jie Lin

As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resourceconstrained hardware platforms, e. g. , smartphones. Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the key is to find suitable compression allocation (e. g. , pruning sparsity and quantization codebook) of each layer. Existing solutions obtain the compression allocation in an iterative/manual fashion while finetuning the compressed model, thus suffering from the efficiency issue. Different from the prior art, we propose a novel One-shot Pruning- Quantization (OPQ) in this paper, which analytically solves the compression allocation with pre-trained weight parameters only. During finetuning, the compression module is fixed and only weight parameters are updated. To our knowledge, OPQ is the first work that reveals pre-trained model is sufficient for solving pruning and quantization simultaneously, without any complex iterative/manual optimization at the finetuning stage. Furthermore, we propose a unified channelwise quantization method that enforces all channels of each layer to share a common codebook, which leads to low bitrate allocation without introducing extra overhead brought by traditional channel-wise quantization. Comprehensive experiments on ImageNet with AlexNet/MobileNet-V1/ResNet-50 show that our method improves accuracy and training efficiency while obtains significantly higher compression rates compared to the state-of-the-art.

PDF Details

YNICL Journal 2021 Journal Article

Reply to Josef Finsterer’s letter referring to “Connectivity on fMRI in the MELAS brain may strongly depend on heteroplasmy and extension or dynamics of stroke-like lesions”

Rong Wang
Jie Lin
Yuxin Li
Liqin Yang

Details DOI

AAAI Conference 2020 Conference Paper

Semi-Supervised Multi-Modal Learning with Balanced Spectral Decomposition

Peng Hu
Hongyuan Zhu
Xi Peng
Jie Lin

Cross-modal retrieval aims to retrieve the relevant samples across different modalities, of which the key problem is how to model the correlations among different modalities while narrowing the large heterogeneous gap. In this paper, we propose a Semi-supervised Multimodal Learning Network method (SMLN) which correlates different modalities by capturing the intrinsic structure and discriminative correlation of the multimedia data. To be speciﬁc, the labeled and unlabeled data are used to construct a similarity matrix which integrates the cross-modal correlation, discrimination, and intra-modal graph information existing in the multimedia data. What is more important is that we propose a novel optimization approach to optimize our loss within a neural network which involves a spectral decomposition problem derived from a ratio trace criterion. Our optimization enjoys two advantages given below. On the one hand, the proposed approach is not limited to our loss, which could be applied to any case that is a neural network with the ratio trace criterion. On the other hand, the proposed optimization is different from existing ones which alternatively maximize the minor eigenvalues, thus overemphasizing the minor eigenvalues and ignore the dominant ones. In contrast, our method will exactly balance all eigenvalues, thus being more competitive to existing methods. Thanks to our loss and optimization strategy, our method could well preserve the discriminative and instinct information into the common space and embrace the scalability in handling large-scale multimedia data. To verify the effectiveness of the proposed method, extensive experiments are carried out on three widely-used multimodal datasets comparing with 13 state-of-the-art approaches.

PDF Details

AAAI Conference 2020 Conference Paper

Spatial-Temporal Gaussian Scale Mixture Modeling for Foreground Estimation

Qian Ning
Weisheng Dong
Fangfang Wu
Jinjian Wu
Jie Lin
Guangming Shi

Subtracting the backgrounds from the video frames is an important step for many video analysis applications. Assuming that the backgrounds are low-rank and the foregrounds are sparse, the robust principle component analysis (RPCA)based methods have shown promising results. However, the RPCA-based methods suffered from the scale issue, i. e. , the 1-sparsity regularizer fails to model the varying sparsity of the moving objects. While several efforts have been made to address this issue with advanced sparse models, previous methods cannot fully exploit the spatial-temporal correlations among the foregrounds. In this paper, we proposed a novel spatial-temporal Gaussian scale mixture (STGSM) model for foreground estimation. In the proposed STGSM model, a temporal consistent constraint is imposed over the estimated foregrounds through nonzero-means Gaussian models. Speciﬁcally, the estimates of the foregrounds obtained in the previous frame are used as the prior for these of the current frame, and nonzero means Gaussian scale mixture models (GSM) are developed. To better characterize the temporal correlations, the optical ﬂow has been used to model the correspondences between foreground pixels in adjacent frames. The spatial correlations have also been exploited by considering that local correlated pixels should be characterized by the same STGSM model, leading to further performance improvements. Experimental results on real video datasets show that the proposed method performs comparably or even better than current state-of-the-art background subtraction methods.

PDF Details

YNICL Journal 2020 Journal Article

Topological reorganization of brain functional networks in patients with mitochondrial encephalomyopathy with lactic acidosis and stroke‐like episodes

Rong Wang
Jie Lin
Chong Sun
Bin Hu
Xueling Liu
Daoying Geng
Yuxin Li
Liqin Yang

Mitochondrial encephalomyopathy with lactic acidosis and stroke-like episodes (MELAS) is a rare maternally inherited genetic disease; however, little is known about its underlying brain basis. Furthermore, the topological organization of brain functional network in MELAS has not been explored. Here, 45 patients with MELAS (22 at acute stage, 23 at chronic stage) and 22 normal controls were studied using resting- state functional magnetic resonance imaging and graph theory analysis approaches. Topological properties of brain functional networks including global and nodal metrics, rich club organization and modularity were analyzed. At the global level, MELAS patients exhibited reduced clustering coefficient, normalized clustering coefficient, normalized characteristic path length and local network efficiency compared with the controls. At the nodal level, several nodes with abnormal degree centrality and nodal efficiency were detected in MELAS patients, and the distribution of these nodes was partly consistent with the stroke-like lesions. For rich club organization, rich club nodes were reorganized and the connections among them were decreased in MELAS patients. Modularity analysis revealed that MELAS patents had altered intra- or inter-modular connections in default mode network, fronto-parietal network, sensorimotor network, occipital network and cerebellum network. Notably, the patients at acute stage showed more obvious changes in these topological properties than the patients at chronic stage. These findings indicated that MELAS patients, particularly those at acute stage, exhibited topological reorganization of the whole-brain functional network. This study may help us to understand the neuropathological mechanisms of MELAS.

Details DOI

IJCAI Conference 2017 Conference Paper

Discriminative Deep Hashing for Scalable Face Image Retrieval

Jie Lin
Zechao Li
Jinhui Tang

With the explosive growth of images containing faces, scalable face image retrieval has attracted increasing attention. Due to the amazing effectiveness, deep hashing has become a popular hashing method recently. In this work, we propose a new Discriminative Deep Hashing (DDH) network to learn discriminative and compact hash codes for large-scale face image retrieval. The proposed network incorporates the end-to-end learning, the divide-and-encode module and the desired discrete code learning into a unified framework. Specifically, a network with a stack of convolution-pooling layers is proposed to extract multi-scale and robust features by merging the outputs of the third max pooling layer and the fourth convolutional layer. To reduce the redundancy among hash codes and the network parameters simultaneously, a divide-and-encode module to generate compact hash codes. Moreover, a loss function is introduced to minimize the prediction errors of the learned hash codes, which can lead to discriminative hash codes. Extensive experiments on two datasets demonstrate that the proposed method achieves superior performance compared with some state-of-the-art hashing methods.

PDF Details

IJCAI Conference 2017 Conference Paper

Object Detection Meets Knowledge Graphs

Yuan Fang
Kingsley Kuan
Jie Lin
Cheston Tan
Vijay Chandrasekhar

Object detection in images is a crucial task in computer vision, with important applications ranging from security surveillance to autonomous vehicles. Existing state-of-the-art algorithms, including deep neural networks, only focus on utilizing features within an image itself, largely neglecting the vast amount of background knowledge about the real world. In this paper, we propose a novel framework of knowledge-aware object detection, which enables the integration of external knowledge such as knowledge graphs into any object detection algorithm. The framework employs the notion of semantic consistency to quantify and generalize knowledge, which improves object detection through a re-optimization process to achieve better consistency with background knowledge. Finally, empirical evaluation on two benchmark datasets show that our approach can significantly increase recall by up to 6. 3 points without compromising mean average precision, when compared to the state-of-the-art baseline.

PDF Details

IJCAI Conference 2015 Conference Paper

Hamming Compatible Quantization for Hashing

Zhe Wang
Ling-Yu Duan
Jie Lin
Xiaofang Wang
Tiejun Huang
Wen Gao

Hashing is one of the effective techniques for fast Approximate Nearest Neighbour (ANN) search. Traditional single-bit quantization (SBQ) in most hashing methods incurs lots of quantization error which seriously degrades the search performance. To address the limitation of SBQ, researchers have proposed promising multi-bit quantization (MBQ) methods to quantize each projection dimension with multiple bits. However, some MBQ methods need to adopt specific distance for binary code matching instead of the original Hamming distance, which would significantly decrease the retrieval speed. Two typical MBQ methods Hierarchical Quantization and Double Bit Quantization retain the Hamming distance, but both of them only consider the projection dimensions during quantization, ignoring the neighborhood structure of raw data inherent in Euclidean space. In this paper, we propose a multi-bit quantization method named Hamming Compatible Quantization (HCQ) to preserve the capability of similarity metric between Euclidean space and Hamming space by utilizing the neighborhood structure of raw data. Extensive experiment results have shown our approach significantly improves the performance of various stateof-the-art hashing methods while maintaining fast retrieval speed.

PDF Details