Author name cluster

He Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers

2 author rows

TAAS Journal 2026 Journal Article

Adaptive Scheduling of Multimodal Large Language Model in Intelligent Edge Computing

Xingyu Yuan
He Li
Mianxiong Dong
Kaoru Ota

Multimodal Large Language Models (MLLMs) integrate multimodal encoders with Large Language Models (LLMs) to overcome the limitations of text-only models. Traditional LLMs are deployed on high-performance cloud servers, but MLLMs, which process multimodal data, face high transmission latency and privacy risks when tasks are offloaded to the cloud. Intelligent edge computing is a promising solution for supporting such latency-sensitive and privacy-sensitive tasks. However, the heterogeneity of edge environments makes efficient MLLM inference challenging. In this work, we enhance MLLM inference efficiency in heterogeneous edge environments by decoupling MLLM into LLM and multimodal encoders, deploying the LLM on high-performance devices and the multimodal encoders on lower-capability devices. Additionally, we observe that processing MLLM tasks in edge environments involves numerous configuration parameters that impact inference speed and energy consumption in an unknown and possibly time-varying fashion. To address this challenge, we present an adaptive scheduling algorithm that assigns parameters to tasks or minimizing energy consumption while meeting maximum latency constraints. The results of extensive experimental trials demonstrate that the proposed approach consistently outperforms existing state-of-the-art methods, achieving significant improvements in both latency reduction and energy efficiency.

AAAI Conference 2026 Conference Paper

ARGH-Mark: Anchor-Synchronized Watermarking with Hamming Correction for Robust and Quality-Preserving LLM Attribution

He Li
Xiaojun Chen
Jingcheng He
Zhendong Zhao
Shuguang Yuan
Xin Zhao
Yunfei Yang

The proliferation of large language models has intensified demands for reliable content attribution, yet existing watermarking techniques face a fundamental trilemma: they cannot simultaneously optimize for robustness against attacks, minimal text quality degradation, and detection efficiency. To resolve this challenge, we propose ARGH-Mark, a novel watermarking framework that integrates three synergistic innovations: (1) Anchor-synchronized phase recovery for maintaining detection integrity under insertion/deletion attacks, (2) RG-balanced vocabulary modulation that dynamically partitions lexicons via contextual hashing to preserve generation quality, and (3) Hamming-based error correction enabling single-bit error rectification through algebraic coding. Comprehensive evaluations across question answering (ELI5), summarization (CNN/DailyMail), and text generation (C4) demonstrate state-of-the-art performance: the proposed ARGH-Mark framework achieves near-perfect match rate and bit accuracy across diverse configurations, while preserving the quality of the generated text. It significantly reduces detection latency, enabling real-time extraction, and maintains high robustness against token tampering attacks through integrated Hamming error correction, ensuring reliable attribution in adversarial settings. ARGH-Mark achieves a new Pareto frontier in the watermarking design space and advances trustworthy deployment of generative AI in alignment-critical applications.

PDF Details DOI

EAAI Journal 2026 Journal Article

Automatic classification of circulating blood cell clusters based on multi-channel flow cytometry imaging

Suqiang Ma
Subhadeep Sengupta
Yao Lee
Beikang Gu
Xianyan Chen
Xianqiao Wang
Yang Liu
Mengjia Xu

Circulating blood cell clusters (CCCs) containing red blood cells (RBCs), white blood cells (WBCs), and platelets are significant biomarkers linked to pathological conditions like thrombosis, infection, and inflammation. Flow cytometry, paired with fluorescence staining, is commonly used to analyze these cell clusters, revealing cell morphology and protein profiles. While computational approaches based on machine learning have advanced the automatic analysis of single-cell flow cytometry images, there is a lack of effort to build tools to automatically analyze images containing CCCs. Unlike single cells, cell clusters exhibit irregular shapes and sizes. In addition, these cell clusters often consist of heterogeneous cell types, which require multi-channel staining to identify the specific cell types within the clusters. To address these challenges, we introduce a new computational framework for analyzing CCC images and identifying cell types within clusters. Our framework uses a two-step analysis strategy. First, it categorizes images into cell cluster and non-cluster groups by fine-tuning the You Only Look Once (YOLOv11) model, which outperforms traditional convolutional neural networks (CNNs), such as Vision Transformers (ViT). Then, it identifies cell types by overlaying cluster contours with regions from multi-channel fluorescence stains, thereby minimizing the impact of cell debris and staining artifacts. This approach achieved over 95% accuracy in both cluster classification and cell phenotype identification. In summary, our automated framework effectively analyzes CCC images from flow cytometry, leveraging both bright-field and fluorescence data. Initially tested on blood cells, it holds potential for broader applications, such as analyzing immune and tumor cell clusters, supporting cellular research across various diseases.

AAAI Conference 2026 Conference Paper

DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks

Yunfei Yang
Xiaojun Chen
Yuexin Xuan
Zhendong Zhao
Xin Zhao
He Li

Model watermarking techniques can embed watermark information into the protected model for ownership declaration by constructing specific input-output pairs. However, existing watermarks are easily removed when facing model stealing attacks, and make it difficult for model owners to effectively verify the copyright of stolen models. In this paper, we analyze the root cause of the failure of current watermarking methods under model stealing scenarios and then explore potential solutions. Specifically, we introduce a robust watermarking framework, DeepTracer, which leverages a novel watermark samples construction method and a same-class coupling loss constraint. DeepTracer can incur a high-coupling model between watermark task and primary task that makes adversaries inevitably learn the hidden watermark task when stealing the primary task functionality. Furthermore, we propose an effective watermark samples filtering mechanism that elaborately select watermark key samples used in model ownership verification to enhance the reliability of watermarks. Extensive experiments across multiple datasets and models demonstrate that our method surpasses existing approaches in defending against various model stealing attacks, as well as watermark attacks, and achieves new state-of-the-art effectiveness and robustness.

PDF Details DOI

JBHI Journal 2025 Journal Article

A Home-based Dual-mode Upper Limb Rehabilitation System: Teleoperation Mode and Bilateral Mode with sEMG and IMU

He Li
Shuxiang Guo
Ruijie He
Bin Wang
Mingchao Ding

Upper limb hemiplegia is a common functional disorder among stroke patients, significantly affecting their quality of life. To address this issue, robot-assisted upper limb rehabilitation training has emerged as a new therapeutic approach, breaking through time and space limitations of traditional rehabilitation. Based on the above, a home-based dual-mode upper limb rehabilitation system is built, including teleoperation mode based on a cloud server and bilateral mode with fusion of Surface Electromyography (sEMG) and Inertial Measurement Unit (IMU). In the telerehabilitation mode, patients can receive professional guidance and regular training at home, greatly enhancing the accessibility of rehabilitation services. The experiments with the master side in Beijing City (China) and the slave side in three different cities are conducted through a cloud server. The slave side is controlled by the master side, and the contact force is sent back to the master side. In the bilateral mode, the intention of continuous movements across subjects can be accurately predicted via the fusion of sEMG and IMU, improving the naturalness of human-robot interaction. In the subject-independent modeling, the Root Mean Square Error (RMSE) under fusion showed a relative decrease of 15. 0329% (p −4 ) compared to IMU data alone, and a significantly greater reduction of 61. 9376% (p −4 ) in comparison with sEMG data alone. Robot-assisted upper limb exoskeleton, cloud-based teleoperation and bilateral training based on sEMG and IMU collectively form a new rehabilitation system, representing part of the future rehabilitation trend.

NeurIPS Conference 2025 Conference Paper

DKDR: Dynamic Knowledge Distillation for Reliability in Federated Learning

Yueyang Yuan
Wenke Huang
Frank Wan
Kaiqi Guan
He Li
Mang Ye

Federated Learning (FL) has demonstrated a promising future in privacy-friendly collaboration but it faces the data heterogeneity problem. Knowledge Distillation (KD) can serve as an effective method to address this issue. However, challenges arise from the unreliability of existing distillation methods in multi-domain scenarios. Prevalent distillation solutions primarily aim to fit the distributions of the global model directly by minimizing forward Kullback-Leibler divergence (KLD). This results in significant bias when the outputs of the global model are multi-peaked, which indicates the unreliability of the distillation pathway. Meanwhile, cross-domain update conflicts can notably reduce the accuracy of the global model (teacher model) in certain domains, reflecting the unreliability of the teacher model in these domains. In this work, we propose DKDR (Dynamic Knowledge Distillation for Reliability in Federated Learning), which dynamically assigns weights to forward and reverse KLD based on knowledge discrepancies. This enables clients to fit the outputs from the teacher precisely. Moreover, we use knowledge decoupling to identify domain experts, thus clients can acquire reliable domain knowledge from experts. Empirical results from single-domain and multi-domain image classification tasks demonstrate the effectiveness of the proposed method and the efficiency of its key modules. The code is available at https: //github. com/YueyangYuan/DKDR.

IJCAI Conference 2025 Conference Paper

Horae: A Domain-Agnostic Language for Automated Service Regulation

Yutao Sun
Mingshuai Chen
Tiancheng Zhao
Kangjia Zhao
He Li
Jintao Chen
Zhongyi Wang
Liqiang Lu

Artificial intelligence is rapidly encroaching on the field of service regulation. However, existing AI-based regulation techniques are often tailored to specific application domains and thus are difficult to generalize in an automated manner. This paper presents Horae, a unified specification language for modeling (multimodal) regulation rules across a diverse set of domains. We showcase how Horae facilitates an intelligent service regulation pipeline by further exploiting a fine-tuned large language model named RuleGPT that automates the Horae modeling process, thereby yielding an end-to-end framework for fully automated intelligent service regulation. The feasibility and effectiveness of our framework are demonstrated over a benchmark of various real-world regulation domains. In particular, we show that our open-sourced, fine-tuned RuleGPT with 7B parameters suffices to outperform GPT-3. 5 and perform on par with GPT-4o.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis

Mengxi Xiao
Ben Liu
He Li
Jimin Huang
Qianqian Xie
Xiaofen Zong
Mang Ye
Min Peng

The application of AI in psychiatric diagnosis faces significant challenges, including the subjective nature of mental health assessments, symptom overlap across disorders, and privacy constraints limiting data availability. To address these issues, we present MoodAngels, the first specialized multi-agent framework for mood disorder diagnosis. Our approach combines granular-scale analysis of clinical assessments with a structured verification process, enabling more accurate interpretation of complex psychiatric data. Complementing this framework, we introduce MoodSyn, an open-source dataset of 1, 173 synthetic psychiatric cases that preserves clinical validity while ensuring patient privacy. Experimental results demonstrate that MoodAngels outperforms conventional methods, with our baseline agent achieving 12. 3\% higher accuracy than GPT-4o on real-world cases, and our full multi-agent system delivering further improvements. Together, these contributions provide both an advanced diagnostic tool and a critical research resource for computational psychiatry, bridging important gaps in AI-assisted mental health assessment.

EAAI Journal 2025 Journal Article

Multi-modal multi-level feature representation learning for flow pattern identification of oil-water two-phase flow

Weihang Kong
Yaohan Chi
He Liu
Hongbao Tang
He Li

To tackle the difficulty of the traditional experimental methods in real-time monitoring and identification of the flow process, this paper introduces a novel flow pattern identification method of the vertical oil-water two-phase flow based on multi-modal multi-level feature representation. The one-dimensional electromagnetic signals are encoded into two-dimensional feature spaces to explore their structural complexity, evolutionary probability laws and nonlinear characteristics in multimodal domain, thereby generating a multi-modal representation of the electromagnetic signals. Subsequently, a multi-modal multi-level feature fusion network is developed for flow pattern identification network, which flexibly leverages effective information across different modalities and levels, thereby enhancing the identifying accuracy. Experimental results demonstrate that the proposed method achieves high accuracy on the constructed multi-modal dataset, proving its feasibility and effectiveness in identifying the flow pattern of the oil-water two-phase flow in vertical pipes.

AAAI Conference 2025 Conference Paper

NightReID: A Large-Scale Nighttime Person Re-Identification Benchmark

Yuxuan Zhao
Weijian Ruan
He Li
Mang Ye

Person re-identification (Re-ID) is crucial for intelligent surveillance systems, facilitating the identification of individuals across multiple camera views. While significant advancements have been made for daytime scenarios, ensuring reliable Re-ID performance during nighttime remains a significant challenge. Given the cost and limited accessibility of infrared cameras, we investigate a critical question: Can RGB cameras be effectively utilized for accurate Re-ID during nighttime? To address this, we introduce NightReID, a large-scale RGB Re-ID dataset collected from a real-world nighttime surveillance system. NightReID includes 1,500 identities and over 53,000 images, capturing diverse scenes with complex lighting and adverse weather conditions. This rich dataset provides a valuable benchmark for advancing nighttime Re-ID research. Moreover, we propose the Enhancement, Denoising, and Alignment (EDA) framework with two novel modules to enhance nighttime Re-ID performance. First, an unsupervised Image Enhancement and Denoising (IED) method is designed to improve the quality of nighttime images, preserving critical details while removing noise without requiring paired ground truth. Second, we introduce Data Distribution Alignment (DDA) through statistical priors, aligning the distributions between pre-training data and nighttime data to mitigate domain shift. Extensive experiments on multiple nighttime Re-ID datasets demonstrate the significance of NightReID and validate the efficacy, flexibility, and applicability of the EDA framework.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Pixel-wise Divide and Conquer for Federated Vessel Segmentation

Tian Chen
Wenke Huang
Zhihao Wang
Zekun Shi
He Li
Wenhui Dong
Mang Ye
Bo Du

Accurate vessel segmentation is essential for diagnosing and managing vascular and ophthalmic diseases. Traditional learning-based vessel segmentation methods heavily rely on high-quality, pixel-level annotated datasets. However, segmentation performance suffers significantly when applied in federated learning settings due to vessel morphology inconsistency and vessel-background imbalance. The former limits the ability of models to capture fine-grained vessels, while the latter overemphasizes background pixels and biases the model towards them. To address these challenges, we propose a novel method named Federated Vessel-Aware Calibration (FVAC), which leverages global uncertainty to provide differentiated guidance for clients, focusing on pixels of various morphologies that are difficult to distinguish. Furthermore, we introduce a foreground-background decoupling alignment strategy that utilizes more stable and balanced global features to mitigate semantic drift caused by vessel-background imbalance in local clients. Comprehensive experiments confirm the effectiveness of our method

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Prototype-guided Knowledge Propagation with Adaptive Learning for Lifelong Person Re-identification

Zhijie Lu
Wuxuan Shi
He Li
Mang Ye

Lifelong Person Re-identification (LReID) is essential in dynamic camera networks, which continually adapts to new environments while preserving previously acquired knowledge. Existing LReID techniques often preserve samples from past datasets to maintain old knowledge, potentially leading to privacy risks. While prototype-based methods offer privacy advantages, current approaches primarily focus on adjusting classifiers for image classification tasks, neglecting representation biases between old and new identities in person re-identification. This study introduces a novel Prototype-guided Knowledge Propagation (PKP) method, which mitigates discrepancies in similar identity images between old and new tasks by guiding prototype construction through triplet loss constraints. Additionally, to address disparities between prototypes and the updated feature extractor, an Adaptive Parameter Evolution (APE) strategy is proposed. APE optimizes the integration of the old and new models by assessing the importance of the new tasks, dynamically selecting the most pertinent parameters for updates according to their contribution to the current task. Extensive experiments on the LReID benchmark demonstrate that our approach surpasses state-of-the-art prototype-based LReID methods in terms of mAP and rank-1 accuracy. Code is available at https: //github. com/joyner-7/IJCAI2025-PKA.

PDF Details DOI

IROS Conference 2025 Conference Paper

TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification

Huaqi Tao
Bingxi Liu 0001
Calvin Chen
Tingjun Huang
He Li
Jinqiang Cui
Hong Zhang 0013

Visual Place Recognition (VPR) is a crucial capability for long-term autonomous robots, enabling them to identify previously visited locations using visual information. However, existing methods remain limited in indoor settings due to the highly repetitive structures inherent in such environments. We observe that scene texts frequently appear in indoor spaces and can help distinguish visually similar but different places. This inspires us to propose TextInPlace, a simple yet effective VPR framework that integrates Scene Text Spotting (STS) to mitigate visual perceptual ambiguity in repetitive indoor environments. Specifically, TextInPlace adopts a dual-branch architecture within a local parameter sharing network. The VPR branch employs attention-based aggregation to extract global descriptors for coarse-grained retrieval, while the STS branch utilizes a bridging text spotter to detect and recognize scene texts. Finally, the discriminative texts are filtered to compute text similarity and re-rank the top-K retrieved images. To bridge the gap between current text-based repetitive indoor scene datasets and the typical scenarios encountered in robot navigation, we establish an indoor VPR benchmark dataset, called Maze-with-Text. Extensive experiments on both custom and public datasets demonstrate that TextInPlace achieves superior performance over existing methods that rely solely on appearance information. The dataset, code, and trained models are publicly available at https://github.com/HqiTao/TextInPlace.

ICML Conference 2025 Conference Paper

Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation

He Li
Haoang Chi
Mingyu Liu
Wanrong Huang
Liyang Xu
Wenjing Yang 0002

The real world naturally has dimensions of time and space. Therefore, estimating the counterfactual outcomes with spatial-temporal attributes is a crucial problem. However, previous methods are based on classical statistical models, which still have limitations in performance and generalization. This paper proposes a novel framework for estimating counterfactual outcomes with spatial-temporal attributes using the Transformer, exhibiting stronger estimation ability. Under mild assumptions, the proposed estimator within this framework is consistent and asymptotically normal. To validate the effectiveness of our approach, we conduct simulation experiments and real data experiments. Simulation experiments show that our estimator has a stronger estimation capability than baseline methods. Real data experiments provide a valuable conclusion to the causal effect of conflicts on forest loss in Colombia. The source code is available at this URL.

EAAI Journal 2024 Journal Article

A parallel deep neural network for intelligent fault diagnosis of drilling pumps

Junyu Guo
Yulai Yang
He Li
Le Dai
Bangkui Huang

This paper introduces a novel parallel deep neural network for fault diagnosis of drilling pumps. It integrates the Convolutional Block Attention Module with the AlexNet and synchronizes with the Anomaly Transformer model to delve meticulously into both the time and time-frequency domains of signals. The method prioritizes the singular extraction and subsequent amalgamation of features, facilitating a detailed view of diagnostic data and mitigating the risk of interference and overfitting. The integration of the anomaly attention of the Anomaly Transformer with the features of the Convolutional Block Attention Module results in a distinctive dual attention mechanism that is critical to the methodology. This mechanism emphasizes essential features in both the time domain and the time-frequency domain, improving the accuracy of fault diagnosis. Verification with on-site data underscores the preeminence of the approach over existing models, signaling improved reliability and accuracy in diagnosing faults in drilling pumps. This meticulous approach offers promising advances in the study and application of fault diagnosis in energy equipment, demonstrating increased efficiency and refined accuracy.

EAAI Journal 2024 Journal Article

A self-decision ant colony clustering algorithm for electricity theft detection

Zhengqiang Yang
Linyue Liu
Ning Li
He Li

The load data features of some electricity-theft consumers during the theft period are similar to those of normal consumers, making these electricity-theft consumers outliers from the cluster of electricity-theft. The current classification method, which uses the mean value to determine the cluster centers, is vulnerable to the influence of outliers. Therefore, this paper proposes a self-decision ant colony clustering algorithm for electricity theft detection method that is targeted to self-decision which samples are used to update the cluster centers. The method constructs a dynamic weighting approach to determine the cluster centers based on the idea of Backpropagation, and updates the weights of each sample in the clusters to reflect the different importance of different samples, thus reducing the influence of outlier samples. A new activation function, Odd, is proposed to enhance the ability of the proposed method to solve linearly indistinguishable problems. A self-decision dropout mechanism is proposed which evolves the mechanism of randomly stopping the work of samples in clusters into a targeted and self-decision mechanism that stops the work of redundant or non-active samples as well as improves the contribution of outlier samples with positive effects. In this paper, the proposed method is tested by the electricity consumption data provided by the State Grid Corporation of China (SGCC) and the Smart* Data Set for Sustainability (SDSS) provided by the UMass Trace Repository, and the experimental results show that the proposed method effectively solves the above problems with higher detection accuracy, it has certain advantages over other current studies.

NeurIPS Conference 2024 Conference Paper

Autoregressive Image Generation without Vector Quantization

Tianhong Li
Yonglong Tian
He Li
Mingyang Deng
Kaiming He

Conventional wisdom holds that autoregressive models for image generation are typically accompanied by vector-quantized tokens. We observe that while a discrete-valued space can facilitate representing a categorical distribution, it is not a necessity for autoregressive modeling. In this work, we propose to model the per-token probability distribution using a diffusion procedure, which allows us to apply autoregressive models in a continuous-valued space. Rather than using categorical cross-entropy loss, we define a Diffusion Loss function to model the per-token probability. This approach eliminates the need for discrete-valued tokenizers. We evaluate its effectiveness across a wide range of cases, including standard autoregressive models and generalized masked autoregressive (MAR) variants. By removing vector quantization, our image generator achieves strong results while enjoying the speed advantage of sequence modeling. We hope this work will motivate the use of autoregressive generation in other continuous-valued domains and applications. Code is available at https: //github. com/LTH14/mar.

PDF Details DOI

EAAI Journal 2024 Journal Article

Corrigendum to “A self-decision ant colony clustering algorithm for electricity theft detection” [Eng. Appl. Artif. Intell. 133, Part E (July 2024), 108442]

Zhengqiang Yang
Linyue Liu
Ning Li
He Li

EAAI Journal 2024 Journal Article

Cross-modal misalignment-robust feature fusion for crowd counting

Weihang Kong
Zepeng Yu
He Li
Junge Zhang

Mainstream crowd counting methods in Red-Green-Blue(RGB)-Thermal(T) information processing field concentrate on how to realize cross-modal complementary feature fusion. However, the cross-modal image misalignment issue has almost not been concerned and discussed for the target task, which substantially affects the precise feature extraction and fusion. Given this, this work intends to mitigate the adverse effects of the misalignment issue between the visible and thermal modalities on complementary feature representation. Specially, we design a cross-modal feature alignment fusion network for crowd counting, with a cross-modal feature alignment (CFA) module and triple-branch frequency-cascaded fusion (TFF) module as the core components. The CFA utilizes the design of the differential deformable representation to calibrate the cross-modal feature misalignment. Then the TFF employs three interactive differential branches with time-frequency joint feature modeling block to realize the mighty complementary feature representation. Extensive experiments, along with the ablation studies, show the effectiveness of the proposed method on the feature alignment and fusion over two challenging RGB-T crowd counting benchmarks. And the extended experimental results also suggest the feasibility of the proposed method on RGB-Depth crowd counting. Among them, the proposed method achieves 6. 97% improvement over the mean absolute error metrics than the suboptimal method on the surveillance-view benchmark, and 10. 72% improvement over the root mean square error. The development of the solution reveals the exploration over the cross-modal misalignment issue could promote the final counting effect, and also provides an effective resolution for the target RGB-Thermal crowd counting, as well as for other similar RGB-Thermal computer vision tasks.

NeurIPS Conference 2024 Conference Paper

Parameter Disparities Dissection for Backdoor Defense in Heterogeneous Federated Learning

Wenke Huang
Mang Ye
Zekun Shi
Guancheng Wan
He Li
Bo Du

Backdoor attacks pose a serious threat to federated systems, where malicious clients optimize on the triggered distribution to mislead the global model towards a predefined target. Existing backdoor defense methods typically require either homogeneous assumption, validation datasets, or client optimization conflicts. In our work, we observe that benign heterogeneous distributions and malicious triggered distributions exhibit distinct parameter importance degrees. We introduce the Fisher Discrepancy Cluster and Rescale (FDCR) method, which utilizes Fisher Information to calculate the degree of parameter importance for local distributions. This allows us to reweight client parameter updates and identify those with large discrepancies as backdoor attackers. Furthermore, we prioritize rescaling important parameters to expedite adaptation to the target distribution, encouraging significant elements to contribute more while diminishing the influence of trivial ones. This approach enables FDCR to handle backdoor attacks in heterogeneous federated learning environments. Empirical results on various heterogeneous federated scenarios under backdoor attacks demonstrate the effectiveness of our method.

PDF Details DOI

EAAI Journal 2024 Journal Article

Priori-distribution-guided adaptive sparse attention for cross-domain feature mining in diesel engine fault diagnosis

He Li
Jinjie Zhang
Zhenjing Zhang
Zhinong Jiang
Zhiwei Mao

Accurately locating the fault impacts and extracting sensitive fault features of vibration signals are challenging problems in diesel engine fault diagnosis. To address the limited integration of existing attention mechanisms with the knowledge of diesel engine operating principles, black-box feature extraction and insufficient interpretability problems, a novel method called priori-distribution adaptive sparse attention (PASA) is devised. This method translates the established priori-distribution of impact features guided by mechanistic knowledge into target formulas learnable, driving the model to learn attention results aligned with priori-distribution. Based on the attention results from PASA, a cross-domain feature mining (CDFM) method is proposed. Leveraging traditional thermodynamic and dynamic features associated with diesel engine operations, it accomplishes cross-domain feature extraction in frequency, time, and envelope domains, constructing a fault-sensitive feature set. Furthermore, the model structure for feature extraction is optimized, reducing model parameter complexity, and leading to the establishment of a diagnostic model. Fault experiments are conducted on two diesel engines to verify the proposed models, including misfires, valve malfunctions, collisions, and bush faults. The results demonstrate that compared to existing methods for fault diagnosis in diesel engines, the proposed approach accurately identifies vibration signal fault characteristics conforming to prior distributions. It shows well performance in four diagnostic indicators of diagnostic accuracy, precision, recall, and F1 score.

NeurIPS Conference 2024 Conference Paper

Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?

Haoang Chi
He Li
Wenjing Yang
Feng Liu
Long Lan
Xiaoguang Ren
Tongliang Liu
Bo Han

Causal reasoning capability is critical in advancing large language models (LLMs) towards artificial general intelligence (AGI). While versatile LLMs appear to have demonstrated capabilities in understanding contextual causality and providing responses that obey the laws of causality, it remains unclear whether they perform genuine causal reasoning akin to humans. However, current evidence indicates the contrary. Specifically, LLMs are only capable of performing shallow (level-1) causal reasoning, primarily attributed to the causal knowledge embedded in their parameters, but they lack the capacity for genuine human-like (level-2) causal reasoning. To support this hypothesis, methodologically, we delve into the autoregression mechanism of transformer-based LLMs, revealing that it is not inherently causal. Empirically, we introduce a new causal Q&A benchmark named CausalProbe 2024, whose corpus is fresh and nearly unseen for the studied LLMs. Empirical results show a significant performance drop on CausalProbe 2024 compared to earlier benchmarks, indicating that LLMs primarily engage in level-1 causal reasoning. To bridge the gap towards level-2 causal reasoning, we draw inspiration from the fact that human reasoning is usually facilitated by general knowledge and intended goals. Inspired by this, we propose G$^2$-Reasoner, a LLM causal reasoning method that incorporates general knowledge and goal-oriented prompts into LLMs' causal reasoning processes. Experiments demonstrate that G$^2$-Reasoner significantly enhances LLMs' causal reasoning capability, particularly in fresh and fictitious contexts. This work sheds light on a new path for LLMs to advance towards genuine causal reasoning, going beyond level-1 and making strides towards level-2.

PDF Details DOI

TIST Journal 2023 Journal Article

3D-Guided Frontal Face Generation for Pose-Invariant Recognition

Hao Wu
Jianyang Gu
Xiaojin Fan
He Li
Lidong Xie
Jian Zhao

Although deep learning techniques have achieved extraordinary accuracy in recognizing human faces, the pose variances of images captured in real-world scenarios still hinder reliable model appliance. To mitigate this gap, we propose to recognize faces via generation frontal face images with a 3D -Guided Deep P ose- I nvariant Face Recognition M odel (3D-PIM) consisted of a simulator and a refiner module. The simulator employs a 3D Morphable Model (3D MM) to fit the shape and appearance features and recover primary frontal images with less training data. The refiner further enhances the image realism on both global facial structure and local details with adversarial training, while keeping the discriminative identity information consistent with original images. An Adaptive Weighting (AW) metric is then adopted to leverage the complimentary information from recovered frontal faces and original profile faces and to obtain credible similarity scores for recognition. Extended experiments verify the superiority of the proposed “recognition via generation” framework over state-of-the-art.

EAAI Journal 2023 Journal Article

A deep learning model for efficient end-to-end stratification of thrombotic risk in left atrial appendage

Qi Gao
Hongtao Lin
Jianghong Qian
Xingli Liu
Shengze Cai
He Li
Hongguang Fan
Zhe Zheng

Clot formation in the left atrial appendage (LAA) poses a high risk of ischemic strokes and systemic embolism to patients with atrial fibrillation (AF), the most common type of sustained heart arrhythmia that affects more than 35 million people worldwide. Hemodynamic metrics evaluated using computational fluid dynamics (CFD) have been employed to assess the risk of thrombosis in LAA, but its utilization in clinical settings is limited due to their cumbersome operations and high computational cost. To this end, we propose UDGCNN (U: U-net, DGCNN: Dynamic Graph Convolutional Neural Network) that can utilize data from the point cloud of patient-specific LAA geometries as inputs and predict multiple hemodynamic indexes for systematic assessment of the thrombotic risk. The novelty of the proposed model lies in introducing edge convolution layers to the PointNet structure to improve the model’s capacity to learn local features, which is essential for training and predicting complex patient-specific geometries. Moreover, the network structure of UDGCNN is optimized by integrating the Encoder-Decoder structure and skip-connection to extract hierarchical features from the point cloud. The accuracy and efficiency of the UDGCNN is examined by training and testing on 371 LAA geometries from patients with AF, the largest patient-specific dataset in the literature. Using mean absolute error and model inference time as metrics, we demonstrate that UDGCNN can provide an assessment of multiple hemodynamic metrics in 3D patient-specific geometries with prediction error ∼ 30% lower than those of the state-of-art PointNet model, whereas the inference time is 500-fold shorter compared to computational time CFD simulation. It is noted that UDGCNN is a general computational framework that can be extended to study various cardiovascular diseases. In summary, this study presents a new computational tool that enables end-to-end stratification of thrombotic risk in LAA based on bioimaging, thereby advancing the current screening approach in clinical practice.

ICML Conference 2023 Conference Paper

Distribution Free Domain Generalization

Peifeng Tong
Wu Su
He Li
Jialin Ding
Zhan Haoxiang
Song Xi Chen

Accurate prediction of the out-of-distribution data is desired for a learning algorithm. In domain generalization, training data from source domains tend to have different distributions from that of the target domain, while the target data are absence in the training process. We propose a Distribution Free Domain Generalization (DFDG) procedure for classification by conducting standardization to avoid the dominance of a few domains in the training process. The essence of the DFDG is its reformulating the cross domain/class discrepancy by pairwise two sample test statistics, and equally weights their importance or the covariance structures to avoid dominant domain/class. A theoretical generalization bound is established for the multi-class classification problem. The DFDG is shown to offer a superior performance in empirical studies with fewer hyperparameters, which means faster and easier implementation.

IJCAI Conference 2023 Conference Paper

Dynamic Group Link Prediction in Continuous-Time Interaction Network

Shijie Luo
He Li
Jianbin Huang

Recently, group link prediction has received increasing attention due to its important role in analyzing relationships between individuals and groups. However, most existing group link prediction methods emphasize static settings or only make cursory exploitation of historical information, so they fail to obtain good performance in dynamic applications. To this end, we attempt to solve the group link prediction problem in continuous-time dynamic scenes with fine-grained temporal information. We propose a novel continuous-time group link prediction method CTGLP to capture the patterns of future link formation between individuals and groups. A new graph neural network CTGNN is presented to learn the latent representations of individuals by biasedly aggregating neighborhood information. Moreover, we design an importance-based group modeling function to model the embedding of a group based on its known members. CTGLP eventually learns a probability distribution and predicts the link target. Experimental results on various datasets with and without unseen nodes show that CTGLP outperforms the state-of-the-art methods by 13. 4% and 13. 2% on average.

PDF Details DOI

EAAI Journal 2023 Journal Article

Multi-level learning counting via pyramid vision transformer and CNN

Jiayu Liu
He Li
Weihang Kong

Severe scale variation has become a challenging issue for hindering the improvement of accuracy in crowd counting task. To tackle the problem, we propose a Pyramid Transformer CNN Network (PTCNet), an effective combination of the transformer and the CNN, which possesses both the global receptive fields and the locality to deal with the severe scale variation problems and boost the prediction accuracy. Firstly, we utilize the pyramid vision transformer to extract multi-level global context information of the crowd, aiming at different head scales. And then, the multi-level information is fully fused in the multi-level feature aggregation module where detailed crowd characteristics from all feature spaces are preserved to be further processed. Finally, we design a multi-branch regression head to enrich the crowd features for strong representations and regress the density maps. Extensive experiments on challenging datasets with complex scenarios and multiple scales demonstrate the effectiveness of the our method. The proposed method achieves competitive results comparing with the state-of-the-art approaches and achieves state-of-the-art results(MAE: 51. 7, RMSE: 79. 6) on ShanghaiTech Part_A dataset.

EAAI Journal 2023 Journal Article

TGM-Nets: A deep learning framework for enhanced forecasting of tumor growth by integrating imaging and modeling

Qijing Chen
Qi Ye
Weiqi Zhang
He Li
Xiaoning Zheng

Prediction and uncertainty quantification of tumor progression are vital in clinical practice, i. e. , disease prognosis and decision-making on treatment strategies. In this work, we propose TGM-Nets, a deep learning framework that combines bioimaging and tumor growth modeling (TGM) for enhanced prediction of tumor growth. This proposed framework, developed based on physics-informed neural networks (PINNs), is capable of integrating the TGM and sequential observations of tumor morphology for patient-specific prediction of tumor growth. The novelties of the design of TGM-Nets include the employment of Fourier layers to extract the features of the input images as well as the utilization of sequential learning and fine-tuning with physics for extrapolation to improve the prediction accuracy. The validity of TGM-Nets for tumor growth forecasting is verified by testing the model performance on synthetic and in-vitro datasets, respectively. Our results show that the TGM-Nets not only can track the growth rates of the mild and aggressive tumors but also capture their detailed morphological features within and outside the training domain. In particular, TGM-Nets can be used to predict the long time dynamics of tumor growth in mild and aggressive cases. Our results show that the parameters inferred from the TGM-Nets can be used for long-time prediction for up to 4 months with a maximum error of ∼ 4%. We also systematically study the effects of the number of training points and noisy data on the performance of TGM-Nets as well as quantify the uncertainty of the model predictions. We show that TGM-Nets can integrate the biomedical images to predict the growth of the in-vitro cultured pancreatic cancer cells and identify the associated growth rates, demonstrating the possibilities of using TGM-Nets in clinical practice. In summary, we propose a new deep learning model that combines imaging and TGM to improve the current approaches for predicting tumor growth and thus provide an advanced computational tool for patient-specific tumor prognosis.

TIST Journal 2022 Journal Article

Deep Reinforcement Learning-based Trajectory Pricing on Ride-hailing Platforms

Jianbin Huang
Longji Huang
Meijuan Liu
He Li
Qinglin Tan
Xiaoke Ma
Jiangtao Cui
De-Shuang Huang

Dynamic pricing plays an important role in solving the problems such as traffic load reduction, congestion control, and revenue improvement. Efficient dynamic pricing strategies can increase capacity utilization, total revenue of service providers, and the satisfaction of both passengers and drivers. Many proposed dynamic pricing technologies focus on short-term optimization and face poor scalability in modeling long-term goals for the limitations of solution optimality and prohibitive computation. In this article, a deep reinforcement learning framework is proposed to tackle the dynamic pricing problem for ride-hailing platforms. A soft actor-critic (SAC) algorithm is adopted in the reinforcement learning framework. First, the dynamic pricing problem is translated into a Markov Decision Process (MDP) and is set up in continuous action spaces, which is no need for the discretization of action space. Then, a new reward function is obtained by the order response rate and the KL-divergence between supply distribution and demand distribution. Experiments and case studies demonstrate that the proposed method outperforms the baselines in terms of order response rate and total revenue.

TIST Journal 2022 Journal Article

Deep Spatio-temporal Adaptive 3D Convolutional Neural Networks for Traffic Flow Prediction

He Li
Xuejiao Li
Liangcai Su
Duo Jin
Jianbin Huang
Deshuang Huang

Traffic flow prediction is the upstream problem of path planning, intelligent transportation system, and other tasks. Many studies have been carried out on the traffic flow prediction of the spatio-temporal network, but the effects of spatio-temporal flexibility (historical data of the same type of time intervals in the same location will change flexibly) and spatio-temporal correlation (different road conditions have different effects at different times) have not been considered at the same time. We propose the Deep Spatio-temporal Adaptive 3D Convolution Neural Network (ST-A3DNet), which is a new scheme to solve both spatio-temporal correlation and flexibility, and consider spatio-temporal complexity (complex external factors, such as weather and holidays). Different from other traffic forecasting models, ST-A3DNet captures the spatio-temporal relationship at the same time through the Adaptive 3D convolution module, assigns different weights flexibly according to the influence of historical data, and obtains the impact of external factors on the flow through the ex-mask module. Considering the holidays and weather conditions, we train our model for experiments in Xi’an and Chengdu. We evaluate the ST-A3DNet and the results show that we have better results than the other 11 baselines.

EAAI Journal 2022 Journal Article

Hierarchical pyramid attentive network with spatial separable convolution for crowd counting

Shihui Zhang
Xiaoxiao Zhang
He Li
Huan He
Dandan Song
Lei Wang

To tackle the challenging scale variation issue of the crowd counting task so as to improve the counting accuracy, we present a novel method based on Hierarchical Pyramid Attentive Network (HPANet) for crowd counting. Specifically, a Scale-aware Pyramid Attentive (SPA) block, extracting the rich multi-scale context, is designed elaborately as using the two-branch spatial separable convolution as its core component to replace the conventional pure convolution with larger kernel size to reduce the computation, as well as adopting a self-attention operation for the spatial feature aggregation. In order to further learn the scale-aware feature representation well from the input image, we stack the designed SPA block in a hierarchical way and fuse their features flexibly as another crucial module of the proposed HPANet, the Hierarchical Feature Fusion (HFF) module. Combining the designed SPA block and HFF module, the developed HPANet could remedy the scale variation issue and thus improve the counting performance with the mighty scale-aware feature representation. The performance of the HPANet is evaluated on four public available benchmark datasets in this paper, including ShanghaiTech, Mall, Beijing BRT and UCF-QNRF. Extensive experimental results on benchmarks demonstrate that the proposed HPANet could have an effective performance for crowd counting and the ablation experimental results validate the effectiveness of the components of HPANet on the counting task. The designed HPANet could realize a preferable counting performance in view of alleviating the scale variation issue, without the cost of introducing too much additional parameters for the multi-column structure.

IROS Conference 2021 Conference Paper

Decentralized, Unlabeled Multi-Agent Navigation in Obstacle-Rich Environments using Graph Neural Networks

Xuebo Ji
He Li
Zherong Pan
Xifeng Gao
Changhe Tu

We propose a decentralized, learning-based solution to the challenging problem of unlabeled multi-agent navigation among obstacles, where robots need to simultaneously tackle the problems of goal assignment, local collision avoidance, and navigation. Our method has each robot infer their desired action by communicating with each other as well as a set of position-fixed routers. The inference is carried out on a graph neural network (GNN) with both robot and router nodes. We train our GNN using imitation learning on a small group of robots, where we modify the centralized version of the concurrent goal assignment and planning algorithm (CAPT) as our expert. By sharing weights among all robots and routers, our model can scale to unseen environments with any number of possibly kinodynamic agents during test time. We have achieved a success rate of 91. 2% and 85. 6% for point and car-like robots, respectively. Source code will be publicly available upon the publication of the work.

YNIMG Journal 2021 Journal Article

Relationship between the disrupted topological efficiency of the structural brain connectome and glucose hypometabolism in normal aging

Qiuhui Bi
Wenxiao Wang
Na Niu
He Li
Yezhou Wang
Weijie Huang
Kewei Chen
Kai Xu

Normal aging is accompanied by structural degeneration and glucose hypometabolism in the human brain. However, the relationship between structural network disconnections and hypometabolism in normal aging remains largely unknown. In the present study, by combining MRI and PET techniques, we investigated the metabolic mechanism of the structural brain connectome and its relationship with normal aging in a cross-sectional, community-based cohort of 42 cognitively normal elderly individuals aged 57–84 years. The structural connectome was constructed based on diffusion MRI tractography, and the network efficiency metrics were quantified using graph theory analyses. FDG-PET scanning was performed to evaluate the glucose metabolic level in the cortical regions of the individuals. The results of this study demonstrated that both network efficiency and cortical metabolism decrease with age (both p < 0. 05). In the subregions of the bilateral thalamus, significant correlations between nodal efficiency and cortical metabolism could be observed across subjects. Individual-level analyses indicated that brain regions with higher nodal efficiency tend to exhibit higher metabolic levels, implying a tight coupling between nodal efficiency and glucose metabolism (r = 0. 56, p = 1. 15 × 10−21). Moreover, efficiency-metabolism coupling coefficient significantly increased with age (r = 0. 44, p = 0. 0046). Finally, the main findings were also reproducible in the ADNI dataset. Together, our results demonstrate a close coupling between structural brain connectivity and cortical metabolism in normal elderly individuals and provide new insight that improve the present understanding of the metabolic mechanisms of structural brain disconnections in normal aging.

TCS Journal 2021 Journal Article

The g-component connectivity of graphs

He Li
Shumin Zhang
Chengfu Ye
Shuming Zhou

Connectivity is a classic metric to evaluate reliability of multiprocessor system under the circumstances of processor failures. Based on connectivity, more refined quantitative indicators for fault tolerance of multiprocessor system have been extensively explored. The g-component connectivity of a graph G, denoted by c κ g ( G ), is the minimum number of vertices whose removal from G results in a disconnected graph with at least g-components. So far, the values of the g-component (edge) connectivity of special networks with small g have been extensively investigated. For general graphs, the results of the g-component connectivity are very few. In this paper, we propose some lower and upper bounds for the g-component connectivity along with their sharpness, and then suggest some characterization of trees and general graphs with given g-component connectivity. Furthermore, we fix some related extremal problems.

YNIMG Journal 2014 Journal Article

Fiber connectivity between the striatum and cortical and subcortical regions is associated with temperaments in Chinese males

Xuemei Lei
Chuansheng Chen
Feng Xue
Qinghua He
Chunhui Chen
Qi Liu
Robert K. Moyzis
Gui Xue

The seven-factor biopsychosocial model of personality distinguished four biologically based temperaments and three psychosocially based characters. Previous studies have suggested that the four temperaments—novelty seeking (NS), reward dependence (RD), harm avoidance (HA), and persistence (P)—have their respective neurobiological correlates, especially in the striatum-connected subcortical and cortical networks. However, few studies have investigated their neurobiological basis in the form of fiber connectivity between brain regions. This study correlated temperaments with fiber connectivity between the striatum and subcortical and cortical hub regions in a sample of 50 Chinese adult males. Generally consistent with our hypotheses, results showed that: (1) NS was positively correlated with fiber connectivity from the medial and lateral orbitofrontal cortex (mOFC, lOFC) and amygdala to the striatum; (2) RD was positively correlated with fiber connectivity from the mOFC, posterior cingulate cortex/retrosplenial cortex (PCC), hippocampus, and amygdala to the striatum; (3) HA was positively linked to fiber connectivity from the dorsolateral prefrontal cortex (dlPFC) and PCC to the striatum; and (4) P was positively linked to fiber connectivity from the mOFC to the striatum. These results extended the research on the neurobiological basis of temperaments by identifying their anatomical fiber connectivity correlates within the subcortical–cortical neural networks.

AAAI Conference 2012 Conference Paper

Discovering Spammers in Social Networks

Yin Zhu
Xiao Wang
Erheng Zhong
Nathan Liu
He Li
Qiang Yang

As the popularity of the social media increases, as evidenced in Twitter, Facebook and China’s Renren, spamming activities also picked up in numbers and variety. On social network sites, spammers often disguise themselves by creating fake accounts and hijacking normal users’ accounts for personal gains. Different from the spammers in traditional systems such as SMS and email, spammers in social media behave like normal users and they continue to change their spamming strategies to fool anti-spamming systems. However, due to the privacy and resource concerns, many social media websites cannot fully monitor all the contents of users, making many of the previous approaches, such as topology-based and content-classiﬁcation-based methods, infeasible to use. In this paper, we propose a Supervised Matrix Factorization method with Social Regularization (SMFSR) for spammer detection in social networks that exploits both social activities as well as users’ social relations in an innovative and highly scalable manner. The proposed method detects spammers collectively based on users’ social actions and social relations. We have empirically tested our method on data from Renren. com, which is one of the largest social networks in China, and demonstrated that our new method can improve the detection performance signiﬁcantly.