Author name cluster

Yang Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

375 papers

2 author rows

EAAI Journal 2026 Journal Article

A comprehensive survey on table question answering: Datasets, methods and future directions

Weiqiang Xu
Yang Liu
Lingfeng Lu
Huakang Li
Guozi Sun

In the data-centric era of Industry 4. 0, a vast amount of critical information — ranging from machining sensor logs and material synthesis recipes to financial statements — is stored in structured tabular formats. Table Question Answering (TableQA) aims to automatically interpret tabular data and provide precise answers to natural language (NL) queries, serving as a vital interface for intelligent engineering systems. This survey provides a comprehensive review of the table question answering landscape, bridging theoretical advances with practical engineering applications. Methodologically, we propose a unified taxonomy that categorizes datasets and approaches into table-only and non-table-only paradigms. We systematically trace the technical evolution from early rule-based semantic parsing and pre-trained language models (PLMs) to recent large language models (LLMs), highlighting innovations in numerical reasoning and cross-modal alignment. From an engineering perspective, we critically evaluate how these techniques are applied to solve domain-specific challenges, such as predictive maintenance in smart manufacturing, property extraction in material informatics, and decision support in business intelligence. Furthermore, going beyond academic benchmarks, we analyze pressing constraints for industrial deployment, including real-time inference latency, system reliability, and verification in safety-critical environments. Finally, we outline future research directions for building robust, verifiable, and computationally efficient table question answering systems across various industrial domains.

Details DOI

EAAI Journal 2026 Journal Article

A novel domain generalization framework for fault diagnosis of rotating machinery based on causal representation learning and causal feature identification

Yang Liu
Weidong Cheng
Weigang Wen
Qiqiang Fang
Hengshan Wu

Rotating machinery in engineering applications often operates under complex working conditions, leading to distribution shifts in monitoring data that degrade the performance of traditional intelligent diagnosis models. Domain generalization (DG) has emerged as a pivotal technology for deploying intelligent diagnosis methods in practical engineering. Recently, causal representation learning has attracted significant attention in DG research. However, the causality of the features obtained by existing methods has not been further tested, resulting in diagnosis models failing to achieve optimal performance. To address this challenge, this study further tests the causality of features on the basis of causal representation learning and proposes a novel DG framework for fault diagnosis. Specifically, a causal analysis of vibration signal generation and feature extraction processes is conducted, and a structure causal model (SCM) is established. Based on the SCM, a pretraining model is designed for causal representation learning. Furthermore, a causality test algorithm is proposed for causal feature identification. Finally, a three-stage DG framework is constructed by integrating active learning (pretraining model) with objective testing (causality test algorithm). The superiority of the proposed method is verified on five datasets, including bearings and gears. The proposed method demonstrates exceptional performance in DG for rotating machinery fault diagnosis, while guiding model optimization and engineering deployment, indicating its broad application prospects in real-world engineering practices.

Details DOI

AIIM Journal 2026 Journal Article

A novel ECG QRS complex detection algorithm based on dynamic Bayesian network

Qince Li
Yang Liu
Na Zhao
Yongfeng Yuan
Runnan He

Accurate detection of the QRS complex, a crucial reference for heartbeat localization in electrocardiogram (ECG) signals, remains inadequate in wearable ECG devices due to complex noise interference. In this study, we propose a novel QRS complex detection method based on dynamic Bayesian network (DBN), integrating the probability distribution of RR intervals. Unlike methods focusing solely on ECG waveforms, our approach explicitly integrates ECG waveform and heart rhythm information into a unified probability model, enhancing noise robustness. Additionally, an unsupervised parameter optimization using expectation maximization (EM) adapts to individual differences of patients. Furthermore, several simplification strategies improve reasoning efficiency, and an online detection mode enables real-time applications. Our method outperforms other state-of-the-art QRS detection methods, including deep learning (DL) methods, on noisy datasets. In conclusion, the proposed DBN-based QRS detection algorithm demonstrates outstanding accuracy, noise robustness, generalization ability, real-time capability, and strong scalability, indicating its potential application in wearable ECG devices.

Details DOI

AAAI Conference 2026 Conference Paper

Addressing Polarization and Unfairness in Performative Prediction

Kun Jin
Tian Xie
Yang Liu
Xueru Zhang

In many real-world applications of machine learning—such as recommendations, hiring, and lending—deployed models influence the data they are trained on, leading to feedback loops between predictions and data distribution. The performative prediction (PP) framework captures this phenomenon by modeling the data distribution as a function of the deployed model. While prior work has focused on finding performative stable (PS) solutions for robustness, their societal impacts, particularly regarding fairness, remain underexplored. We show that PS solutions can lead to severe polarization and prediction performance disparities, and that conventional fairness interventions in previous works often fail under model-dependent distribution shifts due to failing the PS criteria. To address these challenges in PP, we introduce novel fairness mechanisms that provably ensure both stability and fairness, validated by theoretical analysis and empirical results.

PDF Details DOI

EAAI Journal 2026 Journal Article

Automatic classification of circulating blood cell clusters based on multi-channel flow cytometry imaging

Suqiang Ma
Subhadeep Sengupta
Yao Lee
Beikang Gu
Xianyan Chen
Xianqiao Wang
Yang Liu
Mengjia Xu

Circulating blood cell clusters (CCCs) containing red blood cells (RBCs), white blood cells (WBCs), and platelets are significant biomarkers linked to pathological conditions like thrombosis, infection, and inflammation. Flow cytometry, paired with fluorescence staining, is commonly used to analyze these cell clusters, revealing cell morphology and protein profiles. While computational approaches based on machine learning have advanced the automatic analysis of single-cell flow cytometry images, there is a lack of effort to build tools to automatically analyze images containing CCCs. Unlike single cells, cell clusters exhibit irregular shapes and sizes. In addition, these cell clusters often consist of heterogeneous cell types, which require multi-channel staining to identify the specific cell types within the clusters. To address these challenges, we introduce a new computational framework for analyzing CCC images and identifying cell types within clusters. Our framework uses a two-step analysis strategy. First, it categorizes images into cell cluster and non-cluster groups by fine-tuning the You Only Look Once (YOLOv11) model, which outperforms traditional convolutional neural networks (CNNs), such as Vision Transformers (ViT). Then, it identifies cell types by overlaying cluster contours with regions from multi-channel fluorescence stains, thereby minimizing the impact of cell debris and staining artifacts. This approach achieved over 95% accuracy in both cluster classification and cell phenotype identification. In summary, our automated framework effectively analyzes CCC images from flow cytometry, leveraging both bright-field and fluorescence data. Initially tested on blood cells, it holds potential for broader applications, such as analyzing immune and tumor cell clusters, supporting cellular research across various diseases.

Details DOI

AAAI Conference 2026 Conference Paper

Controllable Financial Market Generation with Diffusion Guided Meta Agent

Yu-Hao Huang
Chang Xu
Yang Liu
Weiqing Liu
Wu-Jun Li
Jiang Bian

Generative modeling has transformed many fields, such as language and visual modeling, while its application in financial markets remains under-explored. As the minimal unit within a financial market is an order, order-flow modeling represents a fundamental generative financial task. However, current approaches often yield unsatisfactory fidelity in generating order flow, and their generation lacks controllability, thereby limiting their practical applications. In this paper, we formulate the challenge of controllable financial market generation, and propose a Diffusion Guided Meta Agent (DigMA) model to address it. Specifically, we employ a conditional diffusion model to capture the dynamics of the market state represented by time-evolving distribution parameters of the mid-price return rate and the order arrival rate, and we define a meta agent with financial economic priors to generate orders from the corresponding distributions. Extensive experimental results show that DigMA achieves superior controllability and generation fidelity. Moreover, we validate its effectiveness as a generative environment for downstream high-frequency trading tasks and its computational efficiency.

PDF Details DOI

AIIM Journal 2026 Journal Article

Development and validation of deep continual learning model to sequentially learn multiple clinical prediction tasks for ICU patients

Zhixuan Zeng
Yang Liu
Shuo Yao
Xu Cai
Wenbin Nan
Yiyang Xie
Xun Gong

Background ICU patients often suffer from critical and complex condition, and multiple potential risks should be monitored to provide them comprehensive care. However, no study proposes continual learning (CL) model that can effectively solve multiple clinical prediction tasks without catastrophic forgetting. This study proposes three deep CL models for ICU patients. Methods Three public ICU databases were employed. The included patients from MIMIC-III and MIMIC-IV were divided into eight task sets, and the patients from eICU-CRD composed the test set. We propose three CL models (CL_1, CL_2, CL_3) to sequentially learn eight prediction tasks on the eight task sets, and then externally validate them on the test set. We compare our models to three representative baseline CL models and the single-task (ST) and multi-task (MT) model. We train all the CL models under different orders, and evaluate their prediction performance by multiple metrics and their memory ability by backward transfer (BWT). We also analyzed the effect of previously learned tasks on learning new tasks. Results Our three CL models had comparable or slightly weaker performance compared to ST and MT model on the eight tasks. They effectively mitigated catastrophic forgetting, and their performance is robust to different training orders. CL_2 and CL_3 even have improved performance on the current task after learning some previous tasks. Our three CL models outperformed the baseline CL models in most experiments. Conclusions Our CL models are promising to sequentially learn multiple clinical prediction tasks for ICU patients. The CL_2 and CL_3 show the ability of utilizing information of previous tasks to improve learning new tasks. More new datasets and tasks are still needed to further verify the validity of the CL models.

Details DOI

EAAI Journal 2026 Journal Article

Exploiting implicit knowledge for streaming perception object detection

Qingsong Tang
Jinting Guo
Xuexiao Zhou
Yongkang Li
Mingzhi Yang
Yang Liu

Stream perception is a more challenging task than offline perception. Existing methods perform stream perception object detection by endowing real-time detectors with the ability to predict the future. The difficulty of such methods mainly lies in perceiving complex and changing video background environments, as well as varying object speeds. In this context, we propose a real-time object detection model that utilizes implicit knowledge to enhance features. First, we use a channel implicit knowledge module to perform early fine-tuning on Argoverse-High Definition (Argoverse-HD). This allows the model to perceive the background environment and obtain rich positional features. Then, we use a spatial implicit knowledge module to refine the movement speed features of objects. These refined features are integrated with position features for final fine-tuning. In the final fine-tuning stage, we further weight the original dynamic top- k label assignment strategy to measure the importance of positive samples. Through this weighting, we aim to obtain finer-grained object localization. Our model achieves 37. 8% streaming Average Precision (sAP) on Argoverse-HD ( + 0. 9 % over baseline) with merely 0. 01G additional Floating Point Operations (FLOPs) and a latency increase of less than 3 millisecond (ms). Code is available on https: //github. com/GjtZ/ISYOLO. git.

Details DOI

AAAI Conference 2026 Conference Paper

Graph VQ-Transformer (GVT): Fast and Accurate Molecular Generation via High-Fidelity Discrete Latents

Haozhuo Zheng
Cheng Wang
Yang Liu

The de novo generation of molecules with desirable properties is a critical challenge, where diffusion models are computationally intensive and autoregressive models struggle with error propagation. In this work, we introduce the Graph VQ-Transformer (GVT), a two-stage generative framework that achieves both high accuracy and efficiency. The core of our approach is a novel Graph Vector Quantized Variational Autoencoder (VQ-VAE) that compresses molecular graphs into high-fidelity discrete latent sequences. By synergistically combining a Graph Transformer with canonical Reverse Cuthill-McKee (RCM) node ordering and Rotary Positional Embeddings (RoPE), our VQ-VAE achieves near-perfect reconstruction rates. An autoregressive Transformer is then trained on these discrete latents, effectively converting graph generation into a well-structured sequence modeling problem. Crucially, this mapping of complex graphs to high-fidelity discrete sequences bridges molecular design with the powerful paradigm of large-scale sequence modeling, unlocking potential synergies with Large Language Models (LLMs). Extensive experiments show that GVT achieves state-of-the-art or highly competitive performance across major benchmarks like ZINC250k, MOSES, and GuacaMol, and notably outperforms leading diffusion models on key distribution similarity metrics such as FCD and KL Divergence. With its superior performance, efficiency, and architectural novelty, GVT not only presents a compelling alternative to diffusion models but also establishes a strong new baseline for the field, paving the way for future research in discrete latent-space molecular generation.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment Through Latent Acoustic Pattern Triggers

Liang Lin
Miao Yu
Kaiwen Luo
Yibo Zhang
Lilan Peng
Dexian Wang
Xuehai Tang
Yuanhe Zhang

As Audio Large Language Models (ALLMs) emerge as powerful tools for speech processing, their safety implications demand urgent attention. While considerable research has explored textual and vision safety, audio’s distinct characteristics present significant challenges. This paper first investigates: Is ALLM vulnerable to backdoor attacks exploiting acoustic triggers? In response to this issue, we introduce Hidden in the Noise (HIN), a novel backdoor attack framework designed to exploit subtle, audio-specific features. HIN applies acoustic modifications to raw audio waveforms, such as alterations to temporal dynamics and strategic injection of spectrally tailored noise. These changes introduce consistent patterns that an ALLM’s acoustic feature encoder captures, embedding robust triggers within the audio stream. To evaluate ALLM robustness against audio-feature-based triggers, we develop the AudioSafe benchmark, assessing nine distinct risk types. Extensive experiments on AudioSafe and three established safety datasets reveal critical vulnerabilities in existing ALLMs: (I) audio features like environment noise and speech rate variations achieve over 90% average attack success rate, (II) ALLMs exhibit significant sensitivity differences across acoustic features, particularly showing minimal response to volume as a trigger, and (III) poisoned sample inclusion causes only marginal loss curve fluctuations, highlighting the attack’s stealth.

PDF Details DOI

AAAI Conference 2026 Conference Paper

I2E: Real-Time Image-to-Event Conversion for High-Performance Spiking Neural Networks

Ruichen Ma
Liwei Meng
Guanchao Qiao
Ning Ning
Yang Liu
Shaogang Hu

Spiking neural networks (SNNs) promise highly energy-efficient computing, but their adoption is hindered by a critical scarcity of event-stream data. This work introduces I2E, an algorithmic framework that resolves this bottleneck by converting static images into high-fidelity event streams. By simulating microsaccadic eye movements with a highly parallelized convolution, I2E achieves a conversion speed over 300x faster than prior methods, uniquely enabling on-the-fly data augmentation for SNN training. The framework's effectiveness is demonstrated on large-scale benchmarks. An SNN trained on the generated I2E-ImageNet dataset achieves a state-of-the-art accuracy of 60.50%. Critically, this work establishes a powerful sim-to-real paradigm where pre-training on synthetic I2E data and fine-tuning on the real-world CIFAR10-DVS dataset yields an unprecedented accuracy of 92.5%. This result validates that synthetic event data can serve as a high-fidelity proxy for real sensor data, bridging a long-standing gap in neuromorphic engineering. By providing a scalable solution to the data problem, I2E offers a foundational toolkit for developing high-performance neuromorphic systems. The open-source algorithm and all generated datasets are provided to accelerate research in the field.

PDF Details DOI

AAAI Conference 2026 Conference Paper

ICM-Fusion: In-Context Meta-Optimized LoRA Fusion for Multi-Task Adaptation

Yihua Shao
Xiaofeng Lin
Xinwei Long
Siyu Chen
Minxi Yan
Yang Liu
Ziyang Yan
Ao Ma

Enabling multi-task adaptation in pre-trained Low-Rank Adaptation (LoRA) models is crucial for enhancing their generalization capabilities. Most existing pre-trained LoRA fusion methods decompose weight matrices, sharing similar parameters, while fusion divergent ones. However, this paradigm inevitably induces inter-weight conflicts and leads to catastrophic domain forgetting. While incremental learning enables adaptation to multiple tasks, it struggles to achieve generalization in few-shot scenarios. Consequently, when the weight data follows a long-tailed distribution, it can lead to forgetting in the fused weights. To address this issue, we propose In-Context Meta LoRA Fusion (ICM-Fusion), a novel framework that synergizes meta-learning with in-context adaptation. The key innovation lies in our task vector arithmetic, which dynamically balances conflicting optimization directions across domains through learned manifold projections. ICM-Fusion obtains the optimal task vector orientation for the fused model in the latent space by adjusting the orientation of the task vectors. Subsequently, the fused LoRA is reconstructed by a self-designed Fusion VAE (F-VAE) to realize multi-task LoRA generation. We have conducted extensive experiments on visual and linguistic tasks, and the experimental results demonstrate that ICM-Fusion can be adapted to a wide range of architectural models and applied to various tasks. Compared to the current pre-trained LoRA fusion method, ICM-Fusion fused LoRA can significantly reduce the multi-tasking loss and can even achieve task enhancement in few-shot scenarios.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-Identification

Zhiqi Pang
Lingling Zhao
Yang Liu
Chunyu Wang
Gaurav Sharma

We propose unsupervised multi-scenario (UMS) person re-identification (ReID) as a new task that expands ReID across diverse scenarios (cross-resolution, clothing change, etc.) within a single coherent framework. To tackle UMS-ReID, we introduce image-text knowledge modeling (ITKM) -- a three-stage framework that effectively exploits the representational power of vision-language models. We start with a pre-trained CLIP model with an image encoder and a text encoder. In Stage I, we introduce a scenario embedding in the image encoder and fine-tune the encoder to adaptively leverage knowledge from multiple scenarios. In Stage II, we optimize a set of learned text embeddings to associate with pseudo-labels from Stage I and introduce a multi-scenario separation loss to increase the divergence between inter-scenario text representations. In Stage III, we first introduce cluster-level and instance-level heterogeneous matching modules to obtain reliable heterogeneous positive pairs (e.g., a visible image and an infrared image of the same person) within each scenario. Next, we propose a dynamic text representation update strategy to maintain consistency between text and image supervision signals. Experimental results across multiple scenarios demonstrate the superiority and generalizability of ITKM; it not only outperforms existing scenario-specific methods but also enhances overall performance by integrating knowledge from multiple scenarios.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Improving Sustainability of Adversarial Examples in Class-Incremental Learning

Taifeng Liu
Xinjing Liu
Liangqiu Dong
Yang Liu
Yilong Yang
Zhuo Ma

Current adversarial examples (AEs) are typically designed for static models. However, with the wide application of Class-Incremental Learning (CIL), models are no longer static and need to be updated with new data distributed and labeled differently from the old ones. As a result, existing AEs often fail after CIL updates due to significant domain drift. In this paper, we propose SAE to enhance the sustainability of AEs against CIL. The core idea of SAE is to enhance the robustness of AE semantics against domain drift by making them more similar to the target class while distinguishing them from all other classes. Achieving this is challenging, as relying solely on the initial CIL model to optimize AE semantics often leads to overfitting. To resolve the problem, we propose a Semantic Correction Module. This module encourages the AE semantics to be generalized, based on a generative model capable of producing universal semantics. Additionally, it incorporates the CIL model to correct the optimization direction of the AE semantics, guiding them closer to the target class. To further reduce fluctuations in AE semantics, we propose a Filtering-and-Augmentation Module, which first identifies non-target examples with target-class semantics in the latent space and then augments them to foster more stable semantics. Comprehensive experiments demonstrate that SAE outperforms baselines by an average of 31.28% when updated with a 9-fold increase in the number of classes.

PDF Details DOI

EAAI Journal 2026 Journal Article

In-situ three-dimensional profilometry in high-temperature environment via laser structured light with conditional generative adversarial network-based adaptive speckle denoising

Hongsen Wang
Fujia Liu
Jianhong Yang
Yuxuan Jiang
Chaoyang Duan
Yang Liu

The laser structured light method has been proven effective for in-situ three-dimensional profilometry of high-temperature object in various industrial and aerospace applications. However, during the measurement process, laser speckle noise inevitably occurs, which degrades the measurement accuracy. Owing to the specific requirements for restoring fringe structures and edges, the coupled effects of various types of noise in high-temperature environments, and the inherent difficulty of acquiring training data under such conditions, previous speckle suppression methods exhibit limited performance. Therefore, an adaptive speckle denoising conditional generative adversarial network is proposed in this paper to improve the measurement accuracy of laser structured light method. Specifically, a multi-scale adaptive filtering module is introduced in the generator to reduce composite noise and a scale adaptive network is designed in the discriminator to provide supervision from pixel details to global structure. Moreover, a laser/light-emitting diode multiplexing optical path by the laser structured light measurement system is designed to improve the quality of training dataset. Experiments on in situ measurement of the contour of tungsten alloy sample blocks was carried out in an electron beam heating environment at 2015 °C, with an average error of 0. 12 mm, representing a 60. 1 percent improvement compared with existing methods, thereby demonstrating the advantages of the proposed method in high-temperature environments.

Details DOI

EAAI Journal 2026 Journal Article

Knowledge-data driven digital twin platform: intelligent prediction and control of tunnel face stability during large-diameter slurry shield construction

Xianguo Wu
Yu Lei
Feiming Su
Tiejun Li
Yang Liu

Large diameter shield construction (LDSS) safety is particularly important. To ensure safe excavation in LDSS projects, this paper proposed a digital twin (DT) platform integrated with knowledge-data driven method to achieve prediction and control of tunnel face stability (TFS) during LDSS construction. The DT platform enables acquisition of physical data and expert knowledge, facilitating bidirectional synchronization between physical entities and virtual counterparts. DT platform utilizes Bayesian Optimization (BO), Graph Convolutional Network (GCN), Bidirectional Long Short-Term Memory (BiLSTM) networks, and SHapley Additive exPlanations (SHAP), driven by physical data, together with expert knowledge, to achieve knowledge-data driven prediction and control of Tunnel Face Stability (TFS) during LDSS construction. A case study of Wuhan Metro Line 12 construction demonstrates the platform's effectiveness, with key findings revealing: (1) The DT platform achieves accurate TFS prediction across six Geological types, showing average R2 values of 0. 935 and root mean square error (RMSE) of 0. 239. (2) Key construction parameters are identified through the knowledge-data driven method, including air chamber pressure, grouting pressure, advance rate, cutterhead rotation speed, and cutterhead torque. (3) The DT platform enables control of the TFS during LDSS construction by maintaining the slurry pressure (SP) within an optimal range. The DT platform developed in this study fills TFS research gaps in LDSS construction and provides valuable references for similar complex projects.

Details DOI

AAAI Conference 2026 Conference Paper

MAGIC: Mastering Physical Adversarial Generation in Context Through Collaborative LLM Agents

Yun Xing
Nhat Chung
Jie Zhang
Yue Cao
Ivor Tsang
Yang Liu
Lei Ma
Qing Guo

Physical adversarial attacks in driving scenarios can expose critical vulnerabilities in visual perception models. However, developing such attacks remains non-trivial due to diverse real-world environmental influences. Existing approaches either struggle to generalize to dynamic environments or fail to achieve consistent physical attack performance. To address these challenges, we propose MAGIC (Mastering Physical Adversarial Generation In Context), a novel framework powered by multi-modal LLM agents to automatically understand the scene context during testing time and generate adversarial patches through synergistic interaction of language and vision understanding. Specifically, MAGIC orchestrates three specialized LLM agents: the adv-patch generation agent masters the creation of deceptive patches via strategic prompt manipulation for text-to-image models; the adv-patch deployment agent ensures contextual coherence by determining optimal deployment strategies based on scene understanding; and the self-examination agent completes this trilogy by providing critical oversight and iterative refinement of both processes. We validate our approach with both digital and physical scenarios, i.e., nuImage and real-world scenes, where both statistical and visual results demonstrate that our MAGIC is powerful and effective for attacking widely applied object detection systems, such as YOLO and DETR series.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Modeling Trend Dynamics with Variational Neural ODEs for Information Popularity Prediction

Yuchen Wang
Dongpeng Hou
Weikai Jing
Chao Gao
Xianghua Li
Yang Liu

Predicting the future popularity of information in online social networks is a crucial yet challenging task, due to the complex spatiotemporal dynamics underlying information diffusion. Existing methods typically use structural or sequential patterns within the observation window as direct inputs for subsequent popularity prediction. However, most approaches lack the ability to explicitly model the overall trend of popularity up to the prediction time, which leads to limited predictive capability. To address these limitations, we propose VNOIP, a novel method based on variational neural Ordinary Differential Equations (ODEs) for information popularity prediction. Specifically, VNOIP introduces bidirectional jump ODEs with attention mechanisms to capture long-range dependencies and bidirectional context within cascade sequences. Furthermore, by jointly considering both cascade patterns and overall trend temporal patterns, VNOIP explicitly models the continuous-time dynamics of popularity trend trajectories with variational neural ODEs. Additionally, a knowledge distillation loss is employed to align the evolution of prior and posterior latent variables. Extensive experiments on real-world datasets demonstrate that VNOIP is highly competitive in both prediction accuracy and efficiency compared to state-of-the-art baselines.

PDF Details DOI

AAAI Conference 2026 Conference Paper

On the Alignment of Large Language Models with Global Human Opinion

Yang Liu
Masahiro Kaneko
Chenhui Chu

Today's large language models (LLMs) are capable of supporting multilingual scenarios, allowing users to interact with LLMs in their native languages. When LLMs respond to subjective questions posed by users, they are expected to align with the views of specific demographic groups or historical periods, shaped by the language in which the user interacts with the model. Existing studies mainly focus on researching the opinions represented by LLMs among demographic groups in the United States or a few countries, lacking worldwide country samples and studies on human opinions in different historical periods, as well as lacking discussion on using language to steer LLMs. Moreover, they also overlook the potential influence of prompt language on the alignment of LLMs' opinions. In this study, our goal is to fill these gaps. To this end, we create an evaluation framework based on the World Values Survey (WVS) to systematically assess the alignment of LLMs with human opinions across different countries, languages, and historical periods around the world. We find that LLMs appropriately or over-align the opinions with only a few countries while under-aligning the opinions with most countries. Furthermore, changing the language of the prompt to match the language used in the questionnaire can effectively steer LLMs to align with the opinions of the corresponding country more effectively than existing steering methods. At the same time, LLMs are more aligned with the opinions of the contemporary population. To our knowledge, our study is the first comprehensive investigation of the topic of opinion alignment in LLMs across global, language, and temporal dimensions.

PDF Details DOI

AAAI Conference 2026 Conference Paper

PCSR: Pseudo-label Consistency-Guided Sample Refinement for Noisy Correspondence Learning

Zhuoyao Liu
Yang Liu
Wentao Feng
Shudong Huang

Cross-modal retrieval aims to align different modalities via semantic similarity. However, existing methods often assume that image-text pairs are perfectly aligned, overlooking Noisy Correspondences in real data. These misaligned pairs misguide similarity learning and degrade retrieval performance. Previous methods often rely on coarse-grained categorizations that simply divide data into clean and noisy samples, overlooking the intrinsic diversity within noisy instances. Moreover, they typically apply uniform training strategies regardless of sample characteristics, resulting in suboptimal sample utilization for model optimization. To address the above challenges, we introduce a novel framework, called Pseudo-label Consistency-Guided Sample Refinement (PCSR), which enhances correspondence reliability by explicitly dividing samples based on pseudo-label consistency. Specifically, we first employ a confidence-based estimation to distinguish clean and noisy pairs, then refine the noisy pairs via pseudo-label consistency to uncover structurally distinct subsets. We further proposed a Pseudo-label Consistency Score (PCS) to quantify prediction stability, enabling the separation of ambiguous and refinable samples within noisy pairs. Accordingly, we adopt Adaptive Pair Optimization (APO), where ambiguous samples are optimized with robust loss functions and refinable ones are enhanced via text replacement during training. Extensive experiments on CC152K, MS-COCO and Flickr30K validate the effectiveness of our method in improving retrieval robustness under noisy supervision.

PDF Details DOI

AAAI Conference 2026 Conference Paper

PhysPatch: A Physically Realizable and Transferable Adversarial Patch Attack for Multimodal Large Language Models-based Autonomous Driving Systems

Qi Guo
Xiaojun Jia
Shanmin Pang
Simeng Qin
Lin Wang
Ju Jia
Yang Liu
Qing Guo

Multimodal Large Language Models (MLLMs) are becoming integral to autonomous driving (AD) systems due to their strong vision-language reasoning capabilities. However, MLLMs are vulnerable to adversarial attacks—particularly adversarial patch attacks—which can pose serious threats in real-world scenarios. Existing patch-based attack methods are primarily designed for object detection models. Due to the more complex architectures and strong reasoning capabilities of MLLMs, these approaches perform poorly when transferred to MLLM-based systems. To address these limitations, we propose PhysPatch, a physically realizable and transferable adversarial patch framework tailored for MLLM-based AD systems. PhysPatch jointly optimizes patch location, shape, and content to enhance attack effectiveness and real-world applicability. It introduces a semantic-based mask initialization strategy for realistic placement, an SVD-based local alignment loss with patch-guided crop-resize to improve transferability, and a potential field-based mask refinement method. Extensive experiments across open-source, commercial, and reasoning-capable MLLMs demonstrate that PhysPatch significantly outperforms state-of-the-art (SOTA) methods in steering MLLM-based AD systems toward target-aligned perception and planning outputs. Moreover, PhysPatch consistently places adversarial patches in physically feasible regions of AD scenes, ensuring strong real-world applicability and deployability.

PDF Details DOI

AAAI Conference 2026 Conference Paper

ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Guided Sparse Autoencoders

Xiangyu Liu
Haodi Lei
Yi Liu
Yang Liu
Wei Hu

Sparse Autoencoder (SAE) has emerged as a powerful tool for mechanistic interpretability of large language models. Recent works apply SAE to protein language models (PLMs), aiming to extract and analyze biologically meaningful features from their latent spaces. However, SAE suffers from semantic entanglement, where individual neurons often mix multiple nonlinear concepts, making it difficult to reliably interpret or manipulate model behaviors. In this paper, we propose a semantically-guided SAE, called ProtSAE. Unlike existing SAE which requires annotation datasets to filter and interpret activations, we guide semantic disentanglement during training using both annotation datasets and domain knowledge to mitigate the effects of entangled attributes. We design interpretability experiments showing that ProtSAE learns more biologically relevant and interpretable hidden features compared to previous methods. Performance analyses further demonstrate that ProtSAE maintains high reconstruction fidelity while achieving better results in interpretable probing. We also show the potential of ProtSAE in steering PLMs for downstream generation tasks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

RABot: Reinforcement-Guided Graph Augmentation for Imbalanced and Noisy Social Bot Detection

Longlong Zhang
Xi Wang
Haotong Du
Yangyi Xu
Zhuo Liu
Yang Liu

Social bot detection is pivotal for safeguarding the integrity of online information ecosystems. Although recent graph neural network (GNN) solutions achieve strong results, they remain hindered by two practical challenges: (i) severe class imbalance arising from the high cost of generating bots, and (ii) topological noise introduced by bots that skillfully mimic human behavior and forge deceptive links. We propose the Reinforcement-guided graph Augmentation social Bot detector (RABot), a multi-granularity graph-augmentation framework that addresses both issues in a unified manner. RABot employs a neighborhood-aware oversampling strategy that linearly interpolates minority-class embeddings within local subgraphs, thereby stabilizing the decision boundary under low-resource regimes. Concurrently, a reinforcement-learning-driven edge-filtering module combines similarity-based edge features with adaptive threshold optimization to excise spurious interactions during message passing, yielding a cleaner topology. Extensive experiments on three real-world benchmarks and four GNN backbones demonstrate that RABot consistently surpasses state-of-the-art baselines. In addition, since its augmentation and filtering modules are orthogonal to the underlying architecture, RABot can be seamlessly integrated into existing GNN pipelines to boost performance with minimal overhead.

PDF Details DOI

AAAI Conference 2026 Conference Paper

ReACT: Reward-informed Autoregressive Decision CAD Transformer

Yijie Ding
Yang Liu
Haobo Jiang
Jianmin Zheng

Reconstructing precise CAD modeling sequences from point clouds remains a challenging task, especially for objects with complex geometry and topology. In this paper, by formulating the CAD sequence reconstruction as a Markov decision process, we introduce ReACT, a novel Reward-informed Autoregressive decision Cad Transformer architecture for robust CAD sequence prediction. Beyond previous imitation-only approaches, our key innovation is to frame the CAD Transformer under a reinforcement learning paradigm and thereby integrate reward-inspired heuristic learning into our architecture. This allows ReACT to effectively leverage shape-aware long-term reward feedback to guide the inference of (nearly) optimal CAD commands. Specifically, conditioned on past tokens, comprising the historical CAD states, sketch-extrude commands (i.e., actions) and associated geometric rewards, ReACT autoregressively outputs the most promising CAD commands in a causal manner. In particular, we develop a novel scaffold-aware CAD state representation that integrates global point-command features with an incrementally constructed surface point scaffold, enabling fine-grained geometric reasoning for subsequent reconstruction prediction. Moreover, an effective local barrel points-guided dense reward function is designed to jointly evaluate surface fidelity and command efficiency for reliable reward guidance. Extensive evaluations on the DeepCAD and Fusion360 benchmarks demonstrate that ReACT can achieve superior CAD reconstruction quality, even for objects with complex shapes.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Robust Learning from Noisily Labeled Long-Tailed Data via Fairness Regularizer

Jiaheng Wei
Zhaowei Zhu
Gang Niu
Tongliang Liu
Sijia Liu
Masashi Sugiyama
Yang Liu

Both long-tailed and noisily labeled data frequently appear in real-world applications and impose significant challenges for learning. Most prior works treat either problem in an isolated way and do not explicitly consider the coupling effects of the two. Our empirical observation reveals that such solutions fail to consistently improve the learning when the dataset is long-tailed with label noise. Moreover, with the presence of label noise, existing methods do not observe universal improvements across different sub-populations; in other words, some sub-populations enjoyed the benefits of improved accuracy at the cost of hurting others. Based on these observations, we introduce the Fairness Regularizer (FR), inspired by regularizing the performance gap between any two sub-populations. We show that the introduced fairness regularizer improves the performances of sub-populations on the tail and the overall learning performance. Extensive experiments demonstrate the effectiveness of the proposed solution when complemented with certain existing popular robust or class-balanced methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Stabilizing Self-Consuming Diffusion Models with Latent Space Filtering

Zhongteng Cai
Yaxuan Wang
Yang Liu
Xueru Zhang

As synthetic data proliferates across the Internet, it is often reused to train successive generations of generative models. This creates a "self-consuming loop" that can lead to training instability or *model collapse*. Common strategies to address the issue---such as accumulating historical training data or injecting fresh real data---either increase computational cost or require expensive human annotation. In this paper, we empirically analyze the latent space dynamics of self-consuming diffusion models and observe that the low-dimensional structure of latent representations extracted from synthetic data degrade over generations. Based on this insight, we propose *Latent Space Filtering* (LSF), a novel approach that mitigates model collapse by filtering out less realistic synthetic data from mixed datasets. Theoretically, we present a framework that connects latent space degradation to empirical observations. Experimentally, we show that LSF consistently outperforms existing baselines across multiple real-world datasets, effectively mitigating model collapse without increasing training cost or relying on human annotation.

PDF Details DOI

JBHI Journal 2026 Journal Article

T2Net: Tongue Image-Based T2DM Detection via Simulated Clinical Diagnostic Reasoning

Yang Liu
Peiyu Liu
Yanyi Huang
Liyun Li
Xiaojie Feng
Miao Xie
Junhao Chen
Jiayu Ye

Clinical studies indicate that the progression of Type 2 Diabetes Mellitus (T2DM) is associated with characteristic alterations in tongue features, which may facilitate non-invasive early detection. However, current deep learning–based tongue imaging approaches for diabetes diagnosis remain constrained by limited datasets, subtle feature variations, dependence on clinical expertise, and the lack of quantitative evaluation. To address these issues, we developed an open-source dataset for T2DM tongue diagnosis (DMT) and benchmarked it using multiple baseline models. Building on DMT, we propose T2Net, a tongue image recognition model for T2DM that simulates the clinical diagnostic process. T2Net comprises four core components: local inspection, pathological clue integration, syndrome identification, and diagnostic confidence estimation. First, T2Net automatically extracts key ROIs by combining large-kernel decomposition with multi-scale learning. Then, a multi-order feature interaction module enables effective fusion of tongue image features across scales to capture pathological clues. Meanwhile, we design a context-aware dynamic aggregation convolution to model long-range dependencies, and propose a flexible focal loss to mimic the diagnostic reasoning process of clinicians, enabling brain-inspired inference. Finally, we propose a clustering-based confidence estimation approach to quantitatively evaluate the reliability of model predictions. Experimental results demonstrate that T2Net achieves highly competitive performance on the DMT dataset, outperforming the second-best baseline by 2. 7% in accuracy and 2. 0% in F1 score. Moreover, the quantitative evaluation scores are largely consistent with clinical assessments by physicians.

Details DOI

AAAI Conference 2026 Conference Paper

Temporal-Consistent Video Restoration with Pre-trained Diffusion Models

Hengkang Wang
Yang Liu
Huidong Liu
Chien-Chih Wang
Yanhui Guo
Hongdong Li
Bryan Wang
Ju Sun

Video restoration (VR) aims to recover high-quality videos from degraded ones. Although recent zero-shot VR methods using pre-trained diffusion models (DMs) show good promise, they suffer from approximation errors during reverse diffusion and insufficient temporal consistency. Moreover, dealing with 3D video data, VR is inherently computationally intensive. In this paper, we advocate viewing the reverse process in DMs as a function and present a novel Maximum a Posterior (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors. We also introduce strategies to promote bilevel temporal consistency: semantic consistency by leveraging clustering structures in the seed space, and pixel-level consistency by progressive warping with optical flow refinements. Extensive experiments on multiple virtual reality tasks demonstrate superior visual quality and temporal consistency achieved by our method compared to the state-of-the-art.

PDF Details DOI

AILAW Journal 2026 Journal Article

The trade-off between robustness and reliability in chinese legal large language models: an empirical study

Yang Liu
Xukai Liu
Haozhen Huang
Fanfei Yu
Tao Xiong
Xiaoyiqi Xia
Bohao You
Jinqi Wu

Abstract Legal large language models (LLMs) deployed in high-stakes judicial settings must exhibit robustness against non-substantive linguistic variations while preserving acute sensitivity to legally determinative facts and norms. This study investigates this robustness–reliability trade-off within the context of Chinese legal tasks. We curate a dataset of 5, 000 Chinese judicial question–answer pairs and generate semantic-preserving adversarial rewrites, retaining only those validated by an embedding-based semantic consistency filter. Holding the total training budget and fine-tuning protocol constant, we fine-tune model variants that differ exclusively in their injection ratio of these verified rewrites, establishing seven distinct injection groups (G0–G6). We evaluate model reliability utilizing a composite protocol that incorporates objective accuracy on exam-style questions, expert evaluations of open-ended responses, and embedding-based semantic similarity. For trademark infringement reasoning tasks, we additionally assess verdict accuracy and rationale quality. Across varying model capacities (4B, 20B, and 32B backbones, including Qwen3-4B, GPT-OSS-20B, and Qwen3-VL-32B-Instruct) and both evaluated tasks, our findings reveal an inverted‑U relationship: moderate robustness data injection enhances reliability, whereas excessive injection degrades overall performance and induces characteristic failure modes, such as the attenuation of legally salient distinctions, the generation of boilerplate rationales, and overly cautious abstention. These findings substantiate “moderate robustness injection” as a practical heuristic and underscore a broader principle of differential sensitivity—achieving insensitivity to superficial variations without blunting the model’s sensitivity to legally decisive elements.

Details DOI

EAAI Journal 2026 Journal Article

Two-phase strategy framework for spatial prediction of landslide hazards in wide-area power linear engineering projects: the case of the China's Renewable Energy Transmission Corridors

Bijing Jin
Kunlong Yin
Taorui Zeng
Shuhao Liu
Yang Liu
Haoran Yang
Kai Wang
Lei Gui

A critical knowledge gap persists in the development of high-precision spatial prediction frameworks for landslide susceptibility assessment along wide-area linear power infrastructure. Therefore, this study develops a novel two-phase optimization framework to address this gap, focusing on China's Renewable Energy Transmission Corridors (RETCs). Phase Ⅰ employs natural breaks (optimal at 26-level grading) to address spatial heterogeneity in conditioning factors, while in Phase Ⅱ the selection of non-landslide sample is optimized based on different geological environment zones and areas with lower susceptibility levels. Six base machine learning models were evaluated, with two ensemble models (Stacking and Blending) achieving superior performance, achieving an Area Under the Curve (AUC) value exceeding 0. 88. The Blending model demonstrated peak accuracy (AUC = 0. 927), identifying 35% of transmission towers in high and very high susceptibility zones across nine provinces. The framework enables tower-specific susceptibility assessment, crucial for protecting China's 80, 000 km transmission network. These findings advance RETCs resilience by: (1) establishing continuous conditioning factor optimal grading strategy for linear infrastructure, (2) introducing a replicable non-landslide sample optimization protocol, and (3) demonstrating ensemble models superiority in energy corridor landslide susceptibility mapping. This framework provides robust support for securing stable clean energy delivery, with potential applications in global renewable energy grid landslide hazards management.

Details DOI

AAAI Conference 2026 Conference Paper

Visual-Friendly Concept Protection via Selective Adversarial Perturbations

Xiaoyue Mi
Fan Tang
You Wu
Juan Cao
Peng Li
Yang Liu

Personalized concept generation by tuning diffusion models with a few images raises potential legal and ethical concerns regarding privacy and intellectual property rights. Researchers attempt to prevent malicious personalization using adversarial perturbations. However, previous efforts have mainly focused on the effectiveness of protection while neglecting the visibility of perturbations. They utilize global adversarial perturbations, which introduce noticeable alterations to original images and significantly degrade visual quality. In this work, we propose the Visual-Friendly Concept Protection (VCPro) framework, which prioritizes the protection of key concepts chosen by the image owner through adversarial perturbations with lower perceptibility. To ensure these perturbations are as inconspicuous as possible, we introduce a relaxed optimization objective to identify the least perceptible yet effective adversarial perturbations, solved using the Lagrangian multiplier method. Qualitative and quantitative experiments validate that VCPro achieves a better trade-off between the visibility of perturbations and protection effectiveness, effectively prioritizing the protection of target concepts in images with less perceptible perturbations.

PDF Details DOI

JBHI Journal 2026 Journal Article

Whole-Process Evolutionary Heterogeneity Analysis for Glioblastoma Radiotherapy Response Prediction

Yao Zheng
Dong Huang
Jie Wei
Tianci Liu
Xiaoting Wu
Yuefei Feng
Chengwei Chen
Yang Liu

As a highly heterogeneous tumor, radiotherapy for Glioblastoma (GBM) is a complex and dynamic process. Traditional predictive methods of treatment response often rely on one or a few fixed time points, but this static approach may fail to capture the detailed, individualized changes occurring throughout the treatment process. To address these limitations, we proposed a novel approach called Evolutionary Heterogeneity Analysis Framework (EvoHAF), which integrates tumor heterogeneity and whole-process evolution of GBM radiotherapy. Our framework introduces an Image Heterogeneity Encoder, designed to capture the intricate spatial heterogeneity based on tumor subregions. Additionally, the Temporal Self-Attention Module (TSAM) mechanism integrates longitudinal imaging data throughout the course of radiotherapy, capturing the evolving nature of the tumor. We further introduce a Compensated Prediction Head (CPH) that dynamically refines predictions throughout the patient's radiotherapy. Experimental results on a cross-center cohort, including an internal dataset of 112 patients and an external validation dataset of 80 patients, demonstrate that EvoHAF achieves strong performance. For internal 5-fold validation, the AUC was 0. 8519±0. 0583, and for external validation, the AUC was 0. 7675±0. 0858. These results demonstrate the model's capability to provide accurate whole-process predictions. Moreover, the model's credibility is reinforced by providing visual explanations at both 2D and 3D subregional levels, establishing trust in its decisions and laying a strong foundation for clinical applications.

Details DOI

EAAI Journal 2025 Journal Article

A lightweight detection algorithm for Camellia oleifera buds, stamens, and flowers using you only look once with large selective Kernel Network

Fei Long
Lijun Li
Yang Liu
Haifei Chen
Yuyan Zhang
Haorui Wang

To address the issues of occlusion and missed detections in small targets during the recognition of Camellia oleifera buds, stamens, and flowers and to improve recognition accuracy and computation speed, this study proposes a lightweight detection model based on YOLOv8s (You Only Look Once version 8 small). Firstly, we enhanced detection effectiveness by replacing the original YOLOv8s backbone with the Large Selective Kernel Network (LSKNet) and introducing the Minimum Point Distance Intersection over Union (MPD-IoU) loss function to accelerate convergence and improve recognition of overlapping targets. Secondly, for model lightweighting, we incorporated the Partial Convolution (PCConv) module to reduce parameters and floating-point operations (FLOPs) while enhancing feature representation. An additional detection head was added to improve small bud target detection. Experimental results show our improved model increased precision (P), recall (R), and mean average precision (mAP) by 0. 3%, 1. 2%, and 1. 2% respectively over baseline YOLOv8s. Compared to mainstream models – YOLOv3-tiny, ScaledYOLOv4, YOLOv5s, YOLOv7, YOLOv8s, and Faster Region-Based Convolutional Neural Network (Faster R-CNN) – our model achieved mAP improvements of 1. 8%, 1. 5%, 1. 4%, 2. 1%, 1. 2%, and 7. 7% respectively. The optimized model demonstrates faster, more accurate identification of Camellia oleifera buds, stamens, and flowers, making it suitable for mobile deployment.

Details DOI

EAAI Journal 2025 Journal Article

A safe multi-agent reinforcement learning algorithm using constraint update projection approach

Yang Liu
Xiang Feng
Huiqun Yu

Traditional reinforcement learning has a major limitation, which is that they optimize agent’s policy purely for maximizing rewards. It completely ignore safety considerations. However, in certain critical engineering fields, ensuring safety is of utmost importance, otherwise it can cause incalculable losses. Therefore, this paper proposes a safe Multi-Agent Constrained Update Projection(MACUP) algorithm, which can safely control agents to complete tasks. We solve this problem from the perspective of policy constraint optimization. Firstly, we derive the new bounds of multi-agent policy performance difference based on a tighter general policy performance difference. It contains generalized advantage estimates, and we utilize these bounds as surrogate functions concerning the objective and constraints. Secondly, to address the coordination issue among multiple agents, we employ a multi-agent sequential policy update framework. Finally, we use a projection method to optimize policies, which has low computational complexity and does not require convex approximation of the surrogate function for solving. It can help us reduce errors. Finally, we have validated our algorithm in two different multi-agent safety environments, and the results show that it is able to satisfy safety constraints while achieving higher rewards.

Details DOI

NeurIPS Conference 2025 Conference Paper

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia
Sensen Gao
Simeng Qin
Tianyu Pang
Chao Du
Yihao Huang
Xinfeng Li
Yiming Li

Multimodal large language models (MLLMs) remain vulnerable to transferable adversarial examples. While existing methods typically achieve targeted attacks by aligning global features—such as CLIP’s [CLS] token—between adversarial and target samples, they often overlook the rich local information encoded in patch tokens. This leads to suboptimal alignment and limited transferability, particularly for closed-source models. To address this limitation, we propose a targeted transferable adversarial attack method based on feature optimal alignment, called FOA-Attack, to improve adversarial transfer capability. Specifically, at the global level, we introduce a global feature loss based on cosine similarity to align the coarse-grained features of adversarial samples with those of target samples. At the local level, given the rich local representations within Transformers, we leverage clustering techniques to extract compact local patterns to alleviate redundant local features. We then formulate local feature alignment between adversarial and target samples as an optimal transport (OT) problem and propose a local clustering optimal transport loss to refine fine-grained feature alignment. Additionally, we propose a dynamic ensemble model weighting strategy to adaptively balance the influence of multiple models during adversarial example generation, thereby further improving transferability. Extensive experiments across various models demonstrate the superiority of the proposed method, outperforming state-of-the-art methods, especially in transferring to closed-source MLLMs.