Author name cluster

Yan Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

90 papers

2 author rows

AAAI Conference 2026 Conference Paper

From Pixels to Logic: A Perception-Reasoning Decomposition Framework for Open-World Referring Expression Comprehension

Lihong Huang
Sheng-hua Zhong
Zhi Zhang
Yan Liu

Recent advances in Referring Expression Comprehension (REC) have been largely driven by supervised learning on curated datasets, where each expression is assumed to refer to exactly one known object. However, such assumptions rarely hold in real-world scenarios, where expressions can refer to multiple objects, fail to refer to any, or involve novel categories and complex semantics. These challenges define the task of open-world REC, which demands robust generalization and structured reasoning beyond the scope of traditional REC methods. In this work, we introduce a novel, training-free framework that decouples visual perception from linguistic reasoning to address open-world REC. Our method first transforms the visual scene into a rich textual representation using an open-vocabulary multimodal perception module. It then employs a reasoning language model to interpret the referring expression and perform explicit logical inference over the perceived scene, enabling transparent decision-making and strong generalization in open-world scenarios. Experiments on three standard REC benchmarks as well as two more challenging ones, gRefCOCO and D³, demonstrate that our framework achieves highly competitive zero-shot performance, often surpassing supervised baselines.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Is Symbolic Music a Specific Language? Exploring Inspiration-to-Structure Machine Composition via LLMs

Zhejing Hu
Yan Liu
Zhi Zhang
Aiwei Zhang
Sheng-hua Zhong
Bruce X.B. Yu
Gong Chen

Large Language Models (LLMs) have demonstrated remarkable proficiency in diverse tasks. This success raises a fundamental question in machine composition: Can symbolic music be considered a special form of language that can be jointly modeled with natural language for composition tasks? Recent studies validate that symbolic music can be modeled as a human language, yet composing structured music from partial symbolic inputs through natural language interaction remains underexplored. Even LLMs struggle to generate structurally coherent compositions in such hybrid input-output scenarios, highlighting a fundamental gap that calls for a domain-specific learning paradigm. To this end, we propose Inspiration-to-Structure (IoS), a cognitively inspired framework that enables LLMs to generate structured musical sections from melodic ideas. IoS employs a three-phase process—semantic, structural, and collaborative cognition—and is supported by two key components: (1) a new dataset and construction protocol called Structured Triplet Data (STD), and (2) a training method, Dual-Instance Structural Contrastive Optimization (DiSCO), designed to enhance structural awareness. Experiments show that IoS improves structural coherence by 47.8% and artistic creativity by 21.8% compared to conventional language modeling paradigm, supervised fine-tuning, and even enables smaller LLMs to surpass larger LLMs. These results suggest that symbolic music, while language-like, demands specialized modeling beyond standard language modeling paradigms. IoS enables LLMs to transform music theory knowledge into structured composition, empowering users to compose music interactively via language and advancing toward general creative AI.

PDF Details DOI

JBHI Journal 2026 Journal Article

Mutual Generation for Cross-Domain Challenge in Stroke Patients' Motor Imagery Classification and Functional Recovery Prediction

Rongrong Lu
Wenchang Deng
Tianhao Gao
Songhua Huang
Zhi Zhang
Yan Liu
Sheng-hua Zhong

The accumulating body of research indicates that Motor Imagery (MI)-BCIs have the potential to enhance the quality of life for individuals with disabilities and to advance our understanding of brain function and rehabilitation strategies. Among these diseases, stroke is the leading cause of long-term motor disability across the globe, thereby underscoring the need for innovative rehabilitation strategies, such as MI-BCI technologies. In contrast with these expectations, the majority of existing research is built upon data obtained from healthy subjects. The construction of effective classification models for Motor Imagery tasks in patients with brain diseases, particularly stroke, remains a significant challenge. The lateralization of the left and right hemispheres is more pronounced in patients who have suffered a stroke than in healthy individuals. Moreover, the specific locations of lesions and the regions of influence result in significant variations in the electroencephalogram (EEG) data of patients with different hemiplegic sides. This paper explores the potential of generative models in addressing the issue of domain differences arising from different hemiplegic sides EEG data. Furthermore, this paper circumvents the potential adverse effects of rigorous optimization of low-quality samples on model performance through the utilization of label softening algorithm. Two MI-EEG datasets of stroke patients performing Motor Imagery tasks are used to validate our method. In comparison to both classical machine learning methods and those state-of-the-art models for MI classification, the classification model in this paper achieves a noticeable performance improvement in different data partitioning strategies, including subject-dependent and subject-independent scenarios. Each sub-module, and each designed loss function, contributes to the final performance growth. In addition, this paper also investigates the potential of the proposed framework for predicting a patient's level of functional recovery. Our findings indicate that the addition of a prediction layer to the proposed model enables the accurate prediction of functional recovery level in stroke patients.

Details DOI

AAAI Conference 2026 Conference Paper

RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation

Qinfeng Li
Miao Pan
Ke Xiong
Ge Su
Zhiqiang Shen
Yan Liu
Sun Bing
Hao Peng

Retrieval-Augmented Generation (RAG) systems deployed over proprietary knowledge bases face growing threats from reconstruction attacks that aggregate model responses to replicate knowledge bases. Such attacks exploit both intra-class and inter-class paths—progressively extracting fine-grained knowledge within topics and diffusing it across semantically related ones, thereby enabling comprehensive extraction of the original knowledge base. However, existing defenses target only one path, leaving the other unprotected. We conduct a systematic exploration to assess the impact of protecting each path independently and find that joint protection is essential for effective defense. Based on this, we propose RAGFort, a structure-aware dual-module defense combining contrastive reindexing for inter-class isolation and constrained cascade generation for intra-class protection. Experiments across security, performance, and robustness confirm that RAGFort significantly reduces reconstruction success while preserving answer quality, offering the first comprehensive defense against knowledge base extraction attacks.

PDF Details DOI

EAAI Journal 2026 Journal Article

Remaining useful life prediction of rotating machine via long short-term memory network with uncertainty quantification

Jialong He
Zhenbiao Ma
Yan Liu
Zhaojun Yang

Remaining useful life (RUL) prediction of rotating machinery is critical for intelligent maintenance and ensuring equipment reliability. However, existing methods often struggle to capture the long-term degradation trends and fail to adequately quantify the uncertainty in the predictions. To address these challenges, this paper proposes a novel RUL prediction method based on a long short-term memory-Wiener process-Bayesian optimization (LSTM-WP-Bo) degradation model. First, based on the Wiener process (WP), a long short-term memory (LSTM) network is used to model the drift function of the degradation process. Secondly, based on the concept of first hitting time (FHT), an approximate expression for the probability density function (PDF) of RUL is derived, while the uncertainty of the prediction is quantified. Lastly, the drift and diffusion coefficients are estimated using maximum likelihood estimation (MLE), and the LSTM network's hyperparameters are optimized through Bayesian optimization (Bo). The proposed method is analyzed comparatively on three datasets. For example, after validation on the servo turret power head system degradation dataset, the proposed method achieves a root mean square error (RMSE) of 15. 55 and a mean absolute percentage error (MAPE) of 12. 91 %, demonstrating significant improvements in prediction accuracy and robustness when compared to existing methods.

Details DOI

EAAI Journal 2026 Journal Article

Robot welding trajectory planning for branch-pipes “clustered” intersecting structure based on metaheuristic algorithms

Yan Liu
Qiu Tang

The multi-pipe intersecting structure is widely used in fire fighting, construction, and other industries. Due to the high complexity of weld seam, its welding is mainly based on manual work. To improve the welding efficiency, a new method of robot welding trajectory planning for multi-pipe intersecting structure based on metaheuristic algorithms is proposed by establishing the ideal intersection model. First, this paper establishes the mathematical model of branch pipes “clustered” intersecting structure through investigation, and proposes the calculation method of intersecting curve expression based on the improved proportional-integral-derivative based search algorithm. Second, the multi-pipe intersecting curve is spatial segmented and discontinuous. Taking into account factors such as path distance, attitude mutation, and trajectory smoothness, this paper proposes a path optimization framework for robot welding based on metaheuristic algorithms. Third, to guarantee the seamless transition of the robot trajectory and attitude among sub-paths, a model-driven robot welding trajectory planning approach is presented in this paper. Moreover, the arc transition and unit quaternion interpolation algorithms are integrated to prevent the abrupt change of robot welding attitude. Finally, the experiments are designed to verify the feasibility and accuracy of aforementioned approach.

Details DOI

JBHI Journal 2026 Journal Article

Structure-Constrained Regression Network for Efficient and Topology-Guaranteed Retinal Layer Segmentation in OCT Images

Yi-Peng Liu
Zhanqing Li
Junhao Qu
Yilong Zhang
Yan Liu

Retinal layer segmentation in optical coherence tomography (OCT) images enables the quantification of retinal morphology, which is critical for diagnosing and monitoring ophthalmic diseases. However, interference factors such as speckle noise, intensity variations, and pathological abnormalities hinder efficient and topology-guided layer segmentation. As topology-guaranteed layers can be efficiently derived from structured layer boundaries, this study presents a structure-constrained regression network (SCRNet): a lightweight, end-to-end deep network that leverages structural priors inherent in OCT retinal layers to segment these boundaries efficiently. As the retinal structure is robust to interference factors and critical for establishing structured layer boundaries, SCRNet introduces a lightweight, two-stream architecture to capture structural information, with one stream targeting the layer topology and the other targeting boundary continuity. Each stream incorporates a tailored structural feature module (SFM) and a structure-constrained loss (SCL) to extract structural information effectively from a global perspective. Then, a structure-constrained regression module (SCRM) integrates the complementary structural information from both streams to enable structured boundary regression, enhancing accuracy and robustness. Extensive experiments on two publicly available benchmark datasets demonstrate that SCRNet achieves state-of-the-art performance in segmenting structured layer boundaries and topology-guaranteed layers while maintaining high efficiency. The source code will be publicly available at https://github.com/cyan323/SCRNet.

Details DOI

TMLR Journal 2026 Journal Article

TS-Reasoner: Domain-Oriented Time Series Inference Agents for Reasoning and Automated Analysis

Wen Ye
Wei Yang
Defu Cao
Yizhou Zhang
Lumingyuan Tang
Jie Cai
Yan Liu

Time series analysis is crucial in real-world applications, yet traditional methods focus on isolated tasks only, and recent studies on time series reasoning remain limited to either single-step inference or are constrained to natural language answers. In this work, we introduce TS-Reasoner, a domain-specialized agent designed for multi-step time series inference. By integrating large language model (LLM) reasoning with domain- specific computational tools and error feedback loop, TS-Reasoner enables domain-informed, constraint-aware analytical workflows that combine symbolic reasoning with precise numerical analysis. We assess the system’s capabilities along two axes: 1) fundamental time series understanding assessed by TimeSeriesExam and 2) complex, multi-step inference, evaluated by a newly proposed dataset designed to test both compositional reasoning and computational precision in time series analysis. Experiments show that our approach outperforms standalone general-purpose LLMs in both basic time series concept understanding as well as the multi-step time series inference task, highlighting the promise of domain-specialized agents for automating real-world time series reasoning and analysis.