Author name cluster

Hao Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

97 papers

2 author rows

EAAI Journal 2026 Journal Article

A lightweight multi-window attention transformer for image super-resolution

Yuqing Yang
Hao Liu
Jun Zhang
Wenfei Luo
Jiaqian Wang
Yuxiang Shi
Hongxia Deng

In recent years, Transformer-based models have achieved strong performance in image super-resolution (SR). However, their high computational complexity and parameter cost still limit deployment on resource-constrained devices. To better balance efficiency and representation capability, this paper proposes a lightweight Transformer for image super-resolution, termed Multi-Window Attention Transformer for Image Super-Resolution (MWAT-SR), which adopts a hierarchical multi-window attention strategy. In shallow layers, Local Dense Attention (LDA) with small windows is used to preserve local high-frequency details. In deeper layers, larger windows are introduced together with a Hybrid Sparse-Channel Attention (HSCA) mechanism, which combines sparse spatial interaction and channel-wise semantic modeling to enlarge the effective receptive field under controlled computational cost. In addition, a Window-Adaptive Multi-Scale Convolutional Feed-Forward Network (WAMC-FFN) is designed to adjust convolution kernel sizes according to the window scale, thereby enhancing multi-scale texture representation. Experimental results on standard benchmark datasets show that MWAT-SR achieves competitive reconstruction performance across × 2, × 3, and × 4 settings, while maintaining a favorable trade-off between reconstruction quality and computational complexity.

Details DOI

AAAI Conference 2026 Conference Paper

Beyond Passive Critical Thinking: Fostering Proactive Questioning to Enhance Human-AI Collaboration

Ante Wang
Yujie Lin
Jingyao Liu
Suhang Wu
Hao Liu
Xinyan Xiao
Jinsong Su

Critical thinking is essential for building robust AI systems, preventing them from blindly accepting flawed data or biased reasoning. However, prior work has primarily focused on passive critical thinking, where models simply reject problematic queries without taking constructive steps to address user requests. In this work, we introduce proactive critical thinking, a paradigm where models actively seek missing or clarifying information from users to resolve their queries better. To evaluate this capability, we present GSM-MC and GSM-MCE, two novel benchmarks based on GSM8K for assessing mathematical reasoning under incomplete or misleading conditions. Experiments on Qwen3 and Llama series models show that, while these models excel in traditional reasoning tasks, they struggle with proactive critical thinking, especially smaller ones. However, we demonstrate that reinforcement learning (RL) can significantly improve this ability. By incorporating heuristic information into the reward function, we achieve substantial gains, boosting the Qwen3-1.7B's accuracy from 0.15% to 73.98% on GSM-MC. We hope this work advances models that collaborate more effectively with users in problem-solving through proactive critical thinking.

PDF Details DOI

EAAI Journal 2026 Journal Article

Defect detection of monocrystalline silicon wafers for photovoltaic applications using an improved you only look once version 8 small algorithm

Wenbo Bi
Xinyu Wang
Na Liu
Xu Xing
Lu Li
Hao Liu

Defects on the surface of photovoltaic monocrystalline silicon wafers, such as cracks, corners, and water stains, lead to significant performance degradation and economic losses during manufacturing. To address this, this paper proposes an improved You Only Look Once version 8 small (YOLOv8s) model. The proposed architecture integrates four strategic innovations. First, an Efficient Multi-Scale Convolution (EMSC) module is combined with the Cross-Stage Partial Bottleneck module with two convolutions (C2f) to enhance multi-scale feature extraction capabilities. Second, Spatial Pyramid Pooling-Fast (SPPF) is fused with the Large Separable Kernel Attention (LSKA) module to overcome limitations in processing local details. Third, the Dysample dynamic upsampling operator is introduced to maintain a compact model size while effectively improving detection speed. Finally, the Normalized Wasserstein Distance (NWD) is utilized as the loss function to address the sensitivity of the Intersection over Union (IoU) metric to positional deviations, enhancing precision for small targets. Experimental results demonstrate that the Efficient Lightweight Detection Network (ELDN) achieves superior performance on the validation set with a mean Average Precision (mAP) of 92. 8%. Notably, it exhibits robust generalization on an independent external test set, attaining a mAP of 92. 6%. Validation confirms that YOLOv8s-ELDN consistently outperforms mainstream models. Future research will focus on further optimizing efficiency for deployment on resource-constrained edge devices and addressing defect detection in complex manufacturing environments.

Details DOI

AAAI Conference 2026 Conference Paper

Edge-Centric Relational Reasoning for 3D Scene Graph Prediction

Yanni Ma
Hao Liu
Yulan Guo
Theo Gevers
Martin R. Oswald

3D scene graph prediction aims to abstract complex 3D environments into structured graphs consisting of objects and their pairwise relationships. Existing approaches typically adopt object-centric graph neural networks, where relation edge features are iteratively updated by aggregating messages from connected object nodes. However, this design inherently restricts relation representations to pairwise object context, making it difficult to capture high-order relational dependencies that are essential for accurate relation prediction. To address this limitation, we propose a Link-guided Edge-centric relational reasoning framework with Object-aware fusion, namely LEO, which enables progressive reasoning from relation-level context to object-level understanding. Specifically, LEO first predicts potential links between object pairs to suppress irrelevant edges, and then transforms the original scene graph into a line graph where each relation is treated as a node. A line graph neural network is applied to perform edge-centric relational reasoning to capture inter-relation context. The enriched relation features are subsequently integrated into the original object-centric graph to enhance object-level reasoning and improve relation prediction. Our framework is model-agnostic and can be integrated with any existing object-centric method. Experiments on the 3DSSG dataset with two competitive baselines show consistent improvements, highlighting the effectiveness of our edge-to-object reasoning paradigm.

PDF Details DOI

JBHI Journal 2026 Journal Article

EISegNet: Enhancing Instrument Segmentation Network via Dual-View Disparity Estimation

Yongming Yang
Zhaoshuo Diao
Ziliang Song
Shenglin Zhang
Tiancong Liu
Chengdong Wu
Weiliang Bai
Hao Liu

Accurate segmentation of endoscopic instruments is essential in robot-assisted surgery, supporting precis enavigation, enhancing safety, and advancing surgical automation. However, this task is challenging due to factors like complex environments, instrument-tissue similarity, and lighting variations. Instruments, due to their material properties, have distinct depth distributions compared to surrounding tissues. This aspect is often overlooked in monocular video segmentation methods. To address this issue, we propose EISegNet, a multi-task framework that prioritizes instrument segmentation with an auxiliary disparity estimation task. The framework integrates an asymmetric cross-attention mechanism to enhance segmentation performance by fusing features from both tasks. Moreover, by leveraging the geometric properties of motion, EISegNet adapts the stereo disparity estimation strategy for dual-view depth estimation, broadening its applicability to various endoscopic surgeries beyond laparoscopic procedures. Furthermore, EISegNet incorporates a Gaussian-weighted loss function to emphasize edge features, which are particularly challenging for disparity estimation. This function reduces overall loss and improves segmentation accuracy. Extensive cross-dataset experiments demonstrate the superior accuracy and generalization of our method, achieving a 5. 97% increase in IoU (Intersection over Union). Qualitative evaluations on clinical datasets further demonstrate the promising performance in real-world scenarios.

Details DOI

AAAI Conference 2026 Conference Paper

Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models

Tianrui Song
Wen-Shuo Chao
Hao Liu

Implicit feedback, employed in training recommender systems, unavoidably confronts noise due to factors such as misclicks and position bias. Previous studies have attempted to identify noisy samples through their diverged data patterns, such as higher loss values, and mitigate their influence through sample dropping or reweighting. However, we observed that noisy samples and hard samples display similar patterns, leading to hard-noisy confusion issue. Such confusion is problematic as hard samples are vital for modeling user preferences. To solve this problem, we propose LLMHNI framework, leveraging two auxiliary user-item relevance signals generated by Large Language Models (LLMs) to differentiate hard and noisy samples. LLMHNI obtains user-item semantic relevance from LLM-encoded embeddings, which is used in negative sampling to select hard negatives while filtering out noisy false negatives. An objective alignment strategy is proposed to project LLM-encoded embeddings, originally for general language tasks, into a representation space optimized for user-item relevance modeling. LLMHNI also exploits LLM-inferred logical relevance within user-item interactions to identify hard and noisy samples. These LLM-inferred interactions are integrated into the interaction graph and guide denoising with cross-graph contrastive alignment. To eliminate the impact of unreliable interactions induced by LLM hallucination, we propose a graph contrastive learning strategy that aligns representations from randomly edge-dropped views to suppress unreliable edges. Empirical results demonstrate that LLMHNI significantly improves denoising and recommendation performance.

PDF Details DOI

AAAI Conference 2026 Conference Paper

ICLR: Inter-Chrominance and Luminance Interaction for Natural Color Restoration in Low-Light Image Enhancement

Xin Xu
Hao Liu
Wei Liu
Wei Wang
Jiayi Wu
Kui Jiang

Low-Light Image Enhancement (LLIE) task aims at improving contrast while restoring details and textures for images captured in low-light conditions. HVI color space has made significant progress in this task by enabling precise decoupling of chrominance and luminance. However, for the interaction of chrominance and luminance branches, substantial distributional differences between the two branches prevalent in natural images limit complementary feature extraction, and luminance errors are propagated to chrominance channels through the nonlinear parameter. Furthermore, for interaction between different chrominance branches, images with large homogeneous-color regions usually exhibit weak correlation between chrominance branches due to concentrated distributions. Traditional pixel-wise losses exploit strong inter-branch correlations for co-optimization, causing gradient conflicts in weakly correlated regions. Therefore, we propose an Inter-Chrominance and Luminance Interaction (ICLR) framework including a Dual-stream Interaction Enhancement Module (DIEM) and a Covariance Correction Loss (CCL). The DIEM improves the extraction of complementary information from two dimensions, fusion and enhancement, respectively. The CCL utilizes luminance residual statistics to penalize chrominance errors and balances gradient conflicts by constraining chrominance branches covariance. Experimental results on multiple datasets show that the proposed ICLR framework outperforms state-of-the-art methods.

PDF Details DOI

EAAI Journal 2026 Journal Article

Inverse compensation and adaptive fuzzy integral sliding-mode control for the underactuated soft massage physiotherapy robot

Zixin Huang
Chengsong Yu
Junjie Lu
Hao Liu
Peng Huang

Acupoint massage physiotherapy is a kind of effective method to prevent and remedy diseases. Soft robotics technology is thriving, which has potential applications in the field of acupoint massage physiotherapy. Soft massage physiotherapy robot (SMPR) uses the soft robotics technology to realize the acupoint massage physiotherapy function. In this paper, an SMPR consisting of a wearable armor and several pneumatic physiotherapy actuators (PPAs) is design and fabricated. In order to describe complex hysteresis behavior of SMPR, the dynamic model of its PPA is established and identified, which includes two parts: a linear model and an asymmetric Prandtl–Ishlinskii hysteresis (APIH) model. An inverse compensator is then designed to compensate for the hysteresis behavior of the SMPR based on the APIH model, and an approximately linearized system is obtained. Then, by dint of the artificial intelligence method, a fuzzy approximator is designed to approximate the control system’s lumped uncertainty, which includes external disturbances, modeling errors and parameter perturbations. Further, an adaptive fuzzy integral sliding-mode control (AFISMC) is employed to handle the lump uncertainty. Moreover, based on the back-stepping control method, a nominal controller is designed to realize the control of the approximately linearized system. By combining the inverse compensator, fuzzy approximator, AFISMC and nominal controller, the control of the SMPR is realized and the acupoint massage physiotherapy can be controlled accurately. The stabilization to a control systems is theoretically demonstrated. Finally, the experimental results from multiple test scenarios conclusively demonstrate the efficacy and trajectory tracking capability of the developed control strategy.

Details DOI

AAAI Conference 2026 Conference Paper

Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation

Hefei Xu
Le Wu
Chen Cheng
Hao Liu

With the rapid advancement of large language models (LLMs), aligning them with human values for safety and ethics has become a critical challenge. This problem is especially challenging when multiple, potentially conflicting human values must be considered and balanced. Although several variants of existing alignment methods (such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO)) have been proposed to address multi-value alignment, they suffer from notable limitations: 1) they are often unstable and inefficient in multi-value optimization; and 2) they fail to effectively handle value conflicts. As a result, these approaches typically struggle to achieve optimal trade-offs when aligning multiple values. To address this challenge, we propose a novel framework called Multi-Value Alignment (MVA). It mitigates alignment degradation caused by parameter interference among diverse human values by minimizing their mutual information. Furthermore, we propose a value extrapolation strategy to efficiently explore the Pareto frontier, thereby constructing a set of LLMs with diverse value preferences. Extensive experiments demonstrate that MVA consistently outperforms existing baselines in aligning LLMs with multiple human values.

PDF Details DOI

EAAI Journal 2026 Journal Article

Noise-aware dynamic graph recalibration for interpretable aero-engine anomaly detection

Haiyan Zhang
Youchao Sun
Honglan Wu
Hao Liu

Early detection of anomalies in aero-engine is critical to flight safety. In practical operational environments, however, dynamic noise interference distorts the topological structure of sensor networks, undermining the reliability of feature propagation in graph neural networks. Existing models lack dynamic graph optimization capabilities under noisy conditions and offer limited interpretability, as they fail to explicitly model the relationship between noise intensity and graph structural evolution. To overcome these limitations, this paper introduces a noise-guided framework for dynamic graph threshold recalibration. Specifically, the framework incorporates a noise-aware graph recalibrator that statistically infers noise levels from feature dispersion to dynamically adjust connection thresholds, and a feature fidelity convolutional layer that gates neighborhood aggregation to prevent noise accumulation and mitigate feature degradation. Experiments on public datasets and real aero-engine operational data demonstrate that the proposed framework significantly outperforms state-of-the-art methods in detection accuracy and noise resilience. Quantitative and visualization analyses confirm its noise-aware characteristics, yielding interpretable edge selection and establishing a rigorous causal link between internal dynamic thresholds and external physical interference. Validation on real-world data further substantiates the framework's potential for practical engineering applications in early anomaly detection.

Details DOI

AAAI Conference 2026 Conference Paper

RecCocktail: A Generalizable and Efficient Framework for LLM-Based Recommendation

Min Hou
Chenxi Bai
Le Wu
Hao Liu
Kai Zhang
Weiwen Liu
Richang Hong
Ruiming Tang

Large Language Models (LLMs) have achieved remarkable success in recent years, owing to their impressive generalization capabilities and rich world knowledge. To capitalize on the potential of using LLMs as recommender systems, mainstream approaches typically focus on two paradigms. The first paradigm designs multi-domain or multi-task instruction data for generalizable recommendation, so as to align LLMs with general recommendation areas and deal with cold-start recommendation. The second paradigm focuses on enhancing domain-specific recommendation tasks, improving performance in warm recommendation scenarios. While most previous works treat these two paradigms separately, we argue that they have complementary advantages, and combining them can yield better results. In this paper, we propose a generalizable and efficient LLM-based recommendation framework RecCocktail. Our approach begins with fine-tuning a "base spirit" LoRA module using domain-general recommendation instruction data to align LLM with recommendation knowledge. Next, given users' behavior of a specific domain, we construct a domain-specific "ingredient" LoRA module. We then provide an entropy-guided adaptive merging method to mix the "base spirit" and the "ingredient" in the weight space. Please note that, RecCocktail combines the advantages of the existing two paradigms without introducing additional time or space overhead during the inference phase. Moreover, RecCocktail is efficient with plug and play, as the "base spirit" LoRA is trained only once, and any domain-specific "ingredient" can be efficiently mixed with only domain-specific fine-tuning. Extensive experiments on multiple datasets under both warm and cold-start recommendation scenarios validate the effectiveness and generality of the proposed RecCocktail.

PDF Details DOI

EAAI Journal 2026 Journal Article

The accidental explosion tracing model of architecture glass damage based on shuffle attention

Hao Liu
Zhen Qing Wang
Shuai Qin
Qiang Zhao
Lei Zhang

Explosion tracing is the basis of hazard analysis and risk analysis of accidental explosion. The accidental explosion damage effects data has typical multi-source heterogeneous characteristics. The method of data fusion can be used to aggregate the redundant or complementary information on multiple sensors to obtain the more completed information. The architecture glass damage accidental explosion tracing machine learning model with Shuffle attention based on the experiment and simulate data of tempered glass plate under blast loading has been exhibited. Two branch models, blast wave propagation tracing model (Process-model) and glass plate dynamic response tracing model (Response-model), were established based on multi-source data fusion. Then the final decision tracing model (Decision-model) has been constructed based on the decision level. The Mean Absolute Percentage Error (MAPE) of the Decision-model was reduced to 0. 1605 compared with two branch models. The data of three different positions of the glass plate were used in the tracing process respectively. The results indicated that the data of the peripheral area showed the largest error. Considering the incompleteness of the actual explosion accident investigation data, in order to verify the real-word practical application of the model, an accidental explosion verification test of emulsion explosive was carried out. The MAPE of the actual measured imperfect dataset is 0. 2423. The results show that the tracing model can still ensure its prediction accuracy even when part input data is missing in practical applications. It provides a reliable analysis tool for accident explosion risk assessment.

Details DOI

EAAI Journal 2026 Journal Article

The research on the diagnostic technology for aortic dissection and acute myocardial infarction based on Raman and infrared spectroscopy combined with multimodal deep learning

Lei Yan
Guangyao Ma
Cheng Chen
Chen Chen
Jing Tao
Xuguang Zhou
Ting Tian
Hao Liu

Background Aortic dissection and myocardial infarction are two common and life-threatening cardiovascular emergencies characterized by sudden onset, high mortality, and overlapping clinical symptoms such as chest pain and respiratory distress, which make accurate and timely clinical differentiation particularly challenging. Current mainstream diagnostic techniques, including computed tomography and transesophageal echocardiography, provide valuable anatomical and functional information but are often costly, time-consuming, and insensitive to early-stage biochemical alterations, which may result in missed or incorrect diagnoses in emergency settings. Aortic dissection often requires immediate repair of the damaged vessel to prevent further expansion or rupture, whereas myocardial infarction requires rapid restoration of blood flow to the myocardium. The treatment approaches for the two conditions are distinct, and misdiagnosis can result in severe consequences. Therefore, more convenient, rapid, and efficient diagnostic methods are urgently needed. Methods Vibrational spectroscopy is a noninvasive analytical technique with high sensitivity to molecular and biochemical changes in biological samples, and Raman spectroscopy and infrared spectroscopy target distinct molecular vibrational modes, providing complementary pathological information. In this study, a multimodal attention fusion network was developed to integrate Raman spectroscopy and infrared spectroscopy data for rapid disease classification. Results Experimental results demonstrated that the proposed method achieved a diagnostic accuracy of 94. 06 % and a specificity of 97. 03 % percent in distinguishing aortic dissection, myocardial infarction, and non-critical cases. Conclusion This method provides an innovative and efficient decision-support tool for the clinical differentiation of aortic dissection and myocardial infarction, offering significant clinical value.

Details DOI

EAAI Journal 2025 Journal Article

A single-cell RNA sequencing data imputation method based on non-negative matrix factorization and multi-kernel similarity network fusion

Pei Liu
Cheng Chen
Hao Liu
Jin Gu
Xinya Chen
Ying Su
Zhiyuan Cheng
Xiaoyi Lv

Artificial intelligence-based single-cell RNA sequencing (scRNA-seq) technology is widely used in cell type identification and disease research, but its data often contain a large number of missing values and zero values due to technical limitations and biological differences. These zero values not only affect downstream analysis, but also make it difficult to distinguish technical zero values from biological zero values. Therefore, this paper proposes a scRNA-seq data interpolation method (sc-MKNMF) based on non-negative matrix factorization and multi-kernel similarity network fusion for the first time. This method improves the accuracy of cell clustering by accurately filling some zero values. First, sc-MKNMF uses gene-cell dual-level analysis to distinguish technical zero values from biological zero values, and then calculates the similarity network of multi-kernel fusion of genes and cells respectively. Then, this method uses non-negative matrix factorization combined with similarity network to construct the objective function, and introduces sparse regularization terms to ensure the similarity between genes and cells and improve stability. In addition, sc-MKNMF is also equipped with an efficient optimization algorithm to promote its convergence by continuously updating the objective function. Finally, the verification and comparative experiments on 12 scRNA-seq datasets show that the sc-MKNMF method outperforms other advanced data interpolation methods. In addition, the extension of sc-MKNMF to the two tasks of cell trajectory inference and differentially expressed gene analysis showed significant improvement and excellent versatility.

Details DOI

NeurIPS Conference 2025 Conference Paper

A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone

Jitai Hao
Qiang Huang
Hao Liu
Xinyan Xiao
Zhaochun Ren
Jun Yu

Training high-performing Small Language Models (SLMs) remains computationally expensive, even with knowledge distillation and pruning from larger teacher models. Existing approaches often face three key challenges: (1) information loss from hard pruning, (2) inefficient alignment of representations, and (3) underutilization of informative activations, particularly from Feed-Forward Networks (FFNs). To address these challenges, we introduce \textbf{Low-Rank Clone (LRC)}, an efficient pre-training method that constructs SLMs aspiring to behavioral equivalence with strong teacher models. LRC trains a set of low-rank projection matrices that jointly enable soft pruning by compressing teacher weights, and activation clone by aligning student activations, including FFN signals, with those of the teacher. This unified design maximizes knowledge transfer while removing the need for explicit alignment modules. Extensive experiments with open-source teachers such as Llama-3. 2-3B-Instruct and Qwen2. 5-3B/7B-Instruct show that LRC matches or surpasses the performance of state-of-the-art models trained on trillions of tokens--using only 20B tokens, achieving over \textbf{1, 000$\times$} greater training efficiency. Our codes and model checkpoints are available at https: //github. com/CURRENTF/LowRankClone and https: //huggingface. co/JitaiHao/LRC-4B-Base.