Author name cluster

Wei Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

185 papers

2 author rows

EAAI Journal 2026 Journal Article

A trinity-branch parallel fusion and supervised enhancement network: A multimodal celiac disease diagnosis network based on transformer and dual-tower supervision

Jiahe Li
Tian Shi
Chen Chen
Xuguang Zhou
Wei Liu
Xiaoyi Lv
Feng Gao
Cheng Chen

Details DOI

AAAI Conference 2026 Conference Paper

AdaDepth: Exploiting Inherent Scene Information for Self-Supervised Depth Estimation in Dynamic Scenes

Xuanang Gao
Xiongbin Wu
Zhiwei Ning
Runze Yang
Zhonglong Zheng
Jie Yang
Wei Liu

Self-supervised monocular depth estimation methods severely compromise accuracy in dynamic objects due to their static scene assumption. Existing approaches for dynamic scenes suffer from two critical shortcomings: 1) reliance on supervised segmentation models (requiring costly annotations) or computationally intensive multi-branch models to isolate moving objects, and 2) simple integration of 2D/3D motion flow without reliable supervision for dynamic objects. We propose AdaDepth, a two‑stage framework that jointly performs unsupervised scene decomposition and dynamic-aware depth learning. In the initial structural stage, our geometry-motion joint scene decomposition (GMoDecomp) module ensures the robust generation of a depth prior and simultaneously partitions the scene into multiple regions through the fusion of geometric and motion cues. In the region-adaptive refinement stage, we exploit the depth prior and decomposed regions to introduce motion-aware and geometry-consistent constraints, effectively improving depth estimation in dynamic scenes. AdaDepth achieves accurate depth prediction in highly dynamic scenes without relying on external labels or specialized segmentation models. Extensive experiments on KITTI, Cityscapes, and Waymo Open demonstrate its superiority over state-of-the-art approaches.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Agent-SAMA: State-Aware Mobile Assistant

Linqiang Guo
Wei Liu
Yi Wen Heng
Tse-Hsun (Peter) Chen
Yang Wang

Mobile Graphical User Interface (GUI) agents aim to autonomously complete tasks within or across apps based on user instructions. While recent Multimodal Large Language Models (MLLMs) enable these agents to interpret UI screens and perform actions, existing agents remain fundamentally reactive. They reason over the current UI screen but lack a structured representation of the app navigation flow, lim- iting GUI agents’ ability to understand execution context, detect unexpected execution results, and recover from errors. We introduce Agent-SAMA, a state-aware multi-agent framework that models app execution as a Finite State Machine (FSM), treating UI screens as states and user actions as transitions. Agent-SAMA implements four specialized agents that collaboratively construct and use FSMs in real time to guide task planning, execution verification, and recovery. We evaluate Agent-SAMA on two types of benchmarks: cross- app (Mobile-Eval-E, SPA-Bench) and mostly single-app (AndroidWorld). On Mobile-Eval-E, Agent-SAMA achieves an 84.0% success rate and a 71.9% recovery rate. On SPA-Bench, it reaches an 80.0% success rate with a 66.7% recovery rate. Compared to prior methods, Agent-SAMA improves task success by up to 12% and recovery success by 13.8%. On AndroidWorld, Agent-SAMA achieves a 63.7% success rate, outperforming the baselines. Our results demonstrate that structured state modeling enhances robustness and can serve as a lightweight, model-agnostic memory layer for future GUI agents.

PDF Details DOI

JBHI Journal 2026 Journal Article

CQH-MPN: A Classical–Quantum Hybrid Prototype Network With Fuzzy Proximity-Based Classification for Early Glaucoma Diagnosis

Wei Liu
Haijian Shao
Xing Deng
Yingtao Jiang

Glaucoma is the second leading cause of blindness worldwide and the only form of irreversible vision loss, making early and accurate diagnosis essential. Although deep learning has revolutionized medical image analysis, its dependence on large-scale annotated datasets poses a significant barrier, especially in clinical scenarios with limited labeled data. To address this challenge, we propose a Classical–Quantum Hybrid Mean Prototype Network (CQH-MPN) tailored for few-shot glaucoma diagnosis. CQH-MPN integrates a quantum feature encoder, which exploits quantum superposition and entanglement for enhanced global representation learning, with a classical convolutional encoder to capture local structural features. These dual encodings are fused and projected into a shared embedding space, where mean prototype representations are computed for each class. We introduce a fuzzy proximity-based metric that extends traditional prototype distance measures by incorporating intra-class variability and inter-class ambiguity, thereby improving classification sensitivity under uncertainty. Our model is evaluated on two public retinal fundus image datasets—ACRIMA and ORIGA—under 1-shot, 3-shot, and 5-shot settings. Results show that CQH-MPN consistently outperforms other models, achieving an accuracy of 94. 50% $\pm$ 1. 04% on the ACRIMA dataset under the 1-shot setting. Moreover, the proposed method demonstrates significant performance improvements across different shot configurations on both datasets. By effectively bridging the representational power of quantum computing with classical deep learning, CQH-MPN demonstrates robust generalization in data-scarce environments. This work lays the foundation for quantum-augmented few-shot learning in medical imaging and offers a viable solution for real-world, low-resource diagnostic applications.

Details DOI

JBHI Journal 2026 Journal Article

Detecting Driver Sleepiness From Physiological Indicators Using a CNN-LSTM Self-Attention Model

Yingying Jiao
Yifan Zhang
Wei Liu
Zhuqing Jiao

Sleepiness at the wheel is an important factor contributing to road traffic accidents. Based on the characteristic changes in Electroencephalography (EEG) and Electrooculography (EOG) signals, a dozing state is refined into three sub-states: the onset, duration, and end state. Each state is characterized by different physiological indicators such as the EEG alpha waves, the rising edge, and falling edge waveforms in EOG signals. To enable real-time detection of these physiological indicators, we propose a framework integrating three Convolutional Neural Network–Long Short-Term Memory–Self-Attention (CLSA) models, which combine CNN-based local feature extraction with self-attention mechanism for global context capture. The framework is evaluated for performance on continuous test data from 12 subjects. Our results demonstrate that by detecting alpha waves and the rising edge waveform, the alpha wave epoch (AWE) at the onset of the dozing state can be identified with high accuracy and precision. Thus, the onset sub-state is calculated as the period from the start time of the rising edge waveform to the time when the AWE is valid. Subsequently, the duration sub-state corresponds to the sustained presence of alpha waves. Furthermore, the falling edge waveform is detected with high accuracy, enabling the classification of the end state into two distinct phenomena: alpha blocking phenomenon or alpha wave attenuation-disappearance phenomenon, representing the sleepiness level—relaxed wakefulness or sleep onset, respectively. Utilizing three-channel signal processing, this framework provides a promising approach for real-time sleepiness detection in real-world driving scenarios.

Details DOI

JBHI Journal 2026 Journal Article

Digital Twins Framework for Clinical Decision-Centric Co-Management of Patient Monitoring and Environment Management

Wei Liu
Yuanyuan Sun
Jing Wang
Nanchang Yin

The convergence of continuous physiological monitoring and intelligent building systems in smart clinics offers a transformative opportunity for patient-centered care, yet it introduces the challenge of harmonizing clinical fidelity, patient comfort, and operational sustainability. We present DT-ECO, a privacy-preserving digital twins framework that enables decision-centric co-management of multi-modal patient monitoring and clinical environmental systems. DT-ECO constructs a hybrid digital twin that integrates a physics-informed building model with graph-temporal physiological inference and battery electrochemistry, enabling real-time synchronization between patient state, IoT device operation, and environmental dynamics within a differentiable programming environment. On this foundation, a hierarchical control strategy is developed, in which a constrained deep reinforcement learning agent adaptively schedules wearable IoT sensor sampling to extend device lifetime, while a model predictive controller orchestrates HVAC operation and on-site energy resources to maintain a therapeutic environment. Extensive evaluations on DOE reference hospitals and public ECG datasets demonstrate that DT-ECO achieves a 31. 8% reduction in annual energy consumption and extends median wearable battery life by 28%, while rigorously maintaining clinical standards-evidenced by less than 0. 6% thermal comfort violation and no degradation in arrhythmia detection capability (F1-score 0. 956). By bridging the gap between patient physiology and the clinical environment, DT-ECO establishes a pathway toward precision healthcare facilities that are simultaneously patient-centric, diagnostically robust, and operationally sustainable.

Details DOI

AAAI Conference 2026 Conference Paper

Domain-Aware Suppression and Aggregation for Federated DG ReID

Zhixi Yu
Wei Liu
Wenke Huang
Bin Yang
Qian Bie
Guancheng Wan
Xin Xu

Federated domain generalization in person re-identification (FedDG-ReID) aims to learn a privacy-preserving server model from decentralized client source domains that generalizes to unseen domains. Existing approaches enhance the generalizability of the server model by increasing the diversity of client person data. However, these methods overlook that ReID model parameters are easily biased by client-specific data distributions, leading to the capture of excessive domain-specific identity information. Such identity information (e.g., clothing style) struggles with identity information in unseen domains, thereby hindering the generalization ability of the server model. To address this, we propose a novel FedDG-ReID framework, which mainly consists of Domain-aware Parameter Suppression (DPS) and Domain-invariant Weighted Aggregation (DWA), called FedSupWA. Specifically, DPS adaptively attenuates the update magnitude of the parameters based on the fit of the parameters to the client's domain, encouraging the model to focus on more generalized domain-independent identity information, such as pedestrian contours, and other consistent information across domains. DWA enhances the server model’s generalization by evaluating the effectiveness of the client model in maintaining the consistency of pedestrian identities to measure the importance of the learned domain-independent identity information and assigning greater aggregation weights to clients that contribute more generalized information. Extensive experiments demonstrate the effectiveness of FedSupWA, showing that it achieves state-of-the-art performance.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Exploiting All Mamba Fusion for Efficient RGB-D Tracking

Ge Ying
Dawei Zhang
Chengzhuan Yang
Wei Liu
Sang-Woon Jeon
Hua Wang
Changqin Huang
Zhonglong Zheng

Despite the progress made through deep learning, existing Visual Object Tracking (VOT) frameworks struggle with real-world challenges. Recent approaches incorporate additional modalities like Depth, Thermal Infrared, and Language to enhance the robustness of VOT, particularly with the improvement of the depth sensor precision, facilitating RGB-D tracking. However, current RGB-D trackers often copy RGB tracking paradigms, leading to inefficiency due to two-stream architectures that fail to exploit heterogeneous features, and reliance on simplistic or large-parameter fusion methods. To address these challenges, we propose AMTrack, a one-stream RGB-D tracker leveraging Mamba's linear complexity for simultaneous feature extraction and two-stage cross-modal feature fusion. Our innovation also includes a low-parameter Multimodal Mix Mamba (3M) module, which optimizes deep feature fusion and reduces computational overhead. The advantage of the 3M module stems from our Multimodal State Space Model (MSSM), a multimodal feature interaction component reconstructed based on SSM. Experiments across multiple RGB-D tracking datasets indicate that AMTrack achieves superior performance with lower parameters and memory demands compared to state-of-the-arts.

PDF Details DOI

AAAI Conference 2026 Conference Paper

FedARKS: Federated Aggregation via Robust and Discriminative Knowledge Selection and Integration for Person Re-identification

Xin Xu
Binchang Ma
Zhixi Yu
Wei Liu

The application of federated domain generalization in person re-identification (FedDG-ReID) aims to enhance the model's generalization ability in unseen domains while protecting client data privacy. However, existing mainstream methods typically rely on global feature representations and simple averaging operations for model aggregation, leading to two limitations in domain generalization: (1) Using only global features makes it difficult to capture subtle, domain-invariant local details (such as accessories or textures); (2) Uniform parameter averaging treats all clients as equivalent, ignoring their differences in robust feature extraction capabilities, thereby diluting the contributions of high-quality clients. To address these issues, we propose a novel federated learning framework—Federated Aggregation via Robust and Discriminative Knowledge Selection and Integration (FedARKS)—comprising two mechanisms: RK (Robust Knowledge) and KS (Knowledge Selection). In our design, each client employs a dual-branch network of RK: the Global Feature Processing Branch serves as the primary component, extracting overall representations for model aggregation and server-side updates; while the Body Part Processing Branch acts as an auxiliary component, focusing on extracting domain-invariant local details to supplement and guide the local training process during global feature learning. Additionally, our KS mechanism adaptively assigns corresponding aggregation weights to clients based on their ability to extract domain-invariant knowledge, enabling the server to better integrate cross-domain invariant knowledge extracted by clients. Extensive experiments validate that FedARKS achieves state-of-the-art generalization results on the FedDG-ReID benchmark, demonstrating that learning subtle body part features can effectively assist and reinforce global representations, thereby enabling robust cross-domain person ReID capabilities.

PDF Details DOI

AAAI Conference 2026 Conference Paper

ICLR: Inter-Chrominance and Luminance Interaction for Natural Color Restoration in Low-Light Image Enhancement

Xin Xu
Hao Liu
Wei Liu
Wei Wang
Jiayi Wu
Kui Jiang

Low-Light Image Enhancement (LLIE) task aims at improving contrast while restoring details and textures for images captured in low-light conditions. HVI color space has made significant progress in this task by enabling precise decoupling of chrominance and luminance. However, for the interaction of chrominance and luminance branches, substantial distributional differences between the two branches prevalent in natural images limit complementary feature extraction, and luminance errors are propagated to chrominance channels through the nonlinear parameter. Furthermore, for interaction between different chrominance branches, images with large homogeneous-color regions usually exhibit weak correlation between chrominance branches due to concentrated distributions. Traditional pixel-wise losses exploit strong inter-branch correlations for co-optimization, causing gradient conflicts in weakly correlated regions. Therefore, we propose an Inter-Chrominance and Luminance Interaction (ICLR) framework including a Dual-stream Interaction Enhancement Module (DIEM) and a Covariance Correction Loss (CCL). The DIEM improves the extraction of complementary information from two dimensions, fusion and enhancement, respectively. The CCL utilizes luminance residual statistics to penalize chrominance errors and balances gradient conflicts by constraining chrominance branches covariance. Experimental results on multiple datasets show that the proposed ICLR framework outperforms state-of-the-art methods.

PDF Details DOI

EAAI Journal 2026 Journal Article

Improving cell localization with attention-guided diffusion models

Wei Liu
Chao Xu
Ying Yuan
Wenqi Ye
Huan Xiong
Wenqiao Qiu
Lili Guo
Xinda Li

Details DOI

TMLR Journal 2026 Journal Article

LoDAdaC: a unified local training-based decentralized framework with adaptive gradients and compressed communication

Wei Liu
Anweshit Panda
Ujwal Pandey
Haven Cook
George Slota
Naigang Wang
Jie Chen
Yangyang Xu

In the decentralized distributed learning, achieving fast convergence and low communication cost is essential for scalability and high efficiency. Adaptive gradient methods, such as Adam, have demonstrated strong practical performance in deep learning and centralized distributed settings. However, their convergence properties remain largely unexplored in decentralized settings involving multiple local training steps, such as federated learning. To address this limitation, we propose LoDAdaC, a unified multiple \textbf{Lo}cal Training (MLT) \textbf{D}ecentralized framework with \textbf{Ada}m-type updates and \textbf{C}ompressed communication (CC). LoDAdaC accommodates a broad class of optimizers for its local adaptive updates, including AMSGrad, Adam, and AdaGrad; it is compatible with standard (possibly biased) compressors such as low-bit quantization and sparsification. MLT and CC enable LoDAdaC to achieve multiplied reduction of communication cost, while the technique of adaptive updates enables fast convergence. We rigorously prove the combined advantage through complexity analysis. In addition, experiments on image classification and GPT-style language model training validate our theoretical findings and show that LoDAdaC significantly outperforms existing decentralized algorithms in terms of convergence speed and communication efficiency.