Author name cluster

Yang Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

160 papers

2 author rows

AAAI Conference 2026 Conference Paper

An LLM-based Quantitative Framework for Evaluating High-Stealthy Backdoor Risks in OSS Supply Chains

Zihe Yan
Kai Luo
Haoyu Yang
Yang Yu
Zhuosheng Zhang
Guancheng Li

In modern software development workflows, the open-source software supply chain significantly contributes to efficient and convenient engineering practices. With increasing system complexity, it has become a common practice to use open-source software as third-party dependencies. However, due to the lack of maintenance for underlying dependencies and insufficient community auditing, ensuring the security of source code and the legitimacy of repository maintainers has become a challenge, particularly in the context of high-stealth backdoor attacks such as the XZ-Util incident. To address these problems, we propose a fine-grained project evaluation framework for backdoor risk assessment in open-source software. Our evaluation framework models highly stealthy backdoor attacks from the attacker’s perspective and defines targeted metrics for each attack stage. Moreover, to overcome the limitations of static analysis in assessing the reliability of repository maintenance activities, such as irregular committer privilege escalation and insufficient review participation, we employ large language models (LLMs) to perform semantic evaluation of code repositories while avoiding reliance on manually crafted patterns. The effectiveness of our framework is validated on 66 high-priority packages in the Debian ecosystem, and the experimental results reveal that the current open-source software supply chain is exposed to a series of security risks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Exploring Reliable Spatiotemporal Dependencies for Efficient Visual Tracking

Junze Shi
Yang Yu
Jian Shi
Haibo Luo

Recent advances in transformer-based lightweight object tracking have established new standards across benchmarks, leveraging the global receptive field and powerful feature extraction capabilities of attention mechanisms. Despite these achievements, existing methods universally employ sparse sampling during training—utilizing only one template and one search image per sequence—which fails to comprehensively explore spatiotemporal information in videos. This limitation constrains performance and causes the gap between lightweight and high-performance trackers. To bridge this divide while maintaining real-time efficiency, we propose STDTrack, a framework that pioneers the integration of reliable spatiotemporal dependencies into lightweight trackers. Our approach implements dense video sampling to maximize spatiotemporal information utilization. We introduce a temporally propagating spatiotemporal token to guide per-frame feature extraction. To ensure comprehensive target state representation, we design the Multi-frame Information Fusion Module (MFIFM), which augments current dependencies using historical context. The MFIFM operates on features stored in our constructed Spatiotemporal Token Maintainer (STM), where a quality-based update mechanism ensures information reliability. Considering the scale variation among tracking targets, we develop a multi-scale prediction head to dynamically adapt to objects of different sizes. Extensive experiments demonstrate state-of-the-art results across six benchmarks. Notably, on GOT-10k, STDTrack rivals certain high-performance non-real-time trackers (e.g., MixFormer) while operating at 192 FPS (GPU) and 41 FPS (CPU).

PDF Details DOI

AAAI Conference 2026 Conference Paper

HEV Generative Sandbox: A Framework for Assessing Domain-Specific Social Risks Through Human-LLM Simulation

Yiran Liu
Zhiyi Hou
Xiaoang Xu
Shuo Wang
Huijia Wu
Kaicheng Yu
Yang Yu
ChengXiang Zhai

Deploying Large Language Models (LLMs) in specialized domains introduces significant societal and compliance risks, including bias amplification, misinformation propagation, and privacy violations. These risks predominantly emerge from the dynamic interactions between LLMs and humans in specific contexts. Different domains face unique distribution of hazards, and varying interaction modalities introduce distinct levels of exposure and vulnerability. However, current risk assessment frameworks lack a systematic methodology to capture this dynamic interplay. In this work, we introduce the HEV Generative Sandbox, a novel risk evaluation framework that simulates human-LLM behavior to quantify domain-contextual risks across three interdependent dimensions: 1) Hazard (H): Domain-specific threats inherent to a given context; 2) Exposure (E): The extent to which the LLM and its users are subjected to hazardous scenarios; 3) Vulnerability (V): The susceptibility of the system to risk due to human interaction or model weaknesses. Our approach pioneers "domain-rooted scenario generation", wherein we sample contextual distributions from domain-specific corpora and simulate diverse inputs. By unifying dynamic scenario simulation, causal risk decomposition, and closed-loop evaluation, the HEV Generative Sandbox provides a scalable, domain-sensitive methodology for responsible LLM deployment. This work contributes to advancing the safe deployment of LLMs by providing a comprehensive and automated risk evaluation framework.

PDF Details DOI

EAAI Journal 2026 Journal Article

Lightweight Kolmogorov-Arnold Network with dual-objective optimization for axial capacity prediction of square coal gangue concrete-filled steel tube stub columns based on finite element simulation

Xiangyu Kong
Yaowei Fan
Jinlong Liu
Meng Xi
Yuzhuo Zhang
Yang Yu

Details DOI

AAAI Conference 2026 Conference Paper

Multi-agent In-context Coordination via Decentralized Memory Retrieval

Tao Jiang
Zichuan Lin
Lihe Li
Yi-Chen Li
Cong Guan
Lei Yuan
Zongzhang Zhang
Yang Yu

Large transformer models, trained on diverse datasets, have demonstrated impressive few-shot performance on previously unseen tasks without requiring parameter updates. This capability has also been explored in Reinforcement Learning (RL), where agents interact with the environment to retrieve context and maximize cumulative rewards, showcasing strong adaptability in complex settings. However, in cooperative Multi-Agent Reinforcement Learning (MARL), where agents must coordinate toward a shared goal, decentralized policy deployment can lead to mismatches in task alignment and reward assignment, limiting the efficiency of policy adaptation. To address this challenge, we introduce Multi-agent In-context Coordination via Decentralized Memory Retrieval (MAICC), a novel approach designed to enhance coordination by fast adaptation. Our method involves training a centralized embedding model to capture fine-grained trajectory representations, followed by decentralized models that approximate the centralized one to obtain team-level task information. Based on the learned embeddings, relevant trajectories are retrieved as context, which, combined with the agents' current sub-trajectories, inform decision-making. During decentralized execution, we introduce a novel memory mechanism that effectively balances test-time online data with offline memory. Based on the constructed memory, we propose a hybrid utility score that incorporates both individual- and team-level returns, ensuring credit assignment across agents. Extensive experiments on cooperative MARL benchmarks, including Level-Based Foraging (LBF) and SMAC (v1/v2), show that MAICC enables faster adaptation to unseen tasks compared to existing methods.

PDF Details DOI

JBHI Journal 2026 Journal Article

Point-Supervised Coronary Semantic Segmentation in X-Ray Angiographic Images

Ying Chen
Danni Ai
Jianyu Du
Yuanyuan Wang
Tianyu Fu
Deqiang Xiao
Yucong Lin
Long Shao

Coronary semantic segmentation in X-ray angiography is essential for computer-aided diagnosis and treatment planning of coronary artery disease (CAD). Despite its importance, this task remains highly challenging due to the complex and interconnected vascular topology, as well as the similar visual characteristics among different branches, making dense pixel-level manual annotation difficult and labor-intensive. To alleviate this burden, we propose a point-supervised coronary semantic segmentation framework that significantly reduces annotation effort without compromising segmentation accuracy. The primary challenge of point label based supervision lies in the model's tendency to overfit sparse point labels, leading to limited generalization to pixel-level predictions. To enrich the supervision signals and stabilize the training process with the sparse point labels, we propose an adaptive foreground mask generation module and a region regularization strategy to ensure accurate semantic guidance while maximizing meaningful coverage of the vascular structures. To enhance coronary topology perception and branch differentiation, we propose a multi-task learning framework that jointly performs keypoint detection and coronary semantic segmentation through a shared feature extraction encoder and two task-specific decoders. The experimental results demonstrate that our point-supervised model achieves performance comparable to fully supervised model, and outperforms the existing state-of-the-art point-supervised semantic segmentation methods.

Details DOI

AAAI Conference 2026 Conference Paper

Reward Model Evaluation via Automatically-Ranked Policy Alignment

Aoran Wang
Lei Ou
Yang Yu
Zongzhang Zhang

Evaluating reward models is a fundamental challenge in Reinforcement Learning (RL), particularly in settings where the reward model is learned or manually designed. The standard paradigm for Reward Model Evaluation (RME) involves training an optimal policy via RL on the given reward model and assessing model quality through the performance of the resulting policy. However, this approach conflates the quality of the reward model with the effectiveness of RL training, and is computationally expensive due to the need for policy optimization. Recent RME methods attempt to circumvent this issue by evaluating reward models directly, without RL, but often rely on impractical assumptions such as access to a ground-truth reward or fail to utilize available supervision in a fine-grained manner. To overcome these limitations, we propose the Policy Preference Alignment Coefficient (PPAC), a novel metric for RME that requires neither RL training nor ground-truth rewards. PPAC first generates a sequence of automatically ranked policy preferences that guarantee monotonic improvement in the policy value, and then quantifies the alignment between these generated preferences and those implied by the candidate reward model. Experimental results across gridworld and continuous control task demonstrate that PPAC yields preference sequences with consistently increasing policy values and outperforms existing metrics in evaluating reward model quality.

PDF Details DOI

JBHI Journal 2026 Journal Article

RIHA: Report-Image Hierarchical Alignment for Radiology Report Generation

Yucheng Chen
Yang Yu
Yufei Shi
Conghao Xiong
Xulei Yang
Si Yong Yeo

Radiology report generation (RRG) has emerged as a promising approach to alleviate radiologists' workload and reduce human errors by automatically generating diagnostic reports from medical images. A key challenge in RRG is achieving fine-grained alignment between complex visual features and the hierarchical structure of long-form radiology reports. Although recent methods have improved image-text representation learning, they often treat reports as flat sequences, overlooking their structured sections and semantic hierarchies. This simplification hinders precise cross-modal alignment and weakens RRG accuracy. To address this challenge, we propose RIHA (Report-Image Hierarchical Alignment Transformer), a novel end-to-end framework that performs multi-level alignment between radiological images and their corresponding reports across paragraph, sentence, and word levels. This hierarchical alignment enables more precise cross-modal mapping, essential for capturing the nuanced semantics embedded in clinical narratives. Specifically, RIHA introduces a Visual Feature Pyramid (VFP) to extract multi-scale visual features and a Text Feature Pyramid (TFP) to represent multi-granularity textual structures. These components are integrated through a Cross-modal Hierarchical Alignment (CHA) module, leveraging optimal transport to effectively align visual and textual features across various levels. Furthermore, we incorporate Relative Positional Encoding (RPE) into the decoder to model spatial and semantic relationships among tokens, enhancing the token-level alignment between visual features and generated text. Extensive experiments on two benchmark chest X-ray datasets, IU-Xray and MIMIC-CXR, demonstrate that RIHA outperforms existing state-of-the-art models in both natural language generation and clinical efficacy metrics.

Details DOI

JBHI Journal 2026 Journal Article

Simultaneous Decoding of Wrist Angles and Grasp Forces Based on Channel-Wise Cumulative Spike Trains

Yang Yu
Yang Xu
Jiamin Zhao
Dongxuan Li
Weichao Guo
Xinjun Sheng
Xiangyang Zhu

Understanding the underlying mechanism of neuromuscular system on motion/force generation is essential for human-machine interfacing. However, simultaneous decoding of wrist angles and grasp forces from neural signals remains an open challenge in the field of neural interfacing. In this study, we proposed a scheme leveraging channel-wise cumulative spike trains (cw-CSTs) of motor units to simultaneously decode wrist angles and grasp forces. Specifically, a spatial spike detection method was utilized to detect cw-CST from surface electromyography, observing as much as possible of motor unit activities. Accordingly, we extracted three neural features to drive the decoders, including a twitch force model-based (cw-MUdrive) and a discharge rate-based (DR-cwCST) neural features derived from cw-CSTs, and DR of motor units (DR-MUST) decomposed by a conventional blind source separation algorithm. Wrist- and hand-specific decoders were built to estimate wrist angles and grasp forces via Gaussian process regression. Experiments were conducted with ten subjects, in which they activated wrist motions and grasp forces concurrently. We evaluated the performance with both accuracy and output stability. Results demonstrated that the cwCST-based neural features outperformed the conventional DR-MUST features with both higher accuracy and stability metrics. Additionally, cw-MUdrive performed better than DR-cwCST in grasp force estimation and comparable to DR-cwCST in wrist angle estimation. The outcome provides an effective solution for simultaneously decoding wrist movements and hand grasp forces, promoting the development of natural control in neural interface.

Details DOI

AAAI Conference 2026 Conference Paper

SurgPub-Video: A Comprehensive Surgical Video Framework for Enhanced Surgical Intelligence in Vision-Language Model

Yaoqian Li
Xikai Yang
Dunyuan Xu
Yang Yu
Litao Zhao
Xiaowei Hu
Jinpeng Li
Pheng-Ann Heng

Vision-Language Models (VLMs) have shown significant potential in surgical scene analysis, yet existing models are limited by frame-level datasets and lack high-quality video data with procedural surgical knowledge. To address these challenges, we make the following contributions: (i) SurgPub-Video, a comprehensive dataset of over 3,000 surgical videos and 25 million annotated frames across 11 specialities, sourced from peer-reviewed clinical journals, (ii) SurgLLaVA-Video, a specialized VLM for surgical video understanding, built upon the TinyLLaVA-Video architecture that supports both video-level and frame-level inputs, and (iii) a video-level surgical Visual Question Answering (VQA) benchmark, covering diverse 11 surgical specialities, such as vascular, cardiology, and thoracic. Extensive experiments, conducted on the proposed benchmark and three additional surgical downstream tasks (action recognition, skill assessment, and triplet recognition), show that SurgLLaVA-Video significantly outperforms both general-purpose and surgical-specific VLMs with only three billion parameters.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Adaptable Safe Policy Learning from Multi-task Data with Constraint Prioritized Decision Transformer

Ruiqi Xue
Ziqian Zhang
Lihe Li
Cong Guan
Lei Yuan
Yang Yu

Learning safe reinforcement learning (RL) policies from offline multi-task datasets without direct environmental interaction is crucial for efficient and reliable deployment of RL agents. Benefiting from their scalability and strong in-context learning capabilities, recent approaches attempt to utilize Decision Transformer (DT) architectures for offline safe RL, demonstrating promising adaptability across varying safety budgets. However, these methods primarily focus on single-constraint scenarios and struggle with diverse constraint configurations across multiple tasks. Additionally, their reliance on heuristically defined Return-To-Go (RTG) inputs limits flexibility and reduces learning efficiency, particularly in complex multi-task environments. To address these limitations, we propose CoPDT, a novel DT-based framework designed to enhance adaptability to diverse constraints and varying safety budgets. Specifically, CoPDT introduces a constraint prioritized prompt encoder, which leverages sparse binary cost signals to accurately identify constraints, and a constraint prioritized Return-To-Go (CPRTG) token mechanism, which dynamically generates RTGs based on identified constraints and corresponding safety budgets. Extensive experiments on the OSRL benchmark demonstrate that CoPDT achieves superior efficiency and significantly enhanced safety compliance across diverse multi-task scenarios, surpassing state-of-the-art DT-based methods by satisfying safety constraints in more than twice as many tasks.