Arrow Research search

Author name cluster

Yang Yao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
1 author row

Possible papers

7

AAAI Conference 2026 Conference Paper

CADiff: Context-Aware Diffusion for Controllable Anomaly Generation in Anomaly Detection

  • Xuan Tong
  • Yuxuan Lin
  • Junxiong Lin
  • Xinji Mai
  • Haoran Wang
  • Zeng Tao
  • Yang Yao
  • Ruofan Wang

Generating anomalies is a crucial method to enhance detection and classification performance by expanding anomalous data repository. However, existing anomaly generation methods overlook the intrinsic entanglement between diverse anomaly types and product structures, leading to semantic ambiguity. We propose CADiff, a context-aware generation framework that reframes anomalies as compositional perturbations. Firstly, we propose Context-aware Text Prompt (CTP), a mechanism which contains multiple tokens that characterize anomalies and products separately to enhance the contextual consistency of generated images and refine the local variability of anomalies. Secondly, we develop Self-adaptive Spatial Control (SSC), a self-adaptive interaction design that mitigates anomaly leakage or missing phenomena. Thirdly, we introduce Intensity-controllable Attention Re-weighting (IAR), an inference scheduling scheme with the ability to amplify or attenuate abnormal semantic effects to improve generation diversity. Extensive experiments on MVTec AD and VisA datasets demonstrate the superiority of our proposed method over state-of-the-art methods in both realism and diversity of the generated results, and significantly improve the performance of downstream tasks, including anomaly detection, anomaly localization, and anomaly classification tasks.

AAAI Conference 2026 Conference Paper

The Other Mind: How Language Models Exhibit Human Temporal Cognition

  • Lingyu Li
  • Yang Yao
  • Yixu Wang
  • Chunbo Li
  • Yan Teng
  • Yingchun Wang

As Large Language Models (LLMs) continue to advance, they exhibit certain cognitive patterns similar to those of humans that are not directly specified in training data. This study investigates this phenomenon by focusing on temporal cognition in LLMs. Leveraging the similarity judgment task, we find that larger models spontaneously establish a subjective temporal reference point and adhere to the Weber-Fechner law, whereby the perceived distance logarithmically compresses as years recede from this reference point. To uncover the mechanisms behind this behavior, we conducted multiple analyses across neuronal, representational, and informational levels. We first identify a set of temporal-preferential neurons and find that this group exhibits minimal activation at the subjective reference point and implements a logarithmic coding scheme convergently found in biological systems. Probing representations of years reveals a hierarchical construction process, where years evolve from basic numerical values in shallow layers to abstract temporal orientation in deep layers. Finally, using pre-trained embedding models, we found that the training corpus itself possesses an inherent, non-linear temporal structure, which provides the raw material for the model's internal construction. In discussion, we propose an experientialist perspective for understanding these findings, where the LLMs' cognition is viewed as a subjective construction of the external world by its internal representational system. This nuanced perspective implies the potential emergence of alien cognitive frameworks that humans cannot intuitively predict, pointing toward a direction for AI alignment that focuses on guiding internal constructions.

AAAI Conference 2025 Conference Paper

JAQ: Joint Efficient Architecture Design and Low-Bit Quantization with Hardware-Software Co-Exploration

  • Mingzi Wang
  • Yuan Meng
  • Chen Tang
  • Weixiang Zhang
  • Yijian Qin
  • Yang Yao
  • Yingxin Li
  • Tongtong Feng

The co-design of neural network architectures, quantization precisions, and hardware accelerators offers a promising approach to achieving an optimal balance between performance and efficiency, particularly for model deployment on resource-constrained edge devices. In this work, we propose the JAQ Framework, which jointly optimizes the three critical dimensions. However, effectively automating the design process across the vast search space of those three dimensions poses significant challenges, especially when pursuing extremely low-bit quantization. Specifical, the primary challenges include: (1) Memory overhead in software-side: Low-precision quantization-aware training can lead to significant memory usage due to storing large intermediate features and latent weights for backpropagation, potentially causing memory exhaustion. (2) Search time-consuming in hardware-side: The discrete nature of hardware parameters and the complex interplay between compiler optimizations and individual operators make the accelerator search time-consuming. To address these issues, JAQ mitigates the memory overhead through a channel-wise sparse quantization (CSQ) scheme, selectively applying quantization to the most sensitive components of the model during optimization. Additionally, JAQ designs BatchTile, which employs a hardware generation network to encode all possible tiling modes, thereby speeding up the search for the optimal compiler mapping strategy. Extensive experiments demonstrate the effectiveness of JAQ, achieving approximately 7% higher Top-1 accuracy on ImageNet compared to previous methods and reducing the hardware search time per iteration to 0.15 seconds.

NeurIPS Conference 2025 Conference Paper

SafeVid: Toward Safety Aligned Video Large Multimodal Models

  • Yixu Wang
  • Jiaxin Song
  • Yifeng Gao
  • Xin Wang
  • Yang Yao
  • Yan Teng
  • Xingjun Ma
  • Yingchun Wang

As Video Large Multimodal Models (VLMMs) rapidly advance, their inherent complexity introduces significant safety challenges, particularly the issue of mismatched generalization where static safety alignments fail to transfer to dynamic video contexts. We introduce SafeVid, a framework designed to instill video-specific safety principles in VLMMs. SafeVid uniquely transfers robust textual safety alignment capabilities to the video domain by employing detailed textual video descriptions as an interpretive bridge, facilitating LLM-based rule-driven safety reasoning. This is achieved through a closed-loop system comprising: 1) generation of SafeVid-350K, a novel 350, 000-pair video-specific safety preference dataset; 2) targeted alignment of VLMMs using Direct Preference Optimization (DPO); and 3) comprehensive evaluation via our new SafeVidBench benchmark. Alignment with SafeVid-350K significantly enhances VLMM safety, with models like LLaVA-NeXT-Video demonstrating substantial improvements (e. g. , up to 42. 39%) on SafeVidBench. SafeVid provides critical resources and a structured approach, demonstrating that leveraging textual descriptions as a conduit for safety reasoning markedly improves the safety alignment of VLMMs in complex multimodal scenarios.

AAAI Conference 2024 Conference Paper

Data-Augmented Curriculum Graph Neural Architecture Search under Distribution Shifts

  • Yang Yao
  • Xin Wang
  • Yijian Qin
  • Ziwei Zhang
  • Wenwu Zhu
  • Hong Mei

Graph neural architecture search (NAS) has achieved great success in designing architectures for graph data processing.However, distribution shifts pose great challenges for graph NAS, since the optimal searched architectures for the training graph data may fail to generalize to the unseen test graph data. The sole prior work tackles this problem by customizing architectures for each graph instance through learning graph structural information, but failed to consider data augmentation during training, which has been proven by existing works to be able to improve generalization.In this paper, we propose Data-augmented Curriculum Graph Neural Architecture Search (DCGAS), which learns an architecture customizer with good generalizability to data under distribution shifts. Specifically, we design an embedding-guided data generator, which can generate sufficient graphs for training to help the model better capture graph structural information. In addition, we design a two-factor uncertainty-based curriculum weighting strategy, which can evaluate the importance of data in enabling the model to learn key information in real-world distribution and reweight them during training. Experimental results on synthetic datasets and real datasets with distribution shifts demonstrate that our proposed method learns generalizable mappings and outperforms existing methods.

JBHI Journal 2020 Journal Article

Mitigation of Instrument-Dependent Variability in Ballistocardiogram Morphology: Case Study on Force Plate and Customized Weighing Scale

  • Yang Yao
  • Zahra Ghasemi
  • Md. Mobashir Hasan Shandhi
  • Hazar Ashouri
  • Lisheng Xu
  • Ramakrishna Mukkamala
  • Omer T. Inan
  • Jin-Oh Hahn

The objective of this study was to investigate the measurement instrument-dependent variability in the morphology of the ballistocardiogram (BCG) waveform in human subjects and computational methods to mitigate the variability. The BCG was measured in 22 young healthy subjects using a high-performance force plate and a customized commercial weighing scale under upright standing posture. The timing and amplitude features associated with the major I, J, K waves in the BCG waveforms were extracted and quantitatively analyzed. The results indicated that 1) the I, J, K waves associated with the weighing scale BCG exhibited delay in the timings within the cardiac cycle relative to the ECG R wave as well as attenuation in the absolute amplitudes than the respective force plate counterparts, whereas 2) the time intervals between the I, J, K waves were comparable. Then, two alternative computational methods were conceived in an attempt to mitigate the discrepancy between force plate versus weighing-scale BCG: a transfer function and an amplitude-phase correction. The results suggested that both methods effectively mitigated the discrepancy in the timings and amplitudes associated with the I, J, K waves between the force plate and weighing-scale BCG. Hence, signal processing may serve as a viable solution to the mitigation of the instrument-induced morphological variability in the BCG, thereby facilitating the standardized analysis and interpretation of the timing and amplitude features in the BCG across wide-ranging measurement platforms.

JBHI Journal 2017 Journal Article

Validation of an Adaptive Transfer Function Method to Estimate the Aortic Pressure Waveform

  • Yang Yao
  • Lisheng Xu
  • Yingxian Sun
  • Qiang Fu
  • Shuran Zhou
  • Dianning He
  • Yahui Zhang
  • Liang Guo

Aortic pulse wave reflects cardiovascular status, but, unlike the peripheral pulse wave, is difficult to be measured reliably using noninvasive techniques. Thus, the estimation of aortic pulse wave from peripheral ones is of great significance. This study proposed an adaptive transfer function (ATF) method to estimate the aortic pulse wave from the brachial pulse wave. Aortic and brachial pulse waves were derived from 26 patients who underwent cardiac catheterization. Generalized transfer functions (GTF) were derived based on the autoregressive exogenous model. Then, the GTF was adapted by its peak resonance frequency. And the optional peak resonance frequency for an individual was determined by regression formulas using brachial systolic blood pressure. The method was validated using the leave-one-out cross validation method. Compared with previous studies, the ATF method showed better performance in estimating the aortic pulse wave and predicting the feature parameters. The prediction error of the aortic systolic blood pressure and pulse pressure were 0. 2 ± 3. 1 and -0. 9 ± 3. 1 mmHg, respectively. The percentage errors of augmentation index, percentage notch amplitude, and ejection duration were -2. 1 ± 32. 7%, 12. 4 ± 9. 2%, and -2. 4 ± 3. 3%, respectively.