Arrow Research search

Author name cluster

Xiaoling Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
1 author row

Possible papers

12

AAAI Conference 2026 Conference Paper

EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering

  • Yanjun Li
  • Yuqian Fu
  • Tianwen Qian
  • Qi'Ao Xu
  • Silong Dai
  • Danda Pani Paudel
  • Luc Van Gool
  • Xiaoling Wang

Recent advances in Multimodal Large Language Models (MLLMs) have significantly pushed the frontier of egocentric video question answering (EgocentricQA). However, existing benchmarks and studies are mainly limited to common daily activities such as cooking and cleaning. In contrast, real-world deployment inevitably encounters domain shifts, where target domains differ substantially in both visual style and semantic content. To bridge this gap, we introduce EgoCross, a comprehensive benchmark designed to evaluate the cross-domain generalization of MLLMs in EgocentricQA. EgoCross covers four diverse and challenging domains, including surgery, industry, extreme sports, and animal perspective, representing realistic and high-impact application scenarios. It comprises approximately 1,000 QA pairs across 798 video clips, spanning four key QA tasks: prediction, recognition, localization, and counting. Each QA pair provides both OpenQA and CloseQA formats to support fine-grained evaluation. Extensive experiments show that most existing MLLMs, whether general-purpose or egocentric-specialized, struggle to generalize to domains beyond daily life, highlighting the limitations of current models. Furthermore, we conduct several pilot studies, e.g., fine-tuning and reinforcement learning, to explore potential improvements. We hope EgoCross and our accompanying analysis will serve as a foundation for advancing domain-adaptive, robust egocentric video understanding.

AAAI Conference 2025 Conference Paper

Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation

  • Shunfan Zheng
  • Xiechi Zhang
  • Gerard de Melo
  • Xiaoling Wang
  • Linlin Wang

In the rapidly evolving landscape of large language models (LLMs) for medical applications, ensuring the reliability and accuracy of these models in clinical settings is paramount. Existing benchmarks often focus on fixed-format tasks like multiple-choice QA, which fail to capture the complexity of real-world clinical diagnostics. Moreover, traditional evaluation metrics and LLM-based evaluators struggle with misalignment, often providing oversimplified assessments that do not adequately reflect human judgment. To address these challenges, we introduce HDCEval, a Hierarchical Divide-and-Conquer Evaluation framework tailored for fine-grained alignment in medical evaluation. HDCEval is built on a set of fine-grained medical evaluation guidelines developed in collaboration with professional doctors, encompassing Patient Question Relevance, Medical Knowledge Correctness, and Expression. The framework decomposes complex evaluation tasks into specialized subtasks, each evaluated by expert models trained through Attribute-Driven Token Optimization (ADTO) on a meticulously curated preference dataset. This hierarchical approach ensures that each aspect of the evaluation is handled with expert precision, leading to a significant improvement in alignment with human evaluators.

AAAI Conference 2025 Conference Paper

NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning

  • Xin Yi
  • Shunfan Zheng
  • Linlin Wang
  • Gerard de Melo
  • Xiaoling Wang
  • Liang He

The emergence of fine-tuning-as-a-service has revealed a new vulnerability in large language models (LLMs). A mere handful of malicious data uploaded by users can subtly manipulate the fine-tuning process, leading to a compromised alignment state. Existing methods to counteract fine-tuning attacks typically require substantial computational resources. Even with parameter-efficient techniques like LoRA, gradient updates remain essential. To address these challenges, we propose Neuron-Level Safety Realignment (NLSR), a training-free framework that restores the safety of LLMs based on the similarity difference of safety-critical neurons before and after fine-tuning. The core of our framework is first to construct a safety reference model from an initially aligned model to amplify safety-related features in neurons. We then utilize this reference model to identify safety-critical neurons, which we prepare as patches. Finally, we selectively restore only those neurons that exhibit significant similarity differences by transplanting these prepared patches, thereby minimally altering the fine-tuned model. Extensive experiments demonstrate significant safety enhancements in fine-tuned models across multiple downstream tasks, while greatly maintaining task-level accuracy. Our findings indicate that safety-critical neurons exhibit significant regional variations after fine-tuning, which can be effectively corrected through neuron transplantation from the reference model without the need for additional training.

JBHI Journal 2025 Journal Article

SDPR: Prescription Recommendation With Syndrome Differentiation in Traditional Chinese Medicine

  • Wenjing Yue
  • Wendi Ji
  • Xinyu Wang
  • Xin Ma
  • Pengfei Wang
  • Xiaoling Wang

Prescription recommendation is critical for clinical decision support in Traditional Chinese Medicine (TCM), aiming to recommend a herb set based on a patient's symptoms. The core principle of TCM clinical practice, treatment based on syndrome differentiation (SD), follows a four-step progressive process: symptoms to syndromes, therapeutic methods, and herbs. However, existing models oversimplify this process by overlooking therapeutic methods, directly mapping symptoms to herbs or syndromes to herbs, resulting in information loss and reducing the effectiveness of recommended prescriptions. Furthermore, the implicit, sparse, and many-to-many relationships between syndromes and therapeutic methods, coupled with the nonlinear interactions between therapeutic methods and herbs, further hinder the modeling of the complete SD process. To address these challenges, we propose a novel four-partite graph paradigm that explicitly models the four key components of SD and their interactions, preserving critical information at each step and aligning more closely with clinicians' decision-making logic. Building on this, we develop SDPR, an SD-based prescription recommendation model comprising four modules aligned with all SD steps. Then, we integrated them into a multi-task learning framework to fully capture the progressive prescription process. To handle the implicit and complex relationships among syndromes, therapeutic methods, and herbs, we introduce a syndrome-induced pre-training strategy and a therapeutic method-aware contrastive learning framework. Extensive experiments on public and real-world datasets validate SDPR's effectiveness in herb recommendation and prescription retrieval, confirming the strength of the four-partite graph paradigm. Our broader goal is to advance the intelligent development of TCM in healthcare.

NeurIPS Conference 2024 Conference Paper

Typicalness-Aware Learning for Failure Detection

  • Yijun Liu
  • Jiequan Cui
  • Zhuotao Tian
  • Senqiao Yang
  • Qingdong He
  • Xiaoling Wang
  • Jingyong Su

Deep neural networks (DNNs) often suffer from the overconfidence issue, where incorrect predictions are made with high confidence scores, hindering the applications in critical systems. In this paper, we propose a novel approach called Typicalness-Aware Learning (TAL) to address this issue and improve failure detection performance. We observe that, with the cross-entropy loss, model predictions are optimized to align with the corresponding labels via increasing logit magnitude or refining logit direction. However, regarding atypical samples, the image content and their labels may exhibit disparities. This discrepancy can lead to overfitting on atypical samples, ultimately resulting in the overconfidence issue that we aim to address. To address this issue, we have devised a metric that quantifies the typicalness of each sample, enabling the dynamic adjustment of the logit magnitude during the training process. By allowing relatively atypical samples to be adequately fitted while preserving reliable logit direction, the problem of overconfidence can be mitigated. TAL has been extensively evaluated on benchmark datasets, and the results demonstrate its superiority over existing failure detection methods. Specifically, TAL achieves a more than 5\% improvement on CIFAR100 in terms of the Area Under the Risk-Coverage Curve (AURC) compared to the state-of-the-art. Code is available at https: //github. com/liuyijungoon/TAL.

AAAI Conference 2021 Conference Paper

Dual Sparse Attention Network For Session-based Recommendation

  • Jiahao Yuan
  • Zihan Song
  • Mingyou Sun
  • Xiaoling Wang
  • Wayne Xin Zhao

Session-based Recommendations recommend the next possible item for the user with anonymous sessions, whose challenge is that the user’s behavioral preference can only be analyzed in a limited sequence to meet their need. Recent advances evaluate the effectiveness of the attention mechanism in the session-based recommendation. However, two simplifying assumptions are made by most of these attentionbased models. One is to regard the last-click as the query vector to denote the user’s current preference, and the other is to consider that all items within the session are favorable for the final result, including the effect of unrelated items (i. e. , spurious user behaviors). In this paper, we propose a novel Dual Sparse Attention Network for the sessionbased recommendation called DSAN to address these shortcomings. In this proposed method, we explore a learned target item embedding to model the user’s current preference and apply an adaptively sparse transformation function to eliminate the effect of the unrelated items. Experimental results on two real public datasets show that the proposed method is superior to the state-of-the-art sessionbased recommendation algorithm in all tests and also demonstrate that not all actions within the session are useful. To make our results reproducible, we have published our code on https: //github. com/SamHaoYuan/DSANForAAAI2021.

AAAI Conference 2018 Conference Paper

Personalized Time-Aware Tag Recommendation

  • Keqiang Wang
  • Yuanyuan Jin
  • Haofen Wang
  • Hongwei Peng
  • Xiaoling Wang

Personalized tag recommender systems suggest a list of tags to a user when he or she wants to annotate an item. They utilize users’ preferences and the features of items. Tensor factorization techniques have been widely used in tag recommendation. Given the user-item pair, although the classic PITF (Pairwise Interaction Tensor Factorization) explicitly models the pairwise interactions among users, items and tags, it overlooks users’ short-term interests and suffers from data sparsity. On the other hand, given the user-item-time triple, time-aware approaches like BLL (Base-Level Learning) utilize the time effect to capture the temporal dynamics and the most popular tags on items to handle cold start situation of new users. However, it works only on individual level and the target resource level, which can not find users’ potential interests. In this paper, we propose an unified tag recommendation approach by considering both time awareness and personalization aspects, which extends PITF by adding weights to user-tag interaction and item-tag interaction respectively. Compared to PITF, our proposed model can depict temporal factor by temporal weights and relieve data sparsity problem by referencing the most popular tags on items. Further, our model brings collaborative filtering (CF) to timeaware models, which can mine information from global data and help improving the ability of recommending new tags. Different from the power-form functions used in the existing time-aware recommendation models, we use the Hawkes process with the exponential intensity function to improve the model’s efficiency. The experimental results show that our proposed model outperforms the state of the art tag recommendation methods in accuracy and has better ability to recommend new tags.

AAAI Conference 2017 Conference Paper

Additional Multi-Touch Attribution for Online Advertising

  • Wendi Ji
  • Xiaoling Wang

Multi-Touch Attribution studies the effects of various types of online advertisements on purchase conversions. It is a very important problem in computational advertising, as it allows marketers to assign credits for conversions to different advertising channels and optimize advertising campaigns. In this paper, we propose an additional multi-touch attribution model (AMTA) based on two obvious assumptions: (1) the effect of an ad exposure is fading with time and (2) the effects of ad exposures on the browsing path of a user are additive. AMTA borrows the techniques from survival analysis and uses the hazard rate to measure the influence of an ad exposure. In addition, we both take the conversion time and the intrinsic conversion rate of users into consideration to generate the probability of a conversion. Experimental results on a large real-world advertising dataset illustrate that the our proposed method is superior to state-of-the-art techniques in conversion rate prediction and the credit allocation based on AMTA is reasonable.

IJCAI Conference 2016 Conference Paper

Bayesian Probabilistic Multi-Topic Matrix Factorization for Rating Prediction

  • Keqiang Wang
  • Wayne Xin Zhao
  • Hongwei Peng
  • Xiaoling Wang

Recently, Local Matrix Factorization (LMF) has been shown to be more effective than traditional matrix factorization for rating prediction. The core idea for LMF is to first partition the original matrix into several smaller submatrices, further exploit local structures of submatrices for better low-rank approximation. Various clustering-based methods with heuristic extensions have been proposed for LMF in the literature. To develop a more principled solution for LMF, this paper presents a Bayesian Probabilistic Multi-Topic Matrix Factorization model. We treat the set of the rated items by a useras a document, and employ latent topic models to cluster items as topics. Subsequently, a user has a distribution over the set of topics. We further set topic-specific latent vectors for both users and items. The final prediction is obtained by an ensemble of the results from the corresponding topic-specific latent vectorsin each topic. Using a multi-topic latent representation, our model is more powerful to reflect the complex characteristics for users and items in rating prediction, and enhance the model interpretability. Extensive experiments on large real-world datasets demonstrate the effectiveness of the proposed model.