Arrow Research search

Author name cluster

Xinghua Qu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

AAAI Conference 2026 Conference Paper

Diagnostic-Guided Dynamic Profile Optimization for LLM-based User Simulators in Sequential Recommendation

  • Hongyang Liu
  • Zhu Sun
  • Tianjun Wei
  • Yan Wang
  • Jiajie Zhu
  • Xinghua Qu

Recent advances in large language models (LLMs) have enabled realistic user simulators for developing and evaluating recommender systems (RSs). However, existing LLM-based simulators for RSs face two major limitations: (1) static and single-step prompt-based inference that leads to inaccurate and incomplete user profile construction; (2) unrealistic and single-round recommendation-feedback interaction pattern that fails to capture real-world scenarios. To address these limitations, we propose DGDPO (Diagnostic-Guided Dynamic Profile Optimization), a novel framework that constructs user profile through a dynamic and iterative optimization process to enhance the simulation fidelity. Specifically, DGDPO incorporates two core modules within each optimization loop: firstly, a specialized LLM-based diagnostic module, calibrated through our novel training strategy, accurately identifies specific defects in the user profile. Subsequently, a generalized LLM-based treatment module analyzes the diagnosed defect and generates targeted suggestions to refine the profile. Furthermore, unlike existing LLM-based user simulators that are limited to single-round interactions, we are the first to integrate DGDPO with sequential recommenders, enabling a bidirectional evolution where user profiles and recommendation strategies adapt to each other over multi-round interactions. Extensive experiments conducted on three real-world datasets demonstrate the effectiveness of our proposed framework.

ICML Conference 2025 Conference Paper

Improving Zero-Shot Adversarial Robustness in Vision-Language Models by Closed-form Alignment of Adversarial Path Simplices

  • Junhao Dong 0001
  • Piotr Koniusz
  • Yifei Zhang
  • Hao Zhu 0010
  • Weiming Liu 0005
  • Xinghua Qu
  • Yew-Soon Ong

Vision-Language Models (VLMs) such as CLIP excel at zero-shot classification due to large-scale pre-training but are vulnerable to adversarial examples. Adversarial fine-tuning robustifies zero-shot models by aligning prediction scores of individual adversaries with their clean counterparts, which typically overlooks intermediate adversarial samples along the adversarial trajectory crossing the decision boundary. Such intermediate adversaries and their vicinity produce informative representations capturing the decision boundary in detail. They can be improved by sampling adversarial candidates from simplices formed by joining two consecutive vertices on the adversarial trajectory and their clean counterpart. However, sampling simplices for adversaries is very costly. To train robust VLM, we overcome these limitations by Taylor expansion and formulating an upper-bound of alignment loss that depends on the Jacobian/Hessian obtained at clean samples. As regions between clean and intermediate adversarial samples capture a larger decision landscape, we robustify VLM by plausible adversaries from simplices by our closed-form formulation equivalent to infinite uniform sampling of the simplex. We obtain state-of-the-art robustness across 15 datasets and diverse vision-language tasks.

AAAI Conference 2025 Conference Paper

LLM4RSR: Large Language Models as Data Correctors for Robust Sequential Recommendation

  • Yatong Sun
  • Xiaochun Yang
  • Zhu Sun
  • Yan Wang
  • Bin Wang
  • Xinghua Qu

Sequential Recommenders (SRs) are trained to predict the next item as the target given its preceding items as the input, assuming every input-target pair is matched and is reliable for training. However, users can be induced by external distractions to click on items inconsistent with their true preferences, resulting in unreliable training instances with mismatched input-target pairs. To resist unreliable data, researchers attempt to develop Robust SRs (RSRs). However, our data analysis unveils that existing RSRs are data-driven. That is, for most instances formed by infrequently co-occurred items, existing RSRs are uncertain about their reliability. To fill this gap, we propose a generic framework -- LLM4RSR (Large Language Models for Robust Sequential Recommendation) to semantically complement data-driven RSRs by correcting uncertain instances into reliable ones based on LLMs' semantic comprehension of items beyond co-occurrence. In this way, RSRs can be re-trained with the corrected data for better accuracy. This is a selective knowledge distillation procedure, where the LLM acts as a teacher guiding student RSRs via uncertain instances. To align LLMs with the data correction task and mitigate inherent hallucinations, we equip the LLM with profile, plan, and memory modules, which are automatically optimized via textual gradient descent, eliminating the need for human effort and expertise. Experiments on four real-world datasets spanning eight backbones verify the generality, effectiveness, and efficiency of LLM4RSR.

NeurIPS Conference 2025 Conference Paper

Machine Unlearning via Task Simplex Arithmetic

  • Junhao Dong
  • Hao Zhu
  • Yifei Zhang
  • Xinghua Qu
  • Yew Soon Ong
  • Piotr Koniusz

As foundation Vision-Language Models (VLMs) unlock fine-tuning on smaller datasets while leveraging large-scale pre-training data, machine unlearning becomes critical in addressing privacy concerns and regulatory compliance. Task vector, representing the difference between parameters of models fine-tuned with and without specific data, is a popular retraining-free unlearning strategy. However, we observe that task vectors exhibit substantial sensitivity to various fine-tuning configurations, resulting in unstable unlearning effectiveness that correlates negatively with the prediction-level variance. While aggregating multiple functions (e. g. , VLM with classifier) whose parameters are represented by different task vectors reduces function variance and improves unlearning, the computational cost of obtaining numerous task vectors and aggregating functions is computationally high. Thus, in order to capture the space of task vectors induced by diverse fine-tuning strategies, we propose modeling it within the convex hull of $(Q-1)$-simplex whose vertices represent $Q$ task vectors. Although a function ensemble can be formed by sampling numerous task vectors from such a simplex, we derive a closed-form ensemble of an infinite number of functions whose parameters are uniformly sampled from the simplex, enabling efficient function-level task vector ensembling with enhanced unlearning performance. Extensive experiments and analyses across diverse datasets and scenarios demonstrate the efficacy of our method.

NeurIPS Conference 2025 Conference Paper

Robust SuperAlignment: Weak-to-Strong Robustness Generalization for Vision-Language Models

  • Junhao Dong
  • Cong Zhang
  • Xinghua Qu
  • Zejun Ma
  • Piotr Koniusz
  • Yew Soon Ong

Numerous well-established studies have demonstrated the superhuman capabilities of modern Vision-Language Models (VLMs) across a wide range of tasks. However, growing is the doubt about the continuing availability of reliable high-quality labeling (supervision) from human annotators, leading to stagnation of the model's performance. To address this challenge, ``superalignment'' employs the so-called weak-to-strong generalization paradigm, where the supervision from a weak model can provide generalizable knowledge for a strong model. While effective in aligning knowledge for clean samples between the strong and weak models, the standard weak-to-strong approach typically fails to capture adversarial robustness, exposing strong VLMs to adversarial attacks. This inability to transfer adversarial robustness is because adversarial samples are normally missing in the superalignment stage. To this end, we are the first to propose the weak-to-strong (adversarial) robustness generalization method to elicit zero-shot robustness in large-scale models by an unsupervised scheme, mitigating the unreliable information source for alignment from two perspectives: alignment re-weighting and source guidance refinement. We analyze settings under which robustness generalization is possible. Extensive experiments across various vision-language benchmarks validate the effectiveness of our method in numerous scenarios, demonstrating its plug-and-play applicability to large-scale VLMs.

ICLR Conference 2025 Conference Paper

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs

  • Yihong Luo
  • Xiaolong Chen 0003
  • Xinghua Qu
  • Tianyang Hu 0001
  • Jing Tang 0004

Recently, some works have tried to combine diffusion and Generative Adversarial Networks (GANs) to alleviate the computational cost of the iterative denoising inference in Diffusion Models (DMs). However, existing works in this line suffer from either training instability and mode collapse or subpar one-step generation learning efficiency. To address these issues, we introduce YOSO, a novel generative model designed for rapid, scalable, and high-fidelity one-step image synthesis with high training stability and mode coverage. Specifically, we smooth the adversarial divergence by the denoising generator itself, performing self-cooperative learning. We show that our method can serve as a one-step generation model training from scratch with competitive performance. Moreover, we extend our YOSO to one-step text-to-image generation based on pre-trained models by several effective training techniques (i.e., latent perceptual loss and latent discriminator for efficient training along with the latent DMs; the informative prior initialization (IPI), and the quick adaption stage for fixing the flawed noise scheduler). Experimental results show that YOSO achieves the state-of-the-art one-step generation performance even with Low-Rank Adaptation (LoRA) fine-tuning. In particular, we show that the YOSO-PixArt-$\alpha$ can generate images in one step trained on 512 resolution, with the capability of adapting to 1024 resolution without extra explicit training, requiring only \textasciitilde10 A800 days for fine-tuning. Our code is available at: [https://github.com/Luo-Yihong/YOSO](https://github.com/Luo-Yihong/YOSO)

IJCAI Conference 2023 Conference Paper

AudioQR: Deep Neural Audio Watermarks For QR Code

  • Xinghua Qu
  • Xiang Yin
  • Pengfei Wei
  • Lu Lu
  • Zejun Ma

Image-based quick response (QR) code is frequently used, but creates barriers for the visual impaired people. With the goal of ``AI for good", this paper proposes the AudioQR, a barrier-free QR coding mechanism for the visually impaired population via deep neural audio watermarks. Previous audio watermarking approaches are mainly based on handcrafted pipelines, which is less secure and difficult to apply in large-scale scenarios. In contrast, AudioQR is the first comprehensive end-to-end pipeline that hides watermarks in audio imperceptibly and robustly. To achieve this, we jointly train an encoder and decoder, where the encoder is structured as a concatenation of transposed convolutions and multi-receptive field fusion modules. Moreover, we customize the decoder training with a stochastic data augmentation chain to make the watermarked audio robust towards different audio distortions, such as environment background, room impulse response when playing through the air, music surrounding, and Gaussian noise. Experiment results indicate that AudioQR can efficiently hide arbitrary information into audio without introducing significant perceptible difference. Our code is available at https: //github. com/xinghua-qu/AudioQR.

NeurIPS Conference 2023 Conference Paper

Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective

  • Pengfei Wei
  • Lingdong Kong
  • Xinghua Qu
  • Yi Ren
  • Zhiqiang Xu
  • Jing Jiang
  • Xiang Yin

Unsupervised video domain adaptation is a practical yet challenging task. In this work, for the first time, we tackle it from a disentanglement view. Our key idea is to handle the spatial and temporal domain divergence separately through disentanglement. Specifically, we consider the generation of cross-domain videos from two sets of latent factors, one encoding the static information and another encoding the dynamic information. A Transfer Sequential VAE (TranSVAE) framework is then developed to model such generation. To better serve for adaptation, we propose several objectives to constrain the latent factors. With these constraints, the spatial divergence can be readily removed by disentangling the static domain-specific information out, and the temporal divergence is further reduced from both frame- and video-levels through adversarial learning. Extensive experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE compared with several state-of-the-art approaches.

IJCAI Conference 2022 Conference Paper

Next Point-of-Interest Recommendation with Inferring Multi-step Future Preferences

  • Lu Zhang
  • Zhu Sun
  • Ziqing Wu
  • Jie Zhang
  • Yew Soon Ong
  • Xinghua Qu

Existing studies on next point-of-interest (POI) recommendation mainly attempt to learn user preference from the past and current sequential behaviors. They, however, completely ignore the impact of future behaviors on the decision-making, thus hindering the quality of user preference learning. Intuitively, users' next POI visits may also be affected by their multi-step future behaviors, as users may often have activity planning in mind. To fill this gap, we propose a novel Context-aware Future Preference inference Recommender (CFPRec) to help infer user future preference in a self-ensembling manner. In particular, it delicately derives multi-step future preferences from the learned past preference thanks to the periodic property of users' daily check-ins, so as to implicitly mimic user’s activity planning before her next visit. The inferred future preferences are then seamlessly integrated with the current preference for more expressive user preference learning. Extensive experiments on three datasets demonstrate the superiority of CFPRec against state-of-the-arts.

AAMAS Conference 2022 Conference Paper

Spiking Pitch Black: Poisoning an Unknown Environment to Attack Unknown Reinforcement Learners

  • Hang Xu
  • Xinghua Qu
  • Zinovi Rabinovich

As reinforcement learning (RL) systems are deployed in various safety-critical applications, it is imperative to understand how vulnerable they are to adversarial attacks. Of these, an environmentpoisoning attack (EPA) is considered particularly insidious, since environment hyper-parameters are significant factors in determining an RL policy, yet prone to be accessed by third parties. The success of EPAs relies on comprehensive prior knowledge of the attacked RL system, including RL agent’s learning mechanism and/or its environment model. Unfortunately, such an assumption of prior knowledge creates an unrealistic attack, one that poses limited threat to real-world RL systems. In this paper, we propose a Double-Black-Box EPA framework, only assuming the attacker’s ability to alter environment hyperparameters. Considering that environment alteration comes at a cost, we seek minimal poisoning in an unknown environment and aim to force a black-box RL agent to learn an attacker-designed policy. To this end, we incorporate an inference module in our framework to capture the internal information of an unknown RL system and, accordingly, learn an adaptive strategy based on an approximation of our attack objective. We empirically show the threat posed by our attack to both tabular-RL and deep-RL algorithms, in both discrete and continuous environments.