Arrow Research search

Author name cluster

Junkang Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
2 author rows

Possible papers

10

AAAI Conference 2026 Conference Paper

Causal-HalBench: Uncovering LVLMs Object Hallucinations Through Causal Intervention

  • Zhe Xu
  • Zhicai Wang
  • Junkang Wu
  • Jinda Lu
  • Xiang Wang

Large Vision-Language Models (LVLMs) often suffer from object hallucination, making erroneous judgments about the presence of objects in images. We propose this primarily stems from spurious correlations arising when models strongly associate highly co-occurring objects during training, leading to hallucinated objects influenced by visual context. Current benchmarks mainly focus on hallucination detection but lack a formal characterization and quantitative evaluation of spurious correlations in LVLMs. To address this, we introduce causal analysis into the object recognition scenario of LVLMs, establishing a Structural Causal Model (SCM). Utilizing the language of causality, we formally define spurious correlations arising from co-occurrence bias. To quantify the influence induced by these spurious correlations, we develop Causal-HalBench, a benchmark specifically constructed with counterfactual samples and integrated with comprehensive causal metrics designed to assess model robustness against spurious correlations. Concurrently, we propose an extensible pipeline for the construction of these counterfactual samples, leveraging the capabilities of proprietary LVLMs and Text-to-Image (T2I) models for their generation. Our evaluations on mainstream LVLMs using Causal-HalBench demonstrate these models exhibit susceptibility to spurious correlations, albeit to varying extents.

ICML Conference 2025 Conference Paper

AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization

  • Junkang Wu
  • Xue Wang 0010
  • Zhengyi Yang 0007
  • Jiancan Wu
  • Jinyang Gao
  • Bolin Ding
  • Xiang Wang 0010
  • Xiangnan He 0001

Aligning large language models (LLMs) with human preferences requires balancing policy optimization with computational stability. While recent offline methods like DPO and SimPO bypass reinforcement learning’s complexity, they face critical limitations: DPO relies on static reference models that degrade with policy updates, and SimPO assumes a uniform target reward margin that ignores instance-wise preference strength. We propose AlphaDPO, an adaptive preference optimization framework that dynamically reparameterizes the reference distribution to address these issues. Our key innovation lies in an implicit reference model $\hat{\pi}_{\text{ref}} \propto U(y|x)(\pi_\theta/\pi_{\text{ref}})^\alpha$, which interpolates between policy-driven specialization and uniform exploration while enabling instance-adaptive reward margins. Theoretically, we prove AlphaDPO implicitly controls sequential KL divergence between iterative policy updates, ensuring stability even with poorly calibrated reference models. Empirically, AlphaDPO achieves state-of-the-art performance on AlpacaEval 2 (58. 7% LC win rate) and Arena-Hard (35. 7% win rate) across Mistral2-7B, Llama3-8B, and Gemma2-9B, demonstrating robust alignment without multi-stage training. Our work establishes adaptive reference reparameterization as a principled mechanism for preference optimization.

ICML Conference 2025 Conference Paper

DAMA: Data- and Model-aware Alignment of Multi-modal LLMs

  • Jinda Lu
  • Junkang Wu
  • Jinghan Li
  • Xiaojun Jia
  • Shuo Wang 0008
  • Yifan Zhang 0004
  • Junfeng Fang
  • Xiang Wang 0010

Direct Preference Optimization (DPO) has shown effectiveness in aligning multi-modal large language models (MLLM) with human preferences. However, existing methods exhibit an imbalanced responsiveness to the data of varying hardness, tending to overfit on the easy-to-distinguish data while underfitting on the hard-to-distinguish data. In this paper, we propose Data- and Model-aware DPO (DAMA) to dynamically adjust the optimization process from two key aspects: (1) a data-aware strategy that incorporates data hardness, and (2) a model-aware strategy that integrates real-time model responses. By combining the two strategies, DAMA enables the model to effectively adapt to data with varying levels of hardness. Extensive experiments on five benchmarks demonstrate that DAMA not only significantly enhances the trustworthiness, but also improves the effectiveness over general tasks. For instance, on the Object HalBench, our DAMA-7B reduces response-level and mentioned-level hallucination by 90. 0% and 95. 3%, respectively, surpassing the performance of GPT-4V.

ICML Conference 2025 Conference Paper

Larger or Smaller Reward Margins to Select Preferences for LLM Alignment?

  • Kexin Huang
  • Junkang Wu
  • Ziqian Chen
  • Xue Wang 0010
  • Jinyang Gao
  • Bolin Ding
  • Jiancan Wu
  • Xiangnan He 0001

Preference learning is critical for aligning large language models (LLMs) with human values, with the quality of preference datasets playing a crucial role in this process. While existing metrics primarily assess data quality based on either explicit or implicit reward margins, their single-margin focus often leads to contradictory evaluations for the same data. To address this issue, we propose a new metric of alignment potential, $M_{AP}$, which integrates both margins to quantify the gap from the model’s current implicit reward margin to the target explicit reward margin, thereby estimating the model’s potential to align on the preference data. Empirical results demonstrate that training on the data selected by $M_{AP}$ consistently enhances alignment performance, surpassing existing metrics across different base models and optimization objectives. Furthermore, our method can be extended to self-play data generation frameworks, where we use this metric to identify high-quality data within the self-generated content by LLMs. Under this data generation scenario, our method surpasses current state-of-the-art methods across various training settings and demonstrates continuous improvements with increasing dataset size and training iterations.

ICML Conference 2025 Conference Paper

MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

  • Yifan Zhang 0004
  • Tao Yu
  • Haochen Tian 0001
  • Chaoyou Fu
  • Peiyan Li 0001
  • Jianshu Zeng
  • Wulin Xie
  • Yang Shi 0009

Existing efforts to align multimodal large language models (MLLMs) with human preferences have only achieved progress in narrow areas, such as hallucination reduction, but remain limited in practical applicability and generalizability. To this end, we introduce MM-RLHF, a dataset containing 120k fine-grained, human-annotated preference comparison pairs. This dataset represents a substantial advancement over existing resources, offering superior size, diversity, annotation granularity, and quality. Leveraging this dataset, we propose several key innovations to improve both the quality of reward models and the efficiency of alignment algorithms. Notably, we introduce the Critique-Based Reward Model, which generates critiques of model outputs before assigning scores, offering enhanced interpretability and more informative feedback compared to traditional scalar reward mechanisms. Additionally, we propose Dynamic Reward Scaling, a method that adjusts the loss weight of each sample according to the reward signal, thereby optimizing the use of high-quality comparison pairs. Our approach is rigorously evaluated across 10 distinct dimensions, encompassing 27 benchmarks, with results demonstrating significant and consistent improvements in model performance (Figure. 1).

NeurIPS Conference 2025 Conference Paper

RePO: Understanding Preference Learning Through ReLU-Based Optimization

  • Junkang Wu
  • Kexin Huang
  • Xue Wang
  • Jinyang Gao
  • Bolin Ding
  • Jiancan Wu
  • Xiangnan He
  • Xiang Wang

Preference learning has become a common approach in various recent methods for aligning large language models with human values. These methods optimize the preference margin between chosen and rejected responses, subject to certain constraints for avoiding over-optimization. In this paper, we report surprising empirical findings that simple ReLU activation can learn meaningful alignments even using \emph{none} of the following: (i) sigmoid-based gradient constraints, (ii) explicit regularization terms. Our experiments show that over-optimization does exist, but a threshold parameter $\gamma$ plays an essential role in preventing it by dynamically filtering training examples. We further provide theoretical analysis demonstrating that ReLU-based Preference Optimization (RePO) corresponds to the convex envelope of the 0-1 loss, establishing its fundamental soundness. Our ``RePO'' method achieves competitive or superior results compared to established preference optimization approaches. We hope this simple baseline will motivate researchers to rethink the fundamental mechanisms behind preference optimization for language model alignment.

ICLR Conference 2025 Conference Paper

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

  • Junkang Wu
  • Yuexiang Xie
  • Zhengyi Yang 0007
  • Jiancan Wu
  • Jiawei Chen 0007
  • Jinyang Gao
  • Bolin Ding
  • Xiang Wang 0010

This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robust Optimization (DRO), we enhance DPO's resilience to these types of noise. Our theoretical insights reveal that DPO inherently embeds DRO principles, conferring robustness to pointwise noise, with the regularization coefficient $\beta$ playing a critical role in its noise resistance. Extending this framework, we introduce Distributionally Robustifying DPO (Dr. DPO), which integrates pairwise robustness by optimizing against worst-case pairwise scenarios. The novel hyperparameter $\beta'$ in Dr. DPO allows for fine-tuned control over data pair reliability, providing a strategic balance between exploration and exploitation in noisy training environments. Empirical evaluations demonstrate that Dr. DPO substantially improves the quality of generated text and response accuracy in preference datasets, showcasing enhanced performance in both noisy and noise-free settings.

NeurIPS Conference 2024 Conference Paper

$\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

  • Junkang Wu
  • Yuexiang Xie
  • Zhengyi Yang
  • Jiancan Wu
  • Jinyang Gao
  • Bolin Ding
  • Xiang Wang
  • Xiangnan He

Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $\beta$, as well as to the quality of the preference data. We analyze the impact of $\beta$ and data quality on DPO, uncovering that optimal $\beta$ values vary with the informativeness of pairwise data. Addressing the limitations of static $\beta$ values, we introduce a novel framework that dynamically calibrates $\beta$ at the batch level, informed by data quality considerations. Additionally, our method incorporates $\beta$-guided data filtering to safeguard against the influence of outliers. Through empirical evaluation, we demonstrate that our dynamic $\beta$ adjustment technique significantly improves DPO’s performance across a range of models and datasets, offering a more robust and adaptable training paradigm for aligning LLMs with human feedback. The code is available at \url{https: //anonymous. 4open. science/r/beta-DPO-EE6C}.

IJCAI Conference 2023 Conference Paper

Discriminative-Invariant Representation Learning for Unbiased Recommendation

  • Hang Pan
  • Jiawei Chen
  • Fuli Feng
  • Wentao Shi
  • Junkang Wu
  • Xiangnan He

Selection bias hinders recommendation models from learning unbiased user preference. Recent works empirically reveal that pursuing invariant user and item representation across biased and unbiased data is crucial for counteracting selection bias. However, our theoretical analysis reveals that simply optimizing representation invariance is insufficient for addressing the selection bias — recommendation performance is bounded by both representation invariance and discriminability. Worse still, current invariant representation learning methods in recommendation neglect even hurt the representation discriminability due to data sparsity and label shift. In this light, we propose a new Discriminative-Invariant Representation Learning framework for unbiased recommendation, which incorporates label-conditional clustering and prior-guided contrasting into conventional invariant representation learning to mitigate the impact of data sparsity and label shift, respectively. We conduct extensive experiments on three real-world datasets, validating the rationality and effectiveness of the proposed framework. Code and supplementary materials are available at: https: //github. com/HungPaan/DIRL.

NeurIPS Conference 2023 Conference Paper

Understanding Contrastive Learning via Distributionally Robust Optimization

  • Junkang Wu
  • Jiawei Chen
  • Jiancan Wu
  • Wentao Shi
  • Xiang Wang
  • Xiangnan He

This study reveals the inherent tolerance of contrastive learning (CL) towards sampling bias, wherein negative samples may encompass similar semantics (\eg labels). However, existing theories fall short in providing explanations for this phenomenon. We bridge this research gap by analyzing CL through the lens of distributionally robust optimization (DRO), yielding several key insights: (1) CL essentially conducts DRO over the negative sampling distribution, thus enabling robust performance across a variety of potential distributions and demonstrating robustness to sampling bias; (2) The design of the temperature $\tau$ is not merely heuristic but acts as a Lagrange Coefficient, regulating the size of the potential distribution set; (3) A theoretical connection is established between DRO and mutual information, thus presenting fresh evidence for ``InfoNCE as an estimate of MI'' and a new estimation approach for $\phi$-divergence-based generalized mutual information. We also identify CL's potential shortcomings, including over-conservatism and sensitivity to outliers, and introduce a novel Adjusted InfoNCE loss (ADNCE) to mitigate these issues. It refines potential distribution, improving performance and accelerating convergence. Extensive experiments on various domains (image, sentence, and graph) validate the effectiveness of the proposal.