Arrow Research search

Author name cluster

Jingyao Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
2 author rows

Possible papers

6

AAAI Conference 2026 Conference Paper

Exploring Transferability of Self-Supervised Learning by Task Conflict Calibration

  • Huijie Guo
  • Jingyao Wang
  • Peizheng Guo
  • Xingchen Shen
  • Changwen Zheng
  • Wenwen Qiang

In this paper, we explore the transferability of SSL by addressing two central questions: (i) what is the representation transferability of SSL, and (ii) how can we effectively model this transferability? Transferability is defined as the ability of a representation learned from one task to support the objective of another. Inspired by the meta-learning paradigm, we construct multiple SSL tasks within each training batch to support explicitly modeling transferability. Based on empirical evidence and causal analysis, we find that although introducing task-level information improves transferability, it is still hindered by task conflict. To address this issue, we propose a Task Conflict Calibration method to alleviate the impact of task conflict. Specifically, it first splits batches to create multiple SSL tasks, infusing task-level information. Next, it uses a factor extraction network to produce causal generative factors for all tasks and a weight extraction network to assign dedicated weights to each sample, employing data reconstruction, orthogonality, and sparsity to ensure effectiveness. Finally, the method calibrates sample representations during SSL training and integrates into the pipeline via a two-stage bi-level optimization framework to boost the transferability of learned representations. Experimental results on multiple downstream tasks demonstrate that our method consistently improves the transferability of SSL models.

AAAI Conference 2026 Conference Paper

Group Causal Policy Optimization for Post-Training Large Language Models

  • Ziyin Gu
  • Jingyao Wang
  • Ran Zuo
  • Chuxiong Sun
  • Zeen Song
  • Changwen Zheng
  • Wenwen Qiang

Recent advances in large language models (LLMs) have broadened their applicability across diverse tasks, yet specialized domains still require targeted post-training. Among existing methods, Group Relative Policy Optimization (GRPO) stands out for its efficiency, leveraging groupwise relative rewards while avoiding costly value function learning. However, GRPO treats candidate responses as independent, overlooking semantic interactions such as complementarity and contradiction. To address this challenge, we first introduce a Structural Causal Model (SCM) that reveals hidden dependencies among candidate responses induced by conditioning on a final integrated output, forming a collider structure. Then, our causal analysis leads to two insights: (1) projecting responses onto a causally-informed subspace improves prediction quality, and (2) this projection yields a better baseline than query-only conditioning. Building on these insights, we propose Group Causal Policy Optimization (GCPO), which integrates causal structure into optimization through two key components: a causally-informed reward adjustment and a novel KL-regularization term that aligns the policy with a causally-projected reference distribution. Comprehensive experimental evaluations on various benchmarks demonstrate that GCPO consistently surpasses existing methods.

NeurIPS Conference 2025 Conference Paper

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

  • Jingyao Wang
  • Wenwen Qiang
  • Zeen Song
  • Changwen Zheng
  • Hui Xiong

Large language models (LLMs) excel at complex tasks thanks to advances in their reasoning abilities. However, existing methods overlook the trade-off between reasoning effectiveness and efficiency, often encouraging unnecessarily long reasoning chains and wasting tokens. To address this, we propose Learning to Think (L2T), an information-theoretic reinforcement fine-tuning framework for LLMs to make the models achieve optimal reasoning with fewer tokens. Specifically, L2T treats each query-response interaction as a hierarchical session of multiple episodes and proposes a universal dense process reward, i. e. , quantifies the episode-wise information gain in parameters, requiring no extra annotations or task-specific evaluators. We propose a method to quickly estimate this reward based on PAC-Bayes bounds and the Fisher information matrix. Theoretical analyses show that it significantly reduces computational complexity with high estimation accuracy. By immediately rewarding each episode's contribution and penalizing excessive updates, L2T optimizes the model via reinforcement learning to maximize the use of each episode and achieve effective updates. Empirical results on various reasoning benchmarks and base models demonstrate the advantage of L2T across different tasks, boosting both reasoning effectiveness and efficiency.

ICML Conference 2025 Conference Paper

On the Out-of-Distribution Generalization of Self-Supervised Learning

  • Wenwen Qiang
  • Jingyao Wang
  • Zeen Song
  • Jiangmeng Li
  • Changwen Zheng

In this paper, we focus on the out-of-distribution (OOD) generalization of self-supervised learning (SSL). By analyzing the mini-batch construction during the SSL training phase, we first give one plausible explanation for SSL having OOD generalization. Then, from the perspective of data generation and causal inference, we analyze and conclude that SSL learns spurious correlations during the training process, which leads to a reduction in OOD generalization. To address this issue, we propose a post-intervention distribution (PID) grounded in the Structural Causal Model. PID offers a scenario where the spurious variable and label variable is mutually independent. Besides, we demonstrate that if each mini-batch during SSL training satisfies PID, the resulting SSL model can achieve optimal worst-case OOD performance. This motivates us to develop a batch sampling strategy that enforces PID constraints through the learning of a latent variable model. Through theoretical analysis, we demonstrate the identifiability of the latent variable model and validate the effectiveness of the proposed sampling strategy. Experiments conducted on various downstream OOD tasks demonstrate the effectiveness of the proposed sampling strategy.

ICML Conference 2025 Conference Paper

Towards the Causal Complete Cause of Multi-Modal Representation Learning

  • Jingyao Wang
  • Siyu Zhao
  • Wenwen Qiang
  • Jiangmeng Li
  • Changwen Zheng
  • Fuchun Sun 0001
  • Hui Xiong 0001

Multi-Modal Learning (MML) aims to learn effective representations across modalities for accurate predictions. Existing methods typically focus on modality consistency and specificity to learn effective representations. However, from a causal perspective, they may lead to representations that contain insufficient and unnecessary information. To address this, we propose that effective MML representations should be causally sufficient and necessary. Considering practical issues like spurious correlations and modality conflicts, we relax the exogeneity and monotonicity assumptions prevalent in prior works and explore the concepts specific to MML, i. e. , Causal Complete Cause ($C^3$). We begin by defining $C^3$, which quantifies the probability of representations being causally sufficient and necessary. We then discuss the identifiability of $C^3$ and introduce an instrumental variable to support identifying $C^3$ with non-exogeneity and non-monotonicity. Building on this, we conduct the $C^3$ measurement, i. e. , $C^3$ risk. We propose a twin network to estimate it through (i) the real-world branch: utilizing the instrumental variable for sufficiency, and (ii) the hypothetical-world branch: applying gradient-based counterfactual modeling for necessity. Theoretical analyses confirm its reliability. Based on these results, we propose $C^3$ Regularization, a plug-and-play method that enforces the causal completeness of the learned representations by minimizing $C^3$ risk. Extensive experiments demonstrate its effectiveness.

IJCAI Conference 2024 Conference Paper

Hacking Task Confounder in Meta-Learning

  • Jingyao Wang
  • Yi Ren
  • Zeen Song
  • Jianqi Zhang
  • Changwen Zheng
  • Wenwen Qiang

Meta-learning enables rapid generalization to new tasks by learning knowledge from various tasks. It is intuitively assumed that as the training progresses, a model will acquire richer knowledge, leading to better generalization performance. However, our experiments reveal an unexpected result: there is negative knowledge transfer between tasks, affecting generalization performance. To explain this phenomenon, we conduct Structural Causal Models (SCMs) for causal analysis. Our investigation uncovers the presence of spurious correlations between task-specific causal factors and labels in meta-learning. Furthermore, the confounding factors differ across different batches. We refer to these confounding factors as ``Task Confounders". Based on these findings, we propose a plug-and-play Meta-learning Causal Representation Learner (MetaCRL) to eliminate task confounders. It encodes decoupled generating factors from multiple tasks and utilizes an invariant-based bi-level optimization mechanism to ensure their causality for meta-learning. Extensive experiments on various benchmark datasets demonstrate that our work achieves state-of-the-art (SOTA) performance. The code is provided in https: //github. com/WangJingyao07/MetaCRL.