Arrow Research search

Author name cluster

Kun Kuang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers
1 author row

Possible papers

30

AAAI Conference 2026 Conference Paper

Detecting Unobserved Confounders: A Kernelized Regression Approach

  • Yikai Chen
  • Yunxin Mao
  • Chunyuan Zheng
  • Hao Zou
  • Shanzhi Gu
  • Shixuan Liu
  • Yang Shi
  • Wenjing Yang

Detecting unobserved confounders is crucial for reliable causal inference in observational studies. Existing methods require either linearity assumptions or multiple heterogeneous environments, limiting applicability to nonlinear single-environment settings. To bridge this gap, we propose Kernel Regression Confounder Detection (KRCD), a novel method for detecting unobserved confounding in nonlinear observational data under single-environment conditions. KRCD leverages reproducing kernel Hilbert spaces to model complex dependencies. By comparing standard and higher-order kernel regressions, we derive a test statistic whose significant deviation from zero indicates unobserved confounding. Theoretically, we prove two key results: First, in infinite samples, regression coefficients coincide if and only if no unobserved confounders exist. Second, finite-sample differences converge to zero-mean Gaussian distributions with tractable variance. Extensive experiments on synthetic benchmarks and the Twins dataset demonstrate that KRCD not only outperforms existing baselines but also achieves superior computational efficiency.

AAAI Conference 2026 Conference Paper

P2S: Probabilistic Process Supervision for General-Domain Reasoning Question Answering

  • Wenlin Zhong
  • Chengyuan Liu
  • Yiquan Wu
  • Bovin Tan
  • Changlong Sun
  • Yi Wang
  • Xiaozhong Liu
  • Kun Kuang

While reinforcement learning with verifiable rewards (RLVR) has advanced LLM reasoning in structured domains like mathematics and programming, its application to general-domain reasoning tasks remains challenging due to the absence of verifiable reward signals. To this end, methods like Reinforcement Learning with Reference Probability Reward (RLPR) have emerged, leveraging the probability of generating the final answer as a reward signal. However, these outcome-focused approaches neglect crucial step-by-step supervision of the reasoning process itself. To address this gap, we introduce Probabilistic Process Supervision (P2S), a novel self-supervision framework that provides fine-grained process rewards without requiring a separate reward model or human-annotated reasoning steps. During reinforcement learning, P2S synthesizes and filters a high-quality reference reasoning chain (gold-CoT). The core of our method is to calculate a Path Faithfulness Reward (PFR) for each reasoning step, which is derived from the conditional probability of generating the gold-CoT's suffix, given the model's current reasoning prefix. Crucially, this PFR can be flexibly integrated with any outcome-based reward, directly tackling the reward sparsity problem by providing dense guidance. Extensive experiments on reading comprehension and medical Question Answering benchmarks show that P2S significantly outperforms strong baselines.

AAAI Conference 2026 Conference Paper

Think Then Rewrite: Reasoning Enhanced Query Rewriting for Domain Specific Retrieval

  • Ang Li
  • Yufei Shi
  • Yuxuan Si
  • Yiquan Wu
  • Ming Cai
  • Xu Tan
  • Yi Wang
  • Changlong Sun

Query rewriting is a crucial task for improving retrieval, especially in professional domains such as law and medicine, where user queries are often underspecified and ambiguous. While large language models (LLMs) offer strong understanding and generation capabilities, existing LLM-based approaches reduce the task to text transformation or expansion, neglecting reasoning to disambiguate queries, which fails to bridge the cognitive gap between user queries and specialized documents. In this paper, we propose Think-Then-Rewrite (TTR), a reinforcement learning based framework that unleashes LLMs' reasoning ability for domain-specific query rewriting. TTR introduces a contrastive mutual information reward to encourage the LLM to generate reasoning processes that effectively distinguish confusing distractors. To boost early-stage training, TTR also constructs golden query rewrites as off‑policy data, providing strong guidance for RL learning. A mixed-policy optimization then combines on-policy and off-policy signals, ensuring both effectiveness and stability. Extensive experiments on legal and medical retrieval benchmarks demonstrate that TTR achieves state-of-the-art performance.

AAAI Conference 2025 Conference Paper

FedCFA: Alleviating Simpson’s Paradox in Model Aggregation with Counterfactual Federated Learning

  • Zhonghua Jiang
  • Jimin Xu
  • Shengyu Zhang
  • Tao Shen
  • Jiwei Li
  • Kun Kuang
  • Haibin Cai
  • Fei Wu

Federated learning (FL) is a promising technology for data privacy and distributed optimization, but it suffers from data imbalance and heterogeneity among clients. Existing FL methods try to solve the problems by aligning client with server model or by correcting client model with control variables. These methods excel on IID and general Non-IID data but perform mediocrely in Simpson's Paradox scenarios. Simpson's Paradox refers to the phenomenon that the trend observed on the global dataset disappears or reverses on a subset, which may lead to the fact that global model obtained through aggregation in FL does not accurately reflect the distribution of global data. Thus, we propose FedCFA, an novel FL framework employing counterfactual learning to generate counterfactual samples by replacing local data critical factors with global average data, aligning local data distributions with the global and mitigating Simpson's Paradox effects. In addition, to improve the counterfactual samples quality, we introduce factor decorrelation (FDC) loss to reduce the correlation among features and thus improve the independence of extracted factors. We conduct extensive experiments on six datasets and verify that our method outperforms other FL methods in terms of efficiency and global model accuracy under limited communication rounds.

AAAI Conference 2025 Conference Paper

Learning Causal Transition Matrix for Instance-dependent Label Noise

  • Jiahui Li
  • Tai-Wei Chang
  • Kun Kuang
  • Ximing Li
  • Long Chen
  • Jun Zhou

Noisy labels are both inevitable and problematic in machine learning methods, as they negatively impact models' generalization ability by causing overfitting. In the context of learning with noise, the transition matrix plays a crucial role in the design of statistically consistent algorithms. However, the transition matrix is often considered unidentifiable. One strand of methods typically addresses this problem by assuming that the transition matrix is instance-independent; that is, the probability of mislabeling a particular instance is not influenced by its characteristics or attributes. This assumption is clearly invalid in complex real-world scenarios. To better understand the transition relationship and relax this assumption, we propose to study the data generation process of noisy labels from a causal perspective. We discover that an unobservable latent variable can affect either the instance itself, the label annotation procedure, or both, which complicates the identification of the transition matrix. To address various scenarios, we have unified these observations within a new causal graph. In this graph, the input instance is divided into a noise-resistant component and a noise-sensitive component based on whether they are affected by the latent variable. These two components contribute to identifying the “causal transition matrix”, which approximates the true transition matrix with theoretical guarantee. In line with this, we have designed a novel training framework that explicitly models this causal relationship and, as a result, achieves a more accurate model for inferring the clean label.

AAAI Conference 2025 Conference Paper

MergeNet: Knowledge Migration Across Heterogeneous Models, Tasks, and Modalities

  • Kunxi Li
  • Tianyu Zhan
  • Kairui Fu
  • Shengyu Zhang
  • Kun Kuang
  • Jiwei Li
  • Zhou Zhao
  • Fan Wu

In this study, we focus on heterogeneous knowledge transfer across entirely different model architectures, tasks, and modalities. Existing knowledge transfer methods (e.g., backbone sharing, knowledge distillation) often hinge on shared elements within model structures or task-specific features/labels, limiting transfers to complex model types or tasks. To overcome these challenges, we present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models, facilitating the direct interaction, extraction, and application of knowledge within these parameter spaces. The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters and adeptly learning to identify and map parameters into the target model. MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage, including the training trajectory knowledge of the source model. Extensive experiments on heterogeneous knowledge transfer demonstrate significant improvements in challenging settings, where representative approaches may falter or prove less applicable.

NeurIPS Conference 2025 Conference Paper

Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

  • Jinluan Yang
  • Dingnan Jin
  • Anke Tang
  • Li Shen
  • Didi Zhu
  • Zhengyu Chen
  • Ziyu Zhao
  • Daixin Wang

Achieving balanced alignment of large language models (LLMs) in terms of Helpfulness, Honesty, and Harmlessness (3H optimization) constitutes a cornerstone of responsible AI. Existing methods like data mixture strategies face limitations, including heavy reliance on expert knowledge and conflicting optimization signals. While model merging offers parameter-level conflict-resolution strategies through integrating specialized models' parameters, its potential for 3H optimization remains underexplored. This paper systematically compares the effectiveness of model merging and data mixture methods in constructing 3H-aligned LLMs for the first time, revealing previously overlooked collaborative and conflict relationships among the 3H dimensions and discussing the advantages and drawbacks of data mixture (\textit{data-level}) and model merging (\textit{parameter-level}) methods in mitigating the conflict for balanced 3H optimization. Specially, we propose a novel \textbf{R}eweighting \textbf{E}nhanced task \textbf{S}ingular \textbf{M}erging method, \textbf{RESM}, through outlier weighting and sparsity-aware rank selection strategies to address the challenges of preference noise accumulation and layer sparsity adaptation inherent in 3H-aligned LLM merging. Extensive evaluations can verify the effectiveness and robustness of RESM compared to previous data mixture (2\%-5\% gain) and model merging (1\%-3\% gain) methods in achieving balanced LLM alignment.

NeurIPS Conference 2025 Conference Paper

MS-Bench: Evaluating LMMs in Ancient Manuscript Study through a Dunhuang Case Study

  • Yuqing Zhang
  • Yue Han
  • Shuanghe Zhu
  • Haoxiang Wu
  • Hangqi Li
  • Shengyu Zhang
  • Junchi Yan
  • Zemin Liu

Analyzing ancient manuscripts has traditionally been a labor-intensive and time-consuming task for philologists. While recent advancements in LMMs have demonstrated their potential across diverse domains, their effectiveness in manuscript study remains underexplored. In this paper, we introduce MS-Bench, the first comprehensive benchmark co-developed with archaeologists, comprising 5, 076 high-resolution images from 4th to 14th century and 9, 982 expert-curated questions across nine sub-tasks aligned with archaeological workflows. Through four prompting strategies, we systematically evaluate 32 LMMs on their effectiveness, robustness, and cultural contextualization. Our analysis reveals scale-driven performance and reliability improvements, prompting strategies' impact on performance (CoT has two-sides effect, while visual retrieval-augmented prompts provide consistent boost), and task-specific preferences depending on LMM’s visual capabilities. Although current LMMs are not yet capable of replacing domain expertise, they demonstrate promising potential to accelerate manuscript research through future human–AI collaboration.

AAAI Conference 2025 Conference Paper

Optimize Incompatible Parameters Through Compatibility-aware Knowledge Integration

  • Zheqi Lv
  • Keming Ye
  • Zishu Wei
  • Qi Tian
  • Shengyu Zhang
  • Wenqiao Zhang
  • Wenjie Wang
  • Kun Kuang

Deep neural networks have become foundational to advancements in multiple domains, including recommendation systems, natural language processing, and so on. Despite their successes, these models often contain incompatible parameters that can be underutilized or detrimental to model performance, particularly when faced with specific, varying data distributions. Existing research excels in removing such parameters or merging the outputs of multiple different pretrained models. However, the former focuses on efficiency rather than performance, while the latter requires several times more computing and storage resources to support inference. In this paper, we set the goal to explicitly improve these incompatible parameters by leveraging the complementary strengths of different models, thereby directly enhancing the models without any additional parameters. Specifically, we propose Compatibility-aware Knowledge Integration (CKI), which consists of Parameter Compatibility Assessment and Parameter Splicing, which are used to evaluate the knowledge content of multiple models and integrate the knowledge into one model, respectively. The integrated model can be used directly for inference or for further fine-tuning. Extensive experiments on various recommendation and language datasets show that CKI can effectively optimize incompatible parameters under multiple tasks and settings to break through the training limit of the original model without increasing the inference cost.

AILAW Journal 2025 Journal Article

Specialized or general AI? a comparative evaluation of LLMs’ performance in legal tasks

  • Xue Guo
  • Yuting Huang
  • Bin Wei
  • Kun Kuang
  • Yiquan Wu
  • Leilei Gan
  • Xianshan Huang
  • Xianglin Dong

Abstract The rise of large language models (LLMs) such as ChatGPT and GPT-4 developed by OpenAI have generated significant interest in the legal domain due to their sophisticated language processing capabilities. In particular, regions like China are vigorously developing legal-specific LLMs for legal purposes. Fine-tuned with fewer parameters and based on judicial documents and Chinese case data sets, these specialized LLMs are widely expected to meet practical needs in the judicial field more effectively. However, the ability of these law-specific LLMs to perform legal tasks and their potential to outperform general LLMs has not yet been established. To fill in this research gap, we systematically evaluate a range of general and legal-specific LLMs on various legal tasks. The results show that GPT-4 maintains superior performance on most legal tasks, although legal-specific LLMs show superior performance in specific cases. This study provides insight into the factors leading to these results, hoping to enrich the discourse on the use of LLMs in the legal field.

AAAI Conference 2024 Conference Paper

CGMGM: A Cross-Gaussian Mixture Generative Model for Few-Shot Semantic Segmentation

  • Junao Shen
  • Kun Kuang
  • Jiaheng Wang
  • Xinyu Wang
  • Tian Feng
  • Wei Zhang

Few-shot semantic segmentation (FSS) aims to segment unseen objects in a query image using a few pixel-wise annotated support images, thus expanding the capabilities of semantic segmentation. The main challenge lies in extracting sufficient information from the limited support images to guide the segmentation process. Conventional methods typically address this problem by generating single or multiple prototypes from the support images and calculating their cosine similarity to the query image. However, these methods often fail to capture meaningful information for modeling the de facto joint distribution of pixel and category. Consequently, they result in incomplete segmentation of foreground objects and mis-segmentation of the complex background. To overcome this issue, we propose the Cross Gaussian Mixture Generative Model (CGMGM), a novel Gaussian Mixture Models~(GMMs)-based FSS method, which establishes the joint distribution of pixel and category in both the support and query images. Specifically, our method initially matches the feature representations of the query image with those of the support images to generate and refine an initial segmentation mask. It then employs GMMs to accurately model the joint distribution of foreground and background using the support masks and the initial segmentation mask. Subsequently, a parametric decoder utilizes the posterior probability of pixels in the query image, by applying the Bayesian theorem, to the joint distribution, to generate the final segmentation mask. Experimental results on PASCAL-5i and COCO-20i datasets demonstrate our CGMGM's effectiveness and superior performance compared to the state-of-the-art methods.

AAAI Conference 2024 Conference Paper

Contrastive Balancing Representation Learning for Heterogeneous Dose-Response Curves Estimation

  • Minqin Zhu
  • Anpeng Wu
  • Haoxuan Li
  • Ruoxuan Xiong
  • Bo Li
  • Xiaoqing Yang
  • Xuan Qin
  • Peng Zhen

Estimating the individuals' potential response to varying treatment doses is crucial for decision-making in areas such as precision medicine and management science. Most recent studies predict counterfactual outcomes by learning a covariate representation that is independent of the treatment variable. However, such independence constraints neglect much of the covariate information that is useful for counterfactual prediction, especially when the treatment variables are continuous. To tackle the above issue, in this paper, we first theoretically demonstrate the importance of the balancing and prognostic representations for unbiased estimation of the heterogeneous dose-response curves, that is, the learned representations are constrained to satisfy the conditional independence between the covariates and both of the treatment variables and the potential responses. Based on this, we propose a novel Contrastive balancing Representation learning Network using a partial distance measure, called CRNet, for estimating the heterogeneous dose-response curves without losing the continuity of treatments. Extensive experiments are conducted on synthetic and real-world datasets demonstrating that our proposal significantly outperforms previous methods.

AAAI Conference 2024 Conference Paper

CoreRec: A Counterfactual Correlation Inference for Next Set Recommendation

  • Kexin Li
  • Chengjiang Long
  • Shengyu Zhang
  • Xudong Tang
  • Zhichao Zhai
  • Kun Kuang
  • Jun Xiao

Next set recommendation aims to predict the items that are likely to be bought in the next purchase. Central to this endeavor is the task of capturing intra-set and cross-set correlations among items. However, the modeling of cross-set correlations poses challenges due to specific issues. Primarily, these correlations are often implicit, and the prevailing approach of establishing an indiscriminate link across the entire set of objects neglects factors like purchase frequency and correlations between purchased items. Such hastily formed connections across sets introduce substantial noise. Additionally, the preeminence of high-frequency items in numerous sets could potentially overshadow and distort correlation modeling with respect to low-frequency items. Thus, we devoted to mitigating misleading inter-set correlations. With a fresh perspective rooted in causality, we delve into the question of whether correlations between a particular item and items from other sets should be relied upon for item representation learning and set prediction. Technically, we introduce the Counterfactual Correlation Inference framework for next set recommendation, denoted as CoreRec. This framework establishes a counterfactual scenario in which the recommendation model impedes cross-set correlations to generate intervened predictions. By contrasting these intervened predictions with the original ones, we gauge the causal impact of inter-set neighbors on set prediction—essentially assessing whether they contribute to spurious correlations. During testing, we introduce a post-trained switch module that selects between set-aware item representations derived from either the original or the counterfactual scenarios. To validate our approach, we extensively experiment using three real-world datasets, affirming both the effectiveness of CoreRec and the cogency of our analytical approach.

AAAI Conference 2024 Conference Paper

De-biased Attention Supervision for Text Classification with Causality

  • Yiquan Wu
  • Yifei Liu
  • Ziyu Zhao
  • Weiming Lu
  • Yating Zhang
  • Changlong Sun
  • Fei Wu
  • Kun Kuang

In text classification models, while the unsupervised attention mechanism can enhance performance, it often produces attention distributions that are puzzling to humans, such as assigning high weight to seemingly insignificant conjunctions. Recently, numerous studies have explored Attention Supervision (AS) to guide the model toward more interpretable attention distributions. However, such AS can impact classification performance, especially in specialized domains. In this paper, we address this issue from a causality perspective. Firstly, we leverage the causal graph to reveal two biases in the AS: 1) Bias caused by the label distribution of the dataset. 2) Bias caused by the words' different occurrence ranges that some words can occur across labels while others only occur in a particular label. We then propose a novel De-biased Attention Supervision (DAS) method to eliminate these biases with causal techniques. Specifically, we adopt backdoor adjustment on the label-caused bias and reduce the word-caused bias by subtracting the direct causal effect of the word. Through extensive experiments on two professional text classification datasets (e.g., medicine and law), we demonstrate that our method achieves improved classification accuracy along with more coherent attention distributions.

AAAI Conference 2024 Conference Paper

Learning to Reweight for Generalizable Graph Neural Network

  • Zhengyu Chen
  • Teng Xiao
  • Kun Kuang
  • Zheqi Lv
  • Min Zhang
  • Jinluan Yang
  • Chengqiang Lu
  • Hongxia Yang

Graph Neural Networks (GNNs) show promising results for graph tasks. However, existing GNNs' generalization ability will degrade when there exist distribution shifts between testing and training graph data. The fundamental reason for the severe degeneration is that most GNNs are designed based on the I.I.D hypothesis. In such a setting, GNNs tend to exploit subtle statistical correlations existing in the training set for predictions, even though it is a spurious correlation. In this paper, we study the problem of the generalization ability of GNNs on Out-Of-Distribution (OOD) settings. To solve this problem, we propose the Learning to Reweight for Generalizable Graph Neural Network (L2R-GNN) to enhance the generalization ability for achieving satisfactory performance on unseen testing graphs that have different distributions with training graphs. We propose a novel nonlinear graph decorrelation method, which can substantially improve the out-of-distribution generalization ability and compares favorably to previous methods in restraining the over-reduced sample size. The variables of graph representation are clustered based on the stability of their correlations, and graph decorrelation method learns weights to remove correlations between the variables of different clusters rather than any two variables. Besides, we introduce an effective stochastic algorithm based on bi-level optimization for the L2R-GNN framework, which enables simultaneously learning the optimal weights and GNN parameters, and avoids the over-fitting issue. Experiments show that L2R-GNN greatly outperforms baselines on various graph prediction benchmarks under distribution shifts.

AAMAS Conference 2023 Conference Paper

Adaptive Value Decomposition with Greedy Marginal Contribution Computation for Cooperative Multi-Agent Reinforcement Learning

  • Shanqi Liu
  • Yujing Hu
  • Runze Wu
  • Dong Xing
  • Yu Xiong
  • Changjie Fan
  • Kun Kuang
  • Yong Liu

Real-world cooperation often requires intensive coordination among agents simultaneously. This task has been extensively studied within the framework of cooperative multi-agent reinforcement learning (MARL), and value decomposition methods are among those cuttingedge solutions. However, traditional methods that learn the value function as a monotonic mixing of per-agent utilities cannot solve the tasks with non-monotonic returns. This hinders their application in generic scenarios. Recent methods tackle this problem from the perspective of implicit credit assignment by learning value functions with complete expressiveness or using additional structures to improve cooperation. However, they are either difficult to learn due to large joint action spaces or insufficient to capture the complicated interactions among agents which are essential to solving tasks with non-monotonic returns. Moreover, applications in real-world scenarios usually require policies to be interpretable, but interpretability is limited in the implicit credit assignment methods. To address these problems, we propose a novel explicit credit assignment method to address the non-monotonic problem. Our method, Adaptive Value decomposition with Greedy Marginal contribution (AVGM), is based on an adaptive value decomposition that learns the cooperative value of a group of dynamically changing agents. We first illustrate that the proposed value decomposition can consider the complicated interactions among agents and is feasible to learn in large-scale scenarios. Then, our method uses a greedy marginal contribution computed from the value decomposition as an individual credit to incentivize agents to learn the optimal cooperative policy. We further extend the module with an action encoder to guarantee the linear time complexity for computing the greedy marginal contribution. Experimental results demonstrate that our method achieves significant performance improvements Proc. of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023), A. Ricci, W. Yeoh, N. Agmon, B. An (eds.), May 29 – June 2, 2023, London, United Kingdom. © 2023 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). All rights reserved. in several non-monotonic domains. Besides, we showcase that our model maintains a good sense of interpretability and rationality. This suggests our model can be applied to scenarios with more realistic demands.

NeurIPS Conference 2023 Conference Paper

HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception

  • Junkun Yuan
  • Xinyu Zhang
  • Hao Zhou
  • Jian Wang
  • Zhongwei Qiu
  • Zhiyin Shao
  • Shaofeng Zhang
  • Sifan Long

Model pre-training is essential in human-centric perception. In this paper, we first introduce masked image modeling (MIM) as a pre-training approach for this task. Upon revisiting the MIM training strategy, we reveal that human structure priors offer significant potential. Motivated by this insight, we further incorporate an intuitive human structure prior - human parts - into pre-training. Specifically, we employ this prior to guide the mask sampling process. Image patches, corresponding to human part regions, have high priority to be masked out. This encourages the model to concentrate more on body structure information during pre-training, yielding substantial benefits across a range of human-centric perception tasks. To further capture human characteristics, we propose a structure-invariant alignment loss that enforces different masked views, guided by the human part prior, to be closely aligned for the same image. We term the entire method as HAP. HAP simply uses a plain ViT as the encoder yet establishes new state-of-the-art performance on 11 human-centric benchmarks, and on-par result on one dataset. For example, HAP achieves 78. 1% mAP on MSMT17 for person re-identification, 86. 54% mA on PA-100K for pedestrian attribute recognition, 78. 2% AP on MS COCO for 2D pose estimation, and 56. 0 PA-MPJPE on 3DPW for 3D pose and shape estimation.

AAAI Conference 2023 Conference Paper

Learning Chemical Rules of Retrosynthesis with Pre-training

  • Yinjie Jiang
  • Ying Wei
  • Fei Wu
  • Zhengxing Huang
  • Kun Kuang
  • Zhihua Wang

Retrosynthesis aided by artificial intelligence has been a very active and bourgeoning area of research, for its critical role in drug discovery as well as material science. Three categories of solutions, i.e., template-based, template-free, and semi-template methods, constitute mainstream solutions to this problem. In this paper, we focus on template-free methods which are known to be less bothered by the template generalization issue and the atom mapping challenge. Among several remaining problems regarding template-free methods, failing to conform to chemical rules is pronounced. To address the issue, we seek for a pre-training solution to empower the pre-trained model with chemical rules encoded. Concretely, we enforce the atom conservation rule via a molecule reconstruction pre-training task, and the reaction rule that dictates reaction centers via a reaction type guided contrastive pre-training task. In our empirical evaluation, the proposed pre-training solution substantially improves the single-step retrosynthesis accuracies in three downstream datasets.

AAAI Conference 2023 Conference Paper

Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

  • Qi Tian
  • Kun Kuang
  • Furui Liu
  • Baoxiang Wang

Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the utility of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline manner. These decomposed credits are then used to reconstruct the joint offline datasets into prioritized experience replay with individual trajectories, thereafter agents can share their good trajectories and conservatively train their policies with a graph attention network (GAT) based critic. We evaluate our method in both discrete control (i.e., StarCraft II and multi-agent particle environment) and continuous control (i.e., multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multi-agent datasets, especially when the difference of data quality between individual trajectories is large.

AAAI Conference 2023 Conference Paper

Learning Instrumental Variable from Data Fusion for Treatment Effect Estimation

  • Anpeng Wu
  • Kun Kuang
  • Ruoxuan Xiong
  • Minqin Zhu
  • Yuxuan Liu
  • Bo Li
  • Furui Liu
  • Zhihua Wang

The advent of the big data era brought new opportunities and challenges to draw treatment effect in data fusion, that is, a mixed dataset collected from multiple sources (each source with an independent treatment assignment mechanism). Due to possibly omitted source labels and unmeasured confounders, traditional methods cannot estimate individual treatment assignment probability and infer treatment effect effectively. Therefore, we propose to reconstruct the source label and model it as a Group Instrumental Variable (GIV) to implement IV-based Regression for treatment effect estimation. In this paper, we conceptualize this line of thought and develop a unified framework (Meta-EM) to (1) map the raw data into a representation space to construct Linear Mixed Models for the assigned treatment variable; (2) estimate the distribution differences and model the GIV for the different treatment assignment mechanisms; and (3) adopt an alternating training strategy to iteratively optimize the representations and the joint distribution to model GIV for IV regression. Empirical results demonstrate the advantages of our Meta-EM compared with state-of-the-art methods. The project page with the code and the Supplementary materials is available at https://github.com/causal-machine-learning-lab/meta-em.

AILAW Journal 2023 Journal Article

LK-IB: a hybrid framework with legal knowledge injection for compulsory measure prediction

  • Xiang Zhou
  • Qi Liu
  • Yiquan Wu
  • Qiangchao Chen
  • Kun Kuang

Abstract The interpretability of AI is just as important as its performance. In the LegalAI field, there have been efforts to enhance the interpretability of models, but a trade-off between interpretability and prediction accuracy remains inevitable. In this paper, we introduce a novel framework called LK-IB for compulsory measure prediction (CMP), one of the critical tasks in LegalAI. LK-IB leverages Legal Knowledge and combines an Interpretable model and a Black-box model to balance interpretability and prediction performance. Specifically, LK-IB involves three steps: (1) inputting cases into the first module, where first-order logic (FOL) rules are used to make predictions and output them directly if possible; (2) sending cases to the second module if FOL rules are not applicable, where a case distributor categorizes them as either “simple” or “complex“; and (3) sending simple cases to an interpretable model with strong interpretability and complex cases to a black-box model with outstanding performance. Experimental results demonstrate that the LK-IB framework provides more interpretable and accurate predictions than other state-of-the-art models. Given that the majority of cases in LegalAI are simple, the idea of model combination has significant potential for practical applications.

NeurIPS Conference 2023 Conference Paper

Two Heads are Better Than One: A Simple Exploration Framework for Efficient Multi-Agent Reinforcement Learning

  • Jiahui Li
  • Kun Kuang
  • Baoxiang Wang
  • Xingchen Li
  • Fei Wu
  • Jun Xiao
  • Long Chen

Exploration strategy plays an important role in reinforcement learning, especially in sparse-reward tasks. In cooperative multi-agent reinforcement learning~(MARL), designing a suitable exploration strategy is much more challenging due to the large state space and the complex interaction among agents. Currently, mainstream exploration methods in MARL either contribute to exploring the unfamiliar states which are large and sparse, or measuring the interaction among agents with high computational costs. We found an interesting phenomenon that different kinds of exploration plays a different role in different MARL scenarios, and choosing a suitable one is often more effective than designing an exquisite algorithm. In this paper, we propose a exploration method that incorporate the \underline{C}uri\underline{O}sity-based and \underline{IN}fluence-based exploration~(COIN) which is simple but effective in various situations. First, COIN measures the influence of each agent on the other agents based on mutual information theory and designs it as intrinsic rewards which are applied to each individual value function. Moreover, COIN computes the curiosity-based intrinsic rewards via prediction errors which are added to the extrinsic reward. For integrating the two kinds of intrinsic rewards, COIN utilizes a novel framework in which they complement each other and lead to a sufficient and effective exploration on cooperative MARL tasks. We perform extensive experiments on different challenging benchmarks, and results across different scenarios show the superiority of our method.

AIIM Journal 2022 Journal Article

Attribute-aware interpretation learning for thyroid ultrasound diagnosis

  • Ming Kong
  • Qing Guo
  • Shuowen Zhou
  • Mengze Li
  • Kun Kuang
  • Zhengxing Huang
  • Fei Wu
  • Xiaohong Chen

Thyroid nodule diagnosis from ultrasound images is a critical computer-aided diagnosis task. Previous works tried to imitate the doctor's diagnosis logic by considering the key attributes to improve the diagnosis performance and explaining the conclusion. However, their clinical feasibilities are still ambiguous because of the ignorance of the correlation between attribute features and global characteristics, as well as the lack of clinical effectiveness evaluation of result interpretations. Following the common logic of ultrasonic investigation, we design a novel Attribute-Aware Interpretation Learning (AAIL) model, consisting of attribute properties discovery module and attribute-global feature fusion module. Adequate result interpretation ensures reliability and transparency of diagnostic conclusions, including the visualization of attribute features and the relationship between attributes and the global feature. Extensive experiments on a practical dataset demonstrate the model's effectiveness, and an innovative human-computer collaborative experiment demonstrates the auxiliary diagnostic ability of the interpretations that can benefit professional doctors.

NeurIPS Conference 2022 Conference Paper

ConfounderGAN: Protecting Image Data Privacy with Causal Confounder

  • Qi Tian
  • Kun Kuang
  • Kelu Jiang
  • Furui Liu
  • Zhihua Wang
  • Fei Wu

The success of deep learning is partly attributed to the availability of massive data downloaded freely from the Internet. However, it also means that users' private data may be collected by commercial organizations without consent and used to train their models. Therefore, it's important and necessary to develop a method or tool to prevent unauthorized data exploitation. In this paper, we propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners. Specifically, the noise produced by the generator for each image has the confounder property. It can build spurious correlations between images and labels, so that the model cannot learn the correct mapping from images to labels in this noise-added dataset. Meanwhile, the discriminator is used to ensure that the generated noise is small and imperceptible, thereby remaining the normal utility of the encrypted image for humans. The experiments are conducted in six image classification datasets, including three natural object datasets and three medical datasets. The results demonstrate that our method not only outperforms state-of-the-art methods in standard settings, but can also be applied to fast encryption scenarios. Moreover, we show a series of transferability and stability experiments to further illustrate the effectiveness and superiority of our method.

NeurIPS Conference 2022 Conference Paper

GRASP: Navigating Retrosynthetic Planning with Goal-driven Policy

  • Yemin Yu
  • Ying Wei
  • Kun Kuang
  • Zhengxing Huang
  • Huaxiu Yao
  • Fei Wu

Retrosynthetic planning occupies a crucial position in synthetic chemistry and, accordingly, drug discovery, which aims to find synthetic pathways of a target molecule through a sequential decision-making process on a set of feasible reactions. While the majority of recent works focus on the prediction of feasible reactions at each step, there have been limited attempts toward improving the sequential decision-making policy. Existing strategies rely on either the expensive and high-variance value estimation by online rollout, or a settled value estimation neural network pre-trained with simulated pathways of limited diversity and no negative feedback. Besides, how to return multiple candidate pathways that are not only diverse but also desirable for chemists (e. g. , affordable building block materials) remains an open challenge. To this end, we propose a Goal-dRiven Actor-critic retroSynthetic Planning (GRASP) framework, where we identify the policy that performs goal-driven retrosynthesis navigation toward a user-demand objective. Our experiments on the benchmark Pistachio dataset and a chemists-designed dataset demonstrate that the framework outperforms state-of-the-art approaches by up to 32. 2% on search efficiency and 5. 6% on quality. Remarkably, our user studies show that GRASP successfully plans pathways that accomplish the goal prescribed with a designated goal (building block materials).

AAAI Conference 2021 Conference Paper

Judgment Prediction via Injecting Legal Knowledge into Neural Networks

  • Leilei Gan
  • Kun Kuang
  • Yi Yang
  • Fei Wu

Legal Judgment Prediction (LJP) is a key problem in legal artificial intelligence, which aims to predict a law case’s judgment based on a given text describing the facts of the law case. Most of previous works treat LJP as a text classification task and generally adopt deep neural networks (DNNs) based methods to solve it. However, existing DNNs based models are data thirsty and hard to explain which legal knowledge is based on to make such a prediction. Thus, injecting legal knowledge into neural networks to interpret the model and improve performance remains a significant problem. In this paper, we propose to represent declarative legal knowledge as a set of first-order logic rules and integrate these logic rules into a co-attention network-based model explicitly. The use of logic rules enhances neural networks with direct logical reasoning capabilities and makes the model more interpretable. We take private loan scenario as a case study and demonstrate the effectiveness of the proposed method through comprehensive experiments and analyses conducted on the collected dataset.

AAAI Conference 2021 Conference Paper

Stable Adversarial Learning under Distributional Shifts

  • Jiashuo Liu
  • Zheyan Shen
  • Peng Cui
  • Linjun Zhou
  • Kun Kuang
  • Bo Li
  • Yishi Lin

Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts due to the greedy adoption of all the correlations found in training data. Recently, there are robust learning methods aiming at this problem by minimizing the worst-case risk over an uncertainty set. However, they equally treat all covariates to form the decision sets regardless of the stability of their correlations with the target, resulting in the overwhelmingly large set and low confidence of the learner. In this paper, we propose Stable Adversarial Learning (SAL) algorithm that leverages heterogeneous data sources to construct a more practical uncertainty set and conduct differentiated robustness optimization, where covariates are differentiated according to the stability of their correlations with the target. We theoretically show that our method is tractable for stochastic gradientbased optimization and provide the performance guarantees for our method. Empirical studies on both simulation and real datasets validate the effectiveness of our method in terms of uniformly good performance across unknown distributional shifts.

IJCAI Conference 2020 Conference Paper

Decorrelated Clustering with Data Selection Bias

  • Xiao Wang
  • Shaohua Fan
  • Kun Kuang
  • Chuan Shi
  • Jiawei Liu
  • Bai Wang

Most of existing clustering algorithms are proposed without considering the selection bias in data. In many real applications, however, one cannot guarantee the data is unbiased. Selection bias might bring the unexpected correlation between features and ignoring those unexpected correlations will hurt the performance of clustering algorithms. Therefore, how to remove those unexpected correlations induced by selection bias is extremely important yet largely unexplored for clustering. In this paper, we propose a novel Decorrelation regularized K-Means algorithm (DCKM) for clustering with data selection bias. Specifically, the decorrelation regularizer aims to learn the global sample weights which are capable of balancing the sample distribution, so as to remove unexpected correlations among features. Meanwhile, the learned weights are combined with k-means, which makes the reweighted k-means cluster on the inherent data distribution without unexpected correlation influence. Moreover, we derive the updating rules to effectively infer the parameters in DCKM. Extensive experiments results on real world datasets well demonstrate that our DCKM algorithm achieves significant performance gains, indicating the necessity of removing unexpected feature correlations induced by selection bias when clustering.

AAAI Conference 2020 Conference Paper

Stable Prediction with Model Misspecification and Agnostic Distribution Shift

  • Kun Kuang
  • Ruoxuan Xiong
  • Peng Cui
  • Susan Athey
  • Bo Li

For many machine learning algorithms, two main assumptions are required to guarantee performance. One is that the test data are drawn from the same distribution as the training data, and the other is that the model is correctly speci- fied. In real applications, however, we often have little prior knowledge on the test data and on the underlying true model. Under model misspecification, agnostic distribution shift between training and test data leads to inaccuracy of parameter estimation and instability of prediction across unknown test data. To address these problems, we propose a novel Decorrelated Weighting Regression (DWR) algorithm which jointly optimizes a variable decorrelation regularizer and a weighted regression model. The variable decorrelation regularizer estimates a weight for each sample such that variables are decorrelated on the weighted training data. Then, these weights are used in the weighted regression to improve the accuracy of estimation on the effect of each variable, thus help to improve the stability of prediction across unknown test data. Extensive experiments clearly demonstrate that our DWR algorithm can significantly improve the accuracy of parameter estimation and stability of prediction with model misspecification and agnostic distribution shift.

AAAI Conference 2017 Conference Paper

Treatment Effect Estimation with Data-Driven Variable Decomposition

  • Kun Kuang
  • Peng Cui
  • Bo Li
  • Meng Jiang
  • Shiqiang Yang
  • Fei Wang

One fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Control for confounding effect is generally handled by propensity score. But it treats all observed variables as confounders and ignores the adjustment variables, which have no influence on treatment but are predictive of the outcome. Recently, it has been demonstrated that the adjustment variables are effective in reducing the variance of estimated treatment effect. However, how to automatically separate the confounders and adjustment variables in observational studies is still an open problem, especially in the scenarios of high dimensional variables, which are common in big data era. In this paper, we propose a Data-Driven Variable Decomposition (D2 VD) algorithm, which can 1) automatically separate confounders and adjustment variables with a data driven approach, and 2) simultaneously estimate treatment effect in observational studies with high dimensional variables. Under standard assumptions, we show experimentally that our D2 VD algorithm can automatically separate the variables precisely, and estimate treatment effect more accurately and with tighter confidence intervals than the state-of-the-art methods on both synthetic data and real online advertising dataset.