Arrow Research search

Author name cluster

Baohong Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
1 author row

Possible papers

5

ICML Conference 2025 Conference Paper

Generalizing Causal Effects from Randomized Controlled Trials to Target Populations across Diverse Environments

  • Baohong Li
  • Yingrong Wang
  • Anpeng Wu
  • Ming Ma
  • Ruoxuan Xiong
  • Kun Kuang 0001

Generalizing causal effects from Randomized Controlled Trials (RCTs) to target populations across diverse environments is of significant practical importance, as RCTs are often costly and logistically complex to conduct. A key challenge is environmental shift, defined as changes in the distribution and availability of covariates between source and target environments. A common approach addressing this challenge is to identify a separating set–covariates that govern both treatment effect heterogeneity and environmental differences–and combine RCT samples with target populations matched on this set. However, this approach assumes that the separating set is fully observed and shared across datasets, an assumption often violated in practice. We propose a novel Two-Stage Doubly Robust (2SDR) method that relaxes this assumption by allowing the separating set to be observed in only one of the two datasets. 2SDR leverages shadow variables to impute missing components of the separating set and generalize treatment effects across environments in a two-stage procedure. We show the identification of causal effects in target environments under 2SDR and demonstrate its effectiveness through extensive experiments on both synthetic and real-world datasets.

ICML Conference 2025 Conference Paper

Rethinking Causal Ranking: A Balanced Perspective on Uplift Model Evaluation

  • Minqin Zhu
  • Zexu Sun
  • Ruoxuan Xiong
  • Anpeng Wu
  • Baohong Li
  • Caizhi Tang
  • Jun Zhou 0011
  • Fei Wu 0001

Uplift modeling is crucial for identifying individuals likely to respond to a treatment in applications like marketing and customer retention, but evaluating these models is challenging due to the inaccessibility of counterfactual outcomes in real-world settings. In this paper, we identify a fundamental limitation in existing evaluation metrics, such as the uplift and Qini curves, which fail to rank individuals with binary negative outcomes accurately. This can lead to biased evaluations, where biased models receive higher curve values than unbiased ones, resulting in suboptimal model selection. To address this, we propose the Principled Uplift Curve (PUC), a novel evaluation metric that assigns equal curve values of individuals with both positive and negative binary outcomes, offering a more balanced and unbiased assessment. We then derive the Principled Uplift Loss (PUL) function from the PUC and integrate it into a new uplift model, the Principled Treatment and Outcome Network (PTONet), to reduce bias during uplift model training. Experiments on both simulated and real-world datasets demonstrate that the PUC provides less biased evaluations, while PTONet outperforms existing methods. The source code is available at: https: //github. com/euzmin/PUC.

ICML Conference 2024 Conference Paper

A Generative Approach for Treatment Effect Estimation under Collider Bias: From an Out-of-Distribution Perspective

  • Baohong Li
  • Haoxuan Li 0001
  • Anpeng Wu
  • Minqin Zhu
  • Shiyuan Peng
  • Qingyu Cao
  • Kun Kuang 0001

Resulting from non-random sample selection caused by both the treatment and outcome, collider bias poses a unique challenge to treatment effect estimation using observational data whose distribution differs from that of the target population. In this paper, we rethink collider bias from an out-of-distribution (OOD) perspective, considering that the entire data space of the target population consists of two different environments: The observational data selected from the target population belongs to a seen environment labeled with $S=1$ and the missing unselected data belongs to another unseen environment labeled with $S=0$. Based on this OOD formulation, we utilize small-scale representative data from the entire data space with no environmental labels and propose a novel method, i. e. , Coupled Counterfactual Generative Adversarial Model (C$^2$GAM), to simultaneously generate the missing $S=0$ samples in observational data and the missing $S$ labels in the small-scale representative data. With the help of C$^2$GAM, collider bias can be addressed by combining the generated $S=0$ samples and the observational data to estimate treatment effects. Extensive experiments on synthetic and real-world data demonstrate that plugging C$^2$GAM into existing treatment effect estimators achieves significant performance improvements.

ICML Conference 2024 Conference Paper

Learning Shadow Variable Representation for Treatment Effect Estimation under Collider Bias

  • Baohong Li
  • Haoxuan Li 0001
  • Ruoxuan Xiong
  • Anpeng Wu
  • Fei Wu 0001
  • Kun Kuang 0001

One of the significant challenges in treatment effect estimation is collider bias, a specific form of sample selection bias induced by the common causes of both the treatment and outcome. Identifying treatment effects under collider bias requires well-defined shadow variables in observational data, which are assumed to be related to the outcome and independent of the sample selection mechanism, conditional on the other observed variables. However, finding a valid shadow variable is not an easy task in real-world scenarios and requires domain-specific knowledge from experts. Therefore, in this paper, we propose a novel method that can automatically learn shadow-variable representations from observational data without prior knowledge. To ensure the learned representations satisfy the assumptions of the shadow variable, we introduce a tester to perform hypothesis testing in the representation learning process. We iteratively generate representations and test whether they satisfy the shadow-variable assumptions until they pass the test. With the help of the learned shadow-variable representations, we propose a novel treatment effect estimator to address collider bias. Experiments show that the proposed methods outperform existing treatment effect estimation methods under collider bias and prove their potential application value.

ICML Conference 2024 Conference Paper

Two-Stage Shadow Inclusion Estimation: An IV Approach for Causal Inference under Latent Confounding and Collider Bias

  • Baohong Li
  • Anpeng Wu
  • Ruoxuan Xiong
  • Kun Kuang 0001

Latent confounding bias and collider bias are two key challenges of causal inference in observational studies. Latent confounding bias occurs when failing to control the unmeasured covariates that are common causes of treatments and outcomes, which can be addressed by using the Instrumental Variable (IV) approach. Collider bias comes from non-random sample selection caused by both treatments and outcomes, which can be addressed by using a different type of instruments, i. e. , shadow variables. However, in most scenarios, these two biases simultaneously exist in observational data, and the previous methods focusing on either one are inadequate. To the best of our knowledge, no approach has been developed for causal inference when both biases exist. In this paper, we propose a novel IV approach, Two-Stage Shadow Inclusion (2SSI), which can simultaneously address latent confounding bias and collider bias by utilizing the residual of the treatment as a shadow variable. Extensive experimental results on benchmark synthetic datasets and a real-world dataset show that 2SSI achieves noticeable performance improvement when both biases exist compared to existing methods.