Arrow Research search

Author name cluster

Yewei Xia

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
2 author rows

Possible papers

9

AAAI Conference 2026 Conference Paper

Invariant Feature Learning for Counterfactual Watch-time Prediction in Video Recommendation

  • Chenghou Jin
  • Yixin Ren
  • Hongxu Ma
  • Yewei Xia
  • Yi Guan
  • Hao Zhang
  • Jiandong Ding
  • Jihong Guan

Video recommendation systems heavily rely on user watch time feedback, making accurate watch time prediction a crucial task. However, this task inherently suffers from bias, as recommendation models tend to favor long-duration videos to maximize watch time. This issue, known as duration bias in the watch-time prediction context, can be explained from a causal perspective, where video duration acts as a confounder. Recent works address this bias using backdoor adjustment, isolating the direct effect of content on watch time from observational data. These methods typically discretize video duration into groups, estimate group-wise effects, and then aggregate them via a unified prediction model. However, this aggregation strategy is prone to model misspecification due to feature distribution shift across groups. In this paper, we reinterpret the problem through the lens of invariant learning and propose a novel framework: Duration-Invariant Feature Learning (DIFL). DIFL employs a kernel-based regularization that enforces representation invariance across duration groups, reducing sensitivity to group design and improving generalization. This enables more accurate modeling of the direct causal effect and making counterfactual inference. Extensive experiments on both public and real large-scale production datasets demonstrate the effectiveness of our approach, which achieves SOTA performance.

IJCAI Conference 2025 Conference Paper

Efficient Constraint-based Window Causal Graph Discovery in Time Series with Multiple Time Lags

  • Yewei Xia
  • Yixin Ren
  • Hong Cheng
  • Hao Zhang
  • Jihong Guan
  • Minchuan Xu
  • Shuigeng Zhou

We address the identification of direct causes in time series with multiple time lags, and propose a constraint-based window causal graph discovery method. A key advantage of our method is that the number of required conditional independence (CI) tests scales quadratically with the number of sub-series. The method first uses CI tests to find the minimum trek lag between two arbitrary sub-series, followed by designing an efficient CI testing strategy to identify the direct causes between them. We show that the method is both sound and complete under some graph constraints. We compare the proposed method with typical baselines on various datasets. Experimental results show that our method outperforms all the counterparts in both accuracy and running speed.

ICML Conference 2025 Conference Paper

Extracting Rare Dependence Patterns via Adaptive Sample Reweighting

  • Yiqing Li
  • Yewei Xia
  • Xiaofei Wang
  • Zhengming Chen 0002
  • Liuhua Peng
  • Mingming Gong
  • Kun Zhang 0001

Discovering dependence patterns between variables from observational data is a fundamental issue in data analysis. However, existing testing methods often fail to detect subtle yet critical patterns that occur within small regions of the data distribution–patterns we term rare dependence. These rare dependencies obscure the true underlying dependence structure in variables, particularly in causal discovery tasks. To address this issue, we propose a novel testing method that combines kernel-based (conditional) independence testing with adaptive sample importance reweighting. By learning and assigning higher importance weights to data points exhibiting significant dependence, our method amplifies the patterns and can detect them successfully. Theoretically, we analyze the asymptotic distributions of the statistics in this method and show the uniform bound of the learning scheme. Furthermore, we integrate our tests into the PC algorithm, a constraint-based approach for causal discovery, equipping it to uncover causal relationships even in the presence of rare dependence. Empirical evaluation of synthetic and real-world datasets comprehensively demonstrates the efficacy of our method.

ICML Conference 2025 Conference Paper

Identification of Latent Confounders via Investigating the Tensor Ranks of the Nonlinear Observations

  • Zhengming Chen 0002
  • Yewei Xia
  • Feng Xie 0002
  • Jie Qiao
  • Zhifeng Hao
  • Ruichu Cai
  • Kun Zhang 0001

We study the problem of learning discrete latent variable causal structures from mixed-type observational data. Traditional methods, such as those based on the tensor rank condition, are designed to identify discrete latent structure models and provide robust identification bounds for discrete causal models. However, when observed variables—specifically, those representing the children of latent variables—are collected at various levels with continuous data types, the tensor rank condition is not applicable, limiting further causal structure learning for latent variables. In this paper, we consider a more general case where observed variables can be either continuous or discrete, and further allow for scenarios where multiple latent parents cause the same set of observed variables. We show that, under the completeness condition, it is possible to discretize the data in a way that satisfies the full-rank assumption required by the tensor rank condition. This enables the identifiability of discrete latent structure models within mixed-type observational data. Moreover, we introduce the two-sufficient measurement condition, a more general structural assumption under which the tensor rank condition holds and the underlying latent causal structure is identifiable by a proposed two-stage identification algorithm. Extensive experiments on both simulated and real-world data validate the effectiveness of our method.

IJCAI Conference 2025 Conference Paper

Identifying Causal Mechanism Shifts Under Additive Models with Arbitrary Noise

  • Yewei Xia
  • Xueliang Cui
  • Hao Zhang
  • Yixin Ren
  • Feng Xie
  • Jihong Guan
  • Ruxin Wang
  • Shuigeng Zhou

In many real-world scenarios, the goal is to identify variables whose causal mechanisms change across related datasets. For example, detecting abnormal root nodes in manufacturing, and identifying key genes that influence cancer by analyzing differences in gene regulatory mechanisms between healthy individuals and cancer patients. This can be done by recovering the causal structure for each dataset independently and then comparing them to identify differences, but the performance is often suboptimal. Typically, existing methods directly identify causal mechanism shifts based on linear additive noise models (ANMs) or by imposing restrictive assumptions on the noise distribution. In this paper, we introduce CMSI, a novel and more general algorithm based on nonlinear ANMs that identifies variables with shifting causal mechanisms under arbitrary noise distributions. Evaluated on various synthetic datasets, CMSI consistently outperforms existing baselines in terms of F1 score. Additionally, we demonstrate CMSI's applicability on gene expression datasets of ovarian cancer patients at different disease stages.

NeurIPS Conference 2024 Conference Paper

Efficiently Learning Significant Fourier Feature Pairs for Statistical Independence Testing

  • Yixin Ren
  • Yewei Xia
  • Hao Zhang
  • Jihong Guan
  • Shuigeng Zhou

We propose a novel method to efficiently learn significant Fourier feature pairs for maximizing the power of Hilbert-Schmidt Independence Criterion~(HSIC) based independence tests. We first reinterpret HSIC in the frequency domain, which reveals its limited discriminative power due to the inability to adapt to specific frequency-domain features under the current inflexible configuration. To remedy this shortcoming, we introduce a module of learnable Fourier features, thereby developing a new criterion. We then derive a finite sample estimate of the test power by modeling the behavior of the criterion, thus formulating an optimization objective for significant Fourier feature pairs learning. We show that this optimization objective can be computed in linear time (with respect to the sample size $n$), which ensures fast independence tests. We also prove the convergence property of the optimization objective and establish the consistency of the independence tests. Extensive empirical evaluation on both synthetic and real datasets validates our method's superiority in effectiveness and efficiency, particularly in handling high-dimensional data and dealing with large-scale scenarios.

AAAI Conference 2023 Conference Paper

Differentially Private Nonlinear Causal Discovery from Numerical Data

  • Hao Zhang
  • Yewei Xia
  • Yixin Ren
  • Jihong Guan
  • Shuigeng Zhou

Recently, several methods such as private ANM, EM-PC and Priv-PC have been proposed to perform differentially private causal discovery in various scenarios including bivariate, multivariate Gaussian and categorical cases. However, there is little effort on how to conduct private nonlinear causal discovery from numerical data. This work tries to challenge this problem. To this end, we propose a method to infer nonlinear causal relations from observed numerical data by using regression-based conditional independence test (RCIT) that consists of kernel ridge regression (KRR) and Hilbert-Schmidt independence criterion (HSIC) with permutation approximation. Sensitivity analysis for RCIT is given and a private constraint-based causal discovery framework with differential privacy guarantee is developed. Extensive simulations and real-world experiments for both conditional independence test and causal discovery are conducted, which show that our method is effective in handling nonlinear numerical cases and easy to implement. The source code of our method and data are available at https://github.com/Causality-Inference/PCD.

AAAI Conference 2023 Conference Paper

Multi-Level Wavelet Mapping Correlation for Statistical Dependence Measurement: Methodology and Performance

  • Yixin Ren
  • Hao Zhang
  • Yewei Xia
  • Jihong Guan
  • Shuigeng Zhou

We propose a new criterion for measuring dependence between two real variables, namely, Multi-level Wavelet Mapping Correlation (MWMC). MWMC can capture the nonlinear dependencies between variables by measuring their correlation under different levels of wavelet mappings. We show that the empirical estimate of MWMC converges exponentially to its population quantity. To support independence test better with MWMC, we further design a permutation test based on MWMC and prove that our test can not only control the type I error rate (the rate of false positives) well but also ensure that the type II error rate (the rate of false negatives) is upper bounded by O(1/n) (n is the sample size) with finite permutations. By extensive experiments on (conditional) independence tests and causal discovery, we show that our method outperforms existing independence test methods.