Arrow Research search

Author name cluster

Yingyu Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers
2 author rows

Possible papers

3

ICLR Conference 2025 Conference Paper

A Skewness-Based Criterion for Addressing Heteroscedastic Noise in Causal Discovery

  • Yingyu Lin
  • Yuxing Huang
  • Wenqin Liu
  • Haoran Deng
  • Ignavier Ng
  • Kun Zhang 0001
  • Mingming Gong
  • Yian Ma

Real-world data often violates the equal-variance assumption (homoscedasticity), making it essential to account for heteroscedastic noise in causal discovery. In this work, we explore heteroscedastic symmetric noise models (HSNMs), where the effect $Y$ is modeled as $Y = f(X) + \sigma(X)N$, with $X$ as the cause and $N$ as independent noise following a symmetric distribution. We introduce a novel criterion for identifying HSNMs based on the skewness of the score (i.e., the gradient of the log density) of the data distribution. This criterion establishes a computationally tractable measurement that is zero in the causal direction but nonzero in the anticausal direction, enabling the causal direction discovery. We extend this skewness-based criterion to the multivariate setting and propose \texttt{SkewScore}, an algorithm that handles heteroscedastic noise without requiring the extraction of exogenous noise. We also conduct a case study on the robustness of \texttt{SkewScore} in a bivariate model with a latent confounder, providing theoretical insights into its performance. Empirical studies further validate the effectiveness of the proposed method.

NeurIPS Conference 2025 Conference Paper

Purifying Approximate Differential Privacy with Randomized Post-processing

  • Yingyu Lin
  • Erchi Wang
  • Yian Ma
  • Yu-Xiang Wang

We propose a framework to convert $(\varepsilon, \delta)$-approximate Differential Privacy (DP) mechanisms into $(\varepsilon', 0)$-pure DP mechanisms under certain conditions, a process we call ``purification. '' This algorithmic technique leverages randomized post-processing with calibrated noise to eliminate the $\delta$ parameter while achieving near-optimal privacy-utility tradeoff for pure DP. It enables a new design strategy for pure DP algorithms: first run an approximate DP algorithm with certain conditions, and then purify. This approach allows one to leverage techniques such as strong composition and propose-test-release that require $\delta>0$ in designing pure-DP methods with $\delta=0$. We apply this framework in various settings, including Differentially Private Empirical Risk Minimization (DP-ERM), stability-based release, and query release tasks. To the best of our knowledge, this is the first work with a statistically and computationally efficient reduction from approximate DP to pure DP. Finally, we illustrate the use of this reduction for proving lower bounds under approximate DP constraints with explicit dependence in $\delta$, avoiding the sophisticated fingerprinting code construction.

ICLR Conference 2024 Conference Paper

Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy

  • Yingyu Lin
  • Yian Ma
  • Yu-Xiang Wang 0003
  • Rachel Redberg
  • Zhiqi Bu

Posterior sampling, i.e., exponential mechanism to sample from the posterior distribution, provides $\varepsilon$-pure differential privacy (DP) guarantees and does not suffer from potentially unbounded privacy breach introduced by $(\varepsilon,\delta)$-approximate DP. In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo (MCMC), thus re-introducing the unappealing $\delta$-approximation error into the privacy guarantees. To bridge this gap, we propose the Approximate SAample Perturbation (abbr. ASAP) algorithm which perturbs an MCMC sample with noise proportional to its Wasserstein-infinity ($W_\infty$) distance from a reference distribution that satisfies pure DP or pure Gaussian DP (i.e., $\delta=0$). We then leverage a Metropolis-Hastings algorithm to generate the sample and prove that the algorithm converges in W$_\infty$ distance. We show that by combining our new techniques with a localization step, we obtain the first nearly linear-time algorithm that achieves the optimal rates in the DP-ERM problem with strongly convex and smooth losses.