Author name cluster

Yang Ning

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

1 author row

JMLR Journal 2025 Journal Article

DisC2o-HD: Distributed causal inference with covariates shift for analyzing real-world high-dimensional data

Jiayi Tong
Jie Hu
George Hripcsak
Yang Ning
Yong Chen

High-dimensional healthcare data, such as electronic health records (EHR) data and claims data, present two primary challenges due to the large number of variables and the need to consolidate data from multiple clinical sites. The third key challenge is the potential existence of heterogeneity in terms of covariate shift. In this paper, we propose a distributed learning algorithm accounting for covariate shift to estimate the average treatment effect (ATE) for high-dimensional data, named DisC2o-HD. Leveraging the surrogate likelihood method, our method calibrates the estimates of the propensity score and outcome models to approximately attain the desired covariate balancing property, while accounting for the covariate shift across multiple clinical sites. We show that our distributed covariate balancing propensity score estimator can approximate the pooled estimator, which is obtained by pooling the data from multiple sites together. The proposed estimator remains consistent if either the propensity score model or the outcome regression model is correctly specified. The semiparametric efficiency bound is achieved when both the propensity score and the outcome models are correctly specified. We conduct simulation studies to demonstrate the performance of the proposed algorithm; additionally, we conduct an empirical study to present the readiness of implementation and validity. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

PDF Details

JMLR Journal 2025 Journal Article

Exponential Family Graphical Models: Correlated Replicates and Unmeasured Confounders, with Applications to fMRI Data

Yanxin Jin
Yang Ning
Kean Ming Tan

Graphical models have been used extensively for modeling brain connectivity networks. However, unmeasured confounders and correlations among measurements are often overlooked during model fitting, which may lead to spurious scientific discoveries. Motivated by functional magnetic resonance imaging (fMRI) studies, we propose a novel method for constructing brain connectivity networks with correlated replicates and latent effects. In a typical fMRI study, each participant is scanned and fMRI measurements are collected across a period of time. In many cases, subjects may have different states of mind that cannot be measured during the brain scan: for instance, some subjects may be awake during the first half of the brain scan, and may fall asleep during the second half of the brain scan. To model the correlation among replicates and latent effects induced by the different states of mind, we assume that the correlated replicates within each independent subject follow a one-lag vector autoregressive model, and that the latent effects induced by the unmeasured confounders are piecewise constant. Theoretical guarantees are established for parameter estimation. We demonstrate via extensive numerical studies that our method is able to estimate latent variable graphical models with correlated replicates more accurately than existing methods. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

PDF Details

AIIM Journal 2025 Journal Article

TIPs: Tooth instance and pulp segmentation based on hierarchical extraction and fusion of anatomical priors from cone-beam CT

Tao Zhong
Yang Ning
Xueyang Wu
Li Ye
Chichi Li
Yu Zhang
Yu Du

Accurate instance segmentation of tooth and pulp from cone-beam computed tomography (CBCT) images is essential but highly challenging due to the pulp’s small structures and indistinct boundaries. To address these critical challenges, we propose TIPs designed for Tooth Instance and Pulp segmentation. TIPs initially employs a backbone model to segment a binary mask of the tooth from CBCT images, which is then utilized to derive position prior of the tooth and shape prior of the pulp. Subsequently, we propose the Hierarchical Fusion Mamba models to leverage the strengths of both anatomical priors and CBCT images by extracting and integrating shallow and deep features from Convolution Neural Networks (CNNs) and State Space Sequence Models (SSMs), respectively. This process achieves tooth instance and pulp segmentation, which are then combined to obtain the final pulp instance segmentation. Extensive experiments on CBCT scans from 147 patients demonstrate that TIPs significantly outperforms state-of-the-art methods in terms of segmentation accuracy. Furthermore, we have encapsulated this framework into an openly accessible tool for one-click using. To our knowledge, this is the first toolbox capable of segmentation of tooth and pulp instances, with its performance validated on two external datasets comprising 59 samples from the Toothfairy2 dataset and 48 samples from the STS dataset. These results demonstrate the potential of TIPs as a practical tool to boost clinical workflows in digital dentistry, enhancing the precision and efficiency of dental diagnostics and treatment planning.

Details DOI

IJCAI Conference 2025 Conference Paper

Towards Region-Adaptive Feature Disentanglement and Enhancement for Small Object Detection

Yanchao Bi
Yang Ning
Xiushan Nie
Xiankai Lu
Yongshun Gong
Leida Li

Current feature fusion strategies often fail to adequately account for the influence of activation intensity across different scales on small object features, which impedes the effective detection of small objects. To address this limitation, we propose the Region-Adaptive Feature Disentanglement and Enhancement (RAFDE) strategy, which improves both downsampling and feature fusion by leveraging activation intensity variations at multiple scales. First, we introduce the Boundary Transitional Region-enhanced Downsampling (BTRD) module, which enhances boundary transitional regions containing both strongly and weakly activated features, thereby mitigating the loss of crucial boundary information for small objects. Second, we present the Regional-Adaptive Feature Fusion (RAFF) module, which adaptively disentangles and fuses co-activated and uni-activated regions from adjacent levels into the current level, effectively reducing the risk of small objects being overwhelmed. Extensive experiments on several public datasets demonstrate that the RAFDE strategy is highly effective and outperforms state-of-the-art methods. The code is available at https: //github. com/b-yanchao/RAFDE. git.

PDF Details DOI

JMLR Journal 2022 Journal Article

Estimation and inference on high-dimensional individualized treatment rule in observational data using split-and-pooled de-correlated score

Muxuan Liang
Young-Geun Choi
Yang Ning
Maureen A Smith
Ying-Qi Zhao

With the increasing adoption of electronic health records, there is an increasing interest in developing individualized treatment rules, which recommend treatments according to patients' characteristics, from large observational data. However, there is a lack of valid inference procedures for such rules developed from this type of data in the presence of high-dimensional covariates. In this work, we develop a penalized doubly robust method to estimate the optimal individualized treatment rule from high-dimensional data. We propose a split-and-pooled de-correlated score to construct hypothesis tests and confidence intervals. Our proposal adopts the data splitting to conquer the slow convergence rate of nuisance parameter estimations, such as non-parametric methods for outcome regression or propensity models. We establish the limiting distributions of the split-and-pooled de-correlated score test and the corresponding one-step estimator in high-dimensional setting. Simulation and real data analysis are conducted to demonstrate the superiority of the proposed method. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

PDF Details

JMLR Journal 2020 Journal Article

High-Dimensional Inference for Cluster-Based Graphical Models

Carson Eisenach
Florentina Bunea
Yang Ning
Claudiu Dinicu

Motivated by modern applications in which one constructs graphical models based on a very large number of features, this paper introduces a new class of cluster-based graphical models, in which variable clustering is applied as an initial step for reducing the dimension of the feature space. We employ model assisted clustering, in which the clusters contain features that are similar to the same unobserved latent variable. Two different cluster-based Gaussian graphical models are considered: the latent variable graph, corresponding to the graphical model associated with the unobserved latent variables, and the cluster-average graph, corresponding to the vector of features averaged over clusters. Our study reveals that likelihood based inference for the latent graph, not analyzed previously, is analytically intractable. Our main contribution is the development and analysis of alternative estimation and inference strategies, for the precision matrix of an unobservable latent vector Z. We replace the likelihood of the data by an appropriate class of empirical risk functions, that can be specialized to the latent graphical model and to the simpler, but under-analyzed, cluster-average graphical model. The estimators thus derived can be used for inference on the graph structure, for instance on edge strength or pattern recovery. Inference is based on the asymptotic limits of the entry-wise estimates of the precision matrices associated with the conditional independence graphs under consideration. While taking the uncertainty induced by the clustering step into account, we establish Berry-Esseen central limit theorems for the proposed estimators. It is noteworthy that, although the clusters are estimated adaptively from the data, the central limit theorems regarding the entries of the estimated graphs are proved under the same conditions one would use if the clusters were known in advance. As an illustration of the usage of these newly developed inferential tools, we show that they can be reliably used for recovery of the sparsity pattern of the graphs we study, under FDR control, which is verified via simulation studies and an fMRI data analysis. These experimental results confirm the theoretically established difference between the two graph structures. Furthermore, the data analysis suggests that the latent variable graph, corresponding to the unobserved cluster centers, can help provide more insight into the understanding of the brain connectivity networks relative to the simpler, average-based, graph. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

PDF Details

AAAI Conference 2020 Conference Paper

Regularized Training and Tight Certification for Randomized Smoothed Classifier with Provable Robustness

Huijie Feng
Chunpeng Wu
Guoyang Chen
Weifeng Zhang
Yang Ning

Recently smoothing deep neural network based classiﬁers via isotropic Gaussian perturbation is shown to be an effective and scalable way to provide state-of-the-art probabilistic robustness guarantee against 2 norm bounded adversarial perturbations. However, how to train a good base classiﬁer that is accurate and robust when smoothed has not been fully investigated. In this work, we derive a new regularized risk, in which the regularizer can adaptively encourage the accuracy and robustness of the smoothed counterpart when training the base classiﬁer. It is computationally efﬁcient and can be implemented in parallel with other empirical defense methods. We discuss how to implement it under both standard (nonadversarial) and adversarial training scheme. At the same time, we also design a new certiﬁcation algorithm, which can leverage the regularization effect to provide tighter robustness lower bound that holds with high probability. Our extensive experimentation demonstrates the effectiveness of the proposed training and certiﬁcation approaches on CIFAR-10 and ImageNet datasets.

PDF Details

JMLR Journal 2019 Journal Article

Efficient augmentation and relaxation learning for individualized treatment rules using observational data

Ying-Qi Zhao
Eric B. Laber
Yang Ning
Sumona Saha
Bruce E. Sands

Individualized treatment rules aim to identify if, when, which, and to whom treatment should be applied. A globally aging population, rising healthcare costs, and increased access to patient-level data have created an urgent need for high-quality estimators of individualized treatment rules that can be applied to observational data. A recent and promising line of research for estimating individualized treatment rules recasts the problem of estimating an optimal treatment rule as a weighted classification problem. We consider a class of estimators for optimal treatment rules that are analogous to convex large-margin classifiers. The proposed class applies to observational data and is doubly-robust in the sense that correct specification of either a propensity or outcome model leads to consistent estimation of the optimal individualized treatment rule. Using techniques from semiparametric efficiency theory, we derive rates of convergence for the proposed estimators and use these rates to characterize the bias-variance trade-off for estimating individualized treatment rules with classification-based methods. Simulation experiments informed by these results demonstrate that it is possible to construct new estimators within the proposed framework that significantly outperform existing ones. We illustrate the proposed methods using data from a labor training program and a study of inflammatory bowel syndrome. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

PDF Details

JMLR Journal 2018 Journal Article

On Semiparametric Exponential Family Graphical Models

Zhuoran Yang
Yang Ning
Han Liu

We propose a new class of semiparametric exponential family graphical models for the analysis of high dimensional mixed data. Different from the existing mixed graphical models, we allow the nodewise conditional distributions to be semiparametric generalized linear models with unspecified base measure functions. Thus, one advantage of our method is that it is unnecessary to specify the type of each node and the method is more convenient to apply in practice. Under the proposed model, we consider both problems of parameter estimation and hypothesis testing in high dimensions. In particular, we propose a symmetric pairwise score test for the presence of a single edge in the graph. Compared to the existing methods for hypothesis tests, our approach takes into account of the symmetry of the parameters, such that the inferential results are invariant with respect to the different parametrizations of the same edge. Thorough numerical simulations and a real data example are provided to back up our theoretical results. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

PDF Details

NeurIPS Conference 2015 Conference Paper

High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality

Zhaoran Wang
Quanquan Gu
Yang Ning
Han Liu

We provide a general theory of the expectation-maximization (EM) algorithm for inferring high dimensional latent variable models. In particular, we make two contributions: (i) For parameter estimation, we propose a novel high dimensional EM algorithm which naturally incorporates sparsity structure into parameter estimation. With an appropriate initialization, this algorithm converges at a geometric rate and attains an estimator with the (near-)optimal statistical rate of convergence. (ii) Based on the obtained estimator, we propose a new inferential procedure for testing hypotheses for low dimensional components of high dimensional parameters. For a broad family of statistical models, our framework establishes the first computationally feasible approach for optimal estimation and asymptotic inference in high dimensions.

PDF Details