Arrow Research search
Back to ICML

ICML 2023

Kernel Sufficient Dimension Reduction and Variable Selection for Compositional Data via Amalgamation

Conference Paper Accepted Paper Artificial Intelligence ยท Machine Learning

Abstract

Compositional data with a large number of components and an abundance of zeros are frequently observed in many fields recently. Analyzing such sparse high-dimensional compositional data naturally calls for dimension reduction or, more preferably, variable selection. Most existing approaches lack interpretability or cannot handle zeros properly, as they rely on a log-ratio transformation. We approach this problem with sufficient dimension reduction (SDR), one of the most studied dimension reduction frameworks in statistics. Characterized by the conditional independence of the data to the response on the found subspace, the SDR framework has been effective for both linear and nonlinear dimension reduction problems. This work proposes a compositional SDR that can handle zeros naturally while incorporating the nonlinear nature and spurious negative correlations among components rigorously. A critical consideration of sub-composition versus amalgamation for compositional variable selection is discussed. The proposed compositional SDR is shown to be statistically consistent in constructing a sub-simplex consisting of true signal variables. Simulation and real microbiome data are used to demonstrate the performance of the proposed SDR compared to existing state-of-art approaches.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
International Conference on Machine Learning
Archive span
1993-2025
Indexed papers
16471
Paper id
594934230964592137