Author name cluster

Crystal Maung

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

1 author row

AAAI Conference 2025 Short Paper

Multi-View Unsupervised Column Subset Selection via Combinatorial Search (Student Abstract)

Guihong Wan
Ninghui Hao
Crystal Maung
Haim Schweitzer
Chen Zhao
Kun-Hsing Yu
Yevgeniy R. Semenov

Given a data matrix, unsupervised column subset selection refers to the problem of identifying a subset of columns that can be used to linearly approximate the original data matrix. This problem has many applications, such as feature selection and representative selection, but solving it optimally is known to be NP-hard. We consider multi-view unsupervised column subset selection, which extends the concept of (single-view) column subset selection to data represented in multiple views or modalities. We introduce a combinatorial search algorithm for this generalized problem. One variant of the algorithm is guaranteed to compute an optimal solution in a setting similar to the classical A* algorithm. Other suboptimal variants, in a setting similar to the weighted A* algorithm, are much faster and provide a solution along with a bound on its quality.

PDF Details DOI

AAAI Conference 2019 Conference Paper

Heuristic Search Algorithm for Dimensionality Reduction Optimally Combining Feature Selection and Feature Extraction

Baokun He
Swair Shah
Crystal Maung
Gordon Arnold
Guihong Wan
Haim Schweitzer

The following are two classical approaches to dimensionality reduction: 1. Approximating the data with a small number of features that exist in the data (feature selection). 2. Approximating the data with a small number of arbitrary features (feature extraction). We study a generalization that approximates the data with both selected and extracted features. We show that an optimal solution to this hybrid problem involves a combinatorial search, and cannot be trivially obtained even if one can solve optimally the separate problems of selection and extraction. Our approach that gives optimal and approximate solutions uses a “best first” heuristic search. The algorithm comes with both an a priori and an a posteriori optimality guarantee similar to those that can be obtained for the classical weighted A* algorithm. Experimental results show the effectiveness of the proposed approach.

PDF Details

AAAI Conference 2018 Short Paper

Solving Generalized Column Subset Selection With Heuristic Search

Swair Shah
Baokun He
Ke Xu
Crystal Maung
Haim Schweitzer

We address the problem of approximating a matrix by the linear combination of a column sparse matrix and a low rank matrix. Two variants of a heuristic search algorithm are described. The ﬁrst produces an optimal solution but may be slow, as these problems are believed to be NP-hard. The second is much faster, but only guarantees a suboptimal solution. The quality of the approximation and the optimality criterion can be speciﬁed in terms of unitarily invariant norms.

PDF Details

AAAI Conference 2017 Conference Paper

Cleaning the Null Space: A Privacy Mechanism for Predictors

Ke Xu
Tongyi Cao
Swair Shah
Crystal Maung
Haim Schweitzer

In standard machine learning and regression setting feature values are used to predict some desired information. The privacy challenge considered here is to prevent an adversary from using available feature values to predict conﬁdential information that one wishes to keep secret. We show that this can sometimes be achieved with almost no effect on the quality of predicting desired information. We describe two algorithms aimed at providing such privacy when the predictors have a linear operator in the ﬁrst stage. The desired effect can be achieved by zeroing out feature components in the approximate null space of the linear operator.

PDF Details

AAAI Conference 2017 Short Paper

Enhancing the Privacy of Predictors

Ke Xu
Swair Shah
Tongyi Cao
Crystal Maung
Haim Schweitzer

The privacy challenge considered here is to prevent an adversary from using available feature values to predict confi- dential information. We propose an algorithm providing such privacy for predictors that have a linear operator in the first stage. Privacy is achieved by zeroing out feature components in the approximate null space of the linear operator. We show that this has little effect on predicting desired information.

PDF Details

AAAI Conference 2016 Conference Paper

Unsupervised Feature Selection by Heuristic Search with Provable Bounds on Suboptimality

Hiromasa Arai
Crystal Maung
Ke Xu
Haim Schweitzer

Identifying a small number of features that can represent the data is a known problem that comes up in areas such as machine learning, knowledge representation, data mining, and numerical linear algebra. Computing an optimal solution is believed to be NP-hard, and there is extensive work on approximation algorithms. Classic approaches exploit the algebraic structure of the underlying matrix, while more recent approaches use randomization. An entirely different approach that uses the A* heuristic search algorithm to ﬁnd an optimal solution was recently proposed. Not surprisingly it is limited to effectively selecting only a small number of features. We propose a similar approach related to the Weighted A* algorithm. This gives algorithms that are not guaranteed to ﬁnd an optimal solution but run much faster than the A* approach, enabling effective selection of many features from large datasets. We demonstrate experimentally that these new algorithms are more accurate than the current state-of-the-art while still being practical. Furthermore, they come with an adjustable guarantee on how different their error may be from the smallest possible (optimal) error. Their accuracy can always be increased at the expense of a longer running time.

PDF Details

AAAI Conference 2016 Conference Paper

Weighted A* Algorithms for Unsupervised Feature Selection with Provable Bounds on Suboptimality

Hiromasa Arai
Ke Xu
Crystal Maung
Haim Schweitzer

PDF Details

AAAI Conference 2015 Conference Paper

Optimal Column Subset Selection by A-Star Search

Hiromasa Arai
Crystal Maung
Haim Schweitzer

Approximating a matrix by a small subset of its columns is a known problem in numerical linear algebra. Algorithms that address this problem have been used in areas which include, among others, sparse approximation, unsupervised feature selection, data mining, and knowledge representation. Such algorithms were investigated since the 1960’s, with recent results that use randomization. The problem is believed to be NP-Hard, and to the best of our knowledge there are no previously published algorithms aimed at computing optimal solutions. We show how to model the problem as a graph search, and propose a heuristic based on eigenvalues of related matrices. Applying the A∗ search strategy with this heuristic is guaranteed to find the optimal solution. Experimental results on common datasets show that the proposed algorithm can effectively select columns from moderate size matrices, typically improving by orders of magnitude the run time of exhaustive search.

PDF Details

NeurIPS Conference 2013 Conference Paper

Pass-efficient unsupervised feature selection

Crystal Maung
Haim Schweitzer

The goal of unsupervised feature selection is to identify a small number of important features that can represent the data. We propose a new algorithm, a modification of the classical pivoted QR algorithm of Businger and Golub, that requires a small number of passes over the data. The improvements are based on two ideas: keeping track of multiple features in each pass, and skipping calculations that can be shown not to affect the final selection. Our algorithm selects the exact same features as the classical pivoted QR algorithm, and has the same favorable numerical stability. We describe experiments on real-world datasets which sometimes show improvements of {\em several orders of magnitude} over the classical algorithm. These results appear to be competitive with recently proposed randomized algorithms in terms of pass efficiency and run time. On the other hand, the randomized algorithms may produce better features, at the cost of small probability of failure.

PDF Details