Author name cluster

Kai Zhong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

EAAI Journal 2023 Journal Article

Overview of fault prognosis for traction systems in high-speed trains: A deep learning perspective

Kai Zhong
Jiayi Wang
Shuiqing Xu
Chao Cheng
Hongtian Chen

As the “heart” of high-speed train, traction systems play an important role in the safe operation of trains, of which the operation and maintenance level is still unable to meet the needs of modern railway transportation. Fortunately, multifarious advanced fault prognosis methods have been developed to deal with the dilemma. Among them, deep learning ones have received special attention due to their unique advantages. This paper first reveals the structural characteristics of traction systems in high-speed trains. Then, various representative deep learning based prognosis methods are compared and summarized, focusing on the analysis of their pros and cons. Finally, we point out the challenges and speculate the future trends in this field. This paper may serve as a referee for the interested researchers in fault prognosis and high-speed train traction systems fields, along with the potential directions.

Details DOI

JMLR Journal 2022 Journal Article

PECOS: Prediction for Enormous and Correlated Output Spaces

Hsiang-Fu Yu
Kai Zhong
Jiong Zhang
Wei-Cheng Chang
Inderjit S. Dhillon

Many large-scale applications amount to finding relevant results from an enormous output space of potential candidates. For example, finding the best matching product from a large catalog or suggesting related search phrases on a search engine. The size of the output space for these problems can range from millions to billions, and can even be infinite in some applications. Moreover, training data is often limited for the “long-tail” items in the output space. Fortunately, items in the output space are often correlated thereby presenting an opportunity to alleviate the data sparsity issue. In this paper, we propose the Prediction for Enormous and Correlated Output Spaces (PECOS) framework, a versatile and modular machine learning framework for solving prediction problems for very large output spaces, and apply it to the eXtreme Multilabel Ranking (XMR) problem: given an input instance, find and rank the most relevant items from an enormous but fixed and finite output space. We propose a three phase framework for PECOS: (i) in the first phase, PECOS organizes the output space using a semantic indexing scheme, (ii) in the second phase, PECOS uses the indexing to narrow down the output space by orders of magnitude using a machine learned matching scheme, and (iii) in the third phase, PECOS ranks the matched items using a final ranking scheme. The versatility and modularity of PECOS allows for easy plug-and-play of various choices for the indexing, matching, and ranking phases. The indexing and matching phases alleviate the data sparsity issue by leveraging correlations across different items in the output space. For the critical matching phase, we develop a recursive machine learned matching strategy with both linear and neural matchers. When applied to eXtreme Multilabel Ranking where the input instances are in textual form, we find that the recursive Transformer matcher gives state-of-the-art accuracy results, at the cost of two orders of magnitude increased training time compared to the recursive linear matcher. For example, on a dataset where the output space is of size 2.8 million, the recursive Transformer matcher results in a 6% increase in precision@1 (from 48.6% to 54.2%) over the recursive linear matcher but takes 100x more time to train. Thus it is up to the practitioner to evaluate the trade-offs and decide whether the increased training time and infrastructure cost is warranted for their application; indeed, the flexibility of the PECOS framework seamlessly allows different strategies to be used. We also develop very fast inference procedures which allow us to perform XMR predictions in real time; for example, inference takes less than 1 millisecond per input on the dataset with 2.8 million labels. The PECOS software is available at https://libpecos.org. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

PDF Details

EAAI Journal 2020 Journal Article

Online prediction of noisy time series: Dynamic adaptive sparse kernel recursive least squares from sparse and adaptive tracking perspective

Kai Zhong
Junzhu Ma
Min Han

With the deepening of the research on kernel recursive least squares (KRLS), significant researches have been applied to time series online prediction. However, it usually ignores the extraneous and redundant factors in the raw data, which can cause bias in the prediction. In addition, it usually contains both noise and non-stationary characteristics, resulting in deteriorated prediction accuracy and reduced model efficiency. To ease the above two drawbacks of conventional KRLS, this brief presents a dynamic adaptive sparse kernel recursive least squares (DASKRLS) filtering algorithm. It first uses the online vector projection standard and the approximate linear dependence criterion to effectively constrain kernel matrix dimension, and reduce the computational complexity of the model. After that, the regularized maximum correlation entropy criterion to significant process noise-containing data from the perspective of improving generalization ability. Moreover, the adaptive update mechanism can dynamically track the real-time weight of non-stationary signals. The dynamic sparse process is essentially equivalent to a feature selection process that maintains low-dimensional manifold information. Lorenz benchmarking experiments and real data experiments show that DASKRLS achieves better prediction performance in complex systems with noise and nonstationary.

Details DOI

NeurIPS Conference 2019 Conference Paper

Provable Non-linear Inductive Matrix Completion

Kai Zhong
Zhao Song
Prateek Jain
Inderjit Dhillon

Consider a standard recommendation/retrieval problem where given a query, the goal is to retrieve the most relevant items. Inductive matrix completion (IMC) method is a standard approach for this problem where the given query as well as the items are embedded in a common low-dimensional space. The inner product between a query embedding and an item embedding reflects relevance of the (query, item) pair. Non-linear IMC (NIMC) uses non-linear networks to embed the query as well as items, and is known to be highly effective for a variety of tasks, such as video recommendations for users, semantic web search, etc. Despite its wide usage, existing literature lacks rigorous understanding of NIMC models. A key challenge in analyzing such models is to deal with the non-convexity arising out of non-linear embeddings in addition to the non-convexity arising out of the low-dimensional restriction of the embedding space, which is akin to the low-rank restriction in the standard matrix completion problem. In this paper, we provide the first theoretical analysis for a simple NIMC model in the realizable setting, where the relevance score of a (query, item) pair is formulated as the inner product between their single-layer neural representations. Our results show that under mild assumptions we can recover the ground truth parameters of the NIMC model using standard (stochastic) gradient descent methods if the methods are initialized within a small distance to the optimal parameters. We show that a standard tensor method can be used to initialize the solution within the required distance to the optimal parameters. Furthermore, we show that the number of query-item relevance observations required, a key parameter in learning such models, scales nearly linearly with the input dimensionality thus matching existing results for the standard linear inductive matrix completion.

PDF Details

ICML Conference 2018 Conference Paper

Binary Classification with Karmic, Threshold-Quasi-Concave Metrics

Bowei Yan
Sanmi Koyejo
Kai Zhong
Pradeep Ravikumar

Complex performance measures, beyond the popular measure of accuracy, are increasingly being used in the context of binary classification. These complex performance measures are typically not even decomposable, that is, the loss evaluated on a batch of samples cannot typically be expressed as a sum or average of losses evaluated at individual samples, which in turn requires new theoretical and methodological developments beyond standard treatments of supervised learning. In this paper, we advance this understanding of binary classification for complex performance measures by identifying two key properties: a so-called Karmic property, and a more technical threshold-quasi-concavity property, which we show is milder than existing structural assumptions imposed on performance measures. Under these properties, we show that the Bayes optimal classifier is a threshold function of the conditional probability of positive class. We then leverage this result to come up with a computationally practical plug-in classifier, via a novel threshold estimator, and further, provide a novel statistical analysis of classification error with respect to complex performance measures.

Details

NeurIPS Conference 2018 Conference Paper

MixLasso: Generalized Mixed Regression via Convex Atomic-Norm Regularization

Ian En-Hsu Yen
Wei-Cheng Lee
Kai Zhong
Sung-En Chang
Pradeep Ravikumar
Shou-De Lin

We consider a generalization of mixed regression where the response is an additive combination of several mixture components. Standard mixed regression is a special case where each response is generated from exactly one component. Typical approaches to the mixture regression problem employ local search methods such as Expectation Maximization (EM) that are prone to spurious local optima. On the other hand, a number of recent theoretically-motivated \emph{Tensor-based methods} either have high sample complexity, or require the knowledge of the input distribution, which is not available in most of practical situations. In this work, we study a novel convex estimator \emph{MixLasso} for the estimation of generalized mixed regression, based on an atomic norm specifically constructed to regularize the number of mixture components. Our algorithm gives a risk bound that trades off between prediction accuracy and model sparsity without imposing stringent assumptions on the input/output distribution, and can be easily adapted to the case of non-linear functions. In our numerical experiments on mixtures of linear as well as nonlinear regressions, the proposed method yields high-quality solutions in a wider range of settings than existing approaches.

PDF Details

ICRA Conference 2017 Conference Paper

Fast second-order cone programming for safe mission planning

Kai Zhong
Prateek Jain 0002
Ashish Kapoor

This paper considers the problem of safe mission planning of dynamic systems operating under uncertain environments. Much of the prior work on achieving robust and safe control requires solving second-order cone programs (SOCP). Unfortunately, existing general purpose SOCP methods are often infeasible for real-time robotic tasks due to high memory and computational requirements imposed by existing general optimization methods. The key contribution of this paper is a fast and memory-efficient algorithm for SOCP that would enable robust and safe mission planning on-board robots in realtime. Our algorithm does not have any external dependency, can efficiently utilize warm start provided in safe planning settings, and in fact leads to significant speed up over standard optimization packages (like SDPT3) for even standard SOCP problems. For example, for a standard quadrotor problem, our method leads to speedup of 1000× over SDPT3 without any deterioration in the solution quality. Our method is based on two insights: a) SOCPs can be interpreted as optimizing a function over a polytope with infinite sides, b) a linear function can be efficiently optimized over this polytope. We combine the above observations with a novel utilization of Wolfe's algorithm [1] to obtain an efficient optimization method that can be easily implemented on small embedded devices. In addition to the above mentioned algorithm, we also design a two-level sensing method based on Gaussian Process for complex obstacles with non-linear boundaries such as a cylinder.

Details

ICML Conference 2017 Conference Paper

Recovery Guarantees for One-hidden-layer Neural Networks

Kai Zhong
Zhao Song 0002
Prateek Jain 0002
Peter L. Bartlett
Inderjit S. Dhillon

In this paper, we consider regression problems with one-hidden-layer neural networks (1NNs). We distill some properties of activation functions that lead to local strong convexity in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective and most popular nonlinear activation functions satisfy the distilled properties, including rectified linear units (ReLUs), leaky ReLUs, squared ReLUs and sigmoids. For activation functions that are also smooth, we show local linear convergence guarantees of gradient descent under a resampling rule. For homogeneous activations, we show tensor methods are able to initialize the parameters to fall into the local strong convexity region. As a result, tensor initialization followed by gradient descent is guaranteed to recover the ground truth with sample complexity $ d \cdot \log(1/\epsilon) \cdot \mathrm{poly}(k, \lambda )$ and computational complexity $n\cdot d \cdot \mathrm{poly}(k, \lambda) $ for smooth homogeneous activations with high probability, where $d$ is the dimension of the input, $k$ ($k\leq d$) is the number of hidden nodes, $\lambda$ is a conditioning property of the ground-truth parameter matrix between the input layer and the hidden layer, $\epsilon$ is the targeted precision and $n$ is the number of samples. To the best of our knowledge, this is the first work that provides recovery guarantees for 1NNs with both sample complexity and computational complexity linear in the input dimension and logarithmic in the precision.

Details

NeurIPS Conference 2016 Conference Paper

Coordinate-wise Power Method

Qi Lei
Kai Zhong
Inderjit Dhillon

In this paper, we propose a coordinate-wise version of the power method from an optimization viewpoint. The vanilla power method simultaneously updates all the coordinates of the iterate, which is essential for its convergence analysis. However, different coordinates converge to the optimal value at different speeds. Our proposed algorithm, which we call coordinate-wise power method, is able to select and update the most important k coordinates in O(kn) time at each iteration, where n is the dimension of the matrix and k <= n is the size of the active set. Inspired by the ''greedy'' nature of our method, we further propose a greedy coordinate descent algorithm applied on a non-convex objective function specialized for symmetric matrices. We provide convergence analyses for both methods. Experimental results on both synthetic and real data show that our methods achieve up to 20 times speedup over the basic power method. Meanwhile, due to their coordinate-wise nature, our methods are very suitable for the important case when data cannot fit into memory. Finally, we introduce how the coordinate-wise mechanism could be applied to other iterative methods that are used in machine learning.

PDF Details

NeurIPS Conference 2016 Conference Paper

Dual Decomposed Learning with Factorwise Oracle for Structural SVM of Large Output Domain

Ian En-Hsu Yen
Xiangru Huang
Kai Zhong
Ruohan Zhang
Pradeep Ravikumar
Inderjit Dhillon

Many applications of machine learning involve structured output with large domain, where learning of structured predictor is prohibitive due to repetitive calls to expensive inference oracle. In this work, we show that, by decomposing training of Structural Support Vector Machine (SVM) into a series of multiclass SVM problems connected through messages, one can replace expensive structured oracle with Factorwise Maximization Oracle (FMO) that allows efficient implementation of complexity sublinear to the factor domain. A Greedy Direction Method of Multiplier (GDMM) algorithm is proposed to exploit sparsity of messages which guarantees $\epsilon$ sub-optimality after $O(log(1/\epsilon))$ passes of FMO calls. We conduct experiments on chain-structured problems and fully-connected problems of large output domains. The proposed approach is orders-of-magnitude faster than the state-of-the-art training algorithms for Structural SVM.

PDF Details

NeurIPS Conference 2016 Conference Paper

Mixed Linear Regression with Multiple Components

Kai Zhong
Prateek Jain
Inderjit Dhillon

In this paper, we study the mixed linear regression (MLR) problem, where the goal is to recover multiple underlying linear models from their unlabeled linear measurements. We propose a non-convex objective function which we show is {\em locally strongly convex} in the neighborhood of the ground truth. We use a tensor method for initialization so that the initial models are in the local strong convexity region. We then employ general convex optimization algorithms to minimize the objective function. To the best of our knowledge, our approach provides first exact recovery guarantees for the MLR problem with $K \geq 2$ components. Moreover, our method has near-optimal computational complexity $\tilde O (Nd)$ as well as near-optimal sample complexity $\tilde O (d)$ for {\em constant} $K$. Furthermore, we show that our non-convex formulation can be extended to solving the {\em subspace clustering} problem as well. In particular, when initialized within a small constant distance to the true subspaces, our method converges to the global optima (and recovers true subspaces) in time {\em linear} in the number of points. Furthermore, our empirical results indicate that even with random initialization, our approach converges to the global optima in linear time, providing speed-up of up to two orders of magnitude.

PDF Details

ICML Conference 2016 Conference Paper

PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification

Ian En-Hsu Yen
Xiangru Huang
Pradeep Ravikumar
Kai Zhong
Inderjit S. Dhillon

We consider Multiclass and Multilabel classification with extremely large number of classes, of which only few are labeled to each instance. In such setting, standard methods that have training, prediction cost linear to the number of classes become intractable. State-of-the-art methods thus aim to reduce the complexity by exploiting correlation between labels under assumption that the similarity between labels can be captured by structures such as low-rank matrix or balanced tree. However, as the diversity of labels increases in the feature space, structural assumption can be easily violated, which leads to degrade in the testing performance. In this work, we show that a margin-maximizing loss with l1 penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor. We thus propose a Fully-Corrective Block-Coordinate Frank-Wolfe (FC-BCFW) algorithm that exploits both primal and dual sparsity to achieve a complexity sublinear to the number of primal and dual variables. A bi-stochastic search method is proposed to further improve the efficiency. In our experiments on both Multiclass and Multilabel problems, the proposed method achieves significant higher accuracy than existing approaches of Extreme Classification with very competitive training and prediction time.

Details

ICML Conference 2015 Conference Paper

A Convex Exemplar-based Approach to MAD-Bayes Dirichlet Process Mixture Models

Ian En-Hsu Yen
Xin Lin
Kai Zhong
Pradeep Ravikumar
Inderjit S. Dhillon

MAD-Bayes (MAP-based Asymptotic Derivations) has been recently proposed as a general technique to derive scalable algorithm for Bayesian Nonparametric models. However, the combinatorial nature of objective functions derived from MAD-Bayes results in hard optimization problem, for which current practice employs heuristic algorithms analogous to k-means to find local minimum. In this paper, we consider the exemplar-based version of MAD-Bayes formulation for DP and Hierarchical DP (HDP) mixture model. We show that an exemplar-based MAD-Bayes formulation can be relaxed to a convex structural-regularized program that, under cluster-separation conditions, shares the same optimal solution to its combinatorial counterpart. An algorithm based on Alternating Direction Method of Multiplier (ADMM) is then proposed to solve such program. In our experiments on several benchmark data sets, the proposed method finds optimal solution of the combinatorial problem and significantly improves existing methods in terms of the exemplar-based objective.

Details

NeurIPS Conference 2015 Conference Paper

Sparse Linear Programming via Primal and Dual Augmented Coordinate Descent

Ian En-Hsu Yen
Kai Zhong
Cho-Jui Hsieh
Pradeep Ravikumar
Inderjit Dhillon

Over the past decades, Linear Programming (LP) has been widely used in different areas and considered as one of the mature technologies in numerical optimization. However, the complexity offered by state-of-the-art algorithms (i. e. interior-point method and primal, dual simplex methods) is still unsatisfactory for problems in machine learning with huge number of variables and constraints. In this paper, we investigate a general LP algorithm based on the combination of Augmented Lagrangian and Coordinate Descent (AL-CD), giving an iteration complexity of $O((\log(1/\epsilon))^2)$ with $O(nnz(A))$ cost per iteration, where $nnz(A)$ is the number of non-zeros in the $m\times n$ constraint matrix $A$, and in practice, one can further reduce cost per iteration to the order of non-zeros in columns (rows) corresponding to the active primal (dual) variables through an active-set strategy. The algorithm thus yields a tractable alternative to standard LP methods for large-scale problems of sparse solutions and $nnz(A)\ll mn$. We conduct experiments on large-scale LP instances from $\ell_1$-regularized multi-class SVM, Sparse Inverse Covariance Estimation, and Nonnegative Matrix Factorization, where the proposed approach finds solutions of $10^{-3}$ precision orders of magnitude faster than state-of-the-art implementations of interior-point and simplex methods.

PDF Details

NeurIPS Conference 2014 Conference Paper

Proximal Quasi-Newton for Computationally Intensive L1-regularized M-estimators

Kai Zhong
Ian En-Hsu Yen
Inderjit Dhillon
Pradeep Ravikumar

We consider the class of optimization problems arising from computationally intensive L1-regularized M-estimators, where the function or gradient values are very expensive to compute. A particular instance of interest is the L1-regularized MLE for learning Conditional Random Fields (CRFs), which are a popular class of statistical models for varied structured prediction problems such as sequence labeling, alignment, and classification with label taxonomy. L1-regularized MLEs for CRFs are particularly expensive to optimize since computing the gradient values requires an expensive inference step. In this work, we propose the use of a carefully constructed proximal quasi-Newton algorithm for such computationally intensive M-estimation problems, where we employ an aggressive active set selection technique. In a key contribution of the paper, we show that our proximal quasi-Newton algorithm is provably super-linearly convergent, even in the absence of strong convexity, by leveraging a restricted variant of strong convexity. In our experiments, the proposed algorithm converges considerably faster than current state-of-the-art on the problems of sequence labeling and hierarchical classification.

PDF Details

YNIMG Journal 2011 Journal Article

Phase contrast imaging in neonates

Kai Zhong
Thomas Ernst
Steve Buchthal
Oliver Speck
Lynn Anderson
Linda Chang

Magnetic resonance phase images can yield superior gray and white matter contrast compared to conventional magnitude images. However, the underlying contrast mechanisms are not yet fully understood. Previous studies have been limited to high field acquisitions in adult volunteers and patients. In this study, phase imaging in the neonatal brain is demonstrated for the first time. Compared to adults, phase differences between gray and white matter are significantly reduced but not inverted in neonates with little myelination and iron deposits in their brains. The remaining phase difference between the neonatal and adult brains may be due to a different macromolecule concentration in the unmyelinated brain of the neonates and thus a different frequency due to water macromolecule exchange. Additionally, the susceptibility contrast from brain myelination can be separately studied in neonates during brain development. Therefore, magnetic resonance phase imaging is suggested as a novel tool to study neonatal brain development and pathologies in neonates.

Details DOI

YNIMG Journal 2008 Journal Article

The molecular basis for gray and white matter contrast in phase imaging

Kai Zhong
Jochen Leupold
Dominik von Elverfeldt
Oliver Speck

Direct magnetic resonance phase images acquired at high field have been shown to yield superior gray and white matter contrast up to 10-fold higher compared to conventional magnitude images. However, the underlying contrast mechanism is not yet understood. This study demonstrates that the water resonance frequency is directly shifted by water–macromolecule exchange processes (0. 040 ppm/mM for bovine serum albumin) and might be a major source of contribution to in vivo phase image contrast. Therefore, magnetic resonance phase imaging based on the proposed contrast mechanism could potentially be applied for in vivo studies of pathologies on a macromolecular level.

Details DOI

YNIMG Journal 2007 Journal Article

MR-Encephalography: Fast multi-channel monitoring of brain physiology with magnetic resonance

Juergen Hennig
Kai Zhong
Oliver Speck

A new approach to measure activation-related changes in the brain by magnetic resonance is described offering high temporal resolution of 10–100 measurements per second. This is achieved by simultaneous multi-channel reception where the spatial resolution during continuous observation is determined by the sensitive volume of each coil alone without any additional spatial encoding gradients. Experimental results demonstrate the very high sensitivity of this approach, which allows to directly measure and monitor the stimulus-dependent hemodynamic response as well as ECG- and breathing-related signal fluctuations. One-dimensional spatial encoding either parallel or orthogonal to the cortex demonstrates that vascular signals can be identified by the pronounced signal variation at the ECG-frequency. Noise analysis at different frequencies reveals regional signal fluctuations in the frequency range between 2 and 10 Hz. Furthermore, initial results show that frequency changes in the order of <0. 03 Hz corresponding to <1 nano Tesla can be detected. In addition to its potential use in neuroscientific studies, this new method opens a wide range of applications for fast physiological monitoring and can be easily combined with conventional high-resolution imaging.

Details DOI