Arrow Research search

Author name cluster

Yining Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers
2 author rows

Possible papers

23

TIST Journal 2025 Journal Article

Multi-Autonomous Underwater Vehicle Trajectory Planning in Ocean Current Based on Hierarchical Hunting and Evolutionary Learning

  • Bin Jiang
  • Yining Wang
  • Fanhui Kong
  • Jian Wang

In the context of rising demands for marine resource exploitation and scientific research, collaborative trajectory planning for multiple Autonomous Underwater Vehicles (AUVs) in complex underwater environments—marked by obstacles, ocean currents, and low visibility—remains a critical challenge. Although the Gray Wolf Optimization (GWO) algorithm has advanced multi-objective trajectory planning, it faces issues such as poor high-dimensional space adaptability, susceptibility to local optima, and insufficient constraint handling. To address these, this article proposes a multi-AUV trajectory planning algorithm (EA-GWO) based on evolutionary learning to improve GWO. The method optimizes multi-AUV trajectory planning by leveraging hierarchical population hunting behavior, integrating position update equations to prioritize population bootstrapping, and balancing exploration and exploitation via fitness-based population distribution. Experimental validation across general, ocean current, and threat environments compares EA-GWO with the traditional GWO and multiple population GWO (MP-GWO). For sailing time: in the general environment, EA-GWO reduces total time by 90.6% compared to GWO and 90.6% compared to MP-GWO; in the ocean current environment, it cuts time by 0.9% versus GWO and 2.4% versus MP-GWO; in the threat environment, it cuts time by 13.6% versus GWO and 14.9% versus MP-GWO. For sailing distance: in the general environment, EA-GWO shortens total distance by 9.8% compared to GWO and 3.4% compared to MP-GWO; in the ocean current environment, it reduces distance by 2.3% versus GWO and 4.9% versus MP-GWO; in the threat environment, it shortens distance by 5.5% versus GWO and 1.0% versus MP-GWO. In terms of convergence performance reflected by the fitness curve: across the three environments, EA-GWO demonstrates faster convergence speed. These results highlight that EA-GWO outperforms the other two algorithms in sailing time, distance, and convergence efficiency, verifying its effectiveness in real-time dynamic coordination and constraint handling for multi-AUV missions.

JBHI Journal 2024 Journal Article

Cross-Anatomy Transfer Learning via Shape-Aware Adaptive Fine-Tuning for 3D Vessel Segmentation

  • Tao Han
  • Danni Ai
  • Jingfan Fan
  • Hong Song
  • Deqiang Xiao
  • Yining Wang
  • Jian Yang

Deep learning methods have recently achieved remarkable performance in vessel segmentation applications, yet require numerous labor-intensive labeled data. To alleviate the requirement of manual annotation, transfer learning methods can potentially be used to acquire the related knowledge of tubular structures from public large-scale labeled vessel datasets for target vessel segmentation in other anatomic sites of the human body. However, the cross-anatomy domain shift is a challenging task due to the formidable discrepancy among various vessel structures in different anatomies, resulting in the limited performance of transfer learning. Therefore, we propose a cross-anatomy transfer learning framework for 3D vessel segmentation, which first generates a pre-trained model on a public hepatic vessel dataset and then adaptively fine-tunes our target segmentation network initialized from the model for segmentation of other anatomic vessels. In the framework, the adaptive fine-tuning strategy is presented to dynamically decide on the frozen or fine-tuned filters of the target network for each input sample with a proxy network. Moreover, we develop a Gaussian-based signed distance map that explicitly encodes vessel-specific shape context. The prediction of the map is added as an auxiliary task in the segmentation network to capture geometry-aware knowledge in the fine-tuning. We demonstrate the effectiveness of our method through extensive experiments on two small-scale datasets of coronary artery and brain vessel. The results indicate the proposed method effectively overcomes the discrepancy of cross-anatomy domain shift to achieve accurate vessel segmentation for these two datasets.

NeurIPS Conference 2024 Conference Paper

Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity

  • Qian Yu
  • Yining Wang
  • Baihe Huang
  • Qi Lei
  • Jason D. Lee

Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning. In this work, we consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds. We propose an algorithm that features a combination of a bootstrapping stage and a mirror-descent stage. Our main technical innovation consists of a sharp characterization for the spherical-sampling gradient estimator under higher-order smoothness conditions, which allows the algorithm to optimally balance the bias-variance tradeoff, and a new iterative method for the bootstrapping stage, which maintains the performance for unbounded Hessian.

JBHI Journal 2023 Journal Article

A novel measure of cardiopulmonary coupling during sleep based on the synchrosqueezing transform algorithm

  • Yining Wang
  • Wenbin Shi
  • Chien-Hung Yeh

Objective: This paper presents a novel method to quantify cardiopulmonary dynamics for automatic sleep apnea detection by integrating the synchrosqueezing transform (SST) algorithm with the standard cardiopulmonary coupling (CPC) method. Methods: Simulated data were designed to validate the reliability of the proposed method, with varying levels of signal bandwidth and noise contamination. Real data were collected from the Physionet sleep apnea database, consisting of 70 single-lead ECGs with expert-labeled apnea annotations on a minute-by-minute basis. Three different signal processing techniques applied to sinus interbeat interval and respiratory time series include short-time Fourier transform, continuous Wavelet transform, and synchrosqueezing transform, respectively. Subsequently, the CPC index was computed to construct sleep spectrograms. Features derived from such spectrogram were used as input to five machine- learning-based classifiers including decision trees, support vector machines, k-nearest neighbors, etc. Results: The simulation results showed that the SST-CPC method is robust to both noise level and signal bandwidth, outperforming Fourier-based and Wavelet-based approaches. Meanwhile, the SST-CPC spectrogram exhibited relatively explicit temporal-frequency biomarkers compared with the rest. Furthermore, by integrating SST-CPC features with common-used heart rate and respiratory features, accuracies for per-minute apnea detection improved from 72% to 83%, validating the added value of CPC biomarkers in sleep apnea detection. Conclusion: The SST-CPC method improves the accuracy of automatic sleep apnea detection and presents comparable performances with those automated algorithms reported in the literature. Significance: The proposed SST-CPC method enhances sleep diagnostic capabilities, and may serve as a complementary tool to the routine diagnosis of sleep respiratory events.

NeurIPS Conference 2023 Conference Paper

Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms

  • Qian Yu
  • Yining Wang
  • Baihe Huang
  • Qi Lei
  • Jason D. Lee

In stochastic zeroth-order optimization, a problem of practical relevance is understanding how to fully exploit the local geometry of the underlying objective function. We consider a fundamental setting in which the objective function is quadratic, and provide the first tight characterization of the optimal Hessian-dependent sample complexity. Our contribution is twofold. First, from an information-theoretic point of view, we prove tight lower bounds on Hessian-dependent complexities by introducing a concept called \emph{energy allocation}, which captures the interaction between the searching algorithm and the geometry of objective functions. A matching upper bound is obtained by solving the optimal energy spectrum. Then, algorithmically, we show the existence of a Hessian-independent algorithm that universally achieves the asymptotic optimal sample complexities for all Hessian instances. The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions, which are enabled by a truncation method.

ICML Conference 2021 Conference Paper

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

  • Yanjun Han
  • Yining Wang
  • Xi Chen

In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. {The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback. } We show that, with $N$ arms and subsets of $K$ arms being chosen at each of $T$ time periods, the minimax optimal regret is $\widetilde\Theta_{d}(\sqrt{N^d T})$ if the reward function is a $d$-degree polynomial with $d< K$, and $\Theta_K(\sqrt{N^K T})$ if the reward function is not a low-degree polynomial. {Both bounds are significantly different from the bound $O(\sqrt{\mathrm{poly}(N, K)T})$ for the linear case, which suggests that there is a fundamental gap between the linear and non-linear reward structures. } Our result also finds applications to adversarial assortment optimization problem in online recommendation. We show that in the worst-case of adversarial assortment problem, the optimal algorithm must treat each individual $\binom{N}{K}$ assortment as independent.

JMLR Journal 2020 Journal Article

Dynamic Assortment Optimization with Changing Contextual Information

  • Xi Chen
  • Yining Wang
  • Yuan Zhou

In this paper, we study the dynamic assortment optimization problem over a finite selling season of length $T$. At each time period, the seller offers an arriving customer an assortment of substitutable products under a cardinality constraint, and the customer makes the purchase among offered products according to a discrete choice model. Most existing work associates each product with a real-valued fixed mean utility and assumes a multinomial logit choice (MNL) model. In many practical applications, feature/contextual information of products is readily available. In this paper, we incorporate the feature information by assuming a linear relationship between the mean utility and the feature. In addition, we allow the feature information of products to change over time so that the underlying choice model can also be non-stationary. To solve the dynamic assortment optimization under this changing contextual MNL model, we need to simultaneously learn the underlying unknown coefficient and make the decision on the assortment. To this end, we develop an upper confidence bound (UCB) based policy and establish the regret bound on the order of $\tilde{O}(d\sqrt{T})$, where $d$ is the dimension of the feature and $\tilde{O}$ suppresses logarithmic dependence. We further establish a lower bound $\Omega(d\sqrt{T}/{K})$, where $K$ is the cardinality constraint of an offered assortment, which is usually small. When $K$ is a constant, our policy is optimal up to logarithmic factors. In the exploitation phase of the UCB algorithm, we need to solve a combinatorial optimization problem for assortment optimization based on the learned information. We further develop an approximation algorithm and an efficient greedy heuristic. The effectiveness of the proposed policy is further demonstrated by our numerical studies. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

IJCAI Conference 2019 Conference Paper

Co-Attentive Multi-Task Learning for Explainable Recommendation

  • Zhongxia Chen
  • Xiting Wang
  • Xing Xie
  • Tong Wu
  • Guoqing Bu
  • Yining Wang
  • Enhong Chen

Despite widespread adoption, recommender systems remain mostly black boxes. Recently, providing explanations about why items are recommended has attracted increasing attention due to its capability to enhance user trust and satisfaction. In this paper, we propose a co-attentive multi-task learning model for explainable recommendation. Our model improves both prediction accuracy and explainability of recommendation by fully exploiting the correlations between the recommendation task and the explanation task. In particular, we design an encoder-selector-decoder architecture inspired by human's information-processing model in cognitive psychology. We also propose a hierarchical co-attentive selector to effectively model the cross knowledge transferred for both tasks. Our model not only enhances prediction accuracy of the recommendation task, but also generates linguistic explanations that are fluent, useful, and highly personalized. Experiments on three public datasets demonstrate the effectiveness of our model.

NeurIPS Conference 2018 Conference Paper

How Many Samples are Needed to Estimate a Convolutional Neural Network?

  • Simon Du
  • Yining Wang
  • Xiyu Zhai
  • Sivaraman Balakrishnan
  • Russ Salakhutdinov
  • Aarti Singh

A widespread folklore for explaining the success of Convolutional Neural Networks (CNNs) is that CNNs use a more compact representation than the Fully-connected Neural Network (FNN) and thus require fewer training samples to accurately estimate their parameters. We initiate the study of rigorously characterizing the sample complexity of estimating CNNs. We show that for an $m$-dimensional convolutional filter with linear activation acting on a $d$-dimensional input, the sample complexity of achieving population prediction error of $\epsilon$ is $\widetilde{O(m/\epsilon^2)$, whereas the sample-complexity for its FNN counterpart is lower bounded by $\Omega(d/\epsilon^2)$ samples. Since, in typical settings $m \ll d$, this result demonstrates the advantage of using a CNN. We further consider the sample complexity of estimating a one-hidden-layer CNN with linear activation where both the $m$-dimensional convolutional filter and the $r$-dimensional output weights are unknown. For this model, we show that the sample complexity is $\widetilde{O}\left((m+r)/\epsilon^2\right)$ when the ratio between the stride size and the filter size is a constant. For both models, we also present lower bounds showing our sample complexities are tight up to logarithmic factors. Our main tools for deriving these results are a localized empirical process analysis and a new lemma characterizing the convolutional structure. We believe that these tools may inspire further developments in understanding CNNs.

NeurIPS Conference 2018 Conference Paper

Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

  • Yining Wang
  • Xi Chen
  • Yuan Zhou

In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.

NeurIPS Conference 2018 Conference Paper

Optimization of Smooth Functions with Noisy Observations: Local Minimax Rates

  • Yining Wang
  • Sivaraman Balakrishnan
  • Aarti Singh

We consider the problem of global optimization of an unknown non-convex smooth function with noisy zeroth-order feedback. We propose a local minimax framework to study the fundamental difficulty of optimizing smooth functions with adaptive function evaluations. We show that for functions with fast growth around their global minima, carefully designed optimization algorithms can identify a near global minimizer with many fewer queries than worst-case global minimax theory predicts. For the special case of strongly convex and smooth functions, our implied convergence rates match the ones developed for zeroth-order convex optimization problems. On the other hand, we show that in the worst case no algorithm can converge faster than the minimax rate of estimating an unknown functions in linf-norm. Finally, we show that non-adaptive algorithms, although optimal in a global minimax sense, do not attain the optimal local minimax rate.

IJCAI Conference 2018 Conference Paper

Phrase Table as Recommendation Memory for Neural Machine Translation

  • Yang Zhao
  • Yining Wang
  • Jiajun Zhang
  • Chengqing Zong

Neural Machine Translation (NMT) has drawn much attention due to its promising translation performance recently. However, several studies indicate that NMT often generates fluent but unfaithful translations. In this paper, we propose a method to alleviate this problem by using a phrase table as recommendation memory. The main idea is to add bonus to words worthy of recommendation, so that NMT can make correct predictions. Specifically, we first derive a prefix tree to accommodate all the candidate target phrases by searching the phrase translation table according to the source sentence. Then, we construct a recommendation word set by matching between candidate target phrases and previously translated target words by NMT. After that, we determine the specific bonus value for each recommendable word by using the attention vector and phrase translation probability. Finally, we integrate this bonus value into NMT to improve the translation results. The extensive experiments demonstrate that the proposed methods obtain remarkable improvements over the strong attention based NMT.

JMLR Journal 2018 Journal Article

Provably Correct Algorithms for Matrix Column Subset Selection with Selectively Sampled Data

  • Yining Wang
  • Aarti Singh

We consider the problem of matrix column subset selection, which selects a subset of columns from an input matrix such that the input can be well approximated by the span of the selected columns. Column subset selection has been applied to numerous real-world data applications such as population genetics summarization, electronic circuits testing and recommendation systems. In many applications the complete data matrix is unavailable and one needs to select representative columns by inspecting only a small portion of the input matrix. In this paper we propose the first provably correct column subset selection algorithms for partially observed data matrices. Our proposed algorithms exhibit different merits and limitations in terms of statistical accuracy, computational efficiency, sample complexity and sampling schemes, which provides a nice exploration of the tradeoff between these desired properties for column subset selection. The proposed methods employ the idea of feedback driven sampling and are inspired by several sampling schemes previously introduced for low-rank matrix approximation tasks (Drineas et al., 2008; Frieze et al., 2004; Deshpande and Vempala, 2006; Krishnamurthy and Singh, 2014). Our analysis shows that, under the assumption that the input data matrix has incoherent rows but possibly coherent columns, all algorithms provably converge to the best low-rank approximation of the original data as number of selected columns increases. Furthermore, two of the proposed algorithms enjoy a relative error bound, which is preferred for column subset selection and matrix approximation purposes. We also demonstrate through both theoretical and empirical analysis the power of feedback driven sampling compared to uniform random sampling on input matrices with highly correlated columns. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

JMLR Journal 2017 Journal Article

On Computationally Tractable Selection of Experiments in Measurement-Constrained Regression Models

  • Yining Wang
  • Adams Wei Yu
  • Aarti Singh

We derive computationally tractable methods to select a small subset of experiment settings from a large pool of given design points. The primary focus is on linear regression models, while the technique extends to generalized linear models and Delta's method (estimating functions of linear regression models) as well. The algorithms are based on a continuous relaxation of an otherwise intractable combinatorial optimization problem, with sampling or greedy procedures as post-processing steps. Formal approximation guarantees are established for both algorithms, and numerical results on both synthetic and real-world data confirm the effectiveness of the proposed methods. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

NeurIPS Conference 2017 Conference Paper

On the Power of Truncated SVD for General High-rank Matrix Estimation Problems

  • Simon Du
  • Yining Wang
  • Aarti Singh

We show that given an estimate $\widehat{\mat A}$ that is close to a general high-rank positive semi-definite (PSD) matrix $\mat A$ in spectral norm (i. e. , $\|\widehat{\mat A}-\mat A\|_2 \leq \delta$), the simple truncated Singular Value Decomposition of $\widehat{\mat A}$ produces a multiplicative approximation of $\mat A$ in Frobenius norm. This observation leads to many interesting results on general high-rank matrix estimation problems: 1. High-rank matrix completion: we show that it is possible to recover a {general high-rank matrix} $\mat A$ up to $(1+\varepsilon)$ relative error in Frobenius norm from partial observations, with sample complexity independent of the spectral gap of $\mat A$. 2. High-rank matrix denoising: we design algorithms that recovers a matrix $\mat A$ with relative error in Frobenius norm from its noise-perturbed observations, without assuming $\mat A$ is exactly low-rank. 3. Low-dimensional estimation of high-dimensional covariance: given $N$ i. i. d. ~samples of dimension $n$ from $\mathcal N_n(\mat 0, \mat A)$, we show that it is possible to estimate the covariance matrix $\mat A$ with relative error in Frobenius norm with $N\approx n$, improving over classical covariance estimation results which requires $N\approx n^2$.

ICML Conference 2017 Conference Paper

Sequence Modeling via Segmentations

  • Chong Wang 0002
  • Yining Wang
  • Po-Sen Huang
  • Abdel-rahman Mohamed
  • Dengyong Zhou
  • Li Deng 0001

Segmental structure is a common pattern in many types of sequences such as phrases in human languages. In this paper, we present a probabilistic model for sequences via their segmentations. The probability of a segmented sequence is calculated as the product of the probabilities of all its segments, where each segment is modeled using existing tools such as recurrent neural networks. Since the segmentation of a sequence is usually unknown in advance, we sum over all valid segmentations to obtain the final probability for the sequence. An efficient dynamic programming algorithm is developed for forward and backward computations without resorting to any approximation. We demonstrate our approach on text segmentation and speech recognition tasks. In addition to quantitative results, we also show that our approach can discover meaningful segments in their respective application contexts.

NeurIPS Conference 2016 Conference Paper

Data Poisoning Attacks on Factorization-Based Collaborative Filtering

  • Bo Li
  • Yining Wang
  • Aarti Singh
  • Yevgeniy Vorobeychik

Recommendation and collaborative filtering systems are important in modern information and e-commerce applications. As these systems are becoming increasingly popular in industry, their outputs could affect business decision making, introducing incentives for an adversarial party to compromise the availability or integrity of such systems. We introduce a data poisoning attack on collaborative filtering systems. We demonstrate how a powerful attacker with full knowledge of the learner can generate malicious data so as to maximize his/her malicious objectives, while at the same time mimicking normal user behaviors to avoid being detected. While the complete knowledge assumption seems extreme, it enables a robust assessment of the vulnerability of collaborative filtering schemes to highly motivated attacks. We present efficient solutions for two popular factorization-based collaborative filtering algorithms: the alternative minimization formulation and the nuclear norm minimization method. Finally, we test the effectiveness of our proposed algorithms on real-world data and discuss potential defensive strategies.

AAAI Conference 2016 Conference Paper

Noise-Adaptive Margin-Based Active Learning and Lower Bounds under Tsybakov Noise Condition

  • Yining Wang
  • Aarti Singh

We present a simple noise-robust margin-based active learning algorithm to find homogeneous (passing the origin) linear separators and analyze its error convergence when labels are corrupted by noise. We show that when the imposed noise satisfies the Tsybakov low noise condition (Mammen, Tsybakov, and others 1999; Tsybakov 2004) the algorithm is able to adapt to unknown level of noise and achieves optimal statistical rate up to polylogarithmic factors. We also derive lower bounds for margin based active learning algorithms under Tsybakov noise conditions (TNC) for the membership query synthesis scenario (Angluin 1988). Our result implies lower bounds for the stream based selective sampling scenario (Cohn 1990) under TNC for some fairly simple data distributions. Quite surprisingly, we show that the sample complexity cannot be improved even if the underlying data distribution is as simple as the uniform distribution on the unit ball. Our proof involves the construction of a wellseparated hypothesis set on the d-dimensional unit ball along with carefully designed label distributions for the Tsybakov noise condition. Our analysis might provide insights for other forms of lower bounds as well.

NeurIPS Conference 2016 Conference Paper

Online and Differentially-Private Tensor Decomposition

  • Yining Wang
  • Anima Anandkumar

Tensor decomposition is positioned to be a pervasive tool in the era of big data. In this paper, we resolve many of the key algorithmic questions regarding robustness, memory efficiency, and differential privacy of tensor decomposition. We propose simple variants of the tensor power method which enjoy these strong properties. We propose the first streaming method with a linear memory requirement. Moreover, we present a noise calibrated tensor power method with efficient privacy guarantees. At the heart of all these guarantees lies a careful perturbation analysis derived in this paper which improves up on the existing results significantly.

NeurIPS Conference 2015 Conference Paper

Differentially private subspace clustering

  • Yining Wang
  • Yu-Xiang Wang
  • Aarti Singh

Subspace clustering is an unsupervised learning problem that aims at grouping data points into multiple clusters'' so that data points in a single cluster lie approximately on a low-dimensional linear subspace. It is originally motivated by 3D motion segmentation in computer vision, but has recently been generically applied to a wide range of statistical machine learning problems, which often involves sensitive datasets about human subjects. This raises a dire concern for data privacy. In this work, we build on the framework of differential privacy'' and present two provably private subspace clustering algorithms. We demonstrate via both theory and experiments that one of the presented methods enjoys formal privacy and utility guarantees; the other one asymptotically preserves differential privacy while having good performance in practice. Along the course of the proof, we also obtain two new provable guarantees for the agnostic subspace clustering and the graph connectivity problem which might be of independent interests.

NeurIPS Conference 2015 Conference Paper

Fast and Guaranteed Tensor Decomposition via Sketching

  • Yining Wang
  • Hsiao-Yu Tung
  • Alexander Smola
  • Anima Anandkumar

Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce many novel ideas which are unique to tensors. We develop novel methods for randomized com- putation of tensor contractions via FFTs, without explicitly forming the tensors. Such tensor contractions are encountered in decomposition methods such as ten- sor power iterations and alternating least squares. We also design novel colliding hashes for symmetric tensors to further save time in computing the sketches. We then combine these sketching ideas with existing whitening and tensor power iter- ative techniques to obtain the fastest algorithm on both sparse and dense tensors. The quality of approximation under our method does not depend on properties such as sparsity, uniformity of elements, etc. We apply the method for topic mod- eling and obtain competitive results.

AAAI Conference 2014 Conference Paper

Small-Variance Asymptotics for Dirichlet Process Mixtures of SVMs

  • Yining Wang
  • Jun Zhu

Infinite SVM (iSVM) is a Dirichlet process (DP) mixture of large-margin classifiers. Though flexible in learning nonlinear classifiers and discovering latent clustering structures, iSVM has a difficult inference task and existing methods could hinder its applicability to large-scale problems. This paper presents a smallvariance asymptotic analysis to derive a simple and efficient algorithm, which monotonically optimizes a maxmargin DP-means (M2 DPM) problem, an extension of DP-means for both predictive learning and descriptive clustering. Our analysis is built on Gibbs infinite SVMs, an alternative DP mixture of large-margin machines, which admits a partially collapsed Gibbs sampler without truncation by exploring data augmentation techniques. Experimental results show that M2 DPM runs much faster than similar algorithms without sacrificing prediction accuracies.

NeurIPS Conference 2014 Conference Paper

Spectral Methods for Supervised Topic Models

  • Yining Wang
  • Jun Zhu

Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on either variational approximation or Monte Carlo sampling. This paper presents a novel spectral decomposition algorithm to recover the parameters of supervised latent Dirichlet allocation (sLDA) models. The Spectral-sLDA algorithm is provably correct and computationally efficient. We prove a sample complexity bound and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on a diverse range of synthetic and real-world datasets verify the theory and demonstrate the practical effectiveness of the algorithm.