Author name cluster

Michael Lyu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers

1 author row

AAAI Conference 2020 Conference Paper

Few Shot Network Compression via Cross Distillation

Haoli Bai
Jiaxiang Wu
Irwin King
Michael Lyu

Model compression has been widely adopted to obtain lightweighted deep neural networks. Most prevalent methods, however, require ﬁne-tuning with sufﬁcient training data to ensure accuracy, which could be challenged by privacy and security issues. As a compromise between privacy and performance, in this paper we investigate few shot network compression: given few samples per class, how can we effectively compress the network with negligible performance drop? The core challenge of few shot network compression lies in high estimation errors from the original network during inference, since the compressed network can easily over-ﬁts on the few training instances. The estimation errors could propagate and accumulate layer-wisely and ﬁnally deteriorate the network output. To address the problem, we propose cross distillation, a novel layer-wise knowledge distillation approach. By interweaving hidden layers of teacher and student network, layer-wisely accumulated estimation errors can be effectively reduced. The proposed method offers a general framework compatible with prevalent network compression techniques such as pruning. Extensive experiments n benchmark datasets demonstrate that cross distillation can signiﬁcantly improve the student network’s accuracy when only a few training instances are available.

AAAI Conference 2020 Conference Paper

Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network

Wenxiang Jiao
Michael Lyu
Irwin King

Real-time emotion recognition (RTER) in conversations is signiﬁcant for developing emotionally intelligent chatting machines. Without the future context in RTER, it becomes critical to build the memory bank carefully for capturing historical context and summarize the memories appropriately to retrieve relevant information. We propose an Attention Gated Hierarchical Memory Network (AGHMN) to address the problems of prior work: (1) Commonly used convolutional neural networks (CNNs) for utterance feature extraction are less compatible in the memory modules; (2) Unidirectional gated recurrent units (GRUs) only allow each historical utterance to have context before it, preventing information propagation in the opposite direction; (3) The Soft Attention for summarizing loses the positional and ordering information of memories, regardless of how the memory bank is built. Particularly, we propose a Hierarchical Memory Network (HMN) with a bidirectional GRU (BiGRU) as the utterance reader and a BiGRU fusion layer for the interaction between historical utterances. For memory summarizing, we propose an Attention GRU (AGRU) where we utilize the attention weights to update the internal state of GRU. We further promote the AGRU to a bidirectional variant (BiAGRU) to balance the contextual information from recent memories and that from distant memories. We conduct experiments on two emotion conversation datasets with extensive analysis, demonstrating the efﬁcacy of our AGHMN models.

NeurIPS Conference 2020 Conference Paper

Revisiting Parameter Sharing for Automatic Neural Channel Number Search

Jiaxing Wang
Haoli Bai
Jiaxiang Wu
Xupeng Shi
Junzhou Huang
Irwin King
Michael Lyu
Jian Cheng

Recent advances in neural architecture search inspire many channel number search algorithms~(CNS) for convolutional neural networks. To improve searching efficiency, parameter sharing is widely applied, which reuses parameters among different channel configurations. Nevertheless, it is unclear how parameter sharing affects the searching process. In this paper, we aim at providing a better understanding and exploitation of parameter sharing for CNS. Specifically, we propose affine parameter sharing~(APS) as a general formulation to unify and quantitatively analyze existing channel search algorithms. It is found that with parameter sharing, weight updates of one architecture can simultaneously benefit other candidates. However, it also results in less confidence in choosing good architectures. We thus propose a new strategy of parameter sharing towards a better balance between training efficiency and architecture discrimination. Extensive analysis and experiments demonstrate the superiority of the proposed strategy in channel configuration against many state-of-the-art counterparts on benchmark datasets.

NeurIPS Conference 2020 Conference Paper

Unsupervised Text Generation by Learning from Search

Jingjing Li
Zichao Li
Lili Mou
Xin Jiang
Michael Lyu
Irwin King

In this work, we propose TGLS, a novel framework for unsupervised Text Generation by Learning from Search. We start by applying a strong search algorithm (in particular, simulated annealing) towards a heuristically defined objective that (roughly) estimates the quality of sentences. Then, a conditional generative model learns from the search results, and meanwhile smooth out the noise of search. The alternation between search and learning can be repeated for performance bootstrapping. We demonstrate the effectiveness of TGLS on two real-world natural language generation tasks, unsupervised paraphrasing and text formalization. Our model significantly outperforms unsupervised baseline methods in both tasks. Especially, it achieves comparable performance to strong supervised methods for paraphrase generation.

IJCAI Conference 2019 Conference Paper

Difficulty Controllable Generation of Reading Comprehension Questions

Yifan Gao
Lidong Bing
Wang Chen
Michael Lyu
Irwin King

We investigate the difficulty levels of questions in reading comprehension datasets such as SQuAD, and propose a new question generation setting, named Difficulty-controllable Question Generation (DQG). Taking as input a sentence in the reading comprehension paragraph and some of its text fragments (i. e. , answers) that we want to ask questions about, a DQG method needs to generate questions each of which has a given text fragment as its answer, and meanwhile the generation is under the control of specified difficulty labels---the output questions should satisfy the specified difficulty as much as possible. To solve this task, we propose an end-to-end framework to generate questions of designated difficulty levels by exploring a few important intuitions. For evaluation, we prepared the first dataset of reading comprehension questions with difficulty labels. The results show that the question generated by our framework not only have better quality under the metrics like BLEU, but also comply with the specified difficulty labels.

IJCAI Conference 2019 Conference Paper

Parallel Wasserstein Generative Adversarial Nets with Multiple Discriminators

Yuxin Su
Shenglin Zhao
Xixian Chen
Irwin King
Michael Lyu

Wasserstein Generative Adversarial Nets~(GANs) are newly proposed GAN algorithms and widely used in computer vision, web mining, information retrieval, etc. However, the existing algorithms with approximated Wasserstein loss converge slowly due to heavy computation cost and usually generate unstable results as well. In this paper, we solve the computation cost problem by speeding up the Wasserstein GANs from a well-designed communication efficient parallel architecture. Specifically, we develop a new problem formulation targeting the accurate evaluation of Wasserstein distance and propose an easily parallel optimization algorithm to train the Wasserstein GANs. Compared to traditional parallel architecture, our proposed framework is designed explicitly for the skew parameter updates between the generator network and discriminator network. Rigorous experiments reveal that our proposed framework achieves a significant improvement regarding convergence speed with comparable stability on generating images, compared to the state-of-the-art of Wasserstein GANs algorithms.

NeurIPS Conference 2018 Conference Paper

Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs

Han Shao
XiaoTian Yu
Irwin King
Michael Lyu

In linear stochastic bandits, it is commonly assumed that payoffs are with sub-Gaussian noises. In this paper, under a weaker assumption on noises, we study the problem of \underline{lin}ear stochastic {\underline b}andits with h{\underline e}avy-{\underline t}ailed payoffs (LinBET), where the distributions have finite moments of order $1+\epsilon$, for some $\epsilon\in (0, 1]$. We rigorously analyze the regret lower bound of LinBET as $\Omega(T^{\frac{1}{1+\epsilon}})$, implying that finite moments of order 2 (i. e. , finite variances) yield the bound of $\Omega(\sqrt{T})$, with $T$ being the total number of rounds to play bandits. The provided lower bound also indicates that the state-of-the-art algorithms for LinBET are far from optimal. By adopting median of means with a well-designed allocation of decisions and truncation based on historical information, we develop two novel bandit algorithms, where the regret upper bounds match the lower bound up to polylogarithmic factors. To the best of our knowledge, we are the first to solve LinBET optimally in the sense of the polynomial order on $T$. Our proposed algorithms are evaluated based on synthetic datasets, and outperform the state-of-the-art results.

AAAI Conference 2016 Conference Paper

STELLAR: Spatial-Temporal Latent Ranking for Successive Point-of-Interest Recommendation

Shenglin Zhao
Tong Zhao
Haiqin Yang
Michael Lyu
Irwin King

Successive point-of-interest (POI) recommendation in location-based social networks (LBSNs) becomes a signiﬁcant task since it helps users to navigate a number of candidate POIs and provides the best POI recommendations based on users’ most recent check-in knowledge. However, all existing methods for successive POI recommendation only focus on modeling the correlation between POIs based on users’ check-in sequences, but ignore an important fact that successive POI recommendation is a time-subtle recommendation task. In fact, even with the same previous check-in information, users would prefer different successive POIs at different time. To capture the impact of time on successive POI recommendation, in this paper, we propose a spatial-temporal latent ranking (STELLAR) method to explicitly model the interactions among user, POI, and time. In particular, the proposed STELLAR model is built upon a ranking-based pairwise tensor factorization framework with a ﬁne-grained modeling of user-POI, POI-time, and POI-POI interactions for successive POI recommendation. Moreover, we propose a new interval-aware weight utility function to differentiate successive check-ins’ correlations, which breaks the time interval constraint in prior work. Evaluations on two real-world datasets demonstrate that the STELLAR model outperforms state-of-the-art successive POI recommendation model about 20% in Precision@5 and Recall@5.

AAAI Conference 2015 Conference Paper

Incorporating Implicit Link Preference Into Overlapping Community Detection

Hongyi Zhang
Irwin King
Michael Lyu

Community detection is an important technique to understand structures and patterns in complex networks. Recently, overlapping community detection becomes a trend due to the ubiquity of overlapping and nested communities in real world. However, existing approaches have ignored the use of implicit link preference information, i. e. , links can reflect a node’s preference on the targets of connections it wants to build. This information has strong impact on community detection since a node prefers to build links with nodes inside its community than those outside its community. In this paper, we propose a preference-based nonnegative matrix factorization (PNMF) model to incorporate implicit link preference information. Unlike conventional matrix factorization approaches, which simply approximate the original adjacency matrix in value, our model maximizes the likelihood of the preference order for each node by following the intuition that a node prefers its neighbors than other nodes. Our model overcomes the indiscriminate penalty problem in which non-linked pairs inside one community are equally penalized in objective functions as those across two communities. We propose a learning algorithm which can learn a node-community membership matrix via stochastic gradient descent with bootstrap sampling. We evaluate our PNMF model on several real-world networks. Experimental results show that our model outperforms state-of-the-art approaches and can be applied to large datasets.

AAAI Conference 2015 Conference Paper

Kernelized Online Imbalanced Learning with Fixed Budgets

Junjie Hu
Haiqin Yang
Irwin King
Michael Lyu
Anthony Man-Cho So

Online learning from imbalanced streaming data to capture the nonlinearity and heterogeneity of the data is significant in machine learning and data mining. To tackle this problem, we propose a kernelized online imbalanced learning (KOIL) algorithm to directly maximize the area under the ROC curve (AUC). We address two more challenges: 1) How to control the number of support vectors without sacrificing model performance; and 2) how to restrict the fluctuation of the learned decision function to attain smooth updating. To this end, we introduce two buffers with fixed budgets (buffer sizes) for positive class and negative class, respectively, to store the learned support vectors, which can allow us to capture the global information of the decision boundary. When determining the weight of a new support vector, we confine its influence only to its k-nearest opposite support vectors. This can restrict the effect of new instances and prevent the harm of outliers. More importantly, we design a sophisticated scheme to compensate the model after replacement is conducted when either buffer is full. With this compensation, the learned model approaches the one learned with infinite budgets. We present both theoretical analysis and extensive experimental comparison to demonstrate the effectiveness of our proposed KOIL.

NeurIPS Conference 2014 Conference Paper

Combinatorial Pure Exploration of Multi-Armed Bandits

Shouyuan Chen
Tian Lin
Irwin King
Michael Lyu
Wei Chen

We study the {\em combinatorial pure exploration (CPE)} problem in the stochastic multi-armed bandit setting, where a learner explores a set of arms with the objective of identifying the optimal member of a \emph{decision class}, which is a collection of subsets of arms with certain combinatorial structures such as size-$K$ subsets, matchings, spanning trees or paths, etc. The CPE problem represents a rich class of pure exploration tasks which covers not only many existing models but also novel cases where the object of interest has a non-trivial combinatorial structure. In this paper, we provide a series of results for the general CPE problem. We present general learning algorithms which work for all decision classes that admit offline maximization oracles in both fixed confidence and fixed budget settings. We prove problem-dependent upper bounds of our algorithms. Our analysis exploits the combinatorial structures of the decision classes and introduces a new analytic tool. We also establish a general problem-dependent lower bound for the CPE problem. Our results show that the proposed algorithms achieve the optimal sample complexity (within logarithmic factors) for many decision classes. In addition, applying our results back to the problems of top-$K$ arms identification and multiple bandit best arms identification, we recover the best available upper bounds up to constant factors and partially resolve a conjecture on the lower bounds.

NeurIPS Conference 2013 Conference Paper

Exact and Stable Recovery of Pairwise Interaction Tensors

Shouyuan Chen
Michael Lyu
Irwin King
Zenglin Xu

Tensor completion from incomplete observations is a problem of significant practical interest. However, it is unlikely that there exists an efficient algorithm with provable guarantee to recover a general tensor from a limited number of observations. In this paper, we study the recovery algorithm for pairwise interaction tensors, which has recently gained considerable attention for modeling multiple attribute data due to its simplicity and effectiveness. Specifically, in the absence of noise, we show that one can exactly recover a pairwise interaction tensor by solving a constrained convex program which minimizes the weighted sum of nuclear norms of matrices from $O(nr\log^2(n))$ observations. For the noisy cases, we also prove error bounds for a constrained convex program for recovering the tensors. Our experiments on the synthetic dataset demonstrate that the recovery performance of our algorithm agrees well with the theory. In addition, we apply our algorithm on a temporal collaborative filtering task and obtain state-of-the-art results.

AAAI Conference 2012 Conference Paper

Fused Matrix Factorization with Geographical and Social Influence in Location-Based Social Networks

Chen Cheng
Haiqin Yang
Irwin King
Michael Lyu

Recently, location-based social networks (LBSNs), such as Gowalla, Foursquare, Facebook, and Brightkite, etc. , have attracted millions of users to share their social friendship and their locations via check-ins. The available check-in information makes it possible to mine users’ preference on locations and to provide favorite recommendations. Personalized Point-of-interest (POI) recommendation is a significant task in LBSNs since it can help targeted users explore their surroundings as well as help third-party developers to provide personalized services. To solve this task, matrix factorization is a promising tool due to its success in recommender systems. However, previously proposed matrix factorization (MF) methods do not explore geographical influence, e. g. , multi-center check-in property, which yields suboptimal solutions for the recommendation. In this paper, to the best of our knowledge, we are the first to fuse MF with geographical and social influence for POI recommendation in LBSNs. We first capture the geographical influence via modeling the probability of a user’s check-in on a location as a Multi-center Gaussian Model (MGM). Next, we include social information and fuse the geographical influence into a generalized matrix factorization framework. Our solution to POI recommendation is efficient and scales linearly with the number of observations. Finally, we conduct thorough experiments on a large-scale real-world LBSNs dataset and demonstrate that the fused matrix factorization framework with MGM utilizes the distance information sufficiently and outperforms other state-of-the-art methods significantly.

AAAI Conference 2010 Conference Paper

Diversifying Query Suggestion Results

Hao Ma
Michael Lyu
Irwin King

In order to improve the user search experience, Query Suggestion, a technique for generating alternative queries to Web users, has become an indispensable feature for commercial search engines. However, previous work mainly focuses on suggesting relevant queries to the original query while ignoring the diversity in the suggestions, which will potentially dissatisfy Web users’ information needs. In this paper, we present a novel unified method to suggest both semantically relevant and diverse queries to Web users. The proposed approach is based on Markov random walk and hitting time analysis on the query-URL bipartite graph. It can effectively prevent semantically redundant queries from receiving a high rank, hence encouraging diversities in the results. We evaluate our method on a large commercial clickthrough dataset in terms of relevance measurement and diversity measurement. The experimental results show that our method is very effective in generating both relevant and diverse query suggestions.

AAAI Conference 2010 Conference Paper

Smooth Optimization for Effective Multiple Kernel Learning

Zenglin Xu
Rong Jin
Shenghuo Zhu
Michael Lyu
Irwin King

Multiple Kernel Learning (MKL) can be formulated as a convex-concave minmax optimization problem, whose saddle point corresponds to the optimal solution to MKL. Most MKL methods employ the L1-norm simplex constraints on the combination weights of kernels, which therefore involves optimization of a non-smooth function of the kernel weights. These methods usually divide the optimization into two cycles: one cycle deals with the optimization on the kernel combination weights, and the other cycle updates the parameters of SVM. Despite the success of their efficiency, they tend to discard informative complementary kernels. To improve accuracy, we introduce smoothness to the optimization procedure. Furthermore, we transform the optimization into a single smooth convex optimization problem and employ the Nesterov’s method to efficiently solve the optimization problem. Experiments on benchmark data sets demonstrate that the proposed algorithm clearly improves current MKL methods in a number scenarios.

AAAI Conference 2010 Conference Paper

UserRec: A User Recommendation Framework in Social Tagging Systems

Tom Zhou
Hao Ma
Michael Lyu
Irwin King

Social tagging systems have emerged as an effective way for users to annotate and share objects on the Web. However, with the growth of social tagging systems, users are easily overwhelmed by the large amount of data and it is very difficult for users to dig out information that he/she is interested in. Though the tagging system has provided interestbased social network features to enable the user to keep track of other users’ tagging activities, there is still no automatic and effective way for the user to discover other users with common interests. In this paper, we propose a User Recommendation (UserRec) framework for user interest modeling and interest-based user recommendation, aiming to boost information sharing among users with similar interests. Our work brings three major contributions to the research community: (1) we propose a tag-graph based community detection method to model the users’ personal interests, which are further represented by discrete topic distributions; (2) the similarity values between users’ topic distributions are measured by Kullback-Leibler divergence (KL-divergence), and the similarity values are further used to perform interestbased user recommendation; and (3) by analyzing users’ roles in a tagging system, we find users’ roles in a tagging system are similar to Web pages in the Internet. Experiments on tagging dataset of Web pages (Yahoo! Delicious) show that UserRec outperforms other state-of-the-art recommender system approaches.

NeurIPS Conference 2009 Conference Paper

Adaptive Regularization for Transductive Support Vector Machine

Zenglin Xu
Rong Jin
Jianke Zhu
Irwin King
Michael Lyu
Zhirong Yang

We discuss the framework of Transductive Support Vector Machine (TSVM) from the perspective of the regularization strength induced by the unlabeled data. In this framework, SVM and TSVM can be regarded as a learning machine without regularization and one with full regularization from the unlabeled data, respectively. Therefore, to supplement this framework of the regularization strength, it is necessary to introduce data-dependant partial regularization. To this end, we reformulate TSVM into a form with controllable regularization strength, which includes SVM and TSVM as special cases. Furthermore, we introduce a method of adaptive regularization that is data dependant and is based on the smoothness assumption. Experiments on a set of benchmark data sets indicate the promising results of the proposed work compared with state-of-the-art TSVM algorithms.

NeurIPS Conference 2008 Conference Paper

An Extended Level Method for Efficient Multiple Kernel Learning

Zenglin Xu
Rong Jin
Irwin King
Michael Lyu

We consider the problem of multiple kernel learning (MKL), which can be formulated as a convex-concave problem. In the past, two efficient methods, i. e. , Semi-Infinite Linear Programming (SILP) and Subgradient Descent (SD), have been proposed for large-scale multiple kernel learning. Despite their success, both methods have their own shortcomings: (a) the SD method utilizes the gradient of only the current solution, and (b) the SILP method does not regularize the approximate solution obtained from the cutting plane model. In this work, we extend the level method, which was originally designed for optimizing non-smooth objective functions, to convex-concave optimization, and apply it to multiple kernel learning. The extended level method overcomes the drawbacks of SILP and SD by exploiting all the gradients computed in past iterations and by regularizing the solution via a projection to a level set. Empirical study with eight UCI datasets shows that the extended level method can significantly improve efficiency by saving on average 91. 9% of computational time over the SILP method and 70. 3% over the SD method.

NeurIPS Conference 2008 Conference Paper

Learning with Consistency between Inductive Functions and Kernels

Haixuan Yang
Irwin King
Michael Lyu

Regularized Least Squares (RLS) algorithms have the ability to avoid over-fitting problems and to express solutions as kernel expansions. However, we observe that the current RLS algorithms cannot provide a satisfactory interpretation even on a constant function. On the other hand, while kernel-based algorithms have been developed in such a tendency that almost all learning algorithms are kernelized or being kernelized, a basic fact is often ignored: The learned function from the data and the kernel fits the data well, but may not be consistent with the kernel. Based on these considerations and on the intuition that a good kernel-based inductive function should be consistent with both the data and the kernel, a novel learning scheme is proposed. The advantages of this scheme lie in its corresponding Representer Theorem, its strong interpretation ability about what kind of functions should not be penalized, and its promising accuracy improvements shown in a number of experiments. Furthermore, we provide a detailed technical description about heat kernels, which serves as an example for the readers to apply similar techniques for other kernels. Our work provides a preliminary step in a new direction to explore the varying consistency between inductive functions and kernels under various distributions.

NeurIPS Conference 2007 Conference Paper

Efficient Convex Relaxation for Transductive Support Vector Machine

Zenglin Xu
Rong Jin
Jianke Zhu
Irwin King
Michael Lyu

We consider the problem of Support Vector Machine transduction, which involves a combinatorial problem with exponential computational complexity in the number of unlabeled examples. Although several studies are devoted to Transductive SVM, they suffer either from the high computation complexity or from the solutions of local optimum. To address this problem, we propose solving Transductive SVM via a convex relaxation, which converts the NP-hard problem to a semi-definite programming. Compared with the other SDP relaxation for Transductive SVM, the proposed algorithm is computationally more efficient with the number of free parameters reduced from O(n2) to O(n) where n is the number of examples. Empirical study with several benchmark data sets shows the promising performance of the proposed algorithm in comparison with other state-of-the-art implementations of Transductive SVM.