Author name cluster

Haiqin Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

UAI Conference 2024 Conference Paper

Dirichlet Continual Learning: Tackling Catastrophic Forgetting in NLP

Min Zeng
Haiqin Yang
Wei Xue 0002
Qifeng Liu
Yike Guo

Catastrophic forgetting poses a significant challenge in continual learning (CL). In the context of Natural Language Processing, generative-based rehearsal CL methods have made progress in avoiding expensive retraining. However, generating pseudo samples that accurately capture the task-specific distribution remains a daunting task. In this paper, we propose Dirichlet Continual Learning (DCL), a novel generative-based rehearsal strategy designed specifically for CL. Different from the conventional use of Gaussian latent variable in Conditional Variational Autoencoder, DCL employs the flexibility of the Dirichlet distribution to model the latent variable. This allows DCL to effectively capture sentence-level features from previous tasks and guide the generation of pseudo samples. Additionally, we introduce Jensen-Shannon Knowledge Distillation, a robust logit-based knowledge distillation method that enhances knowledge transfer during pseudo-sample generation. Our extensive experiments show that DCL outperforms state-of-the-art methods in two typical tasks of task-oriented dialogue systems, demonstrating its efficacy.

Details

ICML Conference 2023 Conference Paper

D2Match: Leveraging Deep Learning and Degeneracy for Subgraph Matching

Xuanzhou Liu
Lin Zhang
Jiaqi Sun
Yujiu Yang 0001
Haiqin Yang

Subgraph matching is a fundamental building block for graph-based applications and is challenging due to its high-order combinatorial nature. Existing studies usually tackle it by combinatorial optimization or learning-based methods. However, they suffer from exponential computational costs or searching the matching without theoretical guarantees. In this paper, we develop $D^2$Match by leveraging the efficiency of Deep learning and Degeneracy for subgraph matching. More specifically, we first prove that subgraph matching can degenerate to subtree matching, and subsequently is equivalent to finding a perfect matching on a bipartite graph. We can then yield an implementation of linear time complexity by the built-in tree-structured aggregation mechanism on graph neural networks. Moreover, circle structures and node attributes can be easily incorporated in $D^2$Match to boost the matching performance. Finally, we conduct extensive experiments to show the superior performance of our $D^2$Match and confirm that our $D^2$Match indeed exploits the subtrees and differs from existing GNNs-based subgraph matching methods that depend on memorizing the data distribution divergence.

Details

ICML Conference 2023 Conference Paper

Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks

Peng Xu 0052
Lin Zhang
Xuanzhou Liu
Jiaqi Sun
Yue Zhao 0016
Haiqin Yang
Bei Yu 0001

Neural architecture search (NAS) for Graph neural networks (GNNs), called NAS-GNNs, has achieved significant performance over manually designed GNN architectures. However, these methods inherit issues from the conventional NAS methods, such as high computational cost and optimization difficulty. More importantly, previous NAS methods have ignored the uniqueness of GNNs, where GNNs possess expressive power without training. With the randomly-initialized weights, we can then seek the optimal architecture parameters via the sparse coding objective and derive a novel NAS-GNNs method, namely neural architecture coding (NAC). Consequently, our NAC holds a no-update scheme on GNNs and can efficiently compute in linear time. Empirical evaluations on multiple GNN benchmark datasets demonstrate that our approach leads to state-of-the-art performance, which is up to $200\times$ faster and $18. 8%$ more accurate than the strong baselines.

Details

IJCAI Conference 2022 Conference Paper

Vision-and-Language Pretrained Models: A Survey

Siqu Long
Feiqi Cao
Soyeon Caren Han
Haiqin Yang

Pretrained models have produced great success in both Computer Vision (CV) and Natural Language Processing (NLP). This progress leads to learning joint representations of vision and language pretraining by feeding visual and linguistic contents into a multi-layer transformer, Visual-Language Pretrained Models (VLPMs). In this paper, we present an overview of the major advances achieved in VLPMs for producing joint representations of vision and language. As the preliminaries, we briefly describe the general task definition and genetic architecture of VLPMs. We first discuss the language and vision data encoding methods and then present the mainstream VLPM structure as the core content. We further summarise several essential pretraining and fine-tuning strategies. Finally, we highlight three future directions for both CV and NLP researchers to provide insightful guidance.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Progressive Open-Domain Response Generation with Multiple Controllable Attributes

Haiqin Yang
Xiaoyuan Yao
Yiqun Duan
Jianping Shen
Jie Zhong
Kun Zhang

It is desirable to include more controllable attributes to enhance the diversity of generated responses in open-domain dialogue systems. However, existing methods can generate responses with only one controllable attribute or lack a flexible way to generate them with multiple controllable attributes. In this paper, we propose a Progressively trained Hierarchical Encoder-Decoder (PHED) to tackle this task. More specifically, PHED deploys Conditional Variational AutoEncoder (CVAE) on Transformer to include one aspect of attributes at one stage. A vital characteristic of the CVAE is to separate the latent variables at each stage into two types: a global variable capturing the common semantic features and a specific variable absorbing the attribute information at that stage. PHED then couples the CVAE latent variables with the Transformer encoder and is trained by minimizing a newly derived ELBO and controlled losses to produce the next stage's input and produce responses as required. Finally, we conduct extensive evaluations to show that PHED significantly outperforms the state-of-the-art neural generation models and produces more diverse responses as expected.

PDF Details DOI

IJCAI Conference 2018 Conference Paper

TreeNet: Learning Sentence Representations with Unconstrained Tree Structure

Zhou Cheng
Chun Yuan
Jiancheng Li
Haiqin Yang

Recursive neural network (RvNN) has been proved to be an effective and promising tool to learn sentence representations by explicitly exploiting the sentence structure. However, most existing work can only exploit simple tree structure, e. g. , binary trees, or ignore the order of nodes, which yields suboptimal performance. In this paper, we proposed a novel neural network, namely TreeNet, to capture sentences structurally over the raw unconstrained constituency trees, where the number of child nodes can be arbitrary. In TreeNet, each node is learning from its left sibling and right child in a bottom-up left-to-right order, thus enabling the net to learn over any tree. Furthermore, multiple soft gates and a memory cell are employed in implementing the TreeNet to determine to what extent it should learn, remember and output, which proves to be a simple and efficient mechanism for semantic synthesis. Moreover, TreeNet significantly suppresses convolutional neural networks (CNN) and Long Short-Term Memory (LSTM) with fewer parameters. It improves the classification accuracy by 2%-5% with 42% of the best CNN’s parameters or 94% of standard LSTM’s. Extensive experiments demonstrate TreeNet achieves the state-of-the-art performance on all four typical text classification tasks.

PDF Details

AAAI Conference 2017 Conference Paper

Efficient Non-Oblivious Randomized Reduction for Risk Minimization with Improved Excess Risk Guarantee

Yi Xu
Haiqin Yang
Lijun Zhang
Tianbao Yang

In this paper, we address learning problems for high dimensional data. Previously, oblivious random projection based approaches that project high dimensional features onto a random subspace have been used in practice for tackling highdimensionality challenge in machine learning. Recently, various non-oblivious randomized reduction methods have been developed and deployed for solving many numerical problems such as matrix product approximation, low-rank matrix approximation, etc. However, they are less explored for the machine learning tasks, e. g. , classiﬁcation. More seriously, the theoretical analysis of excess risk bounds for risk minimization, an important measure of generalization performance, has not been established for non-oblivious randomized reduction methods. It therefore remains an open problem what is the beneﬁt of using them over previous oblivious random projection based approaches. To tackle these challenges, we propose an algorithmic framework for employing non-oblivious randomized reduction method for general empirical risk minimizing in machine learning tasks, where the original high-dimensional features are projected onto a random subspace that is derived from the data with a small matrix approximation error. We then derive the ﬁrst excess risk bound for the proposed non-oblivious randomized reduction approach without requiring strong assumptions on the training data. The established excess risk bound exhibits that the proposed approach provides much better generalization performance and it also sheds more insights about different randomized reduction approaches. Finally, we conduct extensive experiments on both synthetic and real-world benchmark datasets, whose dimension scales to O(107 ), to demonstrate the efﬁcacy of our proposed approach.

PDF Details

TIST Journal 2016 Journal Article

A Unified Point-of-Interest Recommendation Framework in Location-Based Social Networks

Chen Cheng
Haiqin Yang
Irwin King
Michael R. Lyu

Location-based social networks (LBSNs), such as Gowalla, Facebook, Foursquare, Brightkite, and so on, have attracted millions of users to share their social friendship and their locations via check-ins in the past few years. Plenty of valuable information is accumulated based on the check-in behaviors, which makes it possible to learn users’ moving patterns as well as their preferences. In LBSNs, point-of-interest (POI) recommendation is one of the most significant tasks because it can help targeted users explore their surroundings as well as help third-party developers provide personalized services. Matrix factorization is a promising method for this task because it can capture users’ preferences to locations and is widely adopted in traditional recommender systems such as movie recommendation. However, the sparsity of the check-in data makes it difficult to capture users’ preferences accurately. Geographical influence can help alleviate this problem and have a large impact on the final recommendation result. By studying users’ moving patterns, we find that users tend to check in around several centers and different users have different numbers of centers. Based on this, we propose a Multi-center Gaussian Model (MGM) to capture this pattern via modeling the probability of a user’s check-in on a location. Moreover, users are usually more interested in the top 20 or even top 10 recommended POIs, which makes personalized ranking important in this task. From previous work, directly optimizing for pairwise ranking like Bayesian Personalized Ranking (BPR) achieves better performance in the top- k recommendation than directly using matrix matrix factorization that aims to minimize the point-wise rating error. To consider users’ preferences, geographical influence and personalized ranking, we propose a unified POI recommendation framework, which unifies all of them together. Specifically, we first fuse MGM with matrix factorization methods and further with BPR using two different approaches. We conduct experiments on Gowalla and Foursquare datasets, which are two large-scale real-world LBSN datasets publicly available online. The results on both datasets show that our unified POI recommendation framework can produce better performance.

Details DOI

AAAI Conference 2016 Conference Paper

STELLAR: Spatial-Temporal Latent Ranking for Successive Point-of-Interest Recommendation

Shenglin Zhao
Tong Zhao
Haiqin Yang
Michael Lyu
Irwin King

Successive point-of-interest (POI) recommendation in location-based social networks (LBSNs) becomes a signiﬁcant task since it helps users to navigate a number of candidate POIs and provides the best POI recommendations based on users’ most recent check-in knowledge. However, all existing methods for successive POI recommendation only focus on modeling the correlation between POIs based on users’ check-in sequences, but ignore an important fact that successive POI recommendation is a time-subtle recommendation task. In fact, even with the same previous check-in information, users would prefer different successive POIs at different time. To capture the impact of time on successive POI recommendation, in this paper, we propose a spatial-temporal latent ranking (STELLAR) method to explicitly model the interactions among user, POI, and time. In particular, the proposed STELLAR model is built upon a ranking-based pairwise tensor factorization framework with a ﬁne-grained modeling of user-POI, POI-time, and POI-POI interactions for successive POI recommendation. Moreover, we propose a new interval-aware weight utility function to differentiate successive check-ins’ correlations, which breaks the time interval constraint in prior work. Evaluations on two real-world datasets demonstrate that the STELLAR model outperforms state-of-the-art successive POI recommendation model about 20% in Precision@5 and Recall@5.

PDF Details

AAAI Conference 2015 Conference Paper

Kernelized Online Imbalanced Learning with Fixed Budgets

Junjie Hu
Haiqin Yang
Irwin King
Michael Lyu
Anthony Man-Cho So

Online learning from imbalanced streaming data to capture the nonlinearity and heterogeneity of the data is significant in machine learning and data mining. To tackle this problem, we propose a kernelized online imbalanced learning (KOIL) algorithm to directly maximize the area under the ROC curve (AUC). We address two more challenges: 1) How to control the number of support vectors without sacrificing model performance; and 2) how to restrict the fluctuation of the learned decision function to attain smooth updating. To this end, we introduce two buffers with fixed budgets (buffer sizes) for positive class and negative class, respectively, to store the learned support vectors, which can allow us to capture the global information of the decision boundary. When determining the weight of a new support vector, we confine its influence only to its k-nearest opposite support vectors. This can restrict the effect of new instances and prevent the harm of outliers. More importantly, we design a sophisticated scheme to compensate the model after replacement is conducted when either buffer is full. With this compensation, the learned model approaches the one learned with infinite budgets. We present both theoretical analysis and extensive experimental comparison to demonstrate the effectiveness of our proposed KOIL.

PDF Details

IJCAI Conference 2015 Conference Paper

Training-Efficient Feature Map for Shift-Invariant Kernels

Xixian Chen
Haiqin Yang
Irwin King
Michael R. Lyu

Random feature map is popularly used to scale up kernel methods. However, employing a large number of mapped features to ensure an accurate approximation will still make the training time consuming. In this paper, we aim to improve the training efficiency of shift-invariant kernels by using fewer informative features without sacrificing precision. We propose a novel feature map method by extending Random Kitchen Sinks through fast datadependent subspace embedding to generate the desired features. More specifically, we describe two algorithms with different tradeoffs on the running speed and accuracy, and prove that O(l) features induced by them are able to perform as accurately as O(l2 ) features by other feature map methods. In addition, several experiments are conducted on the real-world datasets demonstrating the superiority of our proposed algorithms.

PDF Details

IJCAI Conference 2013 Conference Paper

Where You Like to Go Next: Successive Point-of-Interest Recommendation

Chen Cheng
Haiqin Yang
Michael R. Lyu
Irwin King

Personalized point-of-interest (POI) recommendation is a signiﬁcant task in location-based social networks (LBSNs) as it can help provide better user experience as well as enable third-party services, e. g. , launching advertisements. To provide a good recommendation, various research has been conducted in the literature. However, pervious efforts mainly consider the “check-ins” in a whole and omit their temporal relation. They can only recommend POI globally and cannot know where a user would like to go tomorrow or in the next few days. In this paper, we consider the task of successive personalized POI recommendation in LB- SNs, which is a much harder task than standard personalized POI recommendation or prediction. To solve this task, we observe two prominent properties in the check-in sequence: personalized Markov chain and region localization. Hence, we propose a novel matrix factorization method, namely FPMC- LR, to embed the personalized Markov chains and the localized regions. Our proposed FPMC-LR not only exploits the personalized Markov chain in the check-in sequence, but also takes into account users’ movement constraint, i. e. , moving around a localized region. More importantly, utilizing the information of localized regions, we not only reduce the computation cost largely, but also discard the noisy information to boost recommendation. Results on two real-world LBSNs datasets demonstrate the merits of our proposed FPMC-LR.

PDF Details DOI

AAAI Conference 2012 Conference Paper

Fused Matrix Factorization with Geographical and Social Influence in Location-Based Social Networks

Chen Cheng
Haiqin Yang
Irwin King
Michael Lyu

Recently, location-based social networks (LBSNs), such as Gowalla, Foursquare, Facebook, and Brightkite, etc. , have attracted millions of users to share their social friendship and their locations via check-ins. The available check-in information makes it possible to mine users’ preference on locations and to provide favorite recommendations. Personalized Point-of-interest (POI) recommendation is a significant task in LBSNs since it can help targeted users explore their surroundings as well as help third-party developers to provide personalized services. To solve this task, matrix factorization is a promising tool due to its success in recommender systems. However, previously proposed matrix factorization (MF) methods do not explore geographical influence, e. g. , multi-center check-in property, which yields suboptimal solutions for the recommendation. In this paper, to the best of our knowledge, we are the first to fuse MF with geographical and social influence for POI recommendation in LBSNs. We first capture the geographical influence via modeling the probability of a user’s check-in on a location as a Multi-center Gaussian Model (MGM). Next, we include social information and fuse the geographical influence into a generalized matrix factorization framework. Our solution to POI recommendation is efficient and scales linearly with the number of observations. Finally, we conduct thorough experiments on a large-scale real-world LBSNs dataset and demonstrate that the fused matrix factorization framework with MGM utilizes the distance information sufficiently and outperforms other state-of-the-art methods significantly.

PDF Details

UAI Conference 2012 Conference Paper

Response Aware Model-Based Collaborative Filtering

Guang Ling
Haiqin Yang
Michael R. Lyu
Irwin King

Previous work on recommender systems mainly focus on fitting the ratings provided by users. However, the response patterns, i.e., some items are rated while others not, are generally ignored. We argue that failing to observe such response patterns can lead to biased parameter estimation and sub-optimal model performance. Although several pieces of work have tried to model users’ response patterns, they miss the effectiveness and interpretability of the successful matrix factorization collaborative filtering approaches. To bridge the gap, in this paper, we unify explicit response models and PMF to establish the Response Aware Probabilistic Matrix Factorization (RAPMF) framework. We show that RAPMF subsumes PMF as a special case. Empirically we demonstrate the merits of RAPMF from various aspects.

Details

ICML Conference 2010 Conference Paper

Online Learning for Group Lasso

Haiqin Yang
Zenglin Xu
Irwin King
Michael R. Lyu

Details

ICML Conference 2010 Conference Paper

Simple and Efficient Multiple Kernel Learning by Group Lasso

Zenglin Xu
Rong Jin 0001
Haiqin Yang
Irwin King
Michael R. Lyu

Details

ICML Conference 2004 Conference Paper

Learning large margin classifiers locally and globally

Kaizhu Huang
Haiqin Yang
Irwin King
Michael R. Lyu

Details

JMLR Journal 2004 Journal Article

The Minimum Error Minimax Probability Machine

Kaizhu Huang
Haiqin Yang
Irwin King
Michael R. Lyu
Laiwan Chan

We construct a distribution-free Bayes optimal classifier called the Minimum Error Minimax Probability Machine (MEMPM) in a worst-case setting, i.e., under all possible choices of class-conditional densities with a given mean and covariance matrix. By assuming no specific distributions for the data, our model is thus distinguished from traditional Bayes optimal approaches, where an assumption on the data distribution is a must. This model is extended from the Minimax Probability Machine (MPM), a recently-proposed novel classifier, and is demonstrated to be the general case of MPM. Moreover, it includes another special case named the Biased Minimax Probability Machine, which is appropriate for handling biased classification. One appealing feature of MEMPM is that it contains an explicit performance indicator, i.e., a lower bound on the worst-case accuracy, which is shown to be tighter than that of MPM. We provide conditions under which the worst-case Bayes optimal classifier converges to the Bayes optimal classifier. We demonstrate how to apply a more general statistical framework to estimate model input parameters robustly. We also show how to extend our model to nonlinear classification by exploiting kernelization techniques. A series of experiments on both synthetic data sets and real world benchmark data sets validates our proposition and demonstrates the effectiveness of our model. [abs] [ pdf ]

PDF Details