Author name cluster

Jian Pei

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

1 author row

AAAI Conference 2023 Conference Paper

A Graph Fusion Approach for Cross-Lingual Machine Reading Comprehension

Zenan Xu
Linjun Shou
Jian Pei
Ming Gong
Qinliang Su
Xiaojun Quan
Daxin Jiang

Although great progress has been made for Machine Reading Comprehension (MRC) in English, scaling out to a large number of languages remains a huge challenge due to the lack of large amounts of annotated training data in non-English languages. To address this challenge, some recent efforts of cross-lingual MRC employ machine translation to transfer knowledge from English to other languages, through either explicit alignment or implicit attention. For effective knowledge transition, it is beneficial to leverage both semantic and syntactic information. However, the existing methods fail to explicitly incorporate syntax information in model learning. Consequently, the models are not robust to errors in alignment and noises in attention. In this work, we propose a novel approach, which jointly models the cross-lingual alignment information and the mono-lingual syntax information using a graph. We develop a series of algorithms, including graph construction, learning, and pre-training. The experiments on two benchmark datasets for cross-lingual MRC show that our approach outperforms all strong baselines, which verifies the effectiveness of syntax information for cross-lingual MRC.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Identify Event Causality with Knowledge and Analogy

Sifan Wu
Ruihui Zhao
Yefeng Zheng
Jian Pei
Bang Liu

Event causality identification (ECI) aims to identify the causal relationship between events, which plays a crucial role in deep text understanding. Due to the diversity of real-world causality events and difficulty in obtaining sufficient training data, existing ECI approaches have poor generalizability and struggle to identify the relation between seldom seen events. In this paper, we propose to utilize both external knowledge and internal analogy to improve ECI. On the one hand, we utilize a commonsense knowledge graph called ConceptNet to enrich the description of an event sample and reveal the commonalities or associations between different events. On the other hand, we retrieve similar events as analogy exam- ples and glean useful experiences from such analogous neigh- bors to better identify the relationship between a new event pair. By better understanding different events through exter- nal knowledge and making an analogy with similar events, we can alleviate the data sparsity issue and improve model gener- alizability. Extensive evaluations on two benchmark datasets show that our model outperforms other baseline methods by around 18% on the F1-value on average

PDF Details DOI

JMLR Journal 2022 Journal Article

Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization

Feihu Huang
Shangqian Gao
Jian Pei
Heng Huang

In the paper, we propose a class of accelerated zeroth-order and first-order momentum methods for both nonconvex mini-optimization and minimax-optimization. Specifically, we propose a new accelerated zeroth-order momentum (Acc-ZOM) method for black-box mini-optimization where only function values can be obtained. Moreover, we prove that our Acc-ZOM method achieves a lower query complexity of $\tilde{O}(d^{3/4}\epsilon^{-3})$ for finding an $\epsilon$-stationary point, which improves the best known result by a factor of $O(d^{1/4})$ where $d$ denotes the variable dimension. In particular, our Acc-ZOM does not need large batches required in the existing zeroth-order stochastic algorithms. Meanwhile, we propose an accelerated zeroth-order momentum descent ascent (Acc-ZOMDA) method for black-box minimax optimization, where only function values can be obtained. Our Acc-ZOMDA obtains a low query complexity of $\tilde{O}((d_1+d_2)^{3/4}\kappa_y^{4.5}\epsilon^{-3})$ without requiring large batches for finding an $\epsilon$-stationary point, where $d_1$ and $d_2$ denote variable dimensions and $\kappa_y$ is condition number. Moreover, we propose an accelerated first-order momentum descent ascent (Acc-MDA) method for minimax optimization, whose explicit gradients are accessible. Our Acc-MDA achieves a low gradient complexity of $\tilde{O}(\kappa_y^{4.5}\epsilon^{-3})$ without requiring large batches for finding an $\epsilon$-stationary point. In particular, our Acc-MDA can obtain a lower gradient complexity of $\tilde{O}(\kappa_y^{2.5}\epsilon^{-3})$ with a batch size $O(\kappa_y^4)$, which improves the best known result by a factor of $O(\kappa_y^{1/2})$. Extensive experimental results on black-box adversarial attack to deep neural networks and poisoning attack to logistic regression demonstrate efficiency of our algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2022. ( edit, beta )

PDF Details

AAAI Conference 2022 Conference Paper

Cosine Model Watermarking against Ensemble Distillation

Laurent Charette
Lingyang Chu
Yizhou Chen
Jian Pei
Lanjun Wang
Yong Zhang

Many model watermarking methods have been developed to prevent valuable deployed commercial models from being stealthily stolen by model distillations. However, watermarks produced by most existing model watermarking methods can be easily evaded by ensemble distillation, because averaging the outputs of multiple ensembled models can significantly reduce or even erase the watermarks. In this paper, we focus on tackling the challenging task of defending against ensemble distillation. We propose a novel watermarking technique named CosWM to achieve outstanding model watermarking performance against ensemble distillation. CosWM is not only elegant in design, but also comes with desirable theoretical guarantees. Our extensive experiments on public data sets demonstrate the excellent performance of CosWM and its advantages over the state-of-the-art baselines.

PDF Details

AAAI Conference 2022 Conference Paper

From Good to Best: Two-Stage Training for Cross-Lingual Machine Reading Comprehension

Nuo Chen
Linjun Shou
Ming Gong
Jian Pei

Cross-lingual Machine Reading Comprehension (xMRC) is challenging due to the lack of training data in low-resource languages. The recent approaches use training data only in a resource-rich language like English to fine-tune large-scale cross-lingual pre-trained language models. Due to the big difference between languages, a model fine-tuned only by a source language may not perform well for target languages. Interestingly, we observe that while the top-1 results predicted by the previous approaches may often fail to hit the ground-truth answers, the correct answers are often contained in the top-k predicted results. Based on this observation, we develop a two-stage approach to enhance the model performance. The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer. The second stage focuses on precision: an answer-aware contrastive learning (AA-CL) mechanism is developed to learn the fine difference between the accurate answer and other candidates. Our extensive experiments show that our model significantly outperforms a series of strong baselines on two cross-lingual MRC benchmark datasets.

PDF Details

NeurIPS Conference 2022 Conference Paper

Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum

Nian Liu
Xiao Wang
Deyu Bo
Chuan Shi
Jian Pei

Graph Contrastive Learning (GCL), learning the node representations by augmenting graphs, has attracted considerable attentions. Despite the proliferation of various graph augmentation strategies, there are still some fundamental questions unclear: what information is essentially learned by GCL? Are there some general augmentation rules behind different augmentations? If so, what are they and what insights can they bring? In this paper, we answer these questions by establishing the connection between GCL and graph spectrum. By an experimental investigation in spectral domain, we firstly find the General grAph augMEntation (GAME) rule for GCL, i. e. , the difference of the high-frequency parts between two augmented graphs should be larger than that of low-frequency parts. This rule reveals the fundamental principle to revisit the current graph augmentations and design new effective graph augmentations. Then we theoretically prove that GCL is able to learn the invariance information by contrastive invariance theorem, together with our GAME rule, for the first time, we uncover that the learned representations by GCL essentially encode the low-frequency information, which explains why GCL works. Guided by this rule, we propose a spectral graph contrastive learning module (SpCo), which is a general and GCL-friendly plug-in. We combine it with different existing GCL models, and extensive experiments well demonstrate that it can further improve the performances of a wide variety of different GCL methods.

PDF Details

AAAI Conference 2021 Conference Paper

Knowledge-Enhanced Hierarchical Graph Transformer Network for Multi-Behavior Recommendation

Lianghao Xia
Chao Huang
Yong Xu
Peng Dai
Xiyue Zhang
Hongsheng Yang
Jian Pei
Liefeng Bo

Accurate user and item embedding learning is crucial for modern recommender systems. However, most existing recommendation techniques have thus far focused on modeling users’ preferences over singular type of user-item interactions. Many practical recommendation scenarios involve multi-typed user interactive behaviors (e. g. , page view, addto-favorite and purchase), which presents unique challenges that cannot be handled by current recommendation solutions. In particular: i) complex inter-dependencies across different types of user behaviors; ii) the incorporation of knowledgeaware item relations into the multi-behavior recommendation framework; iii) dynamic characteristics of multityped user-item interactions. To tackle these challenges, this work proposes a Knowledge-Enhanced Hierarchical Graph Transformer Network (KHGT), to investigate multi-typed interactive patterns between users and items in recommender systems. Speciﬁcally, KHGT is built upon a graph-structured neural architecture to i) capture type-speciﬁc behavior characteristics; ii) explicitly discriminate which types of user-item interactions are more important in assisting the forecasting task on the target behavior. Additionally, we further integrate the graph attention layer with the temporal encoding strategy, to empower the learned embeddings be reﬂective of both dedicated multiplex user-item and item-item relations, as well as the underlying interaction dynamics. Extensive experiments conducted on three real-world datasets show that KHGT consistently outperforms many state-of-the-art recommendation methods across various evaluation settings. Our implementation code is available in https: //github. com/akaxlh/KHGT.

PDF Details

AAAI Conference 2021 Conference Paper

Personalized Cross-Silo Federated Learning on Non-IID Data

Yutao Huang
Lingyang Chu
Zirui Zhou
Lanjun Wang
Jiangchuan Liu
Jian Pei
Yong Zhang

Non-IID data present a tough challenge for federated learning. In this paper, we explore a novel idea of facilitating pairwise collaborations between clients with similar data. We propose FedAMP, a new method employing federated attentive message passing to facilitate similar clients to collaborate more. We establish the convergence of FedAMP for both convex and non-convex models, and propose a heuristic method to further improve the performance of FedAMP when clients adopt deep neural networks as personalized models. Our extensive experiments on benchmark data sets demonstrate the superior performance of the proposed methods.

PDF Details

AAAI Conference 2021 Conference Paper

Reinforced Multi-Teacher Selection for Knowledge Distillation

Fei Yuan
Linjun Shou
Jian Pei
Wutao Lin
Ming Gong
Yan Fu
Daxin Jiang

In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models in production. As a popular method for model compression, knowledge distillation transfers knowledge from one or multiple large (teacher) models to a small (student) model. When multiple teacher models are available in distillation, the state-of-the-art methods assign a fixed weight to a teacher model in the whole distillation. Furthermore, most of the existing methods allocate an equal weight to every teacher model. In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled. We systematically develop a reinforced method to dynamically assign weights to teacher models for different training instances and optimize the performance of student model. Our extensive experimental results on several NLP tasks clearly verify the feasibility and effectiveness of our approach.

PDF Details

NeurIPS Conference 2021 Conference Paper

Robust Counterfactual Explanations on Graph Neural Networks

Mohit Bajaj
Lingyang Chu
Zi Yu Xue
Jian Pei
Lanjun Wang
Peter Cho-Ho Lam
Yong Zhang

Massive deployment of Graph Neural Networks (GNNs) in high-stake applications generates a strong demand for explanations that are robust to noise and align well with human intuition. Most existing methods generate explanations by identifying a subgraph of an input graph that has a strong correlation with the prediction. These explanations are not robust to noise because independently optimizing the correlation for a single input can easily overfit noise. Moreover, they are not counterfactual because removing an identified subgraph from an input graph does not necessarily change the prediction result. In this paper, we propose a novel method to generate robust counterfactual explanations on GNNs by explicitly modelling the common decision logic of GNNs on similar input graphs. Our explanations are naturally robust to noise because they are produced from the common decision boundaries of a GNN that govern the predictions of many similar input graphs. The explanations are also counterfactual because removing the set of edges identified by an explanation from the input graph changes the prediction significantly. Exhaustive experiments on many public datasets demonstrate the superior performance of our method.

PDF Details

IJCAI Conference 2020 Conference Paper

Sinkhorn Regression

Lei Luo
Jian Pei
Heng Huang

This paper introduces a novel Robust Regression (RR) model, named Sinkhorn regression, which imposes Sinkhorn distances on both loss function and regularization. Traditional RR methods target at searching for an element-wise loss function (e. g. , Lp-norm) to characterize the errors such that outlying data have a relatively smaller influence on the regression estimator. Due to the neglect of the geometric information, they often lead to the suboptimal results in the practical applications. To address this problem, we use a cross-bin distance function, i. e. , Sinkhorn distances, to capture the geometric knowledge of real data. Sinkhorn distances is invariant in movement, rotation and zoom. Thus, our method is more robust to variations of data than traditional regression models. Meanwhile, we leverage Kullback-Leibler divergence to relax the proposed model with marginal constraints into its unbalanced formulation to adapt more types of features. In addition, we propose an efficient algorithm to solve the relaxed model and establish its complete statistical guarantees under mild conditions. Experiments on the five publicly available microarray data sets and one mass spectrometry data set demonstrate the effectiveness and robustness of our method.

PDF Details DOI

AAAI Conference 2018 Conference Paper

TIMERS: Error-Bounded SVD Restart on Dynamic Networks

Ziwei Zhang
Peng Cui
Jian Pei
Xiao Wang
Wenwu Zhu

Singular Value Decomposition (SVD) is a popular approach in various network applications, such as link prediction and network parameter characterization. Incremental SVD approaches are proposed to process newly changed nodes and edges in dynamic networks. However, incremental SVD approaches suffer from serious error accumulation inevitably due to approximation on incremental updates. SVD restart is an effective approach to reset the aggregated error, but when to restart SVD for dynamic networks is not addressed in literature. In this paper, we propose TIMERS, Theoretically Instructed Maximum-Error-bounded Restart of SVD, a novel approach which optimally sets the restart time in order to reduce error accumulation in time. Speciﬁcally, we monitor the margin between reconstruction loss of incremental updates and the minimum loss in SVD model. To reduce the complexity of monitoring, we theoretically develop a lower bound of SVD minimum loss for dynamic networks and use the bound to replace the minimum loss in monitoring. By setting a maximum tolerated error as a threshold, we can trigger SVD restart automatically when the margin exceeds this threshold. We prove that the time complexity of our method is linear with respect to the number of local dynamic changes, and our method is general across different types of dynamic networks. We conduct extensive experiments on several synthetic and real dynamic networks. The experimental results demonstrate that our proposed method signiﬁcantly outperforms the existing methods by reducing 27% to 42% in terms of the maximum error for dynamic network reconstruction when ﬁxing the number of restarts. Our method reduces the number of restarts by 25% to 50% when ﬁxing the maximum error tolerated.

PDF Details

AAAI Conference 2017 Conference Paper

Community Preserving Network Embedding

Xiao Wang
Peng Cui
Jing Wang
Jian Pei
Wenwu Zhu
Shiqiang Yang

Network embedding, aiming to learn the low-dimensional representations of nodes in networks, is of paramount importance in many real applications. One basic requirement of network embedding is to preserve the structure and inherent properties of the networks. While previous network embedding methods primarily preserve the microscopic structure, such as the ﬁrst- and second-order proximities of nodes, the mesoscopic community structure, which is one of the most prominent feature of networks, is largely ignored. In this paper, we propose a novel Modularized Nonnegative Matrix Factorization (M-NMF) model to incorporate the community structure into network embedding. We exploit the consensus relationship between the representations of nodes and community structure, and then jointly optimize NMF based representation learning model and modularity based community detection model in a uniﬁed framework, which enables the learned representations of nodes to preserve both of the microscopic and community structures. We also provide efﬁcient updating rules to infer the parameters of our model, together with the correctness and convergence guarantees. Extensive experimental results on a variety of real-world networks show the superior performance of the proposed method over the state-of-the-arts.

PDF Details

TIST Journal 2013 Journal Article

Mining search and browse logs for web search

Daxin Jiang
Jian Pei
Hang Li

Huge amounts of search log data have been accumulated at Web search engines. Currently, a popular Web search engine may receive billions of queries and collect terabytes of records about user search behavior daily. Beside search log data, huge amounts of browse log data have also been collected through client-side browser plugins. Such massive amounts of search and browse log data provide great opportunities for mining the wisdom of crowds and improving Web search. At the same time, designing effective and efficient methods to clean, process, and model log data also presents great challenges. In this survey, we focus on mining search and browse log data for Web search. We start with an introduction to search and browse log data and an overview of frequently-used data summarizations in log mining. We then elaborate how log mining applications enhance the five major components of a search engine, namely, query understanding, document understanding, document ranking, user understanding, and monitoring and feedback. For each aspect, we survey the major tasks, fundamental principles, and state-of-the-art methods.

Details DOI

AAAI Conference 2013 Conference Paper

Towards Cohesive Anomaly Mining

Yun Xiong
Yangyong Zhu
Philip Yu
Jian Pei

In some applications, such as bioinformatics, social network analysis, and computational criminology, it is desirable to ﬁnd compact clusters formed by a (very) small portion of objects in a large data set. Since such clusters are comprised of a small number of objects, they are extraordinary and anomalous with respect to the entire data set. This speciﬁc type of clustering task cannot be solved well by the conventional clustering methods since generally those methods try to assign most of the data objects into clusters. In this paper, we model this novel and application-inspired task as the problem of mining cohesive anomalies. We propose a general framework and a principled approach to tackle the problem. The experimental results on both synthetic and real data sets verify the effectiveness and efﬁciency of our approach.

PDF Details

TIST Journal 2011 Journal Article

Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion

Zhen Liao
Daxin Jiang
Enhong Chen
Jian Pei
Huanhuan Cao
Hang Li

Query suggestion plays an important role in improving usability of search engines. Although some recently proposed methods provide query suggestions by mining query patterns from search logs, none of them models the immediately preceding queries as context systematically, and uses context information effectively in query suggestions. Context-aware query suggestion is challenging in both modeling context and scaling up query suggestion using context. In this article, we propose a novel context-aware query suggestion approach. To tackle the challenges, our approach consists of two stages. In the first, offline model-learning stage, to address data sparseness, queries are summarized into concepts by clustering a click-through bipartite. A concept sequence suffix tree is then constructed from session data as a context-aware query suggestion model. In the second, online query suggestion stage, a user’s search context is captured by mapping the query sequence submitted by the user to a sequence of concepts. By looking up the context in the concept sequence suffix tree, we suggest to the user context-aware queries. We test our approach on large-scale search logs of a commercial search engine containing 4.0 billion Web queries, 5.9 billion clicks, and 1.87 billion search sessions. The experimental results clearly show that our approach outperforms three baseline methods in both coverage and quality of suggestions.

Details DOI

IJCAI Conference 2009 Conference Paper

Zhengzheng Xing
Jian Pei
Philip S. Yu

In this paper, we formulate the problem of early classiﬁcation of time series data, which is important in some time-sensitive applications such as healthinformatics. We introduce a novel concept of MPL (Minimum Prediction Length) and develop ECTS (Early Classiﬁcation on Time Series), an effective 1-nearest neighbor classiﬁcation method. ECTS makes early predictions and at the same time retains the accuracy comparable to that of a 1NN classiﬁer using the full-length time series. Our empirical study using benchmark time series data sets shows that ECTS works well on the real data sets where 1NN classiﬁcation is effective.

PDF Details