Arrow Research search

Author name cluster

Peng Cui

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

41 papers
1 author row

Possible papers

41

AAAI Conference 2026 Conference Paper

Error Slice Discovery via Manifold Compactness

  • Han Yu
  • Hao Zou
  • Jiashuo Liu
  • Renzhe Xu
  • Yue He
  • Xingxuan Zhang
  • Peng Cui

Despite the great performance of deep learning models in many areas, they still make mistakes and underperform on certain subsets of data, i.e. error slices. Given a trained model, it is important to identify its semantically coherent error slices that are easy to interpret, which is referred to as the error slice discovery problem. However, there is no proper metric of slice coherence without relying on extra information like predefined slice labels. Current evaluation of slice coherence requires access to predefined slices formulated by metadata like attributes or subclasses. Its validity heavily relies on the quality and abundance of metadata, where some possible patterns could be ignored. Besides, current algorithms cannot directly incorporate the constraint of coherence into their optimization objective due to absence of an explicit coherence metric, which could potentially hinder their effectiveness. In this paper, we propose manifold compactness, a coherence metric without reliance on extra information by incorporating the data geometry property into its design, and experiments on typical datasets empirically validate the rationality of the metric. Then we develop Manifold Compactness based error Slice Discovery (MCSD), a novel algorithm that directly treats risk and coherence as the optimization objective, and is flexible to be applied to models of various tasks. Extensive experiments on the benchmark and case studies on other typical datasets demonstrate the superiority of MCSD.

AAAI Conference 2026 Conference Paper

Generating Risky Samples with Conformity Constraints via Diffusion Models

  • Han Yu
  • Hao Zou
  • Xingxuan Zhang
  • Zhengyi Wang
  • Yue He
  • Kehan Li
  • Peng Cui

Although neural networks achieve promising performance in many tasks, they may still fail when encountering some examples and bring about risks to applications. To discover risky samples, previous literature attempts to search for patterns of risky samples within existing datasets or inject perturbation into them. Yet in this way the diversity of risky samples is limited by the coverage of existing datasets. To overcome this limitation, recent works adopt diffusion models to produce new risky samples beyond the coverage of existing datasets. However, these methods struggle in the conformity between generated samples and expected categories, which could introduce label noise and severely limit their effectiveness in applications. To address this issue, we propose RiskyDiff that incorporates the embeddings of both texts and images as implicit constraints of category conformity. We also design a conformity score to further explicitly strengthen the category conformity, as well as introduce the mechanisms of embedding screening and risky gradient guidance to boost the risk of generated samples. Extensive experiments reveal that RiskyDiff greatly outperforms existing methods in terms of the degree of risk, generation quality, and conformity with conditioned categories. We also empirically show the generalization ability of the models can be enhanced by augmenting training data with generated samples of high conformity.

NeurIPS Conference 2025 Conference Paper

Environment Inference for Learning Generalizable Dynamical System

  • Shixuan Liu
  • Yue He
  • Haotian Wang
  • Wenjing Yang
  • Yunfei Wang
  • Peng Cui
  • Zhong Liu

Data-driven methods offer efficient and robust solutions for analyzing complex dynamical systems but rely on the assumption of I. I. D. data, driving the development of generalization techniques for handling environmental differences. These techniques, however, are limited by their dependence on environment labels, which are often unavailable during training due to data acquisition challenges, privacy concerns, and environmental variability, particularly in large public datasets and privacy-sensitive domains. In response, we propose DynaInfer, a novel method that infers environment specifications by analyzing prediction errors from fixed neural networks within each training round, enabling environment assignments directly from data. We prove our algorithm effectively solves the alternating optimization problem in unlabeled scenarios and validate it through extensive experiments across diverse dynamical systems. Results show that DynaInfer outperforms existing environment assignment techniques, converges rapidly to true labels, and even achieves superior performance when environment labels are available.

NeurIPS Conference 2024 Conference Paper

Bridging Multicalibration and Out-of-distribution Generalization Beyond Covariate Shift

  • Jiayun Wu
  • Jiashuo Liu
  • Peng Cui
  • Zhiwei S. Wu

We establish a new model-agnostic optimization framework for out-of-distribution generalization via multicalibration, a criterion that ensures a predictor is calibrated across a family of overlapping groups. Multicalibration is shown to be associated with robustness of statistical inference under covariate shift. We further establish a link between multicalibration and robustness for prediction tasks both under and beyond covariate shift. We accomplish this by extending multicalibration to incorporate grouping functions that consider covariates and labels jointly. This leads to an equivalence of the extended multicalibration and invariance, an objective for robust learning in existence of concept shift. We show a linear structure of the grouping function class spanned by density ratios, resulting in a unifying framework for robust learning by designing specific grouping functions. We propose MC-Pseudolabel, a post-processing algorithm to achieve both extended multicalibration and out-of-distribution generalization. The algorithm, with lightweight hyperparameters and optimization through a series of supervised regression steps, achieves superior performance on real-world datasets with distribution shift.

AAAI Conference 2023 Conference Paper

Covariate-Shift Generalization via Random Sample Weighting

  • Yue He
  • Xinwei Shen
  • Renzhe Xu
  • Tong Zhang
  • Yong Jiang
  • Wenchao Zou
  • Peng Cui

Shifts in the marginal distribution of covariates from training to the test phase, named covariate-shifts, often lead to unstable prediction performance across agnostic testing data, especially under model misspecification. Recent literature on invariant learning attempts to learn an invariant predictor from heterogeneous environments. However, the performance of the learned predictor depends heavily on the availability and quality of provided environments. In this paper, we propose a simple and effective non-parametric method for generating heterogeneous environments via Random Sample Weighting (RSW). Given the training dataset from a single source environment, we randomly generate a set of covariate-determining sample weights and use each weighted training distribution to simulate an environment. We theoretically show that under appropriate conditions, such random sample weighting can produce sufficient heterogeneity to be exploited by common invariance constraints to find the invariant variables for stable prediction under covariate shifts. Extensive experiments on both simulated and real-world datasets clearly validate the effectiveness of our method.

NeurIPS Conference 2023 Conference Paper

Learning Sample Difficulty from Pre-trained Models for Reliable Prediction

  • Peng Cui
  • Dan Zhang
  • Zhijie Deng
  • Yinpeng Dong
  • Jun Zhu

Large-scale pre-trained models have achieved remarkable success in many applications, but how to leverage them to improve the prediction reliability of downstream models is undesirably under-explored. Moreover, modern neural networks have been found to be poorly calibrated and make overconfident predictions regardless of inherent sample difficulty and data uncertainty. To address this issue, we propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization. Pre-trained models that have been exposed to large-scale datasets and do not overfit the downstream training classes enable us to measure each training sample’s difficulty via feature-space Gaussian modeling and relative Mahalanobis distance computation. Importantly, by adaptively penalizing overconfident prediction based on the sample difficulty, we simultaneously improve accuracy and uncertainty calibration across challenging benchmarks (e. g. , +0. 55% ACC and −3. 7% ECE on ImageNet1k using ResNet34), consistently surpassing competitive baselines for reliable prediction. The improved uncertainty estimate further improves selective classification (abstaining from erroneous predictions) and out-of-distribution detection.

NeurIPS Conference 2023 Conference Paper

On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets

  • Jiashuo Liu
  • Tianyu Wang
  • Peng Cui
  • Hongseok Namkoong

Different distribution shifts require different algorithmic and operational interventions. Methodological research must be grounded by the specific shifts they address. Although nascent benchmarks provide a promising empirical foundation, they \emph{implicitly} focus on covariate shifts, and the validity of empirical findings depends on the type of shift, e. g. , previous observations on algorithmic performance can fail to be valid when the $Y|X$ distribution changes. We conduct a thorough investigation of natural shifts in 5 tabular datasets over 86, 000 model configurations, and find that $Y|X$-shifts are most prevalent. To encourage researchers to develop a refined language for distribution shifts, we build ``WhyShift``, an empirical testbed of curated real-world shifts where we characterize the type of shift we benchmark performance over. Since $Y|X$-shifts are prevalent in tabular settings, we \emph{identify covariate regions} that suffer the biggest $Y|X$-shifts and discuss implications for algorithmic and data-based interventions. Our testbed highlights the importance of future research that builds an understanding of why distributions differ.

AAAI Conference 2023 Conference Paper

Stable Learning via Sparse Variable Independence

  • Han Yu
  • Peng Cui
  • Yue He
  • Zheyan Shen
  • Yong Lin
  • Renzhe Xu
  • Xingxuan Zhang

The problem of covariate-shift generalization has attracted intensive research attention. Previous stable learning algorithms employ sample reweighting schemes to decorrelate the covariates when there is no explicit domain information about training data. However, with finite samples, it is difficult to achieve the desirable weights that ensure perfect independence to get rid of the unstable variables. Besides, decorrelating within stable variables may bring about high variance of learned models because of the over-reduced effective sample size. A tremendous sample size is required for these algorithms to work. In this paper, with theoretical justification, we propose SVI (Sparse Variable Independence) for the covariate-shift generalization problem. We introduce sparsity constraint to compensate for the imperfectness of sample reweighting under the finite-sample setting in previous methods. Furthermore, we organically combine independence-based sample reweighting and sparsity-based variable selection in an iterative way to avoid decorrelating within stable variables, increasing the effective sample size to alleviate variance inflation. Experiments on both synthetic and real-world datasets demonstrate the improvement of covariate-shift generalization performance brought by SVI.

NeurIPS Conference 2023 Conference Paper

Towards Accelerated Model Training via Bayesian Data Selection

  • Zhijie Deng
  • Peng Cui
  • Jun Zhu

Mislabeled, duplicated, or biased data in real-world scenarios can lead to prolonged training and even hinder model convergence. Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety simultaneously. Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss. However, its practical adoption relies on less principled approximations and additional holdout data. This work solves these problems by leveraging a lightweight Bayesian treatment and incorporating off-the-shelf zero-shot predictors built on large-scale pre-trained models. The resulting algorithm is efficient and easy to implement. We perform extensive empirical studies on challenging benchmarks with considerable data noise and imbalance in the online batch selection scenario, and observe superior training efficiency over competitive baselines. Notably, on the challenging WebVision benchmark, our method can achieve similar predictive performance with significantly fewer training iterations than leading data selection methods.

NeurIPS Conference 2022 Conference Paper

Confidence-based Reliable Learning under Dual Noises

  • Peng Cui
  • Yang Yue
  • Zhijie Deng
  • Jun Zhu

Deep neural networks (DNNs) have achieved remarkable success in a variety of computer vision tasks, where massive labeled images are routinely required for model optimization. Yet, the data collected from the open world are unavoidably polluted by noise, which may significantly undermine the efficacy of the learned models. Various attempts have been made to reliably train DNNs under data noise, but they separately account for either the noise existing in the labels or that existing in the images. A naive combination of the two lines of works would suffer from the limitations in both sides, and miss the opportunities to handle the two kinds of noise in parallel. This works provides a first, unified framework for reliable learning under the joint (image, label)-noise. Technically, we develop a confidence-based sample filter to progressively filter out noisy data without the need of pre-specifying noise ratio. Then, we penalize the model uncertainty of the detected noisy data instead of letting the model continue over-fitting the misleading information in them. Experimental results on various challenging synthetic and real-world noisy datasets verify that the proposed method can outperform competing baselines in the aspect of classification performance.

NeurIPS Conference 2022 Conference Paper

Distributionally Robust Optimization with Data Geometry

  • Jiashuo Liu
  • Jiayun Wu
  • Bo Li
  • Peng Cui

Distributionally Robust Optimization (DRO) serves as a robust alternative to empirical risk minimization (ERM), which optimizes the worst-case distribution in an uncertainty set typically specified by distance metrics including $f$-divergence and the Wasserstein distance. The metrics defined in the ostensible high dimensional space lead to exceedingly large uncertainty sets, resulting in the underperformance of most existing DRO methods. It has been well documented that high dimensional data approximately resides on low dimensional manifolds. In this work, to further constrain the uncertainty set, we incorporate data geometric properties into the design of distance metrics, obtaining our novel Geometric Wasserstein DRO (GDRO). Empowered by Gradient Flow, we derive a generically applicable approximate algorithm for the optimization of GDRO, and provide the bounded error rate of the approximation as well as the convergence rate of our algorithm. We also theoretically characterize the edge cases where certain existing DRO methods are the degeneracy of GDRO. Extensive experiments justify the superiority of our GDRO to existing DRO methods in multiple settings with strong distributional shifts, and confirm that the uncertainty set of GDRO adapts to data geometry.

NeurIPS Conference 2022 Conference Paper

Product Ranking for Revenue Maximization with Multiple Purchases

  • Renzhe Xu
  • Xingxuan Zhang
  • Bo Li
  • Yafeng Zhang
  • Xiaolong Chen
  • Peng Cui

Product ranking is the core problem for revenue-maximizing online retailers. To design proper product ranking algorithms, various consumer choice models are proposed to characterize the consumers' behaviors when they are provided with a list of products. However, existing works assume that each consumer purchases at most one product or will keep viewing the product list after purchasing a product, which does not agree with the common practice in real scenarios. In this paper, we assume that each consumer can purchase multiple products at will. To model consumers' willingness to view and purchase, we set a random attention span and purchase budget, which determines the maximal amount of products that he/she views and purchases, respectively. Under this setting, we first design an optimal ranking policy when the online retailer can precisely model consumers' behaviors. Based on the policy, we further develop the Multiple-Purchase-with-Budget UCB (MPB-UCB) algorithms with $\tilde{O}(\sqrt{T})$ regret that estimate consumers' behaviors and maximize revenue simultaneously in online settings. Experiments on both synthetic and semi-synthetic datasets prove the effectiveness of the proposed algorithms.

NeurIPS Conference 2022 Conference Paper

ZIN: When and How to Learn Invariance Without Environment Partition?

  • Yong Lin
  • Shengyu Zhu
  • Lu Tan
  • Peng Cui

It is commonplace to encounter heterogeneous data, of which some aspects of the data distribution may vary but the underlying causal mechanisms remain constant. When data are divided into distinct environments according to the heterogeneity, recent invariant learning methods have proposed to learn robust and invariant models using this environment partition. It is hence tempting to utilize the inherent heterogeneity even when environment partition is not provided. Unfortunately, in this work, we show that learning invariant features under this circumstance is fundamentally impossible without further inductive biases or additional information. Then, we propose a framework to jointly learn environment partition and invariant representation, assisted by additional auxiliary information. We derive sufficient and necessary conditions for our framework to provably identify invariant features under a fairly general setting. Experimental results on both synthetic and real world datasets validate our analysis and demonstrate an improved performance of the proposed framework. Our findings also raise the need of making the role of inductive biases more explicit when learning invariant models without environment partition in future works. Codes are available at https: //github. com/linyongver/ZIN_official.

NeurIPS Conference 2021 Conference Paper

Integrated Latent Heterogeneity and Invariance Learning in Kernel Space

  • Jiashuo Liu
  • Zheyuan Hu
  • Peng Cui
  • Bo Li
  • Zheyan Shen

The ability to generalize under distributional shifts is essential to reliable machine learning, while models optimized with empirical risk minimization usually fail on non-$i. i. d$ testing data. Recently, invariant learning methods for out-of-distribution (OOD) generalization propose to find causally invariant relationships with multi-environments. However, modern datasets are frequently multi-sourced without explicit source labels, rendering many invariant learning methods inapplicable. In this paper, we propose Kernelized Heterogeneous Risk Minimization (KerHRM) algorithm, which achieves both the latent heterogeneity exploration and invariant learning in kernel space, and then gives feedback to the original neural network by appointing invariant gradient direction. We theoretically justify our algorithm and empirically validate the effectiveness of our algorithm with extensive experiments.

AAAI Conference 2021 Conference Paper

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

  • Kai Wang
  • Zhene Zou
  • Qilin Deng
  • Jianrong Tao
  • Runze Wu
  • Changjie Fan
  • Liang Chen
  • Peng Cui

In recent years, there are great interests as well as challenges in applying reinforcement learning (RL) to recommendation systems (RS). In this paper, we summarize three key practical challenges of large-scale RL-based recommender systems: massive state and action spaces, high-variance environment, and the unspecific reward setting in recommendation. All these problems remain largely unexplored in the existing literature and make the application of RL challenging. We develop a model-based reinforcement learning framework, called GoalRec. Inspired by the ideas of world model (model-based), value function estimation (model-free), and goal-based RL, a novel disentangled universal value function designed for item recommendation is proposed. It can generalize to various goals that the recommender may have, and disentangle the stochastic environmental dynamics and high-variance reward signals accordingly. As a part of the value function, free from the sparse and high-variance reward signals, a high-capacity reward-independent world model is trained to simulate complex environmental dynamics under a certain goal. Based on the predicted environmental dynamics, the disentangled universal value function is related to the user’s future trajectory instead of a monolithic state and a scalar reward. We demonstrate the superiority of GoalRec over previous approaches in terms of the above three practical challenges in a series of simulations and a real application.

AAAI Conference 2021 Conference Paper

Stable Adversarial Learning under Distributional Shifts

  • Jiashuo Liu
  • Zheyan Shen
  • Peng Cui
  • Linjun Zhou
  • Kun Kuang
  • Bo Li
  • Yishi Lin

Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts due to the greedy adoption of all the correlations found in training data. Recently, there are robust learning methods aiming at this problem by minimizing the worst-case risk over an uncertainty set. However, they equally treat all covariates to form the decision sets regardless of the stability of their correlations with the target, resulting in the overwhelmingly large set and low confidence of the learner. In this paper, we propose Stable Adversarial Learning (SAL) algorithm that leverages heterogeneous data sources to construct a more practical uncertainty set and conduct differentiated robustness optimization, where covariates are differentiated according to the stability of their correlations with the target. We theoretically show that our method is tractable for stochastic gradientbased optimization and provide the performance guarantees for our method. Empirical studies on both simulation and real datasets validate the effectiveness of our method in terms of uniformly good performance across unknown distributional shifts.

AAAI Conference 2020 Conference Paper

A Restricted Black-Box Adversarial Framework Towards Attacking Graph Embedding Models

  • Heng Chang
  • Yu Rong
  • Tingyang Xu
  • Wenbing Huang
  • Honglei Zhang
  • Peng Cui
  • Wenwu Zhu
  • Junzhou Huang

With the great success of graph embedding model on both academic and industry area, the robustness of graph embedding against adversarial attack inevitably becomes a central problem in graph learning domain. Regardless of the fruitful progress, most of the current works perform the attack in a white-box fashion: they need to access the model predictions and labels to construct their adversarial loss. However, the inaccessibility of model predictions in real systems makes the white-box attack impractical to real graph learning system. This paper promotes current frameworks in a more general and flexible sense – we demand to attack various kinds of graph embedding model with black-box driven. To this end, we begin by investigating the theoretical connections between graph signal processing and graph embedding models in a principled way and formulate the graph embedding model as a general graph signal process with corresponding graph filter. As such, a generalized adversarial attacker: GF-Attack is constructed by the graph filter and feature matrix. Instead of accessing any knowledge of the target classifiers used in graph embedding, GF-Attack performs the attack only on the graph filter in a black-box attack fashion. To validate the generalization of GF-Attack, we construct the attacker on four popular graph embedding models. Extensive experimental results validate the effectiveness of our attacker on several benchmark datasets. Particularly by using our attack, even small graph perturbations like one-edge flip is able to consistently make a strong attack in performance to different graph embedding models.

NeurIPS Conference 2020 Conference Paper

Calibrated Reliable Regression using Maximum Mean Discrepancy

  • Peng Cui
  • Wenbo Hu
  • Jun Zhu

Accurate quantification of uncertainty is crucial for real-world applications of machine learning. However, modern deep neural networks still produce unreliable predictive uncertainty, often yielding over-confident predictions. In this paper, we are concerned with getting well-calibrated predictions in regression tasks. We propose the calibrated regression method using the maximum mean discrepancy by minimizing the kernel embedding measure. Theoretically, the calibration error of our method asymptotically converges to zero when the sample size is large enough. Experiments on non-trivial real datasets show that our method can produce well-calibrated and sharp prediction intervals, which outperforms the related state-of-the-art methods.

NeurIPS Conference 2020 Conference Paper

Counterfactual Prediction for Bundle Treatment

  • Hao Zou
  • Peng Cui
  • Bo Li
  • Zheyan Shen
  • Jianxin Ma
  • Hongxia Yang
  • Yue He

Estimating counterfactual outcome of different treatments from observational data is an important problem to assist decision making in a variety of fields. Among the various forms of treatment specification, bundle treatment has been widely adopted in many scenarios, such as recommendation systems and online marketing. The bundle treatment usually can be abstracted as a high dimensional binary vector, which makes it more challenging for researchers to remove the confounding bias in observational data. In this work, we assume the existence of low dimensional latent structure underlying bundle treatment. Via the learned latent representations of treatments, we propose a novel variational sample re-weighting (VSR) method to eliminate confounding bias by decorrelating the treatments and confounders. Finally, we conduct extensive experiments to demonstrate that the predictive model trained on this re-weighted dataset can achieve more accurate counterfactual outcome prediction.

AAAI Conference 2020 Conference Paper

Rule-Guided Compositional Representation Learning on Knowledge Graphs

  • Guanglin Niu
  • Yongfei Zhang
  • Bo Li
  • Peng Cui
  • Si Liu
  • Jingyang Li
  • Xiaowei Zhang

Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces. Early KG embedding methods only pay attention to structured information encoded in triples, which would cause limited performance due to the structure sparseness of KGs. Some recent attempts consider paths information to expand the structure of KGs but lack explainability in the process of obtaining the path representations. In this paper, we propose a novel Rule and Path-based Joint Embedding (RPJE) scheme, which takes full advantage of the explainability and accuracy of logic rules, the generalization of KG embedding as well as the supplementary semantic structure of paths. Specifically, logic rules of different lengths (the number of relations in rule body) in the form of Horn clauses are first mined from the KG and elaborately encoded for representation learning. Then, the rules of length 2 are applied to compose paths accurately while the rules of length 1 are explicitly employed to create semantic associations among relations and constrain relation embeddings. Moreover, the confidence level of each rule is also considered in optimization to guarantee the availability of applying the rule to representation learning. Extensive experimental results illustrate that RPJE outperforms other state-of-the-art baselines on KG completion task, which also demonstrate the superiority of utilizing logic rules as well as paths for improving the accuracy and explainability of representation learning.

AAAI Conference 2020 Conference Paper

Stable Learning via Sample Reweighting

  • Zheyan Shen
  • Peng Cui
  • Tong Zhang
  • Kun Kunag

We consider the problem of learning linear prediction models with model misspecification bias. In such case, the collinearity among input variables may inflate the error of parameter estimation, resulting in instability of prediction results when training and test distributions do not match. In this paper we theoretically analyze this fundamental problem and propose a sample reweighting method that reduces collinearity among input variables. Our method can be seen as a pretreatment of data to improve the condition of design matrix, and it can then be combined with any standard learning method for parameter estimation and variable selection. Empirical studies on both simulation and real datasets demonstrate the effectiveness of our method in terms of more stable performance across different distributed data.

AAAI Conference 2020 Conference Paper

Stable Prediction with Model Misspecification and Agnostic Distribution Shift

  • Kun Kuang
  • Ruoxuan Xiong
  • Peng Cui
  • Susan Athey
  • Bo Li

For many machine learning algorithms, two main assumptions are required to guarantee performance. One is that the test data are drawn from the same distribution as the training data, and the other is that the model is correctly speci- fied. In real applications, however, we often have little prior knowledge on the test data and on the underlying true model. Under model misspecification, agnostic distribution shift between training and test data leads to inaccuracy of parameter estimation and instability of prediction across unknown test data. To address these problems, we propose a novel Decorrelated Weighting Regression (DWR) algorithm which jointly optimizes a variable decorrelation regularizer and a weighted regression model. The variable decorrelation regularizer estimates a weight for each sample such that variables are decorrelated on the weighted training data. Then, these weights are used in the weighted regression to improve the accuracy of estimation on the effect of each variable, thus help to improve the stability of prediction across unknown test data. Extensive experiments clearly demonstrate that our DWR algorithm can significantly improve the accuracy of parameter estimation and stability of prediction with model misspecification and agnostic distribution shift.

IJCAI Conference 2019 Conference Paper

Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation

  • Shengze Yu
  • Xin Wang
  • Wenwu Zhu
  • Peng Cui
  • Jingdong Wang

Cross-platform recommendation aims to improve recommendation accuracy through associating information from different platforms. Existing cross-platform recommendation approaches assume all cross-platform information to be consistent with each other and can be aligned. However, there remain two unsolved challenges: i) there exist inconsistencies in cross-platform association due to platform-specific disparity, and ii) data from distinct platforms may have different semantic granularities. In this paper, we propose a cross-platform association model for cross-platform video recommendation, i. e. , Disparity-preserved Deep Cross-platform Association (DCA), taking platform-specific disparity and granularity difference into consideration. The proposed DCA model employs a partially-connected multi-modal autoencoder, which is capable of explicitly capturing platform-specific information, as well as utilizing nonlinear mapping functions to handle granularity differences. We then present a cross-platform video recommendation approach based on the proposed DCA model. Extensive experiments for our cross-platform recommendation framework on real-world dataset demonstrate that the proposed DCA model significantly outperform existing cross-platform recommendation methods in terms of various evaluation metrics.

AAAI Conference 2019 Conference Paper

Incorporating Network Embedding into Markov Random Field for Better Community Detection

  • Di Jin
  • Xinxin You
  • Weihao Li
  • Dongxiao He
  • Peng Cui
  • Françoise Fogelman-Soulié
  • Tanmoy Chakraborty

Recent research on community detection focuses on learning representations of nodes using different network embedding methods, and then feeding them as normal features to clustering algorithms. However, we find that though one may have good results by direct clustering based on such network embedding features, there is ample room for improvement. More seriously, in many real networks, some statisticallysignificant nodes which play pivotal roles are often divided into incorrect communities using network embedding methods. This is because while some distance measures are used to capture the spatial relationship between nodes by embedding, the nodes after mapping to feature vectors are essentially not coupled any more, losing important structural information. To address this problem, we propose a general Markov Random Field (MRF) framework to incorporate coupling in network embedding which allows better detecting network communities. By smartly utilizing properties of MRF, the new framework not only preserves the advantages of network embedding (e. g. low complexity, high parallelizability and applicability for traditional machine learning), but also alleviates its core drawback of inadequate representations of dependencies via making up the missing coupling relationships. Experiments on real networks show that our new approach improves the accuracy of existing embedding methods (e. g. Node2Vec, DeepWalk and MNMF), and corrects most wrongly-divided statistically-significant nodes, which makes network embedding essentially suitable for real community detection applications. The new approach also outperforms other state-ofthe-art conventional community detection methods.

NeurIPS Conference 2019 Conference Paper

Learning Disentangled Representations for Recommendation

  • Jianxin Ma
  • Chang Zhou
  • Peng Cui
  • Hongxia Yang
  • Wenwu Zhu

User behavior data in recommender systems are driven by the complex interactions of many latent factors behind the users’ decision making processes. The factors are highly entangled, and may range from high-level ones that govern user intentions, to low-level ones that characterize a user’s preference when executing an intention. Learning representations that uncover and disentangle these latent factors can bring enhanced robustness, interpretability, and controllability. However, learning such disentangled representations from user behavior is challenging, and remains largely neglected by the existing literature. In this paper, we present the MACRo-mIcro Disentangled Variational Auto-Encoder (MacridVAE) for learning disentangled representations from user behavior. Our approach achieves macro disentanglement by inferring the high-level concepts associated with user intentions (e. g. , to buy a shirt or a cellphone), while capturing the preference of a user regarding the different concepts separately. A micro-disentanglement regularizer, stemming from an information-theoretic interpretation of VAEs, then forces each dimension of the representations to independently reflect an isolated low-level factor (e. g. , the size or the color of a shirt). Empirical results show that our approach can achieve substantial improvement over the state-of-the-art baselines. We further demonstrate that the learned representations are interpretable and controllable, which can potentially lead to a new paradigm for recommendation where users are given fine-grained control over targeted aspects of the recommendation lists.

AAAI Conference 2018 Conference Paper

Deep Asymmetric Transfer Network for Unbalanced Domain Adaptation

  • Daixin Wang
  • Peng Cui
  • Wenwu Zhu

Recently, domain adaptation based on deep models has been a promising way to deal with the domains with scarce labeled data, which is a critical problem for deep learning models. Domain adaptation propagates the knowledge from a source domain with rich information to the target domain. In reality, the source and target domains are mostly unbalanced in that the source domain is more resource-rich and thus has more reliable knowledge than the target domain. However, existing deep domain adaptation approaches often pre-assume the source and target domains balanced and equally, leading to a medium solution between the source and target domains, which is not optimal for the unbalanced domain adaptation. In this paper, we propose a novel Deep Asymmetric Transfer Network (DATN) to address the problem of unbalanced domain adaptation. Specifically, our model will learn a transfer function from the target domain to the source domain and meanwhile adapting the source domain classifier with more discriminative power to the target domain. By doing this, the deep model is able to adaptively put more emphasis on the resource-rich source domain. To alleviate the scarcity problem of supervised data, we further propose an unsupervised transfer method to propagate the knowledge from a lot of unsupervised data by minimizing the distribution discrepancy over the unlabeled data of two domains. The experiments on two real-world datasets demonstrate that DATN attains a substantial gain over state-of-the-art methods.

AAAI Conference 2018 Conference Paper

DepthLGP: Learning Embeddings of Out-of-Sample Nodes in Dynamic Networks

  • Jianxin Ma
  • Peng Cui
  • Wenwu Zhu

Network embedding algorithms to date are primarily designed for static networks, where all nodes are known before learning. How to infer embeddings for out-of-sample nodes, i. e. nodes that arrive after learning, remains an open problem. The problem poses great challenges to existing methods, since the inferred embeddings should preserve intricate network properties such as high-order proximity, share similar characteristics (i. e. be of a homogeneous space) with in-sample node embeddings, and be of low computational cost. To overcome these challenges, we propose a Deeply Transformed High-order Laplacian Gaussian Process (DepthLGP) method to infer embeddings for out-of-sample nodes. DepthLGP combines the strength of nonparametric probabilistic modeling and deep learning. In particular, we design a high-order Laplacian Gaussian process (hLGP) to encode network properties, which permits fast and scalable inference. In order to further ensure homogeneity, we then employ a deep neural network to learn a nonlinear transformation from latent states of the hLGP to node embeddings. DepthLGP is general, in that it is applicable to embeddings learned by any network embedding algorithms. We theoretically prove the expressive power of DepthLGP, and conduct extensive experiments on real-world networks. Empirical results demonstrate that our approach can achieve significant performance gain over existing approaches.

IJCAI Conference 2018 Conference Paper

Power-law Distribution Aware Trust Prediction

  • Xiao Wang
  • Ziwei Zhang
  • Jing Wang
  • Peng Cui
  • Shiqiang Yang

Trust prediction, aiming to predict the trust relations between users in a social network, is a key to helping users discover the reliable information. Many trust prediction methods are proposed based on the low-rank assumption of a trust network. However, one typical property of the trust network is that the trust relations follow the power-law distribution, i. e. , few users are trusted by many other users, while most tail users have few trustors. Due to these tail users, the fundamental low-rank assumption made by existing methods is seriously violated and becomes unrealistic. In this paper, we propose a simple yet effective method to address the problem of the violated low-rank assumption. Instead of discovering the low-rank component of the trust network alone, we learn a sparse component of the trust network to describe the tail users simultaneously. With both of the learned low-rank and sparse components, the trust relations in the whole network can be better captured. Moreover, the transitive closure structure of the trust relations is also integrated into our model. We then derive an effective iterative algorithm to infer the parameters of our model, along with the proof of correctness. Extensive experimental results on real-world trust networks demonstrate the superior performance of our proposed method over the state-of-the-arts.

AAAI Conference 2018 Conference Paper

Structural Deep Embedding for Hyper-Networks

  • Ke Tu
  • Peng Cui
  • Xiao Wang
  • Fei Wang
  • Wenwu Zhu

Network embedding has recently attracted lots of attentions in data mining. Existing network embedding methods mainly focus on networks with pairwise relationships. In real world, however, the relationships among data points could go beyond pairwise, i. e. , three or more objects are involved in each relationship represented by a hyperedge, thus forming hyper-networks. These hyper-networks pose great challenges to existing network embedding methods when the hyperedges are indecomposable, that is to say, any subset of nodes in a hyperedge cannot form another hyperedge. These indecomposable hyperedges are especially common in heterogeneous networks. In this paper, we propose a novel Deep Hyper-Network Embedding (DHNE) model to embed hypernetworks with indecomposable hyperedges. More specifically, we theoretically prove that any linear similarity metric in embedding space commonly used in existing methods cannot maintain the indecomposibility property in hypernetworks, and thus propose a new deep model to realize a non-linear tuplewise similarity function while preserving both local and global proximities in the formed embedding space. We conduct extensive experiments on four different types of hyper-networks, including a GPS network, an online social network, a drug network and a semantic network. The empirical results demonstrate that our method can significantly and consistently outperform the state-of-the-art algorithms.

AAAI Conference 2018 Conference Paper

TIMERS: Error-Bounded SVD Restart on Dynamic Networks

  • Ziwei Zhang
  • Peng Cui
  • Jian Pei
  • Xiao Wang
  • Wenwu Zhu

Singular Value Decomposition (SVD) is a popular approach in various network applications, such as link prediction and network parameter characterization. Incremental SVD approaches are proposed to process newly changed nodes and edges in dynamic networks. However, incremental SVD approaches suffer from serious error accumulation inevitably due to approximation on incremental updates. SVD restart is an effective approach to reset the aggregated error, but when to restart SVD for dynamic networks is not addressed in literature. In this paper, we propose TIMERS, Theoretically Instructed Maximum-Error-bounded Restart of SVD, a novel approach which optimally sets the restart time in order to reduce error accumulation in time. Specifically, we monitor the margin between reconstruction loss of incremental updates and the minimum loss in SVD model. To reduce the complexity of monitoring, we theoretically develop a lower bound of SVD minimum loss for dynamic networks and use the bound to replace the minimum loss in monitoring. By setting a maximum tolerated error as a threshold, we can trigger SVD restart automatically when the margin exceeds this threshold. We prove that the time complexity of our method is linear with respect to the number of local dynamic changes, and our method is general across different types of dynamic networks. We conduct extensive experiments on several synthetic and real dynamic networks. The experimental results demonstrate that our proposed method significantly outperforms the existing methods by reducing 27% to 42% in terms of the maximum error for dynamic network reconstruction when fixing the number of restarts. Our method reduces the number of restarts by 25% to 50% when fixing the maximum error tolerated.

AAAI Conference 2017 Conference Paper

Community Preserving Network Embedding

  • Xiao Wang
  • Peng Cui
  • Jing Wang
  • Jian Pei
  • Wenwu Zhu
  • Shiqiang Yang

Network embedding, aiming to learn the low-dimensional representations of nodes in networks, is of paramount importance in many real applications. One basic requirement of network embedding is to preserve the structure and inherent properties of the networks. While previous network embedding methods primarily preserve the microscopic structure, such as the first- and second-order proximities of nodes, the mesoscopic community structure, which is one of the most prominent feature of networks, is largely ignored. In this paper, we propose a novel Modularized Nonnegative Matrix Factorization (M-NMF) model to incorporate the community structure into network embedding. We exploit the consensus relationship between the representations of nodes and community structure, and then jointly optimize NMF based representation learning model and modularity based community detection model in a unified framework, which enables the learned representations of nodes to preserve both of the microscopic and community structures. We also provide efficient updating rules to infer the parameters of our model, together with the correctness and convergence guarantees. Extensive experimental results on a variety of real-world networks show the superior performance of the proposed method over the state-of-the-arts.

AAAI Conference 2017 Conference Paper

Treatment Effect Estimation with Data-Driven Variable Decomposition

  • Kun Kuang
  • Peng Cui
  • Bo Li
  • Meng Jiang
  • Shiqiang Yang
  • Fei Wang

One fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Control for confounding effect is generally handled by propensity score. But it treats all observed variables as confounders and ignores the adjustment variables, which have no influence on treatment but are predictive of the outcome. Recently, it has been demonstrated that the adjustment variables are effective in reducing the variance of estimated treatment effect. However, how to automatically separate the confounders and adjustment variables in observational studies is still an open problem, especially in the scenarios of high dimensional variables, which are common in big data era. In this paper, we propose a Data-Driven Variable Decomposition (D2 VD) algorithm, which can 1) automatically separate confounders and adjustment variables with a data driven approach, and 2) simultaneously estimate treatment effect in observational studies with high dimensional variables. Under standard assumptions, we show experimentally that our D2 VD algorithm can automatically separate the variables precisely, and estimate treatment effect more accurately and with tighter confidence intervals than the state-of-the-art methods on both synthetic data and real online advertising dataset.

AAAI Conference 2016 Conference Paper

Little Is Much: Bridging Cross-Platform Behaviors through Overlapped Crowds

  • Meng Jiang
  • Peng Cui
  • Nicholas Jing Yuan
  • Xing Xie
  • Shiqiang Yang

People often use multiple platforms to fulfill their different information needs. With the ultimate goal of serving people intelligently, a fundamental way is to get comprehensive understanding about user needs. How to organically integrate and bridge cross-platform information in a human-centric way is important. Existing transfer learning assumes either fullyoverlapped or non-overlapped among the users. However, the real case is the users of different platforms are partially overlapped. The number of overlapped users is often small and the explicitly known overlapped users is even less due to the lacking of unified ID for a user across different platforms. In this paper, we propose a novel semi-supervised transfer learning method to address the problem of cross-platform behavior prediction, called XPTRANS. To alleviate the sparsity issue, it fully exploits the small number of overlapped crowds to optimally bridge a user’s behaviors in different platforms. Extensive experiments across two real social networks show that XPTRANS significantly outperforms the state-of-the-art. We demonstrate that by fully exploiting 26% overlapped users, XPTRANS can predict the behaviors of non-overlapped users with the same accuracy as overlapped users, which means the small overlapped crowds can successfully bridge the information across different platforms.

IS Journal 2016 Journal Article

Online Behavioral Analysis and Modeling [Guest Editorial]

  • Peng Cui
  • Huan Liu
  • Charu Aggarwal
  • Fei Wang

Online behavioral analysis and modeling has aroused considerable interest from closely related research fields such as data mining, machine learning, and information retrieval. This special issue provides a forum for researchers in behavior analysis to review pressing needs, discuss challenging research issues, and showcase state-of-the-art research and development in modern Web platforms.

IS Journal 2016 Journal Article

Suspicious Behavior Detection: Current Trends and Future Directions

  • Meng Jiang
  • Peng Cui
  • Christos Faloutsos

Different real-world applications have varying definitions of suspicious behaviors. Detection methods often look for the most suspicious parts of the data by optimizing scores, but quantifying the suspiciousness of a behavioral pattern is still an open issue.

IS Journal 2016 Journal Article

Uncovering and Predicting Human Behaviors

  • Peng Cui
  • Huan Liu
  • Charu Aggarwal
  • Fei Wang

This installment of Trends & Controversies provides an array of perspectives on the latest research in modeling user behavior. Peng Cui, Huan Liu, Charu Aggarwal, and Fei Wang introduce the field in "Uncovering and Predicting Human Behaviors. " The essays included are "Computational Modeling of Complex User Behaviors: Challenges and Opportunities, " by Peng Cui, Huan Liu, Charu Aggarwal, and Fei Wang; "Non-IID Recommendation Theories and Systems, " by Longbing Cao and Philip S. Yu; "User Behavior Modeling and Fraud Detection, " by Alex Beutel and Christos Faloutsos; and "Transfer Learning for Behavior Prediction, " by Weike Pan and Qiang Yang.

IJCAI Conference 2015 Conference Paper

Deep Multimodal Hashing with Orthogonal Regularization

  • Daixin Wang
  • Peng Cui
  • Mingdong Ou
  • Wenwu Zhu

Hashing is an important method for performing efficient similarity search. With the explosive growth of multimodal data, how to learn hashing-based compact representations for multimodal data becomes highly non-trivial. Compared with shallowstructured models, deep models present superiority in capturing multimodal correlations due to their high nonlinearity. However, in order to make the learned representation more accurate and compact, how to reduce the redundant information lying in the multimodal representations and incorporate different complexities of different modalities in the deep models is still an open problem. In this paper, we propose a novel deep multimodal hashing method, namely Deep Multimodal Hashing with Orthogonal Regularization (DMHOR), which fully exploits intra-modality and inter-modality correlations. In particular, to reduce redundant information, we impose orthogonal regularizer on the weighting matrices of the model, and theoretically prove that the learned representation is guaranteed to be approximately orthogonal. Moreover, we find that a better representation can be attained with different numbers of layers for different modalities, due to their different complexities. Comprehensive experiments on WIKI and NUS-WIDE, demonstrate a substantial gain of DMHOR compared with state-of-the-art methods.

AAAI Conference 2015 Conference Paper

Perceiving Group Themes from Collective Social and Behavioral Information

  • Peng Cui
  • Tianyang Zhang
  • Fei Wang
  • Peng He

Collective social and behavioral information commonly exists in nature. There is a widespread intuitive sense that the characteristics of these social and behavioral information are to some extend related to the themes (or semantics) of the activities or targets. In this paper, we explicitly validate the interplay of collective social behavioral information and group themes using a large scale real dataset of online groups, and demonstrate the possibility of perceiving group themes from collective social and behavioral information. We propose a REgularized miXEd Regression (REXER) model based on matrix factorization to infer hierarchical semantics (including both group category and group labels) from collective social and behavioral information of group members. We extensively evaluate the proposed method in a large scale real online group dataset. For the prediction of group themes, the proposed REXER achieves satisfactory performances in various criterions. More specifically, we can predict the category of a group (among 6 categories) purely based on the collective social and behavioral information of the group with the Precision@1 to be 55. 16%, without any assistance from group labels or conversation contents. We also show, perhaps counterintuitively, that the collective social and behavioral information is more reliable than the titles and labels of groups for inferring the group categories.

AAAI Conference 2015 Conference Paper

Probabilistic Attributed Hashing

  • Mingdong Ou
  • Peng Cui
  • Jun Wang
  • Fei Wang
  • Wenwu Zhu

Due to the simplicity and efficiency, many hashing methods have recently been developed for large-scale similarity search. Most of the existing hashing methods focus on mapping low-level features to binary codes, but neglect attributes that are commonly associated with data samples. Attribute data, such as image tag, product brand, and user profile, can represent human recognition better than low-level features. However, attributes have specific characteristics, including high-dimensional, sparse and categorical properties, which is hardly leveraged into the existing hashing learning frameworks. In this paper, we propose a hashing learning framework, Probabilistic Attributed Hashing (PAH), to integrate attributes with low-level features. The connections between attributes and low-level features are built through sharing a common set of latent binary variables, i. e. hash codes, through which attributes and features can complement each other. Finally, we develop an efficient iterative learning algorithm, which is generally feasible for large-scale applications. Extensive experiments and comparison study are conducted on two public datasets, i. e. , DBLP and NUS-WIDE. The results clearly demonstrate that the proposed PAH method substantially outperforms the peer methods.

AAAI Conference 2011 Conference Paper

Item-Level Social Influence Prediction with Probabilistic Hybrid Factor Matrix Factorization

  • Peng Cui
  • Fei Wang
  • Shiqiang Yang
  • Lifeng Sun

Social influence has become the essential factor which drives the dynamic evolution process of social network structure and user behaviors. Previous research often focus on social in- fluence analysis in network-level or topic-level. In this paper, we concentrate on predicting item-level social influence to reveal the users’ influences in a more fine-grained level. We formulate the social influence prediction problem as the estimation of a user-post matrix, where each entry in the matrix represents the social influence strength the corresponding user has given the corresponding web post. To deal with the sparsity and complex factor challenges in the research, we model the problem by extending the probabilistic matrix factorization method to incorporate rich prior knowledge on both user dimension and web post dimension, and propose the Probabilistic Hybrid Factor Matrix Factorization (PHF- MF) approach. Intensive experiments are conducted on a real world online social network to demonstrate the advantages and characteristics of the proposed method.