Arrow Research search

Author name cluster

Weiwei Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

53 papers
2 author rows

Possible papers

53

AAAI Conference 2026 Conference Paper

On the Robustness of Bandit Multiple Testing

  • Zhengyu Zhou
  • Weiwei Liu

Bandit multiple hypothesis testing has broad applications in biological sciences, clinical testing for drug discovery, and online A/B/n testing. The framework utilizes an adaptive sampling strategy for multiple testing which aims to maximize statistical power while ensuring anytime false discovery rate control. This paper proposes a robust approach for bandit multiple testing, allowing for at most an epsilon fraction of arbitrary distribution corruption, as in Huber’s contamination model. Specifically, we introduce two adaptive sampling strategies designed to minimize the number of samples required to exceed a target true positive rate, while providing anytime control over the false discovery rate. We analyze the sample complexity of our proposed methods and perform numerical simulations to demonstrate their efficiency and robustness. Furthermore, we extend our methods to address scenarios where distributions have infinite variance and situations involving multiple agents collaborating on the same bandit task.

TMLR Journal 2026 Journal Article

Quantitative Analysis of the Effect of Density Ratio Estimation in Covariate Shift Adaptation

  • Chenglin Yu
  • Zhengyu Zhou
  • Weiwei Liu

In supervised learning, it is essential to assume that the test sample and the training sample come from the same distribution. But in reality, this assumption is frequently broken, which can lead to subpar performance from the learned model. We examine the learning problem under \emph{covariate shift}, in which the conditional distribution of labels given covariates does not change despite the covariate distribution shifting. Two-step procedures, which first compute the density ratio and then carry out importance-weighted empirical risk minimization, are a popular family of methods for addressing covariate shift. However, the two-step techniques' performance could degrade due to estimation error of the density ratio. Unfortunately, the extent of the density ratio estimation error that affects the accuracy of learning algorithms is rarely analyzed. This paper accordingly provides a quantitative answer to this question. Specifically, we formulate the two-step covariate adaptation methods as a meta-algorithm. We show that the effect of the density ratio estimation error on the excess risk bound of the meta algorithm is of the fourth order, i.e., $\mathcal{O}\left(\epsilon_{1}\left(\mathcal{G}, S_{s1}, S_t, \delta/2\right)^4\right)$, if the true risk satisfies a requirement known as the \emph{derivative vanishing} property, where $\epsilon_{1}\left(\mathcal{G}, S_{s1}, S_t, \delta/2\right)$ is the convergence rate of the density ratio estimation algorithm, $\mathcal{G}$ is the density ratio function class, $S_{s1}$ and $S_t$ are the samples generated by training distribution and test distribution respectively, and $\delta/2$ is the confidence parameter. Moreover, we analyze the impact of two specific density ratio estimation algorithms, Kullback-Leibler Importance Estimation Procedure and Kernel unconstrained Least-Squares Importance Fitting, on the final classifier's generalization error. We also report the experimental results of two-step covariate shift adaptation with a toy classification dataset using KLIEP.

AAAI Conference 2026 Conference Paper

Rademacher Complexity for Distributionally Robust Learning

  • Zhengyu Zhou
  • Weiwei Liu

The goal of distributionally robust learning is to learn models capable of performing well against distributional shifts, such as latent heterogeneous subpopulations, unknown covariate shifts, or unmodeled temporal effects. Recently, Duchi and Namkoong (2021) have proven an upper bound for the excess risk of distributionally robust learning through the lens of covering number argument. However, there are situations where the covering argument fails. This motivates us to study the generalization bound through the lens of Rademacher complexity. More specifically, we consider the Cressie-Read divergence, f sub k of t is proportional to t to the k minus one. Our theoretical results indicate that the excess risk is of the order big O sub P of n to the negative one over two k star, where k star equals k over k minus one. The decay rate of the excess risk increases with increasing k. As illustrative examples, we consider three learning settings: 1) linear classifier; 2) Gaussian reproducing kernel Hilbert space; 3) one-hidden-layer networks. The empirical results validate our theoretical findings.

ICML Conference 2025 Conference Paper

Model Uncertainty Quantification by Conformal Prediction in Continual Learning

  • Rui Gao 0004
  • Weiwei Liu

Continual learning has attracted increasing research attention in recent years due to its promising experimental results in real-world applications. In this paper, we study the issue of calibration in continual learning which reliably quantifies the uncertainty of model predictions. Conformal prediction (CP) provides a general framework for model calibration, which outputs prediction intervals or sets with a theoretical high coverage guarantee as long as the samples are exchangeable. However, the tasks in continual learning are learned in sequence, which violates the principle that data should be exchangeable. Meanwhile, the model learns the current task with limited or no access to data from previous tasks, which is not conducive to constructing the calibration set. To address these issues, we propose a CP-based method for model uncertainty quantification in continual learning (CPCL), which also reveals the connection between prediction interval length and forgetting. We analyze the oracle prediction interval in continual learning and theoretically prove the asymptotic coverage guarantee of CPCL. Finally, extensive experiments on simulated and real data empirically verify the validity of our proposed method.

NeurIPS Conference 2025 Conference Paper

On the SAC-BL Algorithm for Anomaly Detection

  • Xinsong Ma
  • Jie Wu
  • Weiwei Liu

Visual anomaly detection is significant in safety-critical and reliability-sensitive scenarios. Prior studies mainly emphasize the design and training of scoring functions, while little effort has been devoted to constructing decision rules based on these score functions. A recent work Ma et al. (2025b) highlights this issue and proposes the SAC-BL algorithm to address it. This method consists of a strong anomaly constraint (SAC) network and a betting-like (BL) algorithm serving as the decision rule. The SAC-BL algorithm can control the false discovery rate (FDR). However the performance of SAC-BL algorithm on anomalous examples, or its false positive rate (FPR), has not been thoroughly investigated. This paper provides a deeper analysis of this problem and explores how to theoretically reduce its FPR. First, we show that as the number of testing examples tends to infinity, the SAC-BL algorithm performs well on abnormal data if the scores follow the generalized Gaussian-like distribution family. But such conditions about the number of testing examples and the distribution of scores are overly restrictive for the real-world applications. So, we attempt to decrease the FPR of the SAC-BL algorithm under the condition of finite samples for practical anomaly detection. To this end, we redesign the BL algorithm by incorporating a randomization strategy and propose a novel stochastic BL (SBL) algorithm. The combination of the SAC network and the SBL algorithm yields our method, SAC-SBL. Theoretical results show that the SAC-SBL algorithm can achieve smaller FPR than SAC-BL algorithm while controlling its FDR. Finally, extensive experimental results demonstrate the superiority of our method over SAC-BL algorithm on multiple visual anomaly detection benchmarks.

NeurIPS Conference 2024 Conference Paper

A Boosting-Type Convergence Result for AdaBoost.MH with Factorized Multi-Class Classifiers

  • Xin Zou
  • Zhengyu Zhou
  • Jingyuan Xu
  • Weiwei Liu

AdaBoost is a well-known algorithm in boosting. Schapire and Singer propose, an extension of AdaBoost, named AdaBoost. MH, for multi-class classification problems. Kégl shows empirically that AdaBoost. MH works better when the classical one-against-all base classifiers are replaced by factorized base classifiers containing a binary classifier and a vote (or code) vector. However, the factorization makes it much more difficult to provide a convergence result for the factorized version of AdaBoost. MH. Then, Kégl raises an open problem in COLT 2014 to look for a convergence result for the factorized AdaBoost. MH. In this work, we resolve this open problem by presenting a convergence result for AdaBoost. MH with factorized multi-class classifiers.

AAAI Conference 2024 Conference Paper

A Closer Look at Curriculum Adversarial Training: From an Online Perspective

  • Lianghe Shi
  • Weiwei Liu

Curriculum adversarial training empirically finds that gradually increasing the hardness of adversarial examples can further improve the adversarial robustness of the trained model compared to conventional adversarial training. However, theoretical understanding of this strategy remains limited. In an attempt to bridge this gap, we analyze the adversarial training process from an online perspective. Specifically, we treat adversarial examples in different iterations as samples from different adversarial distributions. We then introduce the time series prediction framework and deduce novel generalization error bounds. Our theoretical results not only demonstrate the effectiveness of the conventional adversarial training algorithm but also explain why curriculum adversarial training methods can further improve adversarial generalization. We conduct comprehensive experiments to support our theory.

AAAI Conference 2024 Conference Paper

Coverage-Guaranteed Prediction Sets for Out-of-Distribution Data

  • Xin Zou
  • Weiwei Liu

Out-of-distribution (OOD) generalization has attracted increasing research attention in recent years, due to its promising experimental results in real-world applications. In this paper, we study the confidence set prediction problem in the OOD generalization setting. Split conformal prediction (SCP) is an efficient framework for handling the confidence set prediction problem. However, the validity of SCP requires the examples to be exchangeable, which is violated in the OOD setting. Empirically, we show that trivially applying SCP results in a failure to maintain the marginal coverage when the unseen target domain is different from the source domain. To address this issue, we develop a method for forming confident prediction sets in the OOD setting and theoretically prove the validity of our method. Finally, we conduct experiments on simulated data to empirically verify the correctness of our theory and the validity of our proposed method.

AAAI Conference 2024 Conference Paper

DRF: Improving Certified Robustness via Distributional Robustness Framework

  • Zekai Wang
  • Zhengyu Zhou
  • Weiwei Liu

Randomized smoothing (RS) has provided state-of-the-art (SOTA) certified robustness against adversarial perturbations for large neural networks. Among studies in this field, methods based on adversarial training (AT) achieve remarkably robust performance by applying adversarial examples to construct the smoothed classifier. These AT-based RS methods typically seek a pointwise adversary that generates the worst-case adversarial examples by perturbing each input independently. However, there are unexplored benefits to considering such adversarial robustness across the entire data distribution. To this end, we provide a novel framework called DRF, which connects AT-based RS methods with distributional robustness (DR), and show that these methods are special cases of their counterparts in our framework. Due to the advantages conferred by DR, our framework can control the trade-off between the clean accuracy and certified robustness of smoothed classifiers to a significant extent. Our experiments demonstrate that DRF can substantially improve the certified robustness of AT-based RS.

NeurIPS Conference 2024 Conference Paper

Error Analysis of Spherically Constrained Least Squares Reformulation in Solving the Stackelberg Prediction Game

  • Xiyuan Li
  • Weiwei Liu

The Stackelberg prediction game (SPG) is a popular model for characterizing strategic interactions between a learner and an adversarial data provider. Although optimization problems in SPGs are often NP-hard, a notable special case involving the least squares loss (SPG-LS) has gained significant research attention recently, (Bishop et al. 2020; Wang et al. 2021; Wang et al. 2022). The latest state-of-the-art method for solving the SPG-LS problem is the spherically constrained least squares reformulation (SCLS) method proposed in the work of Wang et al. (2022). However, the lack of theoretical analysis on the error of the SCLS method limits its large-scale applications. In this paper, we investigate the estimation error between the learner obtained by the SCLS method and the actual learner. Specifically, we reframe the estimation error of the SCLS method as a Primary Optimization ($\textbf{PO}$) problem and utilize the Convex Gaussian min-max theorem (CGMT) to transform the $\textbf{PO}$ problem into an Auxiliary Optimization ($\textbf{AO}$) problem. Subsequently, we provide a theoretical error analysis for the SCLS method based on this simplified $\textbf{AO}$ problem. This analysis not only strengthens the theoretical framework of the SCLS method but also confirms the reliability of the learner produced by it. We further conduct experiments to validate our theorems, and the results are in excellent agreement with our theoretical predictions.

NeurIPS Conference 2024 Conference Paper

The Reliability of OKRidge Method in Solving Sparse Ridge Regression Problems

  • Xiyuan Li
  • Youjun Wang
  • Weiwei Liu

Sparse ridge regression problems play a significant role across various domains. To solve sparse ridge regression, Liu et al. (2023) recently propose an advanced algorithm, Scalable Optimal $K$-Sparse Ridge Regression (OKRidge), which is both faster and more accurate than existing approaches. However, the absence of theoretical analysis on the error of OKRidge impedes its large-scale applications. In this paper, we reframe the estimation error of OKRidge as a Primary Optimization ($\textbf{PO}$) problem and employ the Convex Gaussian min-max theorem (CGMT) to simplify the $\textbf{PO}$ problem into an Auxiliary Optimization ($\textbf{AO}$) problem. Subsequently, we provide a theoretical error analysis for OKRidge based on the $\textbf{AO}$ problem. This error analysis improves the theoretical reliability of OKRidge. We also conduct experiments to verify our theorems and the results are in excellent agreement with our theoretical findings.

IJCAI Conference 2024 Conference Paper

Zero-shot Learning for Preclinical Drug Screening

  • Kun Li
  • Weiwei Liu
  • Yong Luo
  • Xiantao Cai
  • Jia Wu
  • Wenbin Hu

Conventional deep learning methods typically employ supervised learning for drug response prediction (DRP). This entails dependence on labeled response data from drugs for model training. However, practical applications in the preclinical drug screening phase demand that DRP models predict responses for novel compounds, often with unknown drug responses. This presents a challenge, rendering supervised deep learning methods unsuitable for such scenarios. In this paper, we propose a zero-shot learning solution for the DRP task in preclinical drug screening. Specifically, we propose a Multi-branch Multi-Source Domain Adaptation Test Enhancement Plug-in, called MSDA. MSDA can be seamlessly integrated with conventional DRP methods, learning invariant features from the prior response data of similar drugs to enhance real-time predictions of unlabeled compounds. The results of experiments on two large drug response datasets showed that MSDA efficiently predicts drug responses for novel compounds, leading to a general performance improvement of 5-10% in the preclinical drug screening phase. The significance of this solution resides in its potential to accelerate the drug discovery process, improve drug candidate assessment, and facilitate the success of drug discovery. The code is available at https: //github. com/DrugD/MSDA.

NeurIPS Conference 2023 Conference Paper

A Theory of Transfer-Based Black-Box Attacks: Explanation and Implications

  • Yanbo Chen
  • Weiwei Liu

Transfer-based attacks are a practical method of black-box adversarial attacks, in which the attacker aims to craft adversarial examples from a source (surrogate) model that is transferable to the target model. A wide range of empirical works has tried to explain the transferability of adversarial examples from different angles. However, these works only provide ad hoc explanations without quantitative analyses. The theory behind transfer-based attacks remains a mystery. This paper studies transfer-based attacks under a unified theoretical framework. We propose an explanatory model, called the manifold attack model, that formalizes popular beliefs and explains the existing empirical results. Our model explains why adversarial examples are transferable even when the source model is inaccurate. Moreover, our model implies that the existence of transferable adversarial examples depends on the “curvature” of the data manifold, which quantitatively explains why the success rates of transfer-based attacks are hard to improve. We also discuss the expressive power and the possible extensions of our model in general applications.

NeurIPS Conference 2023 Conference Paper

Adversarial Self-Training Improves Robustness and Generalization for Gradual Domain Adaptation

  • Lianghe Shi
  • Weiwei Liu

Gradual Domain Adaptation (GDA), in which the learner is provided with additional intermediate domains, has been theoretically and empirically studied in many contexts. Despite its vital role in security-critical scenarios, the adversarial robustness of the GDA model remains unexplored. In this paper, we adopt the effective gradual self-training method and replace vanilla self-training with adversarial self-training (AST). AST first predicts labels on the unlabeled data and then adversarially trains the model on the pseudo-labeled distribution. Intriguingly, we find that gradual AST improves not only adversarial accuracy but also clean accuracy on the target domain. We reveal that this is because adversarial training (AT) performs better than standard training when the pseudo-labels contain a portion of incorrect labels. Accordingly, we first present the generalization error bounds for gradual AST in a multiclass classification setting. We then use the optimal value of the Subset Sum Problem to bridge the standard error on a real distribution and the adversarial error on a pseudo-labeled distribution. The result indicates that AT may obtain a tighter bound than standard training on data with incorrect pseudo-labels. We further present an example of a conditional Gaussian distribution to provide more insights into why gradual AST can improve the clean accuracy for GDA.

NeurIPS Conference 2023 Conference Paper

Characterization of Overfitting in Robust Multiclass Classification

  • Jingyuan Xu
  • Weiwei Liu

This paper considers the following question: Given the number of classes m, the number of robust accuracy queries k, and the number of test examples in the dataset n, how much can adaptive algorithms robustly overfit the test dataset? We solve this problem by equivalently giving near-matching upper and lower bounds of the robust overfitting bias in multiclass classification problems.

ICML Conference 2023 Conference Paper

DDGR: Continual Learning with Deep Diffusion-based Generative Replay

  • Rui Gao 0004
  • Weiwei Liu

Popular deep-learning models in the field of image classification suffer from catastrophic forgetting—models will forget previously acquired skills when learning new ones. Generative replay (GR), which typically consists of a generator and a classifier, is an efficient way to mitigate catastrophic forgetting. However, conventional GR methods only focus on a single instruction relationship (generator-to-classifier), where the generator synthesizes samples for previous tasks to instruct the training of the classifier, while ignoring the ways in which the classifier can benefit the generator. In addition, most generative replay methods typically reuse the generated samples to update the generator, which causes the samples regenerated by the generator deviating from the distribution of previous tasks. To overcome these two issues, we propose a novel approach, called deep diffusion-based generative replay (DDGR), which adopts a diffusion model as the generator and calculates an instruction-operator through the classifier to instruct the generation of samples. Extensive experiments in class incremental (CI) and class incremental with repetition (CIR) settings demonstrate the advantages of DDGR. Our code is available at https: //github. com/xiaocangshengGR/DDGR.

IJCAI Conference 2023 Conference Paper

Deep Partial Multi-Label Learning with Graph Disambiguation

  • Haobo Wang
  • Shisong Yang
  • Gengyu Lyu
  • Weiwei Liu
  • Tianlei Hu
  • Ke Chen
  • Songhe Feng
  • Gang Chen

In partial multi-label learning (PML), each data example is equipped with a candidate label set, which consists of multiple ground-truth labels and other false-positive labels. Recently, graph-based methods, which demonstrate a good ability to estimate accurate confidence scores from candidate labels, have been prevalent to deal with PML problems. However, we observe that existing graph-based PML methods typically adopt linear multi-label classifiers and thus fail to achieve superior performance. In this work, we attempt to remove several obstacles for extending them to deep models and propose a novel deep Partial multi-Label model with grAph-disambIguatioN (PLAIN). Specifically, we introduce the instance-level and label-level similarities to recover label confidences as well as exploit label dependencies. At each training epoch, labels are propagated on the instance and label graphs to produce relatively accurate pseudo-labels; then, we train the deep model to fit the numerical labels. Moreover, we provide a careful analysis of the risk functions to guarantee the robustness of the proposed model. Extensive experiments on various synthetic datasets and three real-world PML datasets demonstrate that PLAIN achieves significantly superior results to state-of-the-art methods.

JMLR Journal 2023 Journal Article

Generalization Bounds for Adversarial Contrastive Learning

  • Xin Zou
  • Weiwei Liu

Deep networks are well-known to be fragile to adversarial attacks, and adversarial training is one of the most popular methods used to train a robust model. To take advantage of unlabeled data, recent works have applied adversarial training to contrastive learning (Adversarial Contrastive Learning; ACL for short) and obtain promising robust performance. However, the theory of ACL is not well understood. To fill this gap, we leverage the Rademacher omplexity to analyze the generalization performance of ACL, with a particular focus on linear models and multi-layer neural networks under $\ell_p$ attack ($p \ge 1$). Our theory shows that the average adversarial risk of the downstream tasks can be upper bounded by the adversarial unsupervised risk of the upstream task. The experimental results validate our theory. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2023 Conference Paper

On the Adversarial Robustness of Out-of-distribution Generalization Models

  • Xin Zou
  • Weiwei Liu

Out-of-distribution (OOD) generalization has attracted increasing research attention in recent years, due to its promising experimental results in real-world applications. Interestingly, we find that existing OOD generalization methods are vulnerable to adversarial attacks. This motivates us to study OOD adversarial robustness. We first present theoretical analyses of OOD adversarial robustness in two different complementary settings. Motivated by the theoretical results, we design two algorithms to improve the OOD adversarial robustness. Finally, we conduct experiments to validate the effectiveness of our proposed algorithms.

JMLR Journal 2023 Journal Article

RVCL: Evaluating the Robustness of Contrastive Learning via Verification

  • Zekai Wang
  • Weiwei Liu

Contrastive adversarial training has successfully improved the robustness of contrastive learning (CL). However, the robustness metric in these methods depends on attack algorithms, image labels, and downstream tasks, introducing reliability concerns. To address these issues, this paper proposes a novel Robustness Verification framework for Contrastive Learning (RVCL). Specifically, we define the verification problem of CL from deterministic and probabilistic perspectives, then provide several effective metrics to evaluate the robustness of CL encoder. Furthermore, we use extreme value theory to reveal the relationship between the robust radius of the CL encoder and that of the supervised downstream task. Extensive experiments on various benchmark models and datasets validate theoretical findings, and further demonstrate RVCL's capability to evaluate the robustness of both CL encoders and images. Our code is available at https://github.com/wzekai99/RVCL-JMLR. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

JMLR Journal 2023 Journal Article

Sample Complexity for Distributionally Robust Learning under chi-square divergence

  • Zhengyu Zhou
  • Weiwei Liu

This paper investigates the sample complexity of learning a distributionally robust predictor under a particular distributional shift based on $\chi^2$-divergence, which is well known for its computational feasibility and statistical properties. We demonstrate that any hypothesis class $\mathcal{H}$ with finite VC dimension is distributionally robustly learnable. Moreover, we show that when the perturbation size is smaller than a constant, finite VC dimension is also necessary for distributionally robust learning by deriving a lower bound of sample complexity in terms of VC dimension. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

AAAI Conference 2023 Conference Paper

WAT: Improve the Worst-Class Robustness in Adversarial Training

  • Boqi Li
  • Weiwei Liu

Deep Neural Networks (DNN) have been shown to be vulnerable to adversarial examples. Adversarial training (AT) is a popular and effective strategy to defend against adversarial attacks. Recent works have shown that a robust model well-trained by AT exhibits a remarkable robustness disparity among classes, and propose various methods to obtain consistent robust accuracy across classes. Unfortunately, these methods sacrifice a good deal of the average robust accuracy. Accordingly, this paper proposes a novel framework of worst-class adversarial training and leverages no-regret dynamics to solve this problem. Our goal is to obtain a classifier with great performance on worst-class and sacrifice just a little average robust accuracy at the same time. We then rigorously analyze the theoretical properties of our proposed algorithm, and the generalization error bound in terms of the worst-class robust risk. Furthermore, we propose a measurement to evaluate the proposed method in terms of both the average and worst-class accuracies. Experiments on various datasets and networks show that our proposed method outperforms the state-of-the-art approaches.

NeurIPS Conference 2022 Conference Paper

Defending Against Adversarial Attacks via Neural Dynamic System

  • Xiyuan Li
  • Zou Xin
  • Weiwei Liu

Although deep neural networks (DNN) have achieved great success, their applications in safety-critical areas are hindered due to their vulnerability to adversarial attacks. Some recent works have accordingly proposed to enhance the robustness of DNN from a dynamic system perspective. Following this line of inquiry, and inspired by the asymptotic stability of the general nonautonomous dynamical system, we propose to make each clean instance be the asymptotically stable equilibrium points of a slowly time-varying system in order to defend against adversarial attacks. We present a theoretical guarantee that if a clean instance is an asymptotically stable equilibrium point and the adversarial instance is in the neighborhood of this point, the asymptotic stability will reduce the adversarial noise to bring the adversarial instance close to the clean instance. Motivated by our theoretical results, we go on to propose a nonautonomous neural ordinary differential equation (ASODE) and place constraints on its corresponding linear time-variant system to make all clean instances act as its asymptotically stable equilibrium points. Our analysis suggests that the constraints can be converted to regularizers in implementation. The experimental results show that ASODE improves robustness against adversarial attacks and outperforms state-of-the-art methods.

NeurIPS Conference 2022 Conference Paper

On Robust Multiclass Learnability

  • Jingyuan Xu
  • Weiwei Liu

This work analyzes the robust learning problem in the multiclass setting. Under the framework of Probably Approximately Correct (PAC) learning, we first show that the graph dimension and the Natarajan dimension, which characterize the standard multiclass learnability, are no longer applicable in robust learning problem. We then generalize these notions to the robust learning setting, denoted as the adversarial graph dimension (AG-dimension) and the adversarial Natarajan dimension (AN-dimension). Upper and lower bounds of the sample complexity of robust multiclass learning are rigorously derived based on the AG-dimension and AN-dimension, respectively. Moreover, we calculate the AG-dimension and AN-dimension of the class of linear multiclass predictors, and show that the graph (Natarajan) dimension is of the same order as the AG(AN)-dimension. Finally, we prove that the AG-dimension and AN-dimension are not equivalent.

NeurIPS Conference 2022 Conference Paper

On the Tradeoff Between Robustness and Fairness

  • Xinsong Ma
  • Zekai Wang
  • Weiwei Liu

Interestingly, recent experimental results [2, 26, 22] have identified a robust fairness phenomenon in adversarial training (AT), namely that a robust model well-trained by AT exhibits a remarkable disparity of standard accuracy and robust accuracy among different classes compared with natural training. However, the effect of different perturbation radii in AT on robust fairness has not been studied, and one natural question is raised: does a tradeoff exist between average robustness and robust fairness? Our extensive experimental results provide an affirmative answer to this question: with an increasing perturbation radius, stronger AT will lead to a larger class-wise disparity of robust accuracy. Theoretically, we analyze the class-wise performance of adversarially trained linear models with mixture Gaussian distribution. Our theoretical results support our observations. Moreover, our theory shows that adversarial training easily leads to more serious robust fairness issue than natural training. Motivated by theoretical results, we propose a fairly adversarial training (FAT) method to mitigate the tradeoff between average robustness and robust fairness. Experimental results validate the effectiveness of our proposed method.

IJCAI Conference 2020 Conference Paper

Collaboration Based Multi-Label Propagation for Fraud Detection

  • Haobo Wang
  • Zhao Li
  • Jiaming Huang
  • Pengrui Hui
  • Weiwei Liu
  • Tianlei Hu
  • Gang Chen

Detecting fraud users, who fraudulently promote certain target items, is a challenging issue faced by e-commerce platforms. Generally, many fraud users have different spam behaviors simultaneously, e. g. spam transactions, clicks, reviews and so on. Existing solutions have two main limitations: 1) the correlations among multiple spam behaviors are neglected; 2) large-scale computations are intractable when dealing with an enormous user set. To remedy these problems, this work proposes a collaboration based multi-label propagation (CMLP) algorithm. We first introduce a general-purpose version that involves collaboration technique to exploit label correlations. Specifically, it breaks the final prediction into two parts: 1) its own prediction part; 2) the prediction of others, i. e. collaborative part. Then, to accelerate it on large-scale e-commerce data, we propose a heterogeneous graph based variant that detects communities on the user-item graph directly. Both theoretical analysis and empirical results clearly validate the effectiveness and scalability of our proposals.

AAAI Conference 2020 Conference Paper

Incorporating Label Embedding and Feature Augmentation for Multi-Dimensional Classification

  • Haobo Wang
  • Chen Chen
  • Weiwei Liu
  • Ke Chen
  • Tianlei Hu
  • Gang Chen

Feature augmentation, which manipulates the feature space by integrating the label information, is one of the most popular strategies for solving Multi-Dimensional Classification (MDC) problems. However, the vanilla feature augmentation approaches fail to consider the intra-class exclusiveness, and may achieve degenerated performance. To fill this gap, a novel neural network based model is proposed which seamlessly integrates the Label Embedding and Feature Augmentation (LEFA) techniques to learn label correlations. Specifically, based on attentional factorization machine, a cross correlation aware network is introduced to learn a low-dimensional label representation that simultaneously depicts the inter-class correlations and the intra-class exclusiveness. Then the learned latent label vector can be used to augment the original feature space. Extensive experiments on seven real-world datasets demonstrate the superiority of LEFA over state-of-the-art MDC approaches.

IJCAI Conference 2020 Conference Paper

Learning From Multi-Dimensional Partial Labels

  • Haobo Wang
  • Weiwei Liu
  • Yang Zhao
  • Tianlei Hu
  • Ke Chen
  • Gang Chen

Multi-dimensional classification has attracted huge attention from the community. Though most studies consider fully annotated data, in real practice obtaining fully labeled data in MDC tasks is usually intractable. In this paper, we propose a novel learning paradigm: MultiDimensional Partial Label Learning (MDPL) where the ground-truth labels of each instance are concealed in multiple candidate label sets. We first introduce the partial hamming loss for MDPL that incurs a large loss if the predicted labels are not in candidate label sets, and provide an empirical risk minimization (ERM) framework. Theoretically, we rigorously prove the conditions for ERM learnability of MDPL in both independent and dependent cases. Furthermore, we present two MDPL algorithms under our proposed ERM framework. Comprehensive experiments on both synthetic and real-world datasets validate the effectiveness of our proposals.

IJCAI Conference 2020 Conference Paper

Multichannel Color Image Denoising via Weighted Schatten p-norm Minimization

  • Xinjian Huang
  • Bo Du
  • Weiwei Liu

The R, G and B channels of a color image generally have different noise statistical properties or noise strengths. It is thus problematic to apply grayscale image denoising algorithms to color image denoising. In this paper, based on the non-local self-similarity of an image and the different noise strength across each channel, we propose a MultiChannel Weighted Schatten p-Norm Minimization (MCWSNM) model for RGB color image denoising. More specifically, considering a small local RGB patch in a noisy image, we first find its nonlocal similar cubic patches in a search window with an appropriate size. These similar cubic patches are then vectorized and grouped to construct a noisy low-rank matrix, which can be recovered using the Schatten p-norm minimization framework. Moreover, a weight matrix is introduced to balance each channel’s contribution to the final denoising results. The proposed MCWSNM can be solved via the alternating direction method of multipliers. Convergence property of the proposed method are also theoretically analyzed. Experiments conducted on both synthetic and real noisy color image datasets demonstrate highly competitive denoising performance, outperforming comparison algorithms, including several methods based on neural networks.

IJCAI Conference 2020 Conference Paper

Opinion Maximization in Social Trust Networks

  • Pinghua Xu
  • Wenbin Hu
  • Jia Wu
  • Weiwei Liu

Social media sites are now becoming very important platforms for product promotion or marketing campaigns. Therefore, there is broad interest in determining ways to guide a site to react more positively to a product with a limited budget. However, the practical significance of the existing studies on this subject is limited for two reasons. First, most studies have investigated the issue in oversimplified networks in which several important network characteristics are ignored. Second, the opinions of individuals are modeled as bipartite states (e. g. , support or not) in numerous studies, however, this setting is too strict for many real scenarios. In this study, we focus on social trust networks (STNs), which have the significant characteristics ignored in the previous studies. We generalized a famed continuous-valued opinion dynamics model for STNs, which is more consistent with real scenarios. We subsequently formalized two novel problems for solving the issue in STNs. In addition, we developed two matrix-based methods for these two problems and experiments on realworld datasets to demonstrate the practical utility of our methods.

AAAI Conference 2020 Conference Paper

Temporal Network Embedding with High-Order Nonlinear Information

  • Zhenyu Qiu
  • Wenbin Hu
  • Jia Wu
  • Weiwei Liu
  • Bo Du
  • Xiaohua Jia

Temporal network embedding, which aims to learn the lowdimensional representations of nodes in temporal networks that can capture and preserve the network structure and evolution pattern, has attracted much attention from the scientific community. However, existing methods suffer from two main disadvantages: 1) they cannot preserve the node temporal proximity that capture important properties of the network structure; and 2) they cannot represent the nonlinear structure of temporal networks. In this paper, we propose a high-order nonlinear information preserving (HNIP) embedding method to address these issues. Specifically, we define three orders of temporal proximities by exploring network historical information with a time exponential decay model to quantify the temporal proximity between nodes. Then, we propose a novel deep guided auto-encoder to capture the highly nonlinear structure. Meanwhile, the training set of the guide autoencoder is generated by the temporal random walk (TRW) algorithm. By training the proposed deep guided auto-encoder with a specific mini-batch stochastic gradient descent algorithm, HNIP can efficiently preserves the temporal proximities and highly nonlinear structure of temporal networks. Experimental results on four real-world networks demonstrate the effectiveness of the proposed method.

NeurIPS Conference 2019 Conference Paper

Copula Multi-label Learning

  • Weiwei Liu

A formidable challenge in multi-label learning is to model the interdependencies between labels and features. Unfortunately, the statistical properties of existing multi-label dependency modelings are still not well understood. Copulas are a powerful tool for modeling dependence of multivariate data, and achieve great success in a wide range of applications, such as finance, econometrics and systems neuroscience. This inspires us to develop a novel copula multi-label learning paradigm for modeling label and feature dependencies. The copula based paradigm enables to reveal new statistical insights in multi-label learning. In particular, the paper first leverages the kernel trick to construct continuous distribution in the output space, and then estimates our proposed model semiparametrically where the copula is modeled parametrically, while the marginal distributions are modeled nonparametrically. Theoretically, we show that our estimator is an unbiased and consistent estimator and follows asymptotically a normal distribution. Moreover, we bound the mean squared error of estimator. The experimental results from various domains validate the superiority of our proposed approach.

IJCAI Conference 2019 Conference Paper

Discriminative and Correlative Partial Multi-Label Learning

  • Haobo Wang
  • Weiwei Liu
  • Yang Zhao
  • Chen Zhang
  • Tianlei Hu
  • Gang Chen

In partial label learning (PML), each instance is associated with a candidate label set that contains multiple relevant labels and other false positive labels. The most challenging issue for the PML is that the training procedure is prone to be affected by the labeling noise. We observe that state-of-the-art PML methods are either powerless to disambiguate the correct labels from the candidate labels or incapable of extracting the label correlations sufficiently. To fill this gap, a two-stage DiscRiminative and correlAtive partial Multi-label leArning (DRAMA) algorithm is presented in this work. In the first stage, a confidence value is learned for each label by utilizing the feature manifold, which indicates how likely a label is correct. In the second stage, a gradient boosting model is induced to fit the label confidences. Specifically, to explore the label correlations, we augment the feature space by the previously elicited labels on each boosting round. Extensive experiments on various real-world datasets clearly validate the superiority of our proposed method.

AAAI Conference 2019 Conference Paper

Two-Stage Label Embedding via Neural Factorization Machine for Multi-Label Classification

  • Chen Chen
  • Haobo Wang
  • Weiwei Liu
  • Xingyuan Zhao
  • Tianlei Hu
  • Gang Chen

Label embedding has been widely used as a method to exploit label dependency with dimension reduction in multilabel classification tasks. However, existing embedding methods intend to extract label correlations directly, and thus they might be easily trapped by complex label hierarchies. To tackle this issue, we propose a novel Two-Stage Label Embedding (TSLE) paradigm that involves Neural Factorization Machine (NFM) to jointly project features and labels into a latent space. In encoding phase, we introduce a Twin Encoding Network (TEN) that digs out pairwise feature and label interactions in the first stage and then efficiently learn higherorder correlations with deep neural networks (DNNs) in the second stage. After the codewords are obtained, a set of hidden layers is applied to recover the output labels in decoding phase. Moreover, we develop a novel learning model by leveraging a max margin encoding loss and a label-correlation aware decoding loss, and we adopt the mini-batch Adam to optimize our learning model. Lastly, we also provide a kernel insight to better understand our proposed TSLE. Extensive experiments on various real-world datasets demonstrate that our proposed model significantly outperforms other state-ofthe-art approaches.

AAAI Conference 2018 Conference Paper

Compact Multi-Label Learning

  • Xiaobo Shen
  • Weiwei Liu
  • Ivor Tsang
  • Quan-Sen Sun
  • Yew-Soon Ong

Embedding methods have shown promising performance in multi-label prediction, as they can discover the dependency of labels. Most embedding methods cannot well align the input and output, which leads to degradation in prediction performance. Besides, they suffer from expensive prediction computational costs when applied to large-scale datasets. To address the above issues, this paper proposes a Co-Hashing (CoH) method by formulating multi-label learning from the perspective of cross-view learning. CoH first regards the input and output as two views, and then aims to learn a common latent hamming space, where input and output pairs are compressed into compact binary embeddings. CoH enjoys two key benefits: 1) the input and output can be well aligned, and their correlations are explored; 2) the prediction is very efficient using fast cross-view kNN search in the hamming space. Moreover, we provide the generalization error bound for our method. Extensive experiments on eight real-world datasets demonstrate the superiority of the proposed CoH over the state-of-the-art methods in terms of both prediction accuracy and efficiency.

IJCAI Conference 2018 Conference Paper

Deep Discrete Prototype Multilabel Learning

  • Xiaobo Shen
  • Weiwei Liu
  • Yong Luo
  • Yew-Soon Ong
  • Ivor W. Tsang

kNN embedding methods, such as the state-of-the-art LM-kNN, have shown impressive results in multi-label learning. Unfortunately, these approaches suffer expensive computation and memory costs in large-scale settings. To fill this gap, this paper proposes a novel deep prototype compression, i. e. , DBPC for fast multi-label prediction. DBPC compresses the database into a small set of short discrete prototypes, and uses the prototypes for prediction. The benefit of DBPC comes from two aspects: 1) The number of distance comparisons are reduced in the prototype; 2) The distance computation cost is significantly decreased in the reduced space. We propose to jointly learn the deep latent subspace and discrete prototypes within one framework. The encoding and decoding neural networks are employed to make deep discrete prototypes well represent the instances and labels. Extensive experiments on several large-scale datasets demonstrate that DBPC achieves several orders of magnitude lower storage and prediction complexity than state-of-the-art multi-label methods, while achieving competitive accuracy.

IJCAI Conference 2018 Conference Paper

Discrete Network Embedding

  • Xiaobo Shen
  • Shirui Pan
  • Weiwei Liu
  • Yew-Soon Ong
  • Quan-Sen Sun

Network embedding aims to seek low-dimensional vector representations for network nodes, by preserving the network structure. The network embedding is typically represented in continuous vector, which imposes formidable challenges in storage and computation costs, particularly in large-scale applications. To address the issue, this paper proposes a novel discrete network embedding (DNE) for more compact representations. In particular, DNE learns short binary codes to represent each node. The Hamming similarity between two binary embeddings is then employed to well approximate the ground-truth similarity. A novel discrete multi-class classifier is also developed to expedite classification. Moreover, we propose to jointly learn the discrete embedding and classifier within a unified framework to improve the compactness and discrimination of network embedding. Extensive experiments on node classification consistently demonstrate that DNE exhibits lower storage and computational complexity than state-of-the-art network embedding methods, while obtains competitive classification results.

AAAI Conference 2018 Conference Paper

Doubly Approximate Nearest Neighbor Classification

  • Weiwei Liu
  • Zhuanghua Liu
  • Ivor Tsang
  • Wenjie Zhang
  • Xuemin Lin

Nonparametric classification models, such as K-Nearest Neighbor (KNN), have become particularly powerful tools in machine learning and data mining, due to their simplicity and flexibility. However, the testing time of the KNN classi- fier becomes unacceptable and the KNN’s performance deteriorates significantly when applied to data sets with millions of dimensions. We observe that state-of-the-art approximate nearest neighbor (ANN) methods aim to either reduce the number of distance comparisons based on tree structure or decrease the cost of distance computation by dimension reduction methods. In this paper, we propose a doubly approximate nearest neighbor classification strategy, which marries the two branches which compress the dimensions for decreasing distance computation cost as well as reduce the number of distance comparison instead of full scan. Under this strategy, we build a compressed dimensional tree (CD-Tree) to avoid unnecessary distance calculations. In each decision node, we propose a novel feature selection paradigm by optimizing the feature selection vector as well as the separator (indicator variables for splitting instances) with the maximum margin. An efficient algorithm is then developed to find the globally optimal solution with convergence guarantee. Furthermore, we also provide a data-dependent generalization error bound for our model, which reveals a new insight for the design of ANN classification algorithms. Our empirical studies show that our algorithm consistently obtains competitive or better classification results on all data sets, yet we can also achieve three orders of magnitude faster than state-of-the-art libraries on very high dimensions.

TIST Journal 2018 Journal Article

Multiview Discrete Hashing for Scalable Multimedia Search

  • Xiaobo Shen
  • Fumin Shen
  • Li Liu
  • Yun-Hao Yuan
  • Weiwei Liu
  • Quan-Sen Sun

Hashing techniques have recently gained increasing research interest in multimedia studies. Most existing hashing methods only employ single features for hash code learning. Multiview data with each view corresponding to a type of feature generally provides more comprehensive information. How to efficiently integrate multiple views for learning compact hash codes still remains challenging. In this article, we propose a novel unsupervised hashing method, dubbed multiview discrete hashing (MvDH), by effectively exploring multiview data. Specifically, MvDH performs matrix factorization to generate the hash codes as the latent representations shared by multiple views, during which spectral clustering is performed simultaneously. The joint learning of hash codes and cluster labels enables that MvDH can generate more discriminative hash codes, which are optimal for classification. An efficient alternating algorithm is developed to solve the proposed optimization problem with guaranteed convergence and low computational complexity. The binary codes are optimized via the discrete cyclic coordinate descent (DCC) method to reduce the quantization errors. Extensive experimental results on three large-scale benchmark datasets demonstrate the superiority of the proposed method over several state-of-the-art methods in terms of both accuracy and scalability.

IJCAI Conference 2018 Conference Paper

Ranking Preserving Nonnegative Matrix Factorization

  • Jing Wang
  • Feng Tian
  • Weiwei Liu
  • Xiao Wang
  • Wenjie Zhang
  • Kenji Yamanishi

Nonnegative matrix factorization (NMF), a well-known technique to find parts-based representations of nonnegative data, has been widely studied. In reality, ordinal relations often exist among data, such as data i is more related to j than to q. Such relative order is naturally available, and more importantly, it truly reflects the latent data structure. Preserving the ordinal relations enables us to find structured representations of data that are faithful to the relative order, so that the learned representations become more discriminative. However, current NMFs pay no attention to this. In this paper, we make the first attempt towards incorporating the ordinal relations and propose a novel ranking preserving nonnegative matrix factorization (RPNMF) approach, which enforces the learned representations to be ranked according to the relations. We derive iterative updating rules to solve RPNMF's objective function with convergence guaranteed. Experimental results with several datasets for clustering and classification have demonstrated that RPNMF achieves greater performance against the state-of-the-arts, not only in terms of accuracy, but also interpretation of orderly data structure.

JMLR Journal 2017 Journal Article

An Easy-to-hard Learning Paradigm for Multiple Classes and Multiple Labels

  • Weiwei Liu
  • Ivor W. Tsang
  • Klaus-Robert Müller

Many applications, such as human action recognition and object detection, can be formulated as a multiclass classification problem. One-vs-rest (OVR) is one of the most widely used approaches for multiclass classification due to its simplicity and excellent performance. However, many confusing classes in such applications will degrade its results. For example, hand clap and boxing are two confusing actions. Hand clap is easily misclassified as boxing, and vice versa. Therefore, precisely classifying confusing classes remains a challenging task. To obtain better performance for multiclass classifications that have confusing classes, we first develop a classifier chain model for multiclass classification (CCMC) to transfer class information between classifiers. Then, based on an analysis of our proposed model, we propose an easy- to-hard learning paradigm for multiclass classification to automatically identify easy and hard classes and then use the predictions from simpler classes to help solve harder classes. Similar to CCMC, the classifier chain (CC) model is also proposed by Read et al. (2009) to capture the label dependency for multi-label classification. However, CC does not consider the order of difficulty of the labels and achieves degenerated performance when there are many confusing labels. Therefore, it is non- trivial to learn the appropriate label order for CC. Motivated by our analysis for CCMC, we also propose the easy-to-hard learning paradigm for multi-label classification to automatically identify easy and hard labels, and then use the predictions from simpler labels to help solve harder labels. We also demonstrate that our proposed strategy can be successfully applied to a wide range of applications, such as ordinal classification and relationship prediction. Extensive empirical studies validate our analysis and the effectiveness of our proposed easy-to-hard learning strategies. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

AAAI Conference 2017 Conference Paper

Compressed K-Means for Large-Scale Clustering

  • Xiaobo Shen
  • Weiwei Liu
  • Ivor Tsang
  • Fumin Shen
  • Quan-Sen Sun

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-theart large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

JMLR Journal 2017 Journal Article

Making Decision Trees Feasible in Ultrahigh Feature and Label Dimensions

  • Weiwei Liu
  • Ivor W. Tsang

Due to the non-linear but highly interpretable representations, decision tree (DT) models have significantly attracted a lot of attention of researchers. However, it is difficult to understand and interpret DT models in ultrahigh dimensions and DT models usually suffer from the curse of dimensionality and achieve degenerated performance when there are many noisy features. To address these issues, this paper first presents a novel data- dependent generalization error bound for the perceptron decision tree (PDT), which provides the theoretical justification to learn a sparse linear hyperplane in each decision node and to prune the tree. Following our analysis, we introduce the notion of budget-aware classifier (BAC) with a budget constraint on the weight coefficients, and propose a supervised budgeted tree (SBT) algorithm to achieve non-linear prediction performance. To avoid generating an unstable and complicated decision tree and improve the generalization of the SBT, we present a pruning strategy by learning classifiers to minimize cross-validation errors on each BAC. To deal with ultrahigh label dimensions, based on three important phenomena of real-world data sets from a variety of application domains, we develop a sparse coding tree framework for multi-label annotation problems and provide the theoretical analysis. Extensive empirical studies verify that 1) SBT is easy to understand and interpret in ultrahigh dimensions and is more resilient to noisy features. 2) Compared with state-of-the-art algorithms, our proposed sparse coding tree framework is more efficient, yet accurate in ultrahigh label and feature dimensions. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

NeurIPS Conference 2017 Conference Paper

Sparse Embedded $k$-Means Clustering

  • Weiwei Liu
  • Xiaobo Shen
  • Ivor Tsang

The $k$-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster $k$-means clustering. Their method delivers many improvements over other dimensionality reduction methods. For example, compared to the advanced singular value decomposition based feature extraction approach, [1] reduce the running time by a factor of $\min \{n, d\}\epsilon^2 log(d)/k$ for data matrix $X \in \mathbb{R}^{n\times d} $ with $n$ data points and $d$ features, while losing only a factor of one in approximation accuracy. Unfortunately, they still require $\mathcal{O}(\frac{ndk}{\epsilon^2log(d)})$ for matrix multiplication and this cost will be prohibitive for large values of $n$ and $d$. To break this bottleneck, we carefully build a sparse embedded $k$-means clustering algorithm which requires $\mathcal{O}(nnz(X))$ ($nnz(X)$ denotes the number of non-zeros in $X$) for fast matrix multiplication. Moreover, our proposed algorithm improves on [1]'s results for approximation accuracy by a factor of one. Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate $k$-means clustering, while achieving satisfactory clustering performance.

AAAI Conference 2016 Conference Paper

Sparse Perceptron Decision Tree for Millions of Dimensions

  • Weiwei Liu
  • Ivor Tsang

Due to the nonlinear but highly interpretable representations, decision tree (DT) models have significantly attracted a lot of attention of researchers. However, DT models usually suffer from the curse of dimensionality and achieve degenerated performance when there are many noisy features. To address these issues, this paper first presents a novel data-dependent generalization error bound for the perceptron decision tree (PDT), which provides the theoretical justification to learn a sparse linear hyperplane in each decision node and to prune the tree. Following our analysis, we introduce the notion of sparse perceptron decision node (SPDN) with a budget constraint on the weight coefficients, and propose a sparse perceptron decision tree (SPDT) algorithm to achieve nonlinear prediction performance. To avoid generating an unstable and complicated decision tree and improve the generalization of the SPDT, we present a pruning strategy by learning classi- fiers to minimize cross-validation errors on each SPDN. Extensive empirical studies verify that our SPDT is more resilient to noisy features and effectively generates a small, yet accurate decision tree. Compared with state-of-the-art DT methods and SVM, our SPDT achieves better generalization performance on ultrahigh dimensional problems with more than 1 million features.

AAAI Conference 2015 Conference Paper

Effectively Predicting Whether and When a Topic Will Become Prevalent in a Social Network

  • Weiwei Liu
  • Zhi-Hong Deng
  • Xiuwen Gong
  • Frank Jiang
  • Ivor Tsang

Effective forecasting of future prevalent topics plays an important role in social network business development. It involves two challenging aspects: predicting whether a topic will become prevalent, and when. This cannot be directly handled by the existing algorithms in topic modeling, item recommendation and action forecasting. The classic forecasting framework based on time series models may be able to predict a hot topic when a series of periodical changes to user-addressed frequency in a systematic way. However, the frequency of topics discussed by users often changes irregularly in social networks. In this paper, a generic probabilistic framework is proposed for hot topic prediction, and machine learning methods are explored to predict hot topic patterns. Two effective models, PreWHether and PreWHen, are introduced to predict whether and when a topic will become prevalent. In the PreWHether model, we simulate the constructed features of previously observed frequency changes for better prediction. In the PreWHen model, distributions of time intervals associated with the emergence to prevalence of a topic are modeled. Extensive experiments on real datasets demonstrate that our method outperforms the baselines and generates more effective predictions.

AAAI Conference 2015 Conference Paper

Large Margin Metric Learning for Multi-Label Prediction

  • Weiwei Liu
  • Ivor Tsang

Canonical correlation analysis (CCA) and maximum margin output coding (MMOC) methods have shown promising results for multi-label prediction, where each instance is associated with multiple labels. However, these methods require an expensive decoding procedure to recover the multiple labels of each testing instance. The testing complexity becomes unacceptable when there are many labels. To avoid decoding completely, we present a novel large margin metric learning paradigm for multi-label prediction. In particular, the proposed method learns a distance metric to discover label dependency such that instances with very different multiple labels will be moved far away. To handle many labels, we present an accelerated proximal gradient procedure to speed up the learning process. Comprehensive experiments demonstrate that our proposed method is significantly faster than CCA and MMOC in terms of both training and testing complexities. Moreover, our method achieves superior prediction performance compared with state-of-the-art methods.

NeurIPS Conference 2015 Conference Paper

On the Optimality of Classifier Chain for Multi-label Classification

  • Weiwei Liu
  • Ivor Tsang

To capture the interdependencies between labels in multi-label classification problems, classifier chain (CC) tries to take the multiple labels of each instance into account under a deterministic high-order Markov Chain model. Since its performance is sensitive to the choice of label order, the key issue is how to determine the optimal label order for CC. In this work, we first generalize the CC model over a random label order. Then, we present a theoretical analysis of the generalization error for the proposed generalized model. Based on our results, we propose a dynamic programming based classifier chain (CC-DP) algorithm to search the globally optimal label order for CC and a greedy classifier chain (CC-Greedy) algorithm to find a locally optimal CC. Comprehensive experiments on a number of real-world multi-label data sets from various domains demonstrate that our proposed CC-DP algorithm outperforms state-of-the-art approaches and the CC-Greedy algorithm achieves comparable prediction performance with CC-DP.