Arrow Research search

Author name cluster

Feiping Nie

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

105 papers
1 author row

Possible papers

105

AAAI Conference 2026 Conference Paper

Adaptive and Asymptotic Mean-based Subclass Discriminant Analysis

  • Yuzhe Feng
  • Yunlong Gao
  • Feiping Nie

Traditional Discriminant analysis (DA) is one of the classical supervised learning algorithms to reduce the dimensionality of data with Gaussian assumption. Since the unique class mean in traditional DA is intractable to estimate the non-Gaussian distrbution of data, some existing DA algorithms based on the clustering criterion focus on learning multiple means in each class so as to address the non-Gaussian issue. The clustering-based DA inevitably involved the constraint optimization problem to learn multiple means, which may lead to the locally optimal solution. To address these issues, inspired by the smooth approximation theory and the concept of Kolmogorov mean, this paper explores an unconstraint function with asymptotic property as an alternative proxy to clustering-based DA algorithms. Thus the derived DA algorithm, i.e., adaptive and asymptotic mean-based subclass discriminant analysis (AASDA), which not only leverages multiple means to represent different subclasses in same class but also adaptively and asymptotically learns the similar mean for each sample in the learned optimal subspace via the gradient-based optimizer. The asymptotic analysis of unconstraint function, the gradient analysis and convergence guarantee of proposed criterion verify the effectiveness of AASDA algorithm. Its merits are thoroughly assessed on a suite of synthetic and real world data experiments.

AAAI Conference 2026 Conference Paper

Reliable-View 2D-3D Key-Part Aligned Transformer with Reinforced Masking for 3D Point Cloud Understanding

  • Xianglong Jin
  • Zheng Wang
  • Rong Wang
  • Feiping Nie

Self-supervised 3D point cloud understanding is crucial for scene understanding, where Masked Autoencoders (MAE) have achieved excellent performance in point cloud representation learning. However, existing MAE-style methods fail to consider spatial-semantic variations in masking strategies, and joint learning with multi-view images often overlooks view redundancy. To address these challenges, we propose an MAE framework enhanced with reliable multi-view 2D-3D Key-part alignment and Reinforced masking, named as KR-MAE. Our approach comprises three key innovations: Reinforced Masking (RM) strategically samples visible tokens based on semantic saliency to enhance reconstruction fidelity; Reliable Multi-View Selector (RVS) dynamically refines the most informative image subset by filtering occluded or low-texture views, mitigating detrimental redundancy; Reliable-view 2D-3D Key-part Aligned Transformer (KAT) establishes semantic-aligned correspondence between salient 3D point cloud parts and reliable multi-view 2D image patches, leveraging rich texture cues from 2D images to compensate for sparse geometry in point cloud. Extensive experiments on 3D classification and segmentation benchmarks demonstrate that KR-MAE achieves state-of-the-art performance, surpassing prior multi-modal methods.

AAAI Conference 2026 Conference Paper

S2-Boost: Synergistic Semantic Boosting for Coarse-to-Fine Ensemble Learning

  • Guanxiong He
  • Zheng Wang
  • Jie Wang
  • Liaoyuan Tang
  • Rong Wang
  • Feiping Nie

Neuroscientific evidence reveals that human visual recognition is not an instantaneous event but a hierarchical process, where the brain constructs a holistic perception by progressively integrating simple features like edges or texture into complex scenes. Ensemble learning successfully utilizes this principle, yet existing methods typically integrate models at the decision level, neglecting the rich, complementary information within the feature space itself and thus fundamentally limiting their potential. To address this, we introduce Synergistic Semantic Boosting (S2-Boosting), a framework that employs a self-supervised hierarchical semantic learning module to decompose an image into complementary, semantically meaningful parts autonomously. These parts guide a boosting procedure where a sequence of specialized learners, each focusing on a specific semantic partition, collaboratively corrects the ensemble's errors. We further present encouraging results on real-world image datasets, highlighting the intrinsic interpretability, paving the way for more robust and transparent models.

AAAI Conference 2026 Conference Paper

Towards Federated Clustering: A Client-wise Private Graph Aggregation Framework

  • Guanxiong He
  • Zheng Wang
  • Jie Wang
  • Liaoyuan Tang
  • Rong Wang
  • Feiping Nie

Federated clustering addresses the critical challenge of extracting patterns from decentralized, unlabeled data. However, it is hampered by the flaw that current approaches are forced to accept a compromise between performance and privacy: transmitting embedding representations risks sensitive data leakage, while sharing only abstract cluster prototypes leads to diminished model accuracy. To resolve this dilemma, we propose Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC), a novel algorithm that innovatively leverages local structural graphs as the primary medium for privacy-preserving knowledge sharing, thus moving beyond the limitations of conventional techniques. Our framework operates on a clear client-server logic; on the client-side, each participant constructs a private structural graph that captures intrinsic data relationships, which the server then securely aggregates and aligns to form a comprehensive global graph from which a unified clustering structure is derived. The framework offers two distinct modes to suit different needs. SPP-FGC is designed as an efficient one-shot method that completes its task in a single communication round, ideal for rapid analysis. For more complex, unstructured data like images, SPP-FGC+ employs an iterative process where clients and the server collaboratively refine feature representations to achieve superior downstream performance. Extensive experiments demonstrate that our framework achieves state-of-the-art performance, improving clustering accuracy by up to 10% (NMI) over federated baselines while maintaining provable privacy guarantees.

IJCAI Conference 2025 Conference Paper

Capturing Individuality and Commonality Between Anchor Graphs for Multi-View Clustering

  • Zhoumin Lu
  • Yongbo Yu
  • Linru Ma
  • Feiping Nie
  • Rong Wang

The use of anchors often leads to better efficiency and scalability, making them highly favored. However, there is a challenge in anchor-based multi-view subspace learning. A unified anchor graph overly emphasize the commonality between views, failing to adequately capture the view-specific individuality. This has led some models to independently explore the individuality of each view before aligning and integrating them, often achieving better performance but making the process more cumbersome. Therefore, this paper proposes a new model, simultaneously capturing the individuality and commonality between anchor graphs for multi-view clustering. The model has three notable advantages: First, it allows view-specific anchor graphs to align in real-time with a common anchor graph as a reference, eliminating the need for post-alignment. Second, it enforces a cluster-wise structure among anchors and balances sample distribution among them, providing strong discriminative power. Lastly, it maintains linear complexity with respect to the numbers of samples and anchors, avoiding the significant time costs associated with their increase. Comprehensive experiments demonstrate the effectiveness and efficiency of our method compared to various state-of-the-art algorithms.

AAAI Conference 2025 Conference Paper

Language Pre-training Guided Masking Representation Learning for Time Series Classification

  • Liaoyuan Tang
  • Zheng Wang
  • Jie Wang
  • Guanxiong He
  • Zhezheng Hao
  • Rong Wang
  • Feiping Nie

The representation learning of time series has a wide range of downstream tasks and applications in many practical scenarios. However, due to the complexity, spatiotemporality, and continuity of sequential stream data, compared with the representation learning of structural data such as images/videos, the time series self-supervised representation learning is even more challenging. Besides, the direct application of existing contrastive learning and masked autoencoder based approaches to time series representation learning encounters inherent theoretical limitations, such as ineffective augmentation and masking strategies. To this end, we propose a Language Pre-training guided Masking Representation Learning (LPMRL) for times series classification. Specifically, we first propose a novel language pre-training guided masking encoder for adaptively sampling semantic spatiotemporal patches via natural language descriptions and improving the discriminability of latent representations. Furthermore, we present the dual-information contrastive learning mechanism to explore both local and global information by meticulously designing high-quality hard negative samples of time series data samples. As a result, we also design various experiments, such as visualization of masking position and distribution and reconstruction error to verify the reasonability of proposed language guided masking technique. Last, we evaluate the performance of proposed representation learning via classification task conducted on 106 time series datasets, which demonstrates the effectiveness of proposed method.

TMLR Journal 2024 Journal Article

Hyperspherical Prototype Node Clustering

  • Jitao Lu
  • Danyang Wu
  • Feiping Nie
  • Rong Wang
  • Xuelong Li

The general workflow of deep node clustering is to encode the nodes into node embeddings via graph neural networks and uncover clustering decisions from them, so clustering performance is heavily affected by the embeddings. However, existing works only consider preserving the semantics of the graph but ignore the inter-cluster separability of the nodes, so there's no guarantee that the embeddings can present a clear clustering structure. To remedy this deficiency, we propose Hyperspherical Prototype Node Clustering (HPNC), an end-to-end clustering paradigm that explicitly enhances the inter-cluster separability of learned node embeddings. Concretely, we constrain the embedding space to a unit-hypersphere, enabling us to scatter the cluster prototypes over the space with maximized pairwise distances. Then, we employ a graph autoencoder to map nodes onto the same hypersphere manifold. Consequently, cluster affinities can be directly retrieved from cosine similarities between node embeddings and prototypes. A clustering-oriented loss is imposed to sharpen the affinity distribution so that the learned node embeddings are encouraged to have small intra-cluster distances and large inter-cluster distances. Based on the proposed HPNC paradigm, we devise two schemes (HPNC-IM and HPNC-DEC) with distinct clustering backbones. Empirical results on popular benchmark datasets demonstrate the superiority of our method compared to other state-of-the-art clustering methods, and visualization results illustrate improved separability of the learned embeddings.

AAAI Conference 2024 Conference Paper

Multi-Class Support Vector Machine with Maximizing Minimum Margin

  • Feiping Nie
  • Zhezheng Hao
  • Rong Wang

Support Vector Machine (SVM) stands out as a prominent machine learning technique widely applied in practical pattern recognition tasks. It achieves binary classification by maximizing the "margin", which represents the minimum distance between instances and the decision boundary. Although many efforts have been dedicated to expanding SVM for multi-class case through strategies such as one versus one and one versus the rest, satisfactory solutions remain to be developed. In this paper, we propose a novel method for multi-class SVM that incorporates pairwise class loss considerations and maximizes the minimum margin. Adhering to this concept, we embrace a new formulation that imparts heightened flexibility to multi-class SVM. Furthermore, the correlations between the proposed method and multiple forms of multi-class SVM are analyzed. The proposed regularizer, akin to the concept of "margin", can serve as a seamless enhancement over the softmax in deep learning, providing guidance for network parameter learning. Empirical evaluations demonstrate the effectiveness and superiority of our proposed method over existing multi-classification methods. Complete version is available at https://arxiv.org/pdf/2312.06578.pdf. Code is available at https://github.com/zz-haooo/M3SVM.

IJCAI Conference 2024 Conference Paper

Perturbation Guiding Contrastive Representation Learning for Time Series Anomaly Detection

  • Liaoyuan Tang
  • Zheng Wang
  • Guanxiong He
  • Rong Wang
  • Feiping Nie

Time series anomaly detection is a critical task with applications in various domains. Due to annotation challenges, self-supervised methods have become the mainstream approach for time series anomaly detection in recent years. However, current contrastive methods categorize data perturbations into binary classes, normal or anomaly, which lack clarity on the specific impact of different perturbation methods. Inspired by the hypothesis that "the higher the probability of misclassifying perturbation types, the higher the probability of anomalies", we propose PCRTA, our approach firstly devises a perturbation classifier to learn the pseudo-labels of data perturbations. Furthermore, for addressing "class collapse issue" in contrastive learning, we propose a perturbation guiding positive and negative samples selection strategy by introducing learnable perturbation classification networks. Extensive experiments on six realworld datasets demonstrate the significant superiority of our model over thirteen state-of-the-art competitors, and obtains average 5. 14%, 8. 24% improvement in F1 score and AUC-PR, respectively.

AAAI Conference 2023 Conference Paper

Efficient Top-K Feature Selection Using Coordinate Descent Method

  • Lei Xu
  • Rong Wang
  • Feiping Nie
  • Xuelong Li

Sparse learning based feature selection has been widely investigated in recent years. In this study, we focus on the l2,0-norm based feature selection, which is effective for exact top-k feature selection but challenging to optimize. To solve the general l2,0-norm constrained problems, we novelly develop a parameter-free optimization framework based on the coordinate descend (CD) method, termed CD-LSR. Specifically, we devise a skillful conversion from the original problem to solving one continuous matrix and one discrete selection matrix. Then the nontrivial l2,0-norm constraint can be solved efficiently by solving the selection matrix with CD method. We impose the l2,0-norm on a vanilla least square regression (LSR) model for feature selection and optimize it with CD-LSR. Extensive experiments exhibit the efficiency of CD-LSR, as well as the discrimination ability of l2,0-norm to identify informative features. More importantly, the versatility of CD-LSR facilitates the applications of the l2,0-norm in more sophisticated models. Based on the competitive performance of l2,0-norm on the baseline LSR model, the satisfactory performance of its applications is reasonably expected. The source MATLAB code are available at: https://github.com/solerxl/Code_For_AAAI_2023.

NeurIPS Conference 2023 Conference Paper

Joint Feature and Differentiable $ k $-NN Graph Learning using Dirichlet Energy

  • Lei Xu
  • Lei Chen
  • Rong Wang
  • Feiping Nie
  • Xuelong Li

Feature selection (FS) plays an important role in machine learning, which extracts important features and accelerates the learning process. In this paper, we propose a deep FS method that simultaneously conducts feature selection and differentiable $ k $-NN graph learning based on the Dirichlet Energy. The Dirichlet Energy identifies important features by measuring their smoothness on the graph structure, and facilitates the learning of a new graph that reflects the inherent structure in new feature subspace. We employ Optimal Transport theory to address the non-differentiability issue of learning $ k $-NN graphs in neural networks, which theoretically makes our method applicable to other graph neural networks for dynamic graph learning. Furthermore, the proposed framework is interpretable, since all modules are designed algorithmically. We validate the effectiveness of our model with extensive experiments on both synthetic and real-world datasets.

IJCAI Conference 2022 Conference Paper

EMGC²F: Efficient Multi-view Graph Clustering with Comprehensive Fusion

  • Danyang Wu
  • Jitao Lu
  • Feiping Nie
  • Rong Wang
  • Yuan Yuan

This paper proposes an Efficient Multi-view Graph Clustering with Comprehensive Fusion (EMGC²F) model and a corresponding efficient optimization algorithm to address multi-view graph clustering tasks effectively and efficiently. Compared to existing works, our proposals have the following highlights: 1) EMGC²F directly finds a consistent cluster indicator matrix with a Super Nodes Similarity Minimization module from multiple views, which avoids time-consuming spectral decomposition in previous works. 2) EMGC²F comprehensively mines information from multiple views. More formally, it captures the consistency of multiple views via a Cross-view Nearest Neighbors Voting (CN²V) mechanism, meanwhile capturing the importance of multiple views via an adaptive weighted-learning mechanism. 3) EMGC²F is a parameter-free model and the time complexity of the proposed algorithm is far less than existing works, demonstrating the practicability. Empirical results on several benchmark datasets demonstrate that our proposals outperform SOTA competitors both in effectiveness and efficiency.

IJCAI Conference 2021 Conference Paper

Discrete Multiple Kernel k-means

  • Rong Wang
  • Jitao Lu
  • Yihang Lu
  • Feiping Nie
  • Xuelong Li

The multiple kernel k-means (MKKM) and its variants utilize complementary information from different kernels, achieving better performance than kernel k-means (KKM). However, the optimization procedures of previous works all comprise two stages, learning the continuous relaxed label matrix and obtaining the discrete one by extra discretization procedures. Such a two-stage strategy gives rise to a mismatched problem and severe information loss. To address this problem, we elaborate a novel Discrete Multiple Kernel k-means (DMKKM) model solved by an optimization algorithm that directly obtains the cluster indicator matrix without subsequent discretization procedures. Moreover, DMKKM can strictly measure the correlations among kernels, which is capable of enhancing kernel fusion by reducing redundancy and improving diversity. What’s more, DMKKM is parameter-free avoiding intractable hyperparameter tuning, which makes it feasible in practical applications. Extensive experiments illustrated the effectiveness and superiority of the proposed model.

AAAI Conference 2021 Conference Paper

Fast Multi-view Discrete Clustering with Anchor Graphs

  • Qianyao Qiang
  • Bin Zhang
  • Fei Wang
  • Feiping Nie

Generally, the existing graph-based multi-view clustering models consists of two steps: (1) graph construction; (2) eigendecomposition on the graph Laplacian matrix to compute a continuous cluster assignment matrix, followed by a postprocessing algorithm to get the discrete one. However, both the graph construction and eigen-decomposition are timeconsuming, and the two-stage process may deviate from directly solving the primal problem. To this end, we propose Fast Multi-view Discrete Clustering (FMDC) with anchor graphs, focusing on directly solving the spectral clustering problem with a small time cost. We efficiently generate representative anchors and construct anchor graphs on different views. The discrete cluster assignment matrix is directly obtained by performing clustering on the automatically aggregated graph. FMDC has a linear computational complexity with respect to the data scale, which is a significant improvement compared to the quadratic one. Extensive experiments on benchmark datasets demonstrate its efficiency and effectiveness.

IJCAI Conference 2021 Conference Paper

GSPL: A Succinct Kernel Model for Group-Sparse Projections Learning of Multiview Data

  • Danyang Wu
  • Jin Xu
  • Xia Dong
  • Meng Liao
  • Rong Wang
  • Feiping Nie
  • Xuelong Li

This paper explores a succinct kernel model for Group-Sparse Projections Learning (GSPL), to handle multiview feature selection task completely. Compared to previous works, our model has the following useful properties: 1) Strictness: GSPL innovatively learns group-sparse projections strictly on multiview data via ‘2; 0-norm constraint, which is different with previous works that encourage group-sparse projections softly. 2) Adaptivity: In GSPL model, when the total number of selected features is given, the numbers of selected features of different views can be determined adaptively, which avoids artificial settings. Besides, GSPL can capture the differences among multiple views adaptively, which handles the inconsistent problem among different views. 3) Succinctness: Except for the intrinsic parameters of projection-based feature selection task, GSPL does not bring extra parameters, which guarantees the applicability in practice. To solve the optimization problem involved in GSPL, a novel iterative algorithm is proposed with rigorously theoretical guarantees. Experimental results demonstrate the superb performance of GSPL on synthetic and real datasets.

AAAI Conference 2021 Conference Paper

Integrating Static and Dynamic Data for Improved Prediction of Cognitive Declines Using Augmented Genotype-Phenotype Representations

  • Hoon Seo
  • Lodewijk Brand
  • Hua Wang
  • Feiping Nie

Alzheimer’s Disease (AD) is a chronic neurodegenerative disease that causes severe problems in patients’ thinking, memory, and behavior. An early diagnosis is crucial to prevent AD progression; to this end, many algorithmic approaches have recently been proposed to predict cognitive decline. However, these predictive models often fail to integrate heterogeneous genetic and neuroimaging biomarkers and struggle to handle missing data. In this work we propose a novel objective function and an associated optimization algorithm to identify cognitive decline related to AD. Our approach is designed to incorporate dynamic neuroimaging data by way of a participant-specific augmentation combined with multimodal data integration aligned via a regression task. Our approach, in order to incorporate additional side-information, utilizes structured regularization techniques popularized in recent AD literature. Armed with the fixed-length vector representation learned from the multimodal dynamic and static modalities, conventional machine learning methods can be used to predict the clinical outcomes associated with AD. Our experimental results show that the proposed augmentation model improves the prediction performance on cognitive assessment scores for a collection of popular machine learning algorithms. The results of our approach are interpreted to validate existing genetic and neuroimaging biomarkers that have been shown to be predictive of cognitive decline.

IJCAI Conference 2020 Conference Paper

Discriminative Feature Selection via A Structured Sparse Subspace Learning Module

  • Zheng Wang
  • Feiping Nie
  • Lai Tian
  • Rong Wang
  • Xuelong Li

In this paper, we first propose a novel Structured Sparse Subspace Learning S^3L module to address the long-standing subspace sparsity issue. Elicited by proposed module, we design a new discriminative feature selection method, named Subspace Sparsity Discriminant Feature Selection S^2DFS which enables the following new functionalities: 1) Proposed S^2DFS method directly joints trace ratio objective and structured sparse subspace constraint via L2, 0-norm to learn a row-sparsity subspace, which improves the discriminability of model and overcomes the parameter-tuning trouble with comparison to the methods used L2, 1-norm regularization; 2) An alternative iterative optimization algorithm based on the proposed S^3L module is presented to explicitly solve the proposed problem with a closed-form solution and strict convergence proof. To our best knowledge, such objective function and solver are first proposed in this paper, which provides a new though for the development of feature selection methods. Extensive experiments conducted on several high-dimensional datasets demonstrate the discriminability of selected features via S^2DFS with comparison to several related SOTA feature selection methods. Source matlab code: https: //github. com/StevenWangNPU/L20-FS.

NeurIPS Conference 2020 Conference Paper

Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut

  • Shenfei Pei
  • Feiping Nie
  • Rong Wang
  • Xuelong Li

Spectral clustering and $k$-means, both as two major traditional clustering methods, are still attracting a lot of attention, although a variety of novel clustering algorithms have been proposed in recent years. Firstly, a unified framework of $k$-means and ratio-cut is revisited, and a novel and efficient clustering algorithm is then proposed based on this framework. The time and space complexity of our method are both linear with respect to the number of samples, and are independent of the number of clusters to construct, more importantly. These properties mean that it is easily scalable and applicable to large practical problems. Extensive experiments on 12 real-world benchmark and 8 facial datasets validate the advantages of the proposed algorithms compared to the state-of-the-art clustering algorithms. In particular, over 15x and 7x speed-up can be obtained with respect to $k$-means on the synthetic dataset of 1 million samples and the benchmark dataset (CelebA) of 200k samples, respectively [GitHub].

NeurIPS Conference 2020 Conference Paper

Learning Feature Sparse Principal Subspace

  • Lai Tian
  • Feiping Nie
  • Rong Wang
  • Xuelong Li

This paper presents new algorithms to solve the feature-sparsity constrained PCA problem (FSPCA), which performs feature selection and PCA simultaneously. Existing optimization methods for FSPCA require data distribution assumptions and are lack of global convergence guarantee. Though the general FSPCA problem is NP-hard, we show that, for a low-rank covariance, FSPCA can be solved globally (Algorithm 1). Then, we propose another strategy (Algorithm 2) to solve FSPCA for the general covariance by iteratively building a carefully designed proxy. We prove (data-dependent) approximation bound and convergence guarantees for the new algorithms. For the spectrum of covariance with exponential/Zipf's distribution, we provide exponential/posynomial approximation bound. Experimental results show the promising performance and efficiency of the new algorithms compared with the state-of-the-arts on both synthetic and real-world datasets.

TIST Journal 2020 Journal Article

Self-weighted Robust LDA for Multiclass Classification with Edge Classes

  • Caixia Yan
  • Xiaojun Chang
  • Minnan Luo
  • Qinghua Zheng
  • Xiaoqin Zhang
  • Zhihui Li
  • Feiping Nie

Linear discriminant analysis (LDA) is a popular technique to learn the most discriminative features for multi-class classification. A vast majority of existing LDA algorithms are prone to be dominated by the class with very large deviation from the others, i.e., edge class, which occurs frequently in multi-class classification. First, the existence of edge classes often makes the total mean biased in the calculation of between-class scatter matrix. Second, the exploitation of ℓ 2 -norm based between-class distance criterion magnifies the extremely large distance corresponding to edge class. In this regard, a novel self-weighted robust LDA with ℓ 2,1 -norm based pairwise between-class distance criterion, called SWRLDA, is proposed for multi-class classification especially with edge classes. SWRLDA can automatically avoid the optimal mean calculation and simultaneously learn adaptive weights for each class pair without setting any additional parameter. An efficient re-weighted algorithm is exploited to derive the global optimum of the challenging ℓ 2,1 -norm maximization problem. The proposed SWRLDA is easy to implement and converges fast in practice. Extensive experiments demonstrate that SWRLDA performs favorably against other compared methods on both synthetic and real-world datasets while presenting superior computational efficiency in comparison with other techniques.

IJCAI Conference 2020 Conference Paper

Semi-supervised Clustering via Pairwise Constrained Optimal Graph

  • Feiping Nie
  • Han Zhang
  • Rong Wang
  • Xuelong Li

In this paper, we present a technique of definitely addressing the pairwise constraints in the semi-supervised clustering. Our method contributes to formulating the cannot-link relations and propagating them over the affinity graph flexibly. The pairwise constrained instances are provably guaranteed to be in the same or different connected components of the graph. Combined with the Laplacian rank constraint, the proposed model learns a Pairwise Constrained structured Optimal Graph (PCOG), from which the specified c clusters supporting the known pairwise constraints are directly obtained. An efficient algorithm invoked by the label propagation is designed to solve the formulation. Additionally, we also provide a compact criterion to acquire the key pairwise constraints for prompting the semi-supervised graph clustering. Substantial experimental results show that the proposed method achieves the significant improvements by using a few prior pairwise constraints.

AAAI Conference 2019 Conference Paper

A Probabilistic Derivation of LASSO and L12-Norm Feature Selections

  • Di Ming
  • Chris Ding
  • Feiping Nie

LASSO and `2, 1-norm based feature selection had achieved success in many application areas. In this paper, we first derive LASSO and `1, 2-norm feature selection from a probabilistic framework, which provides an independent point of view from the usual sparse coding point of view. From here, we further propose a feature selection approach based on the probability-derived `1, 2-norm. We point out some inflexibility in the standard feature selection that the feature selected for all different classes are enforced to be exactly the same using the widely used `2, 1-norm, which enforces the joint sparsity across all the data instances. Using the probabilityderived `1, 2-norm feature selection, allowing certain flexibility that the selected features do not have to be exactly same for all classes, the resulting features lead to better classification on six benchmark datasets.

IJCAI Conference 2019 Conference Paper

Learning Robust Distance Metric with Side Information via Ratio Minimization of Orthogonally Constrained L21-Norm Distances

  • Kai Liu
  • Lodewijk Brand
  • Hua Wang
  • Feiping Nie

Metric Learning, which aims at learning a distance metric for a given data set, plays an important role in measuring the distance or similarity between data objects. Due to its broad usefulness, it has attracted a lot of interest in machine learning and related areas in the past few decades. This paper proposes to learn the distance metric from the side information in the forms of must-links and cannot-links. Given the pairwise constraints, our goal is to learn a Mahalanobis distance that minimizes the ratio of the distances of the data pairs in the must-links to those in the cannot-links. Different from many existing papers that use the traditional squared L2-norm distance, we develop a robust model that is less sensitive to data noise or outliers by using the not-squared L2-norm distance. In our objective, the orthonormal constraint is enforced to avoid degenerate solutions. To solve our objective, we have derived an efficient iterative solution algorithm. We have conducted extensive experiments, which demonstrated the superiority of our method over state-of-the-art.

IJCAI Conference 2019 Conference Paper

Learning Strictly Orthogonal p-Order Nonnegative Laplacian Embedding via Smoothed Iterative Reweighted Method

  • Haoxuan Yang
  • Kai Liu
  • Hua Wang
  • Feiping Nie

Laplacian Embedding (LE) is a powerful method to reveal the intrinsic geometry of high-dimensional data by using graphs. Imposing the orthogonal and nonnegative constraints onto the LE objective has proved to be effective to avoid degenerate and negative solutions, which, though, are challenging to achieve simultaneously because they are nonlinear and nonconvex. In addition, recent studies have shown that using the p-th order of the L2-norm distances in LE can find the best solution for clustering and promote the robustness of the embedding model against outliers, although this makes the optimization objective nonsmooth and difficult to efficiently solve in general. In this work, we study LE that uses the p-th order of the L2-norm distances and satisfies both orthogonal and nonnegative constraints. We introduce a novel smoothed iterative reweighted method to tackle this challenging optimization problem and rigorously analyze its convergence. We demonstrate the effectiveness and potential of our proposed method by extensive empirical studies on both synthetic and real data sets.

AAAI Conference 2019 Short Paper

Semi-Supervised Feature Selection with Adaptive Discriminant Analysis

  • Weichan Zhong
  • Xiaojun Chen
  • Guowen Yuan
  • Yiqin Li
  • Feiping Nie

In this paper, we propose a novel Adaptive Discriminant Analysis for semi-supervised feature selection, namely SADA. Instead of computing fixed similarities before performing feature selection, SADA simultaneously learns an adaptive similarity matrix S and a projection matrix W with an iterative method. In each iteration, S is computed from the projected distance with the learned W and W is computed with the learned S. Therefore, SADA can learn better projection matrix W by weakening the effect of noise features with the adaptive similarity matrix. Experimental results on 4 data sets show the superiority of SADA compared to 5 semisupervised feature selection methods.

IJCAI Conference 2019 Conference Paper

Worst-Case Discriminative Feature Selection

  • Shuangli Liao
  • Quanxue Gao
  • Feiping Nie
  • Yang Liu
  • Xiangdong Zhang

Feature selection plays a critical role in data mining, driven by increasing feature dimensionality in target problems. In this paper, we propose a new criterion for discriminative feature selection, worst-case discriminative feature selection (WDFS). Unlike Fisher Score and other methods based on the discriminative criteria considering the overall (or average) separation of data, WDFS adopts a new perspective called worst-case view which arguably is more suitable for classification applications. Specifically, WDFS directly maximizes the ratio of the minimum of between-class variance of all class pairs over the maximum of within-class variance, and thus it duly considers the separation of all classes. Otherwise, we take a greedy strategy by finding one feature at a time, but it is very easy to implement. Moreover, we also utilize the correlation between features to help reduce the redundancy and extend WDFS to uncorrelated WDFS (UWDFS). To evaluate the effectiveness of the proposed algorithm, we conduct classification experiments on many real data sets. In the experiment, we respectively use the original features and the score vectors of features over all class pairs to calculate the correlation coefficients, and analyze the experimental results in these two ways. Experimental results demonstrate the effectiveness of WDFS and UWDFS.

AAAI Conference 2018 Conference Paper

Balanced Clustering via Exclusive Lasso: A Pragmatic Approach

  • Zhihui Li
  • Feiping Nie
  • Xiaojun Chang
  • Zhigang Ma
  • Yi Yang

Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their simplicity and efficacy. The classical k-means algorithm partitions a number of data points into several subsets by iteratively updating the clustering centers and the associated data points. By contrast, a weighted undirected graph is constructed in min-cut algorithms which partition the vertices of the graph into two sets. However, existing clustering algorithms tend to cluster minority of data points into a subset, which shall be avoided when the target dataset is balanced. To achieve more accurate clustering for balanced dataset, we propose to leverage exclusive lasso on k-means and min-cut to regulate the balance degree of the clustering results. By optimizing our objective functions that build atop the exclusive lasso, we can make the clustering result as much balanced as possible. Extensive experiments on several large-scale datasets validate the advantage of the proposed algorithms compared to the state-of-the-art clustering algorithms.

AAAI Conference 2018 Short Paper

Discriminative Semi-Supervised Feature Selection via Rescaled Least Squares Regression-Supplement

  • Guowen Yuan
  • Xiaojun Chen
  • Chen Wang
  • Feiping Nie
  • Liping Jing

In this paper, we propose a Discriminative Semi-Supervised Feature Selection (DSSFS) method. In this method, a dragging technique is introduced to the Rescaled Linear Square Regression in order to enlarge the distances between different classes. An iterative method is proposed to simultaneously learn the regression coefficients, -draggings matrix and predicting the unknown class labels. Experimental results show the superiority of DSSFS.

AAAI Conference 2018 Conference Paper

Multi-Rate Gated Recurrent Convolutional Networks for Video-Based Pedestrian Re-Identification

  • Zhihui Li
  • Lina Yao
  • Feiping Nie
  • Dingwen Zhang
  • Min Xu

Matching pedestrians across multiple camera views has attracted lots of recent research attention due to its apparent importance in surveillance and security applications. While most existing works address this problem in a still-image setting, we consider the more informative and challenging video-based person re-identification problem, where a video of a pedestrian as seen in one camera needs to be matched to a gallery of videos captured by other non-overlapping cameras. We employ a convolutional network to extract the appearance and motion features from raw video sequences, and then feed them into a multi-rate recurrent network to exploit the temporal correlations, and more importantly, to take into account the fact that pedestrians, sometimes even the same pedestrian, move in different speeds across different camera views. The combined network is trained in an end-to-end fashion, and we further propose an initialization strategy via context reconstruction to largely improve the performance. We conduct extensive experiments on the iLIDS-VID and PRID-2011 datasets, and our experimental results confirm the effectiveness and the generalization ability of our model.

AAAI Conference 2018 Conference Paper

New l 2,1 -Norm Relaxation of Multi-Way Graph Cut for Clustering

  • Xu Yang
  • Cheng Deng
  • Xianglong Liu
  • Feiping Nie

The clustering methods have absorbed even-increasing attention in machine learning and computer vision communities in recent years. Exploring manifold information in multi-way graph cut clustering, such as ratio cut clustering, has shown its promising performance. However, traditional multi-way ratio cut clustering method is NP-hard and thus the spectral solution may deviate from the optimal one. In this paper, we propose a new relaxed multi-way graph cut clustering method, where 2, 1-norm distance instead of squared distance is utilized to preserve the solution having much more clearer cluster structures. Furthermore, the resulting solution is constrained with normalization to obtain more sparse representation, which can encourage the solution to contain more discrete values with many zeros. For the objective function, it is very difficult to optimize due to minimizing the ratio of two non-smooth items. To address this problem, we transform the objective function into a quadratic problem on the Stiefel manifold (QPSM), and introduce a novel yet efficient iterative algorithm to solve it. Experimental results on several benchmark datasets show that our method significantly outperforms several state-of-the-art clustering approaches.

AAAI Conference 2017 Conference Paper

A Multiview-Based Parameter Free Framework for Group Detection

  • Xuelong Li
  • Mulin Chen
  • Feiping Nie
  • Qi Wang

Group detection is fundamentally important for analyzing crowd behaviors, and has attracted plenty of attention in arti- ficial intelligence. However, existing works mostly have limitations due to the insufficient utilization of crowd properties and the arbitrary processing of individuals. In this paper, we propose the Multiview-based Parameter Free (MPF) approach to detect groups in crowd scenes. The main contributions made in this study are threefold: (1) a new structural context descriptor is designed to characterize the structural property of individuals in crowd motions; (2) an selfweighted multiview clustering method is proposed to cluster feature points by incorporating their motion and context similarities; (3) a novel framework is introduced for group detection, which is able to determine the group number automatically without any parameter or threshold to be tuned. Extensive experiments on various real world datasets demonstrate the effectiveness of the proposed approach, and show its superiority against state-of-the-art group detection techniques.

IJCAI Conference 2017 Conference Paper

Adaptive Semi-Supervised Learning with Discriminative Least Squares Regression

  • Minnan Luo
  • Lingling Zhang
  • Feiping Nie
  • Xiaojun Chang
  • Buyue Qian
  • Qinghua Zheng

Semi-supervised learning plays a significant role in multi-class classification, where a small number of labeled data are more deterministic while substantial unlabeled data might cause large uncertainties and potential threats. In this paper, we distinguish the label fitting of labeled and unlabeled training data through a probabilistic vector with an adaptive parameter, which always ensures the significant importance of labeled data and characterizes the contribution of unlabeled instance according to its uncertainty. Instead of using traditional least squares regression (LSR) for classification, we develop a new discriminative LSR by equipping each label with an adjustment vector. This strategy avoids incorrect penalization on samples that are far away from the boundary and simultaneously facilitates multi-class classification by enlarging the geometrical distance of instances belonging to different classes. An efficient alternative algorithm is exploited to solve the proposed model with closed form solution for each updating rule. We also analyze the convergence and complexity of the proposed algorithm theoretically. Experimental results on several benchmark datasets demonstrate the effectiveness and superiority of the proposed model for multi-class classification tasks.

IJCAI Conference 2017 Conference Paper

Angle Principal Component Analysis

  • Qianqian Wang
  • Quanxue Gao
  • Xinbo Gao
  • Feiping Nie

Recently, many ℓ1-norm based PCA methods have been developed for dimensionality reduction, but they do not explicitly consider the reconstruction error. Moreover, they do not take into account the relationship between reconstruction error and variance of projected data. This reduces the robustness of algorithms. To handle this problem, a novel formulation for PCA, namely angle PCA, is proposed. Angle PCA employs ℓ2-norm to measure reconstruction error and variance of projected da-ta and maximizes the summation of ratio between variance and reconstruction error of each data. Angle PCA not only is robust to outliers but also retains PCA’s desirable property such as rotational invariance. To solve Angle PCA, we propose an iterative algorithm, which has closed-form solution in each iteration. Extensive experiments on several face image databases illustrate that our method is overall superior to the other robust PCA algorithms, such as PCA, PCA-L1 greedy, PCA-L1 nongreedy and HQ-PCA.

AAAI Conference 2017 Conference Paper

Balanced Clustering with Least Square Regression

  • Hanyang Liu
  • Junwei Han
  • Feiping Nie
  • Xuelong Li

Clustering is a fundamental research topic in data mining. A balanced clustering result is often required in a variety of applications. Many existing clustering algorithms have good clustering performances, yet fail in producing balanced clusters. In this paper, we propose a novel and simple method for clustering, referred to as the Balanced Clustering with Least Square regression (BCLS), to minimize the least square linear regression, with a balance constraint to regularize the clustering model. In BCLS, the linear regression is applied to estimate the class-specific hyperplanes that partition each class of data from others, thus guiding the clustering of the data points into different clusters. A balance constraint is utilized to regularize the clustering, by minimizing which can help produce balanced clusters. In addition, we apply the method of augmented Lagrange multipliers (ALM) to help optimize the objective model. The experiments on seven real-world benchmarks demonstrate that our approach not only produces good clustering performance but also guarantees a balanced clustering result.

AAAI Conference 2017 Conference Paper

Bilateral k-Means Algorithm for Fast Co-Clustering

  • Junwei Han
  • Kun Song
  • Feiping Nie
  • Xuelong Li

With the development of the information technology, the amount of data, e. g. text, image and video, has been increased rapidly. Efficiently clustering those large scale data sets is a challenge. To address this problem, this paper proposes a novel co-clustering method named bilateral k-means algorithm (BKM) for fast co-clustering. Different from traditional k-means algorithms, the proposed method has two indicator matrices P and Q and a diagonal matrix S to be solved, which represent the cluster memberships of samples and features, and the co-cluster centres, respectively. Therefore, it could implement different clustering tasks on the samples and features simultaneously. We also introduce an effective approach to solve the proposed method, which involves less multiplication. The computational complexity is analyzed. Extensive experiments on various types of data sets are conducted. Compared with the state-of-the-art clustering methods, the proposed BKM not only has faster computational speed, but also achieves promising clustering results.

IJCAI Conference 2017 Conference Paper

Convolutional 2D LDA for Nonlinear Dimensionality Reduction

  • Qi Wang
  • Zequn Qin
  • Feiping Nie
  • Yuan Yuan

Representing high-volume and high-order data is an essential problem, especially in machine learning field. Although existing two-dimensional (2D) discriminant analysis achieves promising performance, the single and linear projection features make it difficult to analyze more complex data. In this paper, we propose a novel convolutional two-dimensional linear discriminant analysis (2D LDA) method for data representation. In order to deal with nonlinear data, a specially designed Convolutional Neural Networks (CNN) is presented, which can be proved having the equivalent objective function with common 2D LDA. In this way, the discriminant ability can benefit from not only the nonlinearity of Convolutional Neural Networks, but also the powerful learning process. Experiment results on several datasets show that the proposed method performs better than other state-of-the-art methods in terms of classification accuracy.

IJCAI Conference 2017 Conference Paper

Feature Selection via Scaling Factor Integrated Multi-Class Support Vector Machines

  • Jinglin Xu
  • Feiping Nie
  • Junwei Han

In data mining, we often encounter high dimensional and noisy features, which may not only increase the load of computational resources but also result in the problem of model overfitting. Feature selection is often adopted to address this issue. In this paper, we propose a novel feature selection method based on multi-class SVM, which introduces the scaling factor with a flexible parameter to renewedly adjust the distribution of feature weights and select the most discriminative features. Concretely, the proposed method designs a scaling factor with p/2 power to control the distribution of weights adaptively and search optimal sparsity of weighting matrix. In addition, to solve the proposed model, we provide an alternative and iterative optimization method. It not only makes solutions of weighting matrix and scaling factor independently, but also provides a better way to address the problem of solving L2, 0-norm. Comprehensive experiments are conducted on six datasets to demonstrate that this work can obtain better performance compared with a number of existing state-of-the-art multi-class feature selection methods.

IJCAI Conference 2017 Conference Paper

Flexible Orthogonal Neighborhood Preserving Embedding

  • Tianji Pang
  • Feiping Nie
  • Junwei Han

In this paper, we propose a novel linear subspace learning algorithm called Flexible Orthogonal Neighborhood Preserving Embedding (FONPE), which is a linear approximation of Locally Linear Embedding (LLE) algorithm. Our novel objective function integrates two terms related to manifold smoothness and a flexible penalty defined on the projection fitness. Different from Neighborhood Preserving Embedding (NPE), we relax the hard constraint by modeling the mismatch between the approximate linear embedding and the original nonlinear embedding instead of enforcing them to be equal, which makes it better cope with the data sampled from a nonlinear manifold. Besides, instead of enforcing an orthogonality between the projected points, we enforce the mapping to be orthogonal. By using this method, FONPE tends to preserve distances and thus the overall geometry can be preserved. Unlike LLE, as FONPE has an explicit linear mapping between the input and the reduced spaces, it can handle novel testing data straightforwardly. Moreover, when the projection matrix in our model becomes an identity matrix, our model can be transformed to denoising LLE (DLLE). Compared with the standard LLE, we demonstrate that DLLE can handle data with noise better. Comprehensive experiments on several benchmark databases demonstrate the effectiveness of our algorithm.

IJCAI Conference 2017 Conference Paper

Joint Capped Norms Minimization for Robust Matrix Recovery

  • Feiping Nie
  • Zhouyuan Huo
  • Heng Huang

The low-rank matrix recovery is an important machine learning research topic with various scientific applications. Most existing low-rank matrix recovery methods relax the rank minimization problem via the trace norm minimization. However, such a relaxation makes the solution seriously deviate from the original one. Meanwhile, most matrix recovery methods minimize the squared prediction errors on the observed entries, which is sensitive to outliers. In this paper, we propose a new robust matrix recovery model to address the above two challenges. The joint capped trace norm and capped $\ell_1$-norm are used to tightly approximate the rank minimization and enhance the robustness to outliers. The evaluation experiments are performed on both synthetic data and real world applications in collaborative filtering and social network link prediction. All empirical results show our new method outperforms the existing matrix recovery methods.

AAAI Conference 2017 Conference Paper

Large Graph Hashing with Spectral Rotation

  • Xuelong Li
  • Di Hu
  • Feiping Nie

Faced with the requirements of huge amounts of data processing nowadays, hashing techniques have attracted much attention due to their efficient storage and searching ability. Among these techniques, the ones based on spectral graph show remarkable performance as they could embed the data on a low-dimensional manifold and maintain the neighborhood structure via a non-linear spectral eigenmap. However, the spectral solution in real value of such methods may deviate from the discrete solution. The common practice is just performing a simple rounding operation to obtain the final binary codes, which could break constraints and even result in worse condition. In this paper, we propose to impose a so-called spectral rotation technique to the spectral hashing objective, which could transform the candidate solution into a new one that better approximates the discrete one. Moreover, the binary codes are obtained from the modified solution via minimizing the Euclidean distance, which could result in more semantical correlation within the manifold, where the constraints for codes are always held. We provide an efficient alternative algorithm to solve the above problems. And a manifold learning perceptive for motivating the proposed method is also shown. Extensive experiments are conducted on three large-scale benchmark datasets and the results show our method outperforms state-of-the-art hashing methods, especially the spectral graph ones.

NeurIPS Conference 2017 Conference Paper

Learning A Structured Optimal Bipartite Graph for Co-Clustering

  • Feiping Nie
  • Xiaoqian Wang
  • Cheng Deng
  • Heng Huang

Co-clustering methods have been widely applied to document clustering and gene expression analysis. These methods make use of the duality between features and samples such that the co-occurring structure of sample and feature clusters can be extracted. In graph based co-clustering methods, a bipartite graph is constructed to depict the relation between features and samples. Most existing co-clustering methods conduct clustering on the graph achieved from the original data matrix, which doesn’t have explicit cluster structure, thus they require a post-processing step to obtain the clustering results. In this paper, we propose a novel co-clustering method to learn a bipartite graph with exactly k connected components, where k is the number of clusters. The new bipartite graph learned in our model approximates the original graph but maintains an explicit cluster structure, from which we can immediately get the clustering results without post-processing. Extensive empirical results are presented to verify the effectiveness and robustness of our model.

IJCAI Conference 2017 Conference Paper

Linear Manifold Regularization with Adaptive Graph for Semi-supervised Dimensionality Reduction

  • Kai Xiong
  • Feiping Nie
  • Junwei Han

Many previous graph-based methods perform dimensionality reduction on a pre-defined graph. However, due to the noise and redundant information in the original data, the pre-defined graph has no clear structure and may not be appropriate for the subsequent task. To overcome the drawbacks, in this paper, we propose a novel approach called linear manifold regularization with adaptive graph (LMRAG) for semi-supervised dimensionality reduction. LMRAG directly incorporates the graph construction into the objective function, thus the projection matrix and the optimal graph can be simultaneously optimized. Due to the structure constraint, the learned graph is sparse and has clear structure. Extensive experiments on several benchmark datasets demonstrate the effectiveness of the proposed method.

AAAI Conference 2017 Conference Paper

Local Centroids Structured Non-Negative Matrix Factorization

  • Hongchang Gao
  • Feiping Nie
  • Heng Huang

Non-negative Matrix Factorization (NMF) has attracted much attention and been widely used in real-world applications. As a clustering method, it fails to handle the case where data points lie in a complicated geometry structure. Existing methods adopt single global centroid for each cluster, failing to capture the manifold structure. In this paper, we propose a novel local centroids structured NMF to address this drawback. Instead of using single centroid for each cluster, we introduce multiple local centroids for individual cluster such that the manifold structure can be captured by the local centroids. Such a novel NMF method can improve the clustering performance effectively. Furthermore, a novel bipartite graph is incorporated to obtain the clustering indicator directly without any post process. Experiments on both toy datasets and real-world datasets have verified the effectiveness of the proposed method.

IS Journal 2017 Journal Article

Local PurTree Spectral Clustering for Massive Customer Transaction Data

  • Xiaojun Chen
  • Si Peng
  • Joshua Zhexue Huang
  • Feiping Nie
  • Yong Ming

The clustering of customer transaction data is very important to retail and e-commerce companies. The authors propose a local PurTree spectral clustering algorithm for massive customer transaction data that uses a purchase tree to represent customer transaction data and a PurTree distance to compute the distance between two trees. The new method learns a data similarity matrix from the local distances and the level weights in the PurTree distance simultaneously. An iterative optimization algorithm is proposed to optimize the proposed model. The authors conducted experiments to compare their method with four commonly used clustering method for transaction data on six real-life datasets. The experimental results show that the new method outperformed other clustering algorithms.

IJCAI Conference 2017 Conference Paper

Locality Adaptive Discriminant Analysis

  • Xuelong Li
  • Mulin Chen
  • Feiping Nie
  • Qi Wang

Linear Discriminant Analysis (LDA) is a popular technique for supervised dimensionality reduction, and its performance is satisfying when dealing with Gaussian distributed data. However, the neglect of local data structure makes LDA inapplicable to many real-world situations. So some works focus on the discriminant analysis between neighbor points, which can be easily affected by the noise in the original data space. In this paper, we propose a new supervised dimensionality reduction method, Locality Adaptive Discriminant Analysis (LADA), to lean a representative subspace of the data. Compared to LDA and its variants, the proposed method has three salient advantages: (1) it finds the principle projection directions without imposing any assumption on the data distribution; (2) it’s able to exploit the local manifold structure of data in the desired subspace; (3) it exploits the points’ neighbor relationship automatically without introducing any additional parameter to be tuned. Performance on synthetic datasets and real-world benchmark datasets demonstrate the superiority of the proposed method.

IJCAI Conference 2017 Conference Paper

Multi-Class Support Vector Machine via Maximizing Multi-Class Margins

  • Jie Xu
  • Xianglong Liu
  • Zhouyuan Huo
  • Cheng Deng
  • Feiping Nie
  • Heng Huang

Support Vector Machine (SVM) is originally proposed as a binary classification model, and it has already achieved great success in different applications. In reality, it is more often to solve a problem which has more than two classes. So, it is natural to extend SVM to a multi-class classifier. There have been many works proposed to construct a multi-class classifier based on binary SVM, such as one versus all strategy, one versus one strategy and Weston's multi-class SVM. One versus all strategy and one versus one strategy split the multi-class problem to multiple binary classification subproblems, and we need to train multiple binary classifiers. Weston's multi-class SVM is formed by ensuring risk constraints and imposing a specific regularization, like Frobenius norm. It is not derived by maximizing the margin between hyperplane and training data which is the motivation in SVM. In this paper, we propose a multi-class SVM model from the perspective of maximizing margin between training points and hyperplane, and analyze the relation between our model and other related methods. In the experiment, it shows that our model can get better or compared results when comparing with other related methods.

AAAI Conference 2017 Conference Paper

Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours

  • Feiping Nie
  • Guohao Cai
  • Xuelong Li

Due to the efficiency of learning relationships and complex structures hidden in data, graph-oriented methods have been widely investigated and achieve promising performance in multi-view learning. Generally, these learning algorithms construct informative graph for each view or fuse different views to one graph, on which the following procedure are based. However, in many real world dataset, original data always contain noise and outlying entries that result in unreliable and inaccurate graphs, which cannot be ameliorated in the previous methods. In this paper, we propose a novel multi-view learning model which performs clustering/semi-supervised classification and local structure learning simultaneously. The obtained optimal graph can be partitioned into specific clusters directly. Moreover, our model can allocate ideal weight for each view automatically without additional weight and penalty parameters. An efficient algorithm is proposed to optimize this model. Extensive experimental results on different real-world datasets show that the proposed model outperforms other state-of-the-art multi-view algorithms.

AAAI Conference 2017 Conference Paper

Multi-View Correlated Feature Learning by Uncovering Shared Component

  • Xiaowei Xue
  • Feiping Nie
  • Sen Wang
  • Xiaojun Chang
  • Bela Stantic
  • Min Yao

Learning multiple heterogeneous features from different data sources is challenging. One research topic is how to exploit and utilize the correlations among various features across multiple views with the aim of improving the performance of learning tasks, such as classification. In this paper, we propose a new multi-view feature learning algorithm that simultaneously analyzes features from different views. Compared to most of the existing subspace learning methods that only focus on exploiting a shared latent subspace, our algorithm not only learns individual information in each view but also captures feature correlations among multiple views by learning a shared component. By assuming that such a component is shared by all views, we simultaneously exploit the shared component and individual information of each view in a batch mode. Since the objective function is non-smooth and difficult to solve, we propose an efficient iterative algorithm for optimization with guaranteed convergence. Extensive experiments are conducted on several benchmark datasets. The results demonstrate that our proposed algorithm performs better than all the compared multi-view learning algorithms.

IJCAI Conference 2017 Conference Paper

Multi-view Feature Learning with Discriminative Regularization

  • Jinglin Xu
  • Junwei Han
  • Feiping Nie

More and more multi-view data which can capture rich information from heterogeneous features are widely used in real world applications. How to integrate different types of features, and how to learn low dimensional and discriminative information from high dimensional data are two main challenges. To address these challenges, this paper proposes a novel multi-view feature learning framework, which is regularized by discriminative information and obtains a feature learning model that contains multiple discriminative feature weighting matrices for different views, and then yields multiple low dimensional features used for subsequent multi-view clustering. To optimize the formulated objective function, we transform the proposed framework into a trace optimization problem which obtains the global solution in a closed form. Experimental evaluations on four widely used datasets and comparisons with a number of state-of-the-art multi-view clustering algorithms demonstrate the superiority of the proposed work.

AAAI Conference 2017 Conference Paper

Multiclass Capped _p-Norm SVM for Robust Classifications

  • Feiping Nie
  • Xiaoqian Wang
  • Heng Huang

Support vector machine (SVM) model is one of most successful machine learning methods and has been successfully applied to solve numerous real-world application. Because the SVM methods use the hinge loss or squared hinge loss functions for classifications, they usually outperform other classi- fication approaches, e. g. the least square loss function based methods. However, like most supervised learning algorithms, they learn classifiers based on the labeled data in training set without specific strategy to deal with the noise data. In many real-world applications, we often have data outliers in train set, which could misguide the classifiers learning, such that the classification performance is suboptimal. To address this problem, we proposed a novel capped p-norm SVM classi- fication model by utilizing the capped p-norm based hinge loss in the objective which can deal with both light and heavy outliers. We utilize the new formulation to naturally build the multiclass capped p-norm SVM. More importantly, we derive a novel optimization algorithms to efficiently minimize the capped p-norm based objectives, and also rigorously prove the convergence of proposed algorithms. We present experimental results showing that employing the new capped p-norm SVM method can consistently improve the classi- fication performance, especially in the cases when the data noise level increases.

IJCAI Conference 2017 Conference Paper

Orthogonal and Nonnegative Graph Reconstruction for Large Scale Clustering

  • Junwei Han
  • Kai Xiong
  • Feiping Nie

Spectral clustering has been widely used due to its simplicity for solving graph clustering problem in recent years. However, it suffers from the high computational cost as data grow in scale, and is limited by the performance of post-processing. To address these two problems simultaneously, in this paper, we propose a novel approach denoted by orthogonal and nonnegative graph reconstruction (ONGR) that scales linearly with the data size. For the relaxation of Normalized Cut, we add nonnegative constraint to the objective. Due to the nonnegativity, ONGR offers interpretability that the final cluster labels can be directly obtained without post-processing. Extensive experiments on clustering tasks demonstrate the effectiveness of the proposed method.

AAAI Conference 2017 Conference Paper

Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning

  • Kun Song
  • Feiping Nie
  • Junwei Han
  • Xuelong Li

We introduce a novel supervised metric learning algorithm named parameter free large margin nearest neighbor (PFLMNN) which can be seen as an improvement of the classical large margin nearest neighbor (LMNN) algorithm. The contributions of our work consist of two aspects. First, our method discards the costterm which shrinks the distances between inquiry input and its k target neighbors (the k nearest neighbors with same labels as inquiry input) in LMNN, and only focuses on improving the action to push the imposters (the samples with different labels form the inquiry input) apart out of the neighborhood of inquiry. As a result, our method does not have the parameter needed to tune on the validating set, which makes it more convenient to use. Second, by leveraging the geometry information of the imposters, we construct a novel cost function to penalize the smalldistances between each inquiry and its imposters. Different from LMNN considering every imposter located in the neighborhood of each inquiry, our method only takes care of the nearest imposters. Because when the nearest imposter is pushed out of the neighborhood of its inquiry, other imposters would be all out. In this way, the constraints in our model are much less than that of LMNN, which makes our method much easier to find the optimal distance metric. Consequently, our method not only learns a better distance metric than LMNN, but also runs faster than LMNN. Extensive experiments on different data sets with various sizes and difficulties are conducted, and the results have shown that, compared with LMNN, PFLMNN achieves better classification results.

AAAI Conference 2017 Conference Paper

Probabilistic Non-Negative Matrix Factorization and Its Robust Extensions for Topic Modeling

  • Minnan Luo
  • Feiping Nie
  • Xiaojun Chang
  • Yi Yang
  • Alexander Hauptmann
  • Qinghua Zheng

Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional independence of words given the documents topic distribution. In this paper, we follow the generative procedure of topic model and learn the topic-word distribution and topics distribution via directly approximating the word-document co-occurrence matrix with matrix decomposition technique. These methods include: (1) Approximating the normalized document-word conditional distribution with the documents probability matrix and words probability matrix based on probabilistic nonnegative matrix factorization (NMF); (2) Since the standard NMF is well known to be non-robust to noises and outliers, we extended the probabilistic NMF of the topic model to its robust versions using 2, 1-norm and capped 2, 1-norm based loss functions, respectively. The proposed framework inherits the explicit probabilistic meaning of factors in topic models and simultaneously makes the conditional independence assumption on words unnecessary. Straightforward and efficient algorithms are exploited to solve the corresponding nonsmooth and non-convex problems. Experimental results over several benchmark datasets illustrate the effectiveness and superiority of the proposed methods.

IJCAI Conference 2017 Conference Paper

Scalable Normalized Cut with Improved Spectral Rotation

  • Xiaojun Chen
  • Feiping Nie
  • Joshua Zhexue Huang
  • Min Yang

Many spectral clustering algorithms have been proposed and successfully applied to many high-dimensional applications. However, there are still two problems that need to be solved: 1) existing methods for obtaining the final clustering assignments may deviate from the true discrete solution, and 2) most of these methods usually have very high computational complexity. In this paper, we propose a Scalable Normalized Cut method for clustering of large scale data. In the new method, an efficient method is used to construct a small representation matrix and then clustering is performed on the representation matrix. In the clustering process, an improved spectral rotation method is proposed to obtain the solution of the final clustering assignments. A series of experimental were conducted on 14 benchmark data sets and the experimental results show the superior performance of the new method.

IJCAI Conference 2017 Conference Paper

Self-weighted Multiview Clustering with Multiple Graphs

  • Feiping Nie
  • Jing Li
  • Xuelong Li

In multiview learning, it is essential to assign a reasonable weight to each view according to its importance. Thus, for multiview clustering task, a wise and elegant method should achieve clustering multiview data while learning the view weights. In this paper, we address this problem by exploring a Laplacian rank constrained graph, which can be approximately as the centroid of the built graph for each view with different confidences. We start our work with a natural thought that the weights can be learned by introducing a hyperparameter. By analyzing the weakness of it, we further propose a new multiview clustering method which is totally self-weighted. Furthermore, once the target graph is obtained in our models, we can directly assign the cluster label to each data point and do not need any postprocessing such as $K$-means in standard spectral clustering. Evaluations on two synthetic datasets prove the effectiveness of our methods. Compared with several representative graph-based multiview clustering approaches on four real-world datasets, experimental results demonstrate that the proposed methods achieve the better performances and our new clustering method is more practical to use.

AAAI Conference 2017 Conference Paper

Semi-Supervised Classifications via Elastic and Robust Embedding

  • Yun Liu
  • Yiming Guo
  • Hua Wang
  • Feiping Nie
  • Heng Huang

Transductive semi-supervised learning can only predict labels for unlabeled data appearing in training data, and can not predict labels for testing data never appearing in training set. To handle this out-of-sample problem, many inductive methods make a constraint such that the predicted label matrix should be exactly equal to a linear model. In practice, this constraint might be too rigid to capture the manifold structure of data. In this paper, we relax this rigid constraint and propose to use an elastic constraint on the predicted label matrix such that the manifold structure can be better explored. Moreover, since unlabeled data are often very abundant in practice and usually there are some outliers, we use a non-squared loss instead of the traditional squared loss to learn a robust model. The derived problem, although is convex, has so many nonsmooth terms, which make it very challenging to solve. In the paper, we propose an efficient optimization algorithm to solve a more general problem, based on which we find the optimal solution to the derived problem.

IJCAI Conference 2017 Conference Paper

Semi-supervised Feature Selection via Rescaled Linear Regression

  • Xiaojun Chen
  • Guowen Yuan
  • Feiping Nie
  • Joshua Zhexue Huang

With the rapid increase of complex and high-dimensional sparse data, demands for new methods to select features by exploiting both labeled and unlabeled data have increased. Least regression based feature selection methods usually learn a projection matrix and evaluate the importances of features using the projection matrix, which is lack of theoretical explanation. Moreover, these methods cannot find both global and sparse solution of the projection matrix. In this paper, we propose a novel semi-supervised feature selection method which can learn both global and sparse solution of the projection matrix. The new method extends the least square regression model by rescaling the regression coefficients in the least square regression with a set of scale factors, which are used for ranking the features. It has shown that the new model can learn global and sparse solution. Moreover, the introduction of scale factors provides a theoretical explanation for why we can use the projection matrix to rank the features. A simple yet effective algorithm with proved convergence is proposed to optimize the new model. Experimental results on eight real-life data sets show the superiority of the method.

IJCAI Conference 2017 Conference Paper

Semi-supervised Orthogonal Graph Embedding with Recursive Projections

  • Hanyang Liu
  • Junwei Han
  • Feiping Nie

Many graph based semi-supervised dimensionality reduction algorithms utilize the projection matrix to linearly map the data matrix from the original feature space to a lower dimensional representation. But the dimensionality after reduction is inevitably restricted to the number of classes, and the learned non-orthogonal projection matrix usually fails to preserve distances well and balance the weight on different projection direction. This paper proposes a novel dimensionality reduction method, called the semi-supervised orthogonal graph embedding with recursive projections (SOGE). We integrate the manifold smoothness and label fitness as well as the penalization of the linear mapping mismatch, and learn the orthogonal projection on the Stiefel manifold that empirically demonstrates better performance. Moreover, we recursively update the projection matrix in its orthocomplemented space to continuously learn more projection vectors, so as to better control the dimension of reduction. Comprehensive experiment on several benchmarks demonstrates the significant improvement over the existing methods.

IJCAI Conference 2017 Conference Paper

Theoretic Analysis and Extremely Easy Algorithms for Domain Adaptive Feature Learning

  • Wenhao Jiang
  • Cheng Deng
  • Wei Liu
  • Feiping Nie
  • Fu-lai Chung
  • Heng Huang

Domain adaptation problems arise in a variety of applications, where a training dataset from the source domain and a test dataset from the target domain typically follow different distributions. The primary difficulty in designing effective learning models to solve such problems lies in how to bridge the gap between the source and target distributions. In this paper, we provide comprehensive analysis of feature learning algorithms used in conjunction with linear classifiers for domain adaptation. Our analysis shows that in order to achieve good adaptation performance, the second moments of the source domain distribution and target domain distribution should be similar. Based on our new analysis, a novel extremely easy feature learning algorithm for domain adaptation is proposed. Furthermore, our algorithm is extended by leveraging multiple layers, leading to another feature learning algorithm. We evaluate the effectiveness of the proposed algorithms in terms of domain adaptation tasks on Amazon review and spam datasets from the ECML/PKDD 2006 discovery challenge.

IJCAI Conference 2017 Conference Paper

Two dimensional Large Margin Nearest Neighbor for Matrix Classification

  • Kun Song
  • Feiping Nie
  • Junwei Han

Matrices are common forms of data that are encountered in a wide range of real applications. How to classify this kind of data is an important research topic. In this paper, we propose a novel distance metric learning method named two dimensional large margin nearest neighbor (2DLMNNN), for improving the performance of k nearest neighbor (KNN) classifier in matrix classification. In the proposed method, left and right projection matrices are employed to define the matrix-based Mahalanobis distance, which is used to construct the objective aimed at separating points in different classes by a large margin. The parameters in those two projection matrices are much less than that in its vector-based counterpart, thus our method reduces the risks of overfitting. We also introduce a framework for solving the proposed 2DLMNN. The convergence behavior, initialization, and parameter determination are also analyzed. Compared with vector-based methods, 2DLMNN performs better for matrix data classification. Promising experimental results on several data sets are provided to demonstrate the effectiveness of our method.

AAAI Conference 2017 Conference Paper

Unsupervised Large Graph Embedding

  • Feiping Nie
  • Wei Zhu
  • Xuelong Li

There are many successful spectral based unsupervised dimensionality reduction methods, including Laplacian Eigenmap (LE), Locality Preserving Projection (LPP), Spectral Regression (SR), etc. LPP and SR are two different linear spectral based methods, however, we discover that LPP and SR are equivalent, if the symmetric similarity matrix is doubly stochastic, Positive Semi-Definite (PSD) and with rank p, where p is the reduced dimension. The discovery promotes us to seek low-rank and doubly stochastic similarity matrix, we then propose an unsupervised linear dimensionality reduction method, called Unsupervised Large Graph Embedding (ULGE). ULGE starts with similar idea as LPP, it adopts an efficient approach to construct similarity matrix and then performs spectral analysis efficiently, the computational complexity can reduce to O(ndm), which is a significant improvement compared to conventional spectral based methods which need O(n2 d) at least, where n, d and m are the number of samples, dimensions and anchors, respectively. Extensive experiments on several public available data sets demonstrate the efficiency and effectiveness of the proposed method.

AAAI Conference 2016 Conference Paper

Discriminative Vanishing Component Analysis

  • Chenping Hou
  • Feiping Nie
  • Dacheng Tao

Vanishing Component Analysis (VCA) is a recently proposed prominent work in machine learning. It narrows the gap between tools and computational algebra: the vanishing ideal and its applications to classification problem. In this paper, we will analyze VCA in the kernel view, which is also another important research direction in machine learning. Under a very weak assumption, we provide a different point of view to VCA and make the kernel trick on VCA become possible. We demonstrate that the projection matrix derived by VCA is located in the same space as that of Kernel Principal Component Analysis (KPCA) with a polynomial kernel. Two groups of projections can express each other by linear transformation. Furthermore, we prove that KPCA and VCA have identical discriminative power, provided that the ratio trace criteria is employed as the measurement. We also show that the kernel formulated by the inner products of VCA’s projections can be expressed by the KPCA’s kernel linearly. Based on the analysis above, we proposed a novel Discriminative Vanishing Component Analysis (DVCA) approach. Experimental results are provided for demonstration.

IJCAI Conference 2016 Conference Paper

Fast Robust Non-Negative Matrix Factorization for Large-Scale Human Action Data Clustering

  • De Wang
  • Feiping Nie
  • Heng Huang

Human action recognition is important in improving human life in various aspects. However, the outliers and noise in data often bother the clustering tasks. Therefore, there is a great need for the robust data clustering techniques. Nonnegative matrix factorization (NMF) and Nonnegative Matrix Tri-Factorization (NMTF) methods have been widely researched these years and applied to many data clustering applications. With the presence of outliers, most previous NMF/NMTF models fail to achieve the optimal clustering performance. To address this challenge, in this paper, we propose three new NMF and NMTF models which are robust to outliers. Efficient algorithms are derived, which converge much faster than previous NMF methods and as fast as K-means algorithm, and scalable to large-scale data sets. Experimental results on both synthetic and real world data sets show that our methods outperform other NMF and NMTF methods in most cases, and in the meanwhile, take much less computational time.

AAAI Conference 2016 Conference Paper

Graph-without-cut: An Ideal Graph Learning for Image Segmentation

  • Lianli Gao
  • Jingkuan Song
  • Feiping Nie
  • Fuhao Zou
  • Nicu Sebe
  • Heng Tao Shen

Graph-based image segmentation organizes the image elements into graphs and partitions an image based on the graph. It has been widely used and many promising results are obtained. Since the segmentation performance highly depends on the graph, most of existing methods focus on obtaining a precise similarity graph or on designing efficient cutting/merging strategies. However, these two components are often conducted in two separated steps, and thus the obtained graph similarity may not be the optimal one for segmentation and this may lead to suboptimal results. In this paper, we propose a novel framework, Graph-Without- Cut (GWC), for learning the similarity graph and image segmentations simultaneously. GWC learns the similarity graph by assigning adaptive and optimal neighbors to each vertex based on the spatial and visual information. Meanwhile, the new rank constraint is imposed to the Laplacian matrix of the similarity graph, such that the connected components in the resulted similarity graph are exactly equal to the region number. Extensive empirical results on three public data sets (i. e, BSDS300, BSDS500 and MSRC) show that our unsupervised GWC achieves state-of-the-art performance compared with supervised and unsupervised image segmentation approaches.

AAAI Conference 2016 Conference Paper

New l1-Norm Relaxations and Optimizations for Graph Clustering

  • Feiping Nie
  • Hua Wang
  • Cheng Deng
  • Xinbo Gao
  • Xuelong Li
  • Heng Huang

In recent data mining research, the graph clustering methods, such as normalized cut and ratio cut, have been well studied and applied to solve many unsupervised learning applications. The original graph clustering methods are NP-hard problems. Traditional approaches used spectral relaxation to solve the graph clustering problems. The main disadvantage of these approaches is that the obtained spectral solutions could severely deviate from the true solution. To solve this problem, in this paper, we propose a new relaxation mechanism for graph clustering methods. Instead of minimizing the squared distances of clustering results, we use the 1-norm distance. More important, considering the normalized consistency, we also use the 1norm for the normalized terms in the new graph clustering relaxations. Due to the sparse result from the 1-norm minimization, the solutions of our new relaxed graph clustering methods get discrete values with many zeros, which are close to the ideal solutions. Our new objectives are difficult to be optimized, because the minimization problem involves the ratio of nonsmooth terms. The existing sparse learning optimization algorithms cannot be applied to solve this problem. In this paper, we propose a new optimization algorithm to solve this difficult non-smooth ratio minimization problem. The extensive experiments have been performed on three two-way clustering and eight multi-way clustering benchmark data sets. All empirical results show that our new relaxation methods consistently enhance the normalized cut and ratio cut clustering results.

IJCAI Conference 2016 Conference Paper

Parameter-Free Auto-Weighted Multiple Graph Learning: A Framework for Multiview Clustering and Semi-Supervised Classification

  • Feiping Nie
  • Jing Li
  • Xuelong Li

Graph-based approaches have been successful in unsupervised and semi-supervised learning. In this paper, we focus on the real-world applications where the same instance can be represented by multiple heterogeneous features. The key point of utilizing the graph-based knowledge to deal with this kind of data is to reasonably integrate the different representations and obtain the most consistent manifold with the real data distributions. In this paper, we propose a novel framework via the reformulation of the standard spectral learning model, which can be used for multiview clustering and semi-supervised tasks. Unlike other methods in the literature, the proposed method can learn an optimal weight for each graph automatically without introducing an additive parameter as previous methods do. Furthermore, our objective under semi-supervised learning is convex and the global optimal result will be obtained. Extensive empirical results on different real-world data sets demonstrate that the proposed method achieves comparable performance with the state-of-the-art approaches and can be used more practically.

IJCAI Conference 2016 Conference Paper

Robust and Sparse Fuzzy K-Means Clustering

  • Jinglin Xu
  • Junwei Han
  • Kai Xiong
  • Feiping Nie

The partition-based clustering algorithms, like K-Means and fuzzy K-Means, are most widely and successfully used in data mining in the past decades. In this paper, we present a robust and sparse fuzzy K-Means clustering algorithm, an extension to the standard fuzzy K-Means algorithm by incorporating a robust function, rather than the square data fitting term, to handle outliers. More importantly, combined with the concept of sparseness, the new algorithm further introduces a penalty term to make the object-clusters membership of each sample have suitable sparseness. Experimental results on benchmark datasets demonstrate that the proposed algorithm not only can ensure the robustness of such soft clustering algorithm in real world applications, but also can avoid the performance degradation by considering the membership sparsity.

IJCAI Conference 2016 Conference Paper

Subspace Clustering via New Low-Rank Model with Discrete Group Structure Constraint

  • Feiping Nie
  • Heng Huang

We propose a new subspace clustering model to segment data which is drawn from multiple linear or affine subspaces. Unlike the well-known sparse subspace clustering (SSC) and low-rank representation (LRR) which transfer the subspace clustering problem into two steps' algorithm including building the affinity matrix and spectral clustering, our proposed model directly learns the different subspaces' indicator so that low-rank based different groups are obtained clearly. To better approximate the low-rank constraint, we suggest to use Schatten p-norm to relax the rank constraint instead of using trace norm. We tactically avoid the integer programming problem imposed by group indicator constraint to let our algorithm more efficient and scalable. Furthermore, we extend our discussion to the general case in which subspaces don't pass the original point. The new algorithm's convergence is given, and both synthetic and real world datasets demonstrate our proposed model's effectiveness.

AAAI Conference 2016 Conference Paper

The Constrained Laplacian Rank Algorithm for Graph-Based Clustering

  • Feiping Nie
  • Xiaoqian Wang
  • Michael Jordan
  • Heng Huang

Graph-based clustering methods perform clustering on a fixed input data graph. If this initial construction is of low quality then the resulting clustering may also be of low quality. Moreover, existing graph-based clustering methods require post-processing on the data graph to extract the clustering indicators. We address both of these drawbacks by allowing the data graph itself to be adjusted as part of the clustering procedure. In particular, our Constrained Laplacian Rank (CLR) method learns a graph with exactly k connected components (where k is the number of clusters). We develop two versions of this method, based upon the L1-norm and the L2-norm, which yield two new graph-based clustering objectives. We derive optimization algorithms to solve these objectives. Experimental results on synthetic datasets and real-world benchmark datasets exhibit the effectiveness of this new graph-based clustering method.

AAAI Conference 2016 Conference Paper

Unsupervised Feature Selection with Structured Graph Optimization

  • Feiping Nie
  • Wei Zhu
  • Xuelong Li

Since amounts of unlabelled and high-dimensional data needed to be processed, unsupervised feature selection has become an important and challenging problem in machine learning. Conventional embedded unsupervised methods always need to construct the similarity matrix, which makes the selected features highly depend on the learned structure. However real world data always contain lots of noise samples and features that make the similarity matrix obtained by original data can’t be fully relied. We propose an unsupervised feature selection approach which performs feature selection and local structure learning simultaneously, the similarity matrix thus can be determined adaptively. Moreover, we constrain the similarity matrix to make it contain more accurate information of data structure, thus the proposed approach can select more valuable features. An efficient and simple algorithm is derived to optimize the problem. Experiments on various benchmark data sets, including handwritten digit data, face image data and biomedical data, validate the effectiveness of the proposed approach.

AAAI Conference 2015 Conference Paper

A Closed Form Solution to Multi-View Low-Rank Regression

  • Shuai Zheng
  • Xiao Cai
  • Chris Ding
  • Feiping Nie
  • Heng Huang

Real life data often includes information from different channels. For example, in computer vision, we can describe an image using different image features, such as pixel intensity, color, HOG, GIST feature, SIFT features, etc. . These different aspects of the same objects are often called multi-view (or multi-modal) data. Lowrank regression model has been proved to be an effective learning mechanism by exploring the low-rank structure of real life data. But previous low-rank regression model only works on single view data. In this paper, we propose a multi-view low-rank regression model by imposing low-rank constraints on multi-view regression model. Most importantly, we provide a closed-form solution to the multi-view low-rank regression model. Extensive experiments on 4 multi-view datasets show that the multi-view low-rank regression model outperforms single-view regression model and reveals that multiview low-rank structure is very helpful.

AAAI Conference 2015 Conference Paper

A Convex Formulation for Spectral Shrunk Clustering

  • Xiaojun Chang
  • Feiping Nie
  • Zhigang Ma
  • Yi Yang
  • Xiaofang Zhou

Spectral clustering is a fundamental technique in the field of data mining and information processing. Most existing spectral clustering algorithms integrate dimensionality reduction into the clustering process assisted by manifold learning in the original space. However, the manifold in reduced-dimensional subspace is likely to exhibit altered properties in contrast with the original space. Thus, applying manifold information obtained from the original space to the clustering process in a low-dimensional subspace is prone to inferior performance. Aiming to address this issue, we propose a novel convex algorithm that mines the manifold structure in the low-dimensional subspace. In addition, our unified learning process makes the manifold learning particularly tailored for the clustering. Compared with other related methods, the proposed algorithm results in more structured clustering result. To validate the efficacy of the proposed algorithm, we perform extensive experiments on several benchmark datasets in comparison with some state-of-the-art clustering approaches. The experimental results demonstrate that the proposed algorithm has quite promising clustering performance.

IJCAI Conference 2015 Conference Paper

Discriminative Unsupervised Dimensionality Reduction

  • Xiaoqian Wang
  • Yun Liu
  • Feiping Nie
  • Heng Huang

As an important machine learning topic, dimensionality reduction has been widely studied and utilized in various kinds of areas. A multitude of dimensionality reduction methods have been developed, among which unsupervised dimensionality reduction is more desirable when obtaining label information requires onerous work. However, most previous unsupervised dimensionality reduction methods call for an affinity graph constructed beforehand, with which the following dimensionality reduction steps can be then performed. Separation of graph construction and dimensionality reduction leads the dimensionality reduction process highly dependent on quality of the input graph. In this paper, we propose a novel graph embedding method for unsupervised dimensionality reduction. We simultaneously conduct dimensionality reduction along with graph construction by assigning adaptive and optimal neighbors according to the projected local distances. Our method doesn’t need an affinity graph constructed in advance, but instead learns the graph concurrently with dimensionality reduction. Thus, the learned graph is optimal for dimensionality reduction. Meanwhile, our learned graph has an explicit block diagonal structure, from which the clustering results could be directly revealed without any postprocessing steps. Extensive empirical results on dimensionality reduction as well as clustering are presented to corroborate the performance of our method.

AAAI Conference 2015 Conference Paper

Large-Scale Multi-View Spectral Clustering via Bipartite Graph

  • Yeqing Li
  • Feiping Nie
  • Heng Huang
  • Junzhou Huang

In this paper, we address the problem of large-scale multi-view spectral clustering. In many real-world applications, data can be represented in various heterogeneous features or views. Different views often provide different aspects of information that are complementary to each other. Several previous methods of clustering have demonstrated that better accuracy can be achieved using integrated information of all the views than just using each view individually. One important class of such methods is multi-view spectral clustering, which is based on graph Laplacian. However, existing methods are not applicable to large-scale problem for their high computational complexity. To this end, we propose a novel large-scale multi-view spectral clustering approach based on the bipartite graph. Our method uses local manifold fusion to integrate heterogeneous features. To improve efficiency, we approximate the similarity graphs using bipartite graphs. Furthermore, we show that our method can be easily extended to handle the out-of-sample problem. Extensive experimental results on five benchmark datasets demonstrate the effectiveness and efficiency of the proposed method, where our method runs up to nearly 3000 times faster than the state-of-the-art methods.

AAAI Conference 2015 Conference Paper

Learning Robust Locality Preserving Projection via p-Order Minimization

  • Hua Wang
  • Feiping Nie
  • Heng Huang

Locality preserving projection (LPP) is an effective dimensionality reduction method based on manifold learning, which is defined over the graph weighted squared 2-norm distances in the projected subspace. Since squared 2-norm distance is prone to outliers, it is desirable to develop a robust LPP method. In this paper, motivated by existing studies that improve the robustness of statistical learning models via 1-norm or not-squared 2-norm formulations, we propose a robust LPP (rLPP) formulation to minimize the p-th order of the 2-norm distances, which can better tolerate large outlying data samples because it suppress the introduced biased more than the 1-norm or not squared 2-norm minimizations. However, solving the formulated objective is very challenging because it not only non-smooth but also non-convex. As an important theoretical contribution of this work, we systematically derive an efficient iterative algorithm to solve the general p-th order 2-norm minimization problem, which, to the best of our knowledge, is solved for the first time in literature. Extensive empirical evaluations on the proposed rLPP method have been performed, in which our new method outperforms the related state-of-the-art methods in a variety of experimental settings and demonstrate its effectiveness in seeking better subspaces on both noiseless and noisy data.

IJCAI Conference 2015 Conference Paper

Robust Dictionary Learning with Capped l1-Norm

  • Wenhao Jiang
  • Feiping Nie
  • Heng Huang

Expressing data vectors as sparse linear combinations of basis elements (dictionary) is widely used in machine learning, signal processing, and statistics. It has been found that dictionaries learned from data are more effective than off-the-shelf ones. Dictionary learning has become an important tool for computer vision. Traditional dictionary learning methods use quadratic loss function which is known sensitive to outliers. Hence they could not learn the good dictionaries when outliers exist. In this paper, aiming at learning dictionaries resistant to outliers, we proposed capped `1-norm based dictionary learning and an efficient iterative re-weighted algorithm to solve the problem. We provided theoretical analysis and carried out extensive experiments on real word datasets and synthetic datasets to show the effectiveness of our method.

AAAI Conference 2014 Conference Paper

A Convex Formulation for Semi-Supervised Multi-Label Feature Selection

  • Xiaojun Chang
  • Feiping Nie
  • Yi Yang
  • Heng Huang

Explosive growth of multimedia data has brought challenge of how to efficiently browse, retrieve and organize these data. Under this circumstance, different approaches have been proposed to facilitate multimedia analysis. Several semi-supervised feature selection algorithms have been proposed to exploit both labeled and unlabeled data. However, they are implemented based on graphs, such that they cannot handle large-scale datasets. How to conduct semi-supervised feature selection on large-scale datasets has become a challenging research problem. Moreover, existing multi-label feature selection algorithms rely on eigen-decomposition with heavy computational burden, which further prevent current feature selection algorithms from being applied for big data. In this paper, we propose a novel convex semi-supervised multi-label feature selection algorithm, which can be applied to large-scale datasets. We evaluate performance of the proposed algorithm over five benchmark datasets and compare the results with stateof-the-art supervised and semi-supervised feature selection algorithms as well as baseline using all features. The experimental results demonstrate that our proposed algorithm consistently achieve superiors performances.

NeurIPS Conference 2014 Conference Paper

Exclusive Feature Learning on Arbitrary Structures via $\ell_{1,2}$-norm

  • Deguang Kong
  • Ryohei Fujimaki
  • Ji Liu
  • Feiping Nie
  • Chris Ding

Group lasso is widely used to enforce the structural sparsity, which achieves the sparsity at inter-group level. In this paper, we propose a new formulation called ``exclusive group lasso'', which brings out sparsity at intra-group level in the context of feature selection. The proposed exclusive group lasso is applicable on any feature structures, regardless of their overlapping or non-overlapping structures. We give analysis on the properties of exclusive group lasso, and propose an effective iteratively re-weighted algorithm to solve the corresponding optimization problem with rigorous convergence analysis. We show applications of exclusive group lasso for uncorrelated feature selection. Extensive experiments on both synthetic and real-world datasets indicate the good performance of proposed methods.

AAAI Conference 2014 Conference Paper

Feature Selection at the Discrete Limit

  • Miao Zhang
  • Chris Ding
  • Ya Zhang
  • Feiping Nie

Feature selection plays an important role in many machine learning and data mining applications. In this paper, we propose to use L2, p norm for feature selection with emphasis on small p. As p → 0, feature selection becomes discrete feature selection problem. We provide two algorithms, proximal gradient algorithm and rankone update algorithm, which is more efficient at large regularization λ. We provide closed form solutions of the proximal operator at p = 0, 1/2. Experiments on real life datasets show that features selected at small p consistently outperform features selected at p = 1, the standard L2, 1 approach and other popular feature selection methods.

IJCAI Conference 2013 Conference Paper

Adaptive Loss Minimization for Semi-Supervised Elastic Embedding

  • Feiping Nie
  • Hua Wang
  • Heng Huang
  • Chris Ding

The semi-supervised learning usually only predict labels for unlabeled data appearing in training data, and cannot effectively predict labels for testing data never appearing in training set. To handle this outof-sample problem, many inductive methods make a constraint such that the predicted label matrix should be exactly equal to a linear model. In practice, this constraint is too rigid to capture the manifold structure of data. Motivated by this deficiency, we relax the rigid linear embedding constraint and propose to use an elastic embedding constraint on the predicted label matrix such that the manifold structure can be better explored. To solve our new objective and also a more general optimization problem, we study a novel adaptive loss with efficient optimization algorithm. Our new adaptive loss minimization method takes the advantages of both L1 norm and L2 norm, and is robust to the data outlier under Laplacian distribution and can efficiently learn the normal data under Gaussian distribution. Experiments have been performed on image classification tasks and our approach outperforms other state-of-the-art methods.

IJCAI Conference 2013 Conference Paper

Early Active Learning via Robust Representation and Structured Sparsity

  • Feiping Nie
  • Hua Wang
  • Heng Huang
  • Chris Ding

Labeling training data is quite time-consuming but essential for supervised learning models. To solve this problem, the active learning has been studied and applied to select the informative and representative data points for labeling. However, during the early stage of experiments, only a small number (or none) of labeled data points exist, thus the most representative samples should be selected first. In this paper, we propose a novel robust active learning method to handle the early stage experimental design problem and select the most representative data points. Selecting the representative samples is an NP-hard problem, thus we employ the structured sparsity-inducing norm to relax the objective to an efficient convex formulation. Meanwhile, the robust sparse representation loss function is utilized to reduce the effect of outliers. A new efficient optimization algorithm is introduced to solve our non-smooth objective with low computational cost and proved global convergence. Empirical results on both single-label and multi-label classification benchmark data sets show the promising results of our method.

IJCAI Conference 2013 Conference Paper

Exact Top-k Feature Selection via l2, 0-Norm Constraint

  • Xiao Cai
  • Feiping Nie
  • Heng Huang

In this paper, we propose a novel robust and pragmatic feature selection approach. Unlike those sparse learning based feature selection methods which tackle the approximate problem by imposing sparsity regularization in the objective function, the proposed method only has one `2, 1-norm loss term with an explicit `2, 0-Norm equality constraint. An efficient algorithm based on augmented Lagrangian method will be derived to solve the above constrained optimization problem to find out the stable local solution. Extensive experiments on four biological datasets show that although our proposed model is not a convex problem, it outperforms the approximate convex counterparts and state-ofart feature selection methods evaluated in terms of classification accuracy by two popular classifiers. What is more, since the regularization parameter of our method has the explicit meaning, i. e. the number of feature selected, it avoids the burden of tuning the parameter, making it a pragmatic feature selection method.

IJCAI Conference 2013 Conference Paper

Multi-View K-Means Clustering on Big Data

  • Xiao Cai
  • Feiping Nie
  • Heng Huang

In past decade, more and more data are collected from multiple sources or represented by multiple views, where different views describe distinct perspectives of the data. Although each view could be individually used for finding patterns by clustering, the clustering performance could be more accurate by exploring the rich information among multiple views. Several multi-view clustering methods have been proposed to unsupervised integrate different views of data. However, they are graph based approaches, e. g. based on spectral clustering, such that they cannot handle the large-scale data. How to combine these heterogeneous features for unsupervised large-scale data clustering has become a challenging problem. In this paper, we propose a new robust large-scale multi-view clustering method to integrate heterogeneous representations of largescale data. We evaluate the proposed new methods by six benchmark data sets and compared the performance with several commonly used clustering approaches as well as the baseline multi-view clustering methods. In all experimental results, our proposed methods consistently achieve superiors clustering performances.

IJCAI Conference 2013 Conference Paper

Social Trust Prediction Using Rank-k Matrix Recovery

  • Jin Huang
  • Feiping Nie
  • Heng Huang
  • Yu Lei
  • Chris Ding

Trust prediction, which explores the unobserved relationships between online community users, is an emerging and important research topic in social network analysis and many web applications. Similar to other social-based recommender systems, trust relationships between users can be also modeled in the form of matrices. Recent study shows users generally establish friendship due to a few latent factors, it is therefore reasonable to assume the trust matrices are of low-rank. As a result, many recommendation system strategies can be applied here. In particular, trace norm minimization, which uses matrix’s trace norm to approximate its rank, is especially appealing. However, recent articles cast doubts on the validity of trace norm approximation. In this paper, instead of using trace norm minimization, we propose a new robust rank-k matrix completion method, which explicitly seeks a matrix with exact rank. Moreover, our method is robust to noise or corrupted observations. We optimize the new objective function in an alternative manner, based on a combination of ancillary variables and Augmented Lagrangian Multiplier (ALM) Method. We perform the experiments on three real-world data sets and all empirical results demonstrate the effectiveness of our method.

AAAI Conference 2013 Conference Paper

Spectral Rotation versus K-Means in Spectral Clustering

  • Jin Huang
  • Feiping Nie
  • Heng Huang

Spectral clustering has been a popular data clustering algorithm. This category of approaches often resort to other clustering methods, such as K-Means, to get the final cluster. The potential flaw of such common practice is that the obtained relaxed continuous spectral solution could severely deviate from the true discrete solution. In this paper, we propose to impose an additional orthonormal constraint to better approximate the optimal continuous solution to the graph cut objective functions. Such a method, called spectral rotation in literature, optimizes the spectral clustering objective functions better than K-Means, and improves the clustering accuracy. We would provide efficient algorithm to solve the new problem rigorously, which is not significantly more costly than K-Means. We also establish the connection between our method and K-Means to provide theoretical motivation of our method. Experimental results show that our algorithm consistently reaches better cut and meanwhile outperforms in clustering metrics than classic spectral clustering methods.

AAAI Conference 2013 Conference Paper

Supervised and Projected Sparse Coding for Image Classification

  • Jin Huang
  • Feiping Nie
  • Heng Huang
  • Chris Ding

Classic sparse representation for classification (SRC) method fails to incorporate the label information of training images, and meanwhile has a poor scalability due to the expensive computation for `1 norm. In this paper, we propose a novel subspace sparse coding method with utilizing label information to effectively classify the images in the subspace. Our new approach unifies the tasks of dimension reduction and supervised sparse vector learning, by simultaneously preserving the data sparse structure and meanwhile seeking the optimal projection direction in the training stage, therefore accelerates the classification process in the test stage. Our method achieves both flat and structured sparsity for the vector representations, therefore making our framework more discriminative during the subspace learning and subsequent classification. The empirical results on 4 benchmark data sets demonstrate the effectiveness of our method.

IJCAI Conference 2013 Conference Paper

Thinking of Images as What They Are: Compound Matrix Regression for Image Classification

  • Zhigang Ma
  • Yi Yang
  • Feiping Nie
  • Nicu Sebe

In this paper, we propose a new classification framework for image matrices. The approach is realized by learning two groups of classification vectors for each dimension of the image matrices. One novelty is that we utilize compound regression models in the learning process, which endows the algorithm increased degree of freedom. On top of that, we extend the two-dimensional classification method to a semi-supervised classifier which leverages both labeled and unlabeled data. A fast iterative solution is then proposed to solve the objective function. The proposed method is evaluated by several different applications. The experimental results show that our method outperforms several classification approaches. In addition, we observe that our method attains respectable classification performance even when only few labeled training samples are provided. This advantage is especially desirable for real-world problems since precisely annotated images are scarce.

NeurIPS Conference 2012 Conference Paper

Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

  • Dijun Luo
  • Heng Huang
  • Feiping Nie
  • Chris Ding

In many graph-based machine learning and data mining approaches, the quality of the graph is critical. However, in real-world applications, especially in semi-supervised learning and unsupervised learning, the evaluation of the quality of a graph is often expensive and sometimes even impossible, due the cost or the unavailability of ground truth. In this paper, we proposed a robust approach with convex optimization to ``forge'' a graph: with an input of a graph, to learn a graph with higher quality. Our major concern is that an ideal graph shall satisfy all the following constraints: non-negative, symmetric, low rank, and positive semidefinite. We develop a graph learning algorithm by solving a convex optimization problem and further develop an efficient optimization to obtain global optimal solutions with theoretical guarantees. With only one non-sensitive parameter, our method is shown by experimental results to be robust and achieve higher accuracy in semi-supervised learning and clustering under various settings. As a preprocessing of graphs, our method has a wide range of potential applications machine learning and data mining.

NeurIPS Conference 2012 Conference Paper

High-Order Multi-Task Feature Learning to Identify Longitudinal Phenotypic Markers for Alzheimer's Disease Progression Prediction

  • Hua Wang
  • Feiping Nie
  • Heng Huang
  • Jingwen Yan
  • Sungeun Kim
  • Shannon Risacher
  • Andrew Saykin
  • Li Shen

Alzheimer disease (AD) is a neurodegenerative disorder characterized by progressive impairment of memory and other cognitive functions. Regression analysis has been studied to relate neuroimaging measures to cognitive status. However, whether these measures have further predictive power to infer a trajectory of cognitive performance over time is still an under-explored but important topic in AD research. We propose a novel high-order multi-task learning model to address this issue. The proposed model explores the temporal correlations existing in data features and regression tasks by the structured sparsity-inducing norms. In addition, the sparsity of the model enables the selection of a small number of MRI measures while maintaining high prediction accuracy. The empirical studies, using the baseline MRI and serial cognitive data of the ADNI cohort, have yielded promising results.

AAAI Conference 2012 Conference Paper

Low-Rank Matrix Recovery via Efficient Schatten p-Norm Minimization

  • Feiping Nie
  • Heng Huang
  • Chris Ding

As an emerging machine learning and information retrieval technique, the matrix completion has been successfully applied to solve many scientific applications, such as collaborative prediction in information retrieval, video completion in computer vision, etc. The matrix completion is to recover a low-rank matrix with a fraction of its entries arbitrarily corrupted. Instead of solving the popularly used trace norm or nuclear norm based objective, we directly minimize the original formulations of trace norm and rank norm. We propose a novel Schatten p-Norm optimization framework that unifies different norm formulations. An efficient algorithm is derived to solve the new objective and followed by the rigorous theoretical proof on the convergence. The previous main solution strategy for this problem requires computing singular value decompositions - a task that requires increasingly cost as matrix sizes and rank increase. Our algorithm has closed form solution in each iteration, hence it converges fast. As a consequence, our algorithm has the capacity of solving large-scale matrix completion problems. Empirical studies on the recommendation system data sets demonstrate the promising performance of our new optimization framework and efficient algorithm.

IJCAI Conference 2011 Conference Paper

Fast Nonnegative Matrix Tri-Factorization for Large-Scale Data Co-Clustering

  • Hua Wang
  • Feiping Nie
  • Heng Huang
  • Fillia Makedon

NonnegativeMatrix Factorization (NMF) based coclustering methods have attracted increasing attention in recent years because of their mathematical elegance and encouraging empirical results. However, the algorithms to solve NMF problems usually involve intensive matrix multiplications, which make them computationally inefficient. In this paper, instead of constraining the factor matrices of NMF to be nonnegative as existing methods, we propose a novel Fast Nonnegative Matrix Trifactorization (FNMTF) approach to constrain them to be cluster indicator matrices, a special type of nonnegative matrices. As a result, the optimization problem of our approach can be decoupled, which results in much smaller size subproblems requiring much less matrix multiplications, such that our approach works well for large-scale input data. Moreover, the resulted factor matrices can directly assign cluster labels to data points and features due to the nature of indicator matrices. In addition, through exploiting the manifold structures in both data and feature spaces, we further introduce the Locality Preserved FNMTF (LP-FNMTF) approach, by which the clustering performance is improved. The promising results in extensive experimental evaluations validate the effectiveness of the proposed methods.

IJCAI Conference 2011 Conference Paper

Feature Selection via Joint Embedding Learning and Sparse Regression

  • Chenping Hou
  • Feiping Nie
  • Dongyun Yi
  • Yi Wu

The problem of feature selection has aroused considerable research interests in the past few years. Traditional learning based feature selection methods separate embedding learning and feature ranking. In this paper, we introduce a novel unsupervised feature selection approach via Joint Embedding Learning and Sparse Regression (JELSR). Instead of simply employing the graph laplacian for embedding learning and then regression, we use the weight via locally linear approximation to construct graph and unify embedding learning and sparse regression to perform feature selection. By adding the l2, 1-norm regularization, we can learn a sparse matrix for feature ranking. We also provide an effective method to solve the proposed problem. Compared with traditional unsupervised feature selection methods, our approach could integrate the merits of embedding learning and sparse regression simultaneously. Plenty of experimental results are provided to show the validity.

AAAI Conference 2011 Conference Paper

Learning Instance Specific Distance for Multi-Instance Classification

  • Hua Wang
  • Feiping Nie
  • Heng Huang

Multi-Instance Learning (MIL) deals with problems where each training example is a bag, and each bag contains a set of instances. Multi-instance representation is useful in many real world applications, because it is able to capture more structural information than traditional flat single-instance representation. However, it also brings new challenges. Specifically, the distance between data objects in MIL is a set-to-set distance, which is harder to estimate than vector distances used in single-instance data. Moreover, because in MIL labels are assigned to bags instead of instances, although a bag belongs to a class, some, or even most, of its instances may not be truly related to the class. In order to address these difficulties, in this paper we propose a novel Instance Specific Distance (ISD) method for MIL, which computes the Class-to-Bag (C2B) distance by further considering the relevances of training instances with respect to their labeled classes. Taking into account the outliers caused by the weak label association in MIL, we learn ISD by solving an 0+ -norm minimization problem. An efficient algorithm to solve the optimization problem is presented, together with the rigorous proof of its convergence. The promising results on five benchmark multi-instance data sets and two real world multi-instance applications validate the effectiveness of the proposed method.

NeurIPS Conference 2011 Conference Paper

Maximum Margin Multi-Instance Learning

  • Hua Wang
  • Heng Huang
  • Farhad Kamangar
  • Feiping Nie
  • Chris Ding

Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.

AAAI Conference 2011 Conference Paper

Nonnegative Spectral Clustering with Discriminative Regularization

  • Yi Yang
  • Heng Shen
  • Feiping Nie
  • Rongrong Ji
  • Xiaofang Zhou

Clustering is a fundamental research topic in the field of data mining. Optimizing the objective functions of clustering algorithms, e. g. normalized cut and k-means, is an NP-hard optimization problem. Existing algorithms usually relax the elements of cluster indicator matrix from discrete values to continuous ones. Eigenvalue decomposition is then performed to obtain a relaxed continuous solution, which must be discretized. The main problem is that the signs of the relaxed continuous solution are mixed. Such results may deviate severely from the true solution, making it a nontrivial task to get the cluster labels. To address the problem, we impose an explicit nonnegative constraint for a more accurate solution during the relaxation. Besides, we additionally introduce a discriminative regularization into the objective to avoid overfitting. A new iterative approach is proposed to optimize the objective. We show that the algorithm is a general one which naturally leads to other extensions. Experiments demonstrate the effectiveness of our algorithm.

NeurIPS Conference 2010 Conference Paper

Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization

  • Feiping Nie
  • Heng Huang
  • Xiao Cai
  • Chris Ding

Feature selection is an important component of many machine learning applications. Especially in many bioinformatics tasks, efficient and robust feature selection methods are desired to extract meaningful features and eliminate noisy ones. In this paper, we propose a new robust feature selection method with emphasizing joint ℓ2, 1-norm minimization on both loss function and regularization. The ℓ2, 1-norm based loss function is robust to outliers in data points and the ℓ2, 1-norm regularization selects features across all data points with joint sparsity. An efficient algorithm is introduced with proved convergence. Our regression based objective makes the feature selection process more efficient. Our method has been applied into both genomic and proteomic biomarkers discovery. Extensive empirical studies were performed on six data sets to demonstrate the effectiveness of our feature selection method.

AAAI Conference 2010 Conference Paper

Local and Global Regressive Mapping for Manifold Learning with Out-of-Sample Extrapolation

  • Yi Yang
  • Feiping Nie
  • Shiming Xiang
  • Yueting Zhuang
  • Wenhua Wang

Over the past few years, a large family of manifold learning algorithms have been proposed, and applied to various applications. While designing new manifold learning algorithms has attracted much research attention, fewer research efforts have been focused on out-ofsample extrapolation of learned manifold. In this paper, we propose a novel algorithm of manifold learning. The proposed algorithm, namely Local and Global Regressive Mapping (LGRM), employs local regression models to grasp the manifold structure. We additionally impose a global regression term as regularization to learn a model for out-of-sample data extrapolation. Based on the algorithm, we propose a new manifold learning framework. Our framework can be applied to any manifold learning algorithms to simultaneously learn the low dimensional embedding of the training data and a model which provides explicit mapping of the outof-sample data to the learned manifold. Experiments demonstrate that the proposed framework uncover the manifold structure precisely and can be freely applied to unseen data.

IJCAI Conference 2009 Conference Paper

  • Feiping Nie
  • Dong Xu
  • Ivor W. Tsang
  • Changshui Zhang

In this paper, we propose a new spectral clustering method, referred to as Spectral Embedded Clustering (SEC), to minimize the normalized cut criterion in spectral clustering as well as control the mismatch between the cluster assignment matrix and the low dimensional embedded representation of the data. SEC is based on the observation that the cluster assignment matrix of high dimensional data can be represented by a low dimensional linear mapping of data. We also discover the connection between SEC and other clustering methods, such as spectral clustering, Clustering with local and global regularization, K-means and Discriminative K-means. The experiments on many realworld data sets show that SEC significantly outperforms the existing spectral clustering methods as well as K-means clustering related methods.

AAAI Conference 2008 Conference Paper

Trace Ratio Criterion for Feature Selection

  • Feiping Nie
  • Shiming Xiang
  • Changshui Zhang

Fisher score and Laplacian score are two popular feature selection algorithms, both of which belong to the general graph-based feature selection framework. In this framework, a feature subset is selected based on the corresponding score (subset-level score), which is calculated in a trace ratio form. Since the number of all possible feature subsets is very huge, it is often prohibitively expensive in computational cost to search in a brute force manner for the feature subset with the maximum subset-level score. Instead of calculating the scores of all the feature subsets, traditional methods calculate the score for each feature, and then select the leading features based on the rank of these feature-level scores. However, selecting the feature subset based on the feature-level score cannot guarantee the optimum of the subset-level score. In this paper, we directly optimize the subset-level score, and propose a novel algorithm to efficiently find the global optimal feature subset such that the subset-level score is maximized. Extensive experiments demonstrate the effectiveness of our proposed algorithm in comparison with the traditional methods for feature selection.

IJCAI Conference 2007 Conference Paper

  • Feiping Nie
  • Shiming Xiang
  • Changshui Zhang

A new algorithm, Neighborhood MinMax Projections(NMMP), is proposed for supervised dimensionality reduction in this paper. The algorithm aims at learning a linear transformation, and focuses only on the pairwise points where the two points are neighbors of each other. After the transformation, the considered pairwise points within the same class are as close as possible, while those between different classes are as far as possible. We formulate this problem as a constrained optimization problem, in which the global optimum can be effectively and efficiently obtained. Compared with the popular supervised method, Linear Discriminant Analysis(LDA), our method has three significant advantages. First, it is able to extract more discriminative features. Second, it can deal with the case where the class distributions are more complex than Gaussian. Third, the singularity problem existing in LDA does not occur naturally. The performance on several data sets demonstrates the effectiveness of the proposed method.