Arrow Research search

Author name cluster

Xiaofei He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

47 papers
1 author row

Possible papers

47

AAAI Conference 2026 Conference Paper

FGD-Align: Pluralistic Alignment for Large Language Models via Fuzzy Group Decision-Making

  • Weihang Pan
  • Zhengxu Yu
  • Yong Wu
  • Xun Liang
  • Zhongming Jin
  • Qiang Fu
  • Penghui Shang
  • Binbin Lin

Ensuring alignment with human values is essential for modern large language models (LLMs), especially amid growing concerns around AI safety and social impact. Yet achieving such alignment remains challenging due to the limited, noisy, and often conflicting nature of human feedback from diverse annotators. Most existing approaches, such as Direct Preference Optimization (DPO), assume consistent and conflict-free supervision, overlooking the ambiguity, inconsistency, and value trade-offs inherent in real-world preferences—often leading to reduced robustness and exclusion of minority views. To address this, we propose FGD-Align, a novel pluralistic alignment framework grounded in Fuzzy Group Decision-Making theory. Our approach rigorously models and aggregates human preferences while retaining the complexity of real-world value trade-offs. Unlike traditional methods that rely on coarse-grained preference pairs, FGD-Align introduces fuzzy preference modeling via triangular fuzzy numbers to capture nuanced, multi-criteria human judgments. We further develop a new training objective, Probabilistic Fuzzy DPO, which incorporates fuzzy preference strength as adaptive loss weights and gradient filters, enhancing robustness to ambiguity and inconsistency in feedback. Comprehensive experiments demonstrate that FGD-Align consistently outperforms both DPO variants and advanced preference aggregation methods in terms of preference accuracy and robustness to ambiguity. It achieves superior alignment stability and better preserves minority preferences, all with minimal computational overhead. Our work bridges the gap between algorithmic tractability and the nuanced landscape of human values, enabling more scalable, inclusive, and socially-aware AI alignment.

AAAI Conference 2025 Conference Paper

Local Conditional Controlling for Text-to-Image Diffusion Models

  • Yibo Zhao
  • Liang Peng
  • Yang Yang
  • Zekai Luo
  • Hengjia Li
  • Yao Chen
  • Zheng Yang
  • Xiaofei He

Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level structure controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired images. This controlling process is globally operated on the entire image, which limits the flexibility of control regions. In this paper, we explore a novel and practical task setting: local control. It focuses on controlling specific local region according to user-defined image conditions, while the remaining regions are only conditioned by the original text prompt. However, it is non-trivial to achieve it. The naive manner of directly adding local conditions may lead to the local control dominance problem, which forces the model to focus on the controlled region and neglect object generation in other regions. To mitigate this problem, we propose Regional Discriminate Loss to update the noised latents, aiming at enhanced object generation in non-control regions. Furthermore, the proposed Focused Token Response suppresses weaker attention scores which lack the strongest response to enhance object distinction and reduce duplication. Lastly, we adopt Feature Mask Constraint to reduce quality degradation in images caused by information differences across the local control region. All proposed strategies are operated at the inference stage. Extensive experiments demonstrate that our method can synthesize high-quality images aligned with the text prompt under local control conditions.

NeurIPS Conference 2025 Conference Paper

Self-Supervised Direct Preference Optimization for Text-to-Image Diffusion Models

  • Liang Peng
  • Boxi Wu
  • Haoran Cheng
  • Yibo Zhao
  • Xiaofei He

Direct preference optimization (DPO) is an effective method for aligning generative models with human preferences and has been successfully applied to fine‑tune text‑to‑image diffusion models. Its practical adoption, however, is hindered by a labor‑intensive pipeline that first produces a large set of candidate images and then requires humans to rank them pairwise. We address this bottleneck with self‑supervised direct preference optimization, a new paradigm that removes the need for any pre‑generated images or manual ranking. During training, we create preference pairs on the fly through self‑supervised image transformations, allowing the model to learn from fresh and diverse comparisons at every iteration. This online strategy eliminates costly data collection and annotation while remaining plug‑and‑play for any text‑to‑image diffusion method. Surprisingly, the on‑the‑fly pairs produced by the proposed method not only match but exceed the effectiveness of conventional DPO, which we attribute to the greater diversity of preferences sampled during training. Extensive experiments with Stable Diffusion 1. 5 and Stable Diffusion XL confirm that our method delivers substantial gains.

NeurIPS Conference 2024 Conference Paper

AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning

  • Minghao Chen
  • Yihang Li
  • Yanting Yang
  • Shiyu Yu
  • Binbin Lin
  • Xiaofei He

Large Language Models (LLM) based agents have shown promise in autonomously completing tasks across various domains, e. g. , robotics, games, and web navigation. However, these agents typically require elaborate design and expert prompts to solve tasks in specific domains, which limits their adaptability. We introduce AutoManual, a framework enabling LLM agents to autonomously build their understanding through interaction and adapt to new environments. AutoManual categorizes environmental knowledge into diverse rules and optimizes them in an online fashion by two agents: 1) The Planner codes actionable plans based on current rules for interacting with the environment. 2) The Builder updates the rules through a well-structured rule system that facilitates online rule management and essential detail retention. To mitigate hallucinations in managing rules, we introduce a case-conditioned prompting strategy for the Builder. Finally, the Formulator agent compiles these rules into a comprehensive manual. The self-generated manual can not only improve the adaptability but also guide the planning of smaller LLMs while being human-readable. Given only one simple demonstration, AutoManual significantly improves task success rates, achieving 97. 4\% with GPT-4-turbo and 86. 2\% with GPT-3. 5-turbo on ALFWorld benchmark tasks. The code is available at https: //github. com/minghchen/automanual.

NeurIPS Conference 2024 Conference Paper

EMVP: Embracing Visual Foundation Model for Visual Place Recognition with Centroid-Free Probing

  • Qibo Qiu
  • Shun Zhang
  • Haiming Gao
  • Honghui Yang
  • Haochao Ying
  • Wenxiao Wang
  • Xiaofei He

Visual Place Recognition (VPR) is essential for mobile robots as it enables them to retrieve images from a database closest to their current location. The progress of Visual Foundation Models (VFMs) has significantly advanced VPR by capturing representative descriptors in images. However, existing fine-tuning efforts for VFMs often overlook the crucial role of probing in effectively adapting these descriptors for improved image representation. In this paper, we propose the Centroid-Free Probing (CFP) stage, making novel use of second-order features for more effective use of descriptors from VFMs. Moreover, to control the preservation of task-specific information adaptively based on the context of the VPR, we introduce the Dynamic Power Normalization (DPN) module in both the recalibration and CFP stages, forming a novel Parameter Efficiency Fine-Tuning (PEFT) pipeline (EMVP) tailored for the VPR task. Extensive experiments demonstrate the superiority of the proposed CFP over existing probing methods. Moreover, the EMVP pipeline can further enhance fine-tuning performance in terms of accuracy and efficiency. Specifically, it achieves 93. 9\%, 96. 5\%, and 94. 6\% Recall@1 on the MSLS Validation, Pitts250k-test, and SPED datasets, respectively, while saving 64. 3\% of trainable parameters compared with the existing SOTA PEFT method.

NeurIPS Conference 2024 Conference Paper

Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control

  • Yuxin Xiao
  • Chaoqun Wan
  • Yonggang Zhang
  • Wenxiao Wang
  • Binbin Lin
  • Xiaofei He
  • Xu Shen
  • Jieping Ye

As the development and application of Large Language Models (LLMs) continue to advance rapidly, enhancing their trustworthiness and aligning them with human preferences has become a critical area of research. Traditional methods rely heavily on extensive data for Reinforcement Learning from Human Feedback (RLHF), but representation engineering offers a new, training-free approach. This technique leverages semantic features to control the representation of LLM's intermediate hidden states, enabling the model to meet specific requirements such as increased honesty or heightened safety awareness. However, a significant challenge arises when attempting to fulfill multiple requirements simultaneously. It proves difficult to encode various semantic contents, like honesty and safety, into a singular semantic feature, restricting its practicality. In this work, we address this challenge through Sparse Activation Control. By delving into the intrinsic mechanisms of LLMs, we manage to identify and pinpoint modules that are closely related to specific tasks within the model, i. e. attention heads. These heads display sparse characteristics that allow for near-independent control over different tasks. Our experiments, conducted on the open-source Llama series models, have yielded encouraging results. The models were able to align with human preferences on issues of safety, factualness, and bias concurrently.

IJCAI Conference 2024 Conference Paper

Task-Agnostic Self-Distillation for Few-Shot Action Recognition

  • Bin Zhang
  • Yuanjie Dang
  • Peng Chen
  • Ronghua Liang
  • Nan Gao
  • Ruohong Huan
  • Xiaofei He

Task-oriented matching is one of the core aspects of few-shot Action Recognition. Most previous works leverage the metric features within the support and query sets of individual tasks, without considering the metric information across different matching tasks. This oversight represents a significant limitation in this task. Specifically, the task-specific metric feature can decrease the generalization ability and ignore the general matching feature applicable across different tasks. To address these challenges, we propose a novel meta-distillation framework for few-shot action recognition that learns the task-agnostic metric features and generalizes them to different tasks. First, to extract the task-agnostic metric information, we design a task-based self-distillation framework to learn the metric features from the training process progressively. Additionally, to enable the model with fine-grained matching capabilities, we design a multi-dimensional distillation module that extracts more detailed relations from the temporal, spatial, and channel dimensions within video pairs and improves the representative performance of metric features for each individual task. After that, the few-shot predictions can be obtained by feeding the embedded task-agnostic metric features to a common feature matcher. Extensive experimental results on standard datasets demonstrate our method’s superior performance compared to existing state-of-the-art methods.

AAAI Conference 2023 Conference Paper

Towards In-Distribution Compatible Out-of-Distribution Detection

  • Boxi Wu
  • Jie Jiang
  • Haidong Ren
  • Zifan Du
  • Wenxiao Wang
  • Zhifeng Li
  • Deng Cai
  • Xiaofei He

Deep neural network, despite its remarkable capability of discriminating targeted in-distribution samples, shows poor performance on detecting anomalous out-of-distribution data. To address this defect, state-of-the-art solutions choose to train deep networks on an auxiliary dataset of outliers. Various training criteria for these auxiliary outliers are proposed based on heuristic intuitions. However, we find that these intuitively designed outlier training criteria can hurt in-distribution learning and eventually lead to inferior performance. To this end, we identify three causes of the in-distribution incompatibility: contradictory gradient, false likelihood, and distribution shift. Based on our new understandings, we propose a new out-of-distribution detection method by adapting both the top-design of deep models and the loss function. Our method achieves in-distribution compatibility by pursuing less interference with the probabilistic characteristic of in-distribution features. On several benchmarks, our method not only achieves the state-of-the-art out-of-distribution detection performance but also improves the in-distribution accuracy.

AAAI Conference 2022 Conference Paper

DMN4: Few-Shot Learning via Discriminative Mutual Nearest Neighbor Neural Network

  • Yang Liu
  • Tu Zheng
  • Jie Song
  • Deng Cai
  • Xiaofei He

Few-shot learning (FSL) aims to classify images under lowdata regimes, where the conventional pooled global feature is likely to lose useful local characteristics. Recent work has achieved promising performances by using deep descriptors. They generally take all deep descriptors from neural networks into consideration while ignoring that some of them are useless in classification due to their limited receptive field, e. g. , task-irrelevant descriptors could be misleading and multiple aggregative descriptors from background clutter could even overwhelm the object’s presence. In this paper, we argue that a Mutual Nearest Neighbor (MNN) relation should be established to explicitly select the query descriptors that are most relevant to each task and discard less relevant ones from aggregative clutters in FSL. Specifically, we propose Discriminative Mutual Nearest Neighbor Neural Network (DMN4) for FSL. Extensive experiments demonstrate that our method outperforms the existing state-of-the-arts on both fine-grained and generalized datasets.

NeurIPS Conference 2021 Conference Paper

Do Wider Neural Networks Really Help Adversarial Robustness?

  • Boxi Wu
  • Jinghui Chen
  • Deng Cai
  • Xiaofei He
  • Quanquan Gu

Adversarial training is a powerful type of defense against adversarial examples. Previous empirical results suggest that adversarial training requires wider networks for better performances. However, it remains elusive how does neural network width affect model robustness. In this paper, we carefully examine the relationship between network width and model robustness. Specifically, we show that the model robustness is closely related to the tradeoff between natural accuracy and perturbation stability, which is controlled by the robust regularization parameter λ. With the same λ, wider networks can achieve better natural accuracy but worse perturbation stability, leading to a potentially worse overall model robustness. To understand the origin of this phenomenon, we further relate the perturbation stability with the network's local Lipschitzness. By leveraging recent results on neural tangent kernels, we theoretically show that wider networks tend to have worse perturbation stability. Our analyses suggest that: 1) the common strategy of first fine-tuning λ on small networks and then directly use it for wide model training could lead to deteriorated model robustness; 2) one needs to properly enlarge λ to unleash the robustness potential of wider models fully. Finally, we propose a new Width Adjusted Regularization (WAR) method that adaptively enlarges λ on wide models and significantly saves the tuning time.

IJCAI Conference 2020 Conference Paper

MaCAR: Urban Traffic Light Control via Active Multi-agent Communication and Action Rectification

  • Zhengxu Yu
  • Shuxian Liang
  • Long Wei
  • Zhongming Jin
  • Jianqiang Huang
  • Deng Cai
  • Xiaofei He
  • Xian-Sheng Hua

Urban traffic light control is an important and challenging real-world problem. By regarding intersections as agents, most of the Reinforcement Learning (RL) based methods generate actions of agents independently. They can cause action conflict and result in overflow or road resource waste in adjacent intersections. Recently, some collaborative methods have alleviated the above problems by extending the observable surroundings of agents, which can be considered as inactive cross-agent communication methods. However, when agents act synchronously in these works, the perceived action value is biased and the information exchanged is insufficient. In this work, we propose a novel Multi-agent Communication and Action Rectification (MaCAR) framework. It enables active communication between agents by considering the impact of synchronous actions of agents. MaCAR consists of two parts: (1) an active Communication Agent Network (CAN) involving a Message Propagation Graph Neural Network (MPGNN); (2) a Traffic Forecasting Network (TFN) which learns to predict the traffic after agents' synchronous actions and the corresponding action values. By using predicted information, we mitigate the action value bias during training to help rectify agents' future actions. In experiments, we show that our proposal can outperforms state-of-the-art methods on both synthetic and real-world datasets.

AAAI Conference 2020 Conference Paper

PI-RCNN: An Efficient Multi-Sensor 3D Object Detector with Point-Based Attentive Cont-Conv Fusion Module

  • Liang Xie
  • Chao Xiang
  • Zhengxu Yu
  • Guodong Xu
  • Zheng Yang
  • Deng Cai
  • Xiaofei He

LIDAR point clouds and RGB-images are both extremely essential for 3D object detection. So many state-of-the-art 3D detection algorithms dedicate in fusing these two types of data effectively. However, their fusion methods based on Bird’s Eye View (BEV) or voxel format are not accurate. In this paper, we propose a novel fusion approach named Pointbased Attentive Cont-conv Fusion(PACF) module, which fuses multi-sensor features directly on 3D points. Except for continuous convolution, we additionally add a Point-Pooling and an Attentive Aggregation to make the fused features more expressive. Moreover, based on the PACF module, we propose a 3D multi-sensor multi-task network called Pointcloud- Image RCNN(PI-RCNN as brief), which handles the image segmentation and 3D object detection tasks. PI-RCNN employs a segmentation sub-network to extract full-resolution semantic feature maps from images and then fuses the multisensor features via powerful PACF module. Beneficial from the effectiveness of the PACF module and the expressive semantic features from the segmentation module, PI-RCNN can improve much in 3D object detection. We demonstrate the effectiveness of the PACF module and PI-RCNN on the KITTI 3D Detection benchmark, and our method can achieve stateof-the-art on the metric of 3D AP.

IJCAI Conference 2019 Conference Paper

COP: Customized Deep Model Compression via Regularized Correlation-Based Filter-Level Pruning

  • Wenxiao Wang
  • Cong Fu
  • Jishun Guo
  • Deng Cai
  • Xiaofei He

Neural network compression empowers the effective yet unwieldy deep convolutional neural networks (CNN) to be deployed in resource-constrained scenarios. Most state-of-the-art approaches prune the model in filter-level according to the "importance" of filters. Despite their success, we notice they suffer from at least two of the following problems: 1) The redundancy among filters is not considered because the importance is evaluated independently. 2) Cross-layer filter comparison is unachievable since the importance is defined locally within each layer. Consequently, we must manually specify layer-wise pruning ratios. 3) They are prone to generate sub-optimal solutions because they neglect the inequality between reducing parameters and reducing computational cost. Reducing the same number of parameters in different positions in the network may reduce different computational cost. To address the above problems, we develop a novel algorithm named as COP (correlation-based pruning), which can detect the redundant filters efficiently. We enable the cross-layer filter comparison through global normalization. We add parameter-quantity and computational-cost regularization terms to the importance, which enables the users to customize the compression according to their preference (smaller or faster). Extensive experiments have shown COP outperforms the others significantly. The code is released at https: //github. com/ZJULearning/COP.

IJCAI Conference 2019 Conference Paper

Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks

  • Zhu Zhang
  • Zhou Zhao
  • Zhijie Lin
  • Jingkuan Song
  • Xiaofei He

Open-ended video question answering aims to automatically generate the natural-language answer from referenced video contents according to the given question. Currently, most existing approaches focus on short-form video question answering with multi-modal recurrent encoder-decoder networks. Although these works have achieved promising performance, they may still be ineffectively applied to long-form video question answering due to the lack of long-range dependency modeling and the suffering from the heavy computational cost. To tackle these problems, we propose a fast hierarchical convolutional self-attention encoder-decoder network. Concretely, we first develop a hierarchical convolutional self-attention encoder to efficiently model long-form video contents, which builds the hierarchical structure for video sequences and captures question-aware long-range dependencies from video context. We then devise a multi-scale attentive decoder to incorporate multi-layer video representations for answer generation, which avoids the information missing of the top encoder layer. The extensive experiments show the effectiveness and efficiency of our method.

IJCAI Conference 2019 Conference Paper

Progressive Transfer Learning for Person Re-identification

  • Zhengxu Yu
  • Zhongming Jin
  • Long Wei
  • Jishun Guo
  • Jianqiang Huang
  • Deng Cai
  • Xiaofei He
  • Xian-Sheng Hua

Model fine-tuning is a widely used transfer learning approach in person Re-identification (ReID) applications, which fine-tuning a pre-trained feature extraction model into the target scenario instead of training a model from scratch. It is challenging due to the significant variations inside the target scenario, e. g. , different camera viewpoint, illumination changes, and occlusion. These variations result in a gap between the distribution of each mini-batch and the distribution of the whole dataset when using mini-batch training. In this paper, we study model fine-tuning from the perspective of the aggregation and utilization of the global information of the dataset when using mini-batch training. Specifically, we introduce a novel network structure called Batch-related Convolutional Cell (BConv-Cell), which progressively collects the global information of the dataset into a latent state and uses this latent state to rectify the extracted feature. Based on BConv-Cells, we further proposed the Progressive Transfer Learning (PTL) method to facilitate the model fine-tuning process by joint training the BConv-Cells and the pre-trained ReID model. Empirical experiments show that our proposal can improve the performance of the ReID model greatly on MSMT17, Market-1501, CUHK03 and DukeMTMC-reID datasets. The code will be released later on at \url{https: //github. com/ZJULearning/PTL}

JMLR Journal 2019 Journal Article

Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction

  • Bin Hong
  • Weizhong Zhang
  • Wei Liu
  • Jieping Ye
  • Deng Cai
  • Xiaofei He
  • Jie Wang

Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and ultra-high dimensional features, solving sparse SVMs remains challenging. By noting that sparse SVMs induce sparsities in both feature and sample spaces, we propose a novel approach, which is based on accurate estimations of the primal and dual optima of sparse SVMs, to simultaneously identify the inactive features and samples that are guaranteed to be irrelevant to the outputs. Thus, we can remove the identified inactive samples and features from the training phase, leading to substantial savings in the computational cost without sacrificing the accuracy. Moreover, we show that our method can be extended to multi-class sparse support vector machines. To the best of our knowledge, the proposed method is the first static feature and sample reduction method for sparse SVMs and multi-class sparse SVMs. Experiments on both synthetic and real data sets demonstrate that our approach significantly outperforms state-of-the-art methods and the speedup gained by our approach can be orders of magnitude. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

IJCAI Conference 2018 Conference Paper

Attentional Image Retweet Modeling via Multi-Faceted Ranking Network Learning

  • Zhou Zhao
  • Lingtao Meng
  • Jun Xiao
  • Min Yang
  • Fei Wu
  • Deng Cai
  • Xiaofei He
  • Yueting Zhuang

Retweet prediction is a challenging problem in social media sites (SMS). In this paper, we study the problem of image retweet prediction in social media, which predicts the image sharing behavior that the user reposts the image tweets from their followees. Unlike previous studies, we learn user preference ranking model from their past retweeted image tweets in SMS. We first propose heterogeneous image retweet modeling network (IRM) that exploits users' past retweeted image tweets with associated contexts, their following relations in SMS and preference of their followees. We then develop a novel attentional multi-faceted ranking network learning framework with multi-modal neural networks for the proposed heterogenous IRM network to learn the joint image tweet representations and user preference representations for prediction task. The extensive experiments on a large-scale dataset from Twitter site shows that our method achieves better performance than other state-of-the-art solutions to the problem.

NeurIPS Conference 2018 Conference Paper

MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models

  • Boyuan Pan
  • Yazheng Yang
  • Hao Li
  • Zhou Zhao
  • Yueting Zhuang
  • Deng Cai
  • Xiaofei He

Machine Comprehension (MC) is one of the core problems in natural language processing, requiring both understanding of the natural language and knowledge about the world. Rapid progress has been made since the release of several benchmark datasets, and recently the state-of-the-art models even surpass human performance on the well-known SQuAD evaluation. In this paper, we transfer knowledge learned from machine comprehension to the sequence-to-sequence tasks to deepen the understanding of the text. We propose MacNet: a novel encoder-decoder supplementary architecture to the widely used attention-based sequence-to-sequence models. Experiments on neural machine translation (NMT) and abstractive text summarization show that our proposed framework can significantly improve the performance of the baseline models, and our method for the abstractive text summarization achieves the state-of-the-art results on the Gigaword dataset.

IJCAI Conference 2018 Conference Paper

Multi-Turn Video Question Answering via Multi-Stream Hierarchical Attention Context Network

  • Zhou Zhao
  • Xinghua Jiang
  • Deng Cai
  • Jun Xiao
  • Xiaofei He
  • Shiliang Pu

Conversational video question answering is a challenging task in visual information retrieval, which generates the accurate answer from the referenced video contents according to the visual conversation context and given question. However, the existing visual question answering methods mainly tackle the problem of single-turn video question answering, which may be ineffectively applied for multi-turn video question answering directly, due to the insufficiency of modeling the sequential conversation context. In this paper, we study the problem of multi-turn video question answering from the viewpoint of multi-step hierarchical attention context network learning. We first propose the hierarchical attention context network for context-aware question understanding by modeling the hierarchically sequential conversation context structure. We then develop the multi-stream spatio-temporal attention network for learning the joint representation of the dynamic video contents and context-aware question embedding. We next devise the hierarchical attention context network learning method with multi-step reasoning process for multi-turn video question answering. We construct two large-scale multi-turn video question answering datasets. The extensive experiments show the effectiveness of our method.

IJCAI Conference 2018 Conference Paper

Translating Embeddings for Knowledge Graph Completion with Relation Attention Mechanism

  • Wei Qian
  • Cong Fu
  • Yu Zhu
  • Deng Cai
  • Xiaofei He

Knowledge graph embedding is an essential problem in knowledge extraction. Recently, translation based embedding models (e. g. , TransE) have received increasingly attentions. These methods try to interpret the relations among entities as translations from head entity to tail entity and achieve promising performance on knowledge graph completion. Previous researchers attempt to transform the entity embedding concerning the given relation for distinguishability. Also, they naturally think the relation-related transforming should reflect attention mechanism, which means it should focus on only a part of the attributes. However, we found previous methods are failed with creating attention mechanism, and the reason is that they ignore the hierarchical routine of human cognition. When predicting whether a relation holds between two entities, people first check the category of entities, then they focus on fined-grained relation-related attributes to make the decision. In other words, the attention should take effect on entities filtered by the right category. In this paper, we propose a novel knowledge graph embedding method named TransAt to learn the translation based embedding, relation-related categories of entities and relation-related attention simultaneously. Extensive experiments show that our approach outperforms state-of-the-art methods significantly on public datasets, and our method can learn the true attention varying among relations.

AAAI Conference 2017 Conference Paper

Community-Based Question Answering via Asymmetric Multi-Faceted Ranking Network Learning

  • Zhou Zhao
  • Hanqing Lu
  • Vincent Zheng
  • Deng Cai
  • Xiaofei He
  • Yueting Zhuang

Nowadays the community-based question answering (CQA) sites become the popular Internet-based web service, which have accumulated millions of questions and their posted answers over time. Thus, question answering becomes an essential problem in CQA sites, which ranks the high-quality answers to the given question. Currently, most of the existing works study the problem of question answering based on the deep semantic matching model to rank the answers based on their semantic relevance, while ignoring the authority of answerers to the given question. In this paper, we consider the problem of community-based question answering from the viewpoint of asymmetric multi-faceted ranking network embedding. We propose a novel asymmetric multi-faceted ranking network learning framework for community-based question answering by jointly exploiting the deep semantic relevance between question-answer pairs and the answerers’ authority to the given question. We then develop an asymmetric ranking network learning method with deep recurrent neural networks by integrating both answers’ relative quality rank to the given question and the answerers’ following relations in CQA sites. The extensive experiments on a large-scale dataset from a real world CQA site show that our method achieves better performance than other state-of-the-art solutions to the problem.

IJCAI Conference 2017 Conference Paper

Link Prediction via Ranking Metric Dual-Level Attention Network Learning

  • Zhou Zhao
  • Ben Gao
  • Vincent W. Zheng
  • Deng Cai
  • Xiaofei He
  • Yueting Zhuang

Link prediction is a challenging problem for complex network analysis, arising in many disciplines such as social networks and telecommunication networks. Currently, many existing approaches estimate the proximity of the link endpoints for link prediction from their feature or the local neighborhood around them, which suffer from the localized view of network connections and insufficiency of discriminative feature representation. In this paper, we consider the problem of link prediction from the viewpoint of learning discriminative path-based proximity ranking metric embedding. We propose a novel ranking metric network learning framework by jointly exploiting both node-level and path-level attentional proximity of the endpoints for link prediction. We then develop the path-based dual-level reasoning attentional learning method with recurrent neural network for proximity ranking metric embedding. The extensive experiments on two large-scale datasets show that our method achieves better performance than other state-of-the-art solutions to the problem.

IJCAI Conference 2017 Conference Paper

Microblog Sentiment Classification via Recurrent Random Walk Network Learning

  • Zhou Zhao
  • Hanqing Lu
  • Deng Cai
  • Xiaofei He
  • Yueting Zhuang

Microblog Sentiment Classification (MSC) is a challenging task in microblog mining, arising in many applications such as stock price prediction and crisis management. Currently, most of the existing approaches learn the user sentiment model from their posted tweets in microblogs, which suffer from the insufficiency of discriminative tweet representation. In this paper, we consider the problem of microblog sentiment classification from the viewpoint of heterogeneous MSC network embedding. We propose a novel recurrent random walk network learning framework for the problem by exploiting both users’ posted tweets and their social relations in microblogs. We then introduce the deep recurrent neural networks with random-walk layer for heterogeneous MSC network embedding, which can be trained end-to-end from the scratch. Weemploytheback-propagationmethodfortraining the proposed recurrent random walk network model. The extensive experiments on the large-scale public datasets from Twitter show that our method achieves better performance than other state-of-the-art solutions to the problem.

IJCAI Conference 2017 Conference Paper

Robust Asymmetric Bayesian Adaptive Matrix Factorization

  • Xin Guo
  • Boyuan Pan
  • Deng Cai
  • Xiaofei He

Low rank matrix factorizations(LRMF) have attracted much attention due to its wide range of applications in computer vision, such as image impainting and video denoising. Most of the existing methods assume that the loss between an observed measurement matrix and its bilinear factorization follows symmetric distribution, like gaussian or gamma families. However, in real-world situations, this assumption is often found too idealized, because pictures under various illumination and angles may suffer from multi-peaks, asymmetric and irregular noises. To address these problems, this paper assumes that the loss follows a mixture of Asymmetric Laplace distributions and proposes robust Asymmetric Laplace Adaptive Matrix Factorization model(ALAMF) under bayesian matrix factorization framework. The assumption of Laplace distribution makes our model more robust and the asymmetric attribute makes our model more flexible and adaptable to real-world noise. A variational method is then devised for model inference. We compare ALAMF with other state-of-the-art matrix factorization methods both on data sets ranging from synthetic and real-world application. The experimental results demonstrate the effectiveness of our proposed approach.

IJCAI Conference 2017 Conference Paper

Video Question Answering via Hierarchical Spatio-Temporal Attention Networks

  • Zhou Zhao
  • Qifan Yang
  • Deng Cai
  • Xiaofei He
  • Yueting Zhuang

Open-ended video question answering is a challenging problem in visual information retrieval, which automatically generates the natural language answer from the referenced video content according to the question. However, the existing visual question answering works only focus on the static image, which may be ineffectively applied to video question answering due to the temporal dynamics of video contents. In this paper, we consider the problem of open-ended video question answering from the viewpoint of spatio-temporal attentional encoder-decoder learning framework. We propose the hierarchical spatio-temporal attention network for learning the joint representation of the dynamic video contents according to the given question. We then develop the encoder-decoder learning method with reasoning recurrent neural networks for open-ended video question answering. We construct a large-scale video question answering dataset. The extensive experiments show the effectiveness of our method.

AAAI Conference 2016 Conference Paper

Accelerated Sparse Linear Regression via Random Projection

  • Weizhong Zhang
  • Lijun Zhang
  • Rong Jin
  • Deng Cai
  • Xiaofei He

In this paper, we present an accelerated numerical method based on random projection for sparse linear regression. Previous studies have shown that under appropriate conditions, gradient-based methods enjoy a geometric convergence rate when applied to this problem. However, the time complexity of evaluating the gradient is as large as O(nd), where n is the number of data points and d is the dimensionality, making those methods inefficient for large-scale and highdimensional dataset. To address this limitation, we first utilize random projection to find a rank-k approximator for the data matrix, and reduce the cost of gradient evaluation to O(nk + dk), a significant improvement when k is much smaller than d and n. Then, we solve the sparse linear regression problem via a proximal gradient method with a homotopy strategy to generate sparse intermediate solutions. Theoretical analysis shows that our method also achieves a global geometric convergence rate, and moreover the sparsity of all the intermediate solutions are well-bounded over the iterations. Finally, we conduct experiments to demonstrate the ef- ficiency of the proposed method.

IJCAI Conference 2016 Conference Paper

Expert Finding for Community-Based Question Answering via Ranking Metric Network Learning

  • Zhou Zhao
  • Qifan Yang
  • Deng Cai
  • Xiaofei He
  • Yueting Zhuang

Expert finding for question answering is a challenging problem in Community-based Question Answering(CQA) site, arising in many applications such as question routing and the identification of best answers. In order to provide high-quality experts, many existing approaches learn the user model mainly from their past question-answering activities in CQA sites, which suffer from the sparsity problem of CQA data. In this paper, we consider the problem of expert finding from the viewpoint of learning ranking metric embedding. We propose a novel ranking metric network learning framework for expert finding by exploiting both users' relative quality rank to given questions and their social relations. We then develop a random-walk based learning method with recurrent neural networks for ranking metric network embedding. The extensive experiments on a large-scale dataset from a real world CQA site show that our method achieves better performance than other state-of-the-art solutions to the problem.

IJCAI Conference 2016 Conference Paper

Non-Negative Matrix Factorization with Sinkhorn Distance

  • Wei Qian
  • Bin Hong
  • Deng Cai
  • Xiaofei He
  • Xuelong Li

Non-negative Matrix Factorization (NMF) has received considerable attentions in various areas for its psychological and physiological interpretation of naturally occurring data whose representation may be parts-based in the human brain. Despite its good practical performance, one shortcoming of original NMF is that it ignores intrinsic structure of data set. On one hand, samples might be on a manifold and thus one may hope that geometric information can be exploited to improve NMF's performance. On the other hand, features might correlate with each other, thus conventional L 2 distance can not well measure the distance between samples. Although some works have been proposed to solve these problems, rare connects them together. In this paper, we propose a novel method that exploits knowledge in both data manifold and features correlation. We adopt an approximation of Earth Mover's Distance (EMD) as metric and add a graph regularized term based on EMD to NMF. Furthermore, we propose an efficient multiplicative iteration algorithm to solve it. Our empirical study shows the encouraging results of the proposed algorithm comparing with other NMF methods.

EAAI Journal 2015 Journal Article

Data-driven minimization of pump operating and maintenance cost

  • Zijun Zhang
  • Xiaofei He
  • Andrew Kusiak

A data-driven model for scheduling pumps in a wastewater treatment process is proposed. The objective is to minimize the cost of pump operations and maintenance. A neural network algorithm is applied to model performance of the pumps using the data collected at a municipal wastewater treatment plant. The discrete-state Markov process is utilized to develop a model of maintenance decisions. The developed pump performance and maintenance models are integrated into a scheduling model. A hierarchical particle swarm optimization algorithm is designed to solve the proposed scheduling model. The concepts developed in this paper are illustrated with two case studies.

IJCAI Conference 2015 Conference Paper

Mobile Query Recommendation via Tensor Function Learning

  • Zhou Zhao
  • Ruihua Song
  • Xing Xie
  • Xiaofei He
  • Yueting Zhuang

With the prevalence of mobile search nowadays, the benefits of mobile query recommendation are well recognized, which provide formulated queries sticking to users’ search intent. In this paper, we introduce the problem of query recommendation on mobile devices and model the user-location-query relations with a tensor representation. Unlike previous studies based on tensor decomposition, we study this problem via tensor function learning. That is, we learn the tensor function from the side information of users, locations and queries, and then predict users’ search intent. We develop an efficient alternating direction method of multipliers (ADMM) scheme to solve the introduced problem. We empirically evaluate our approach based on the mobile query dataset from Bing search engine in the city of Beijing, China, and show that our method can outperform several state-of-the-art approaches.

AAAI Conference 2014 Conference Paper

Sparse Learning for Stochastic Composite Optimization

  • Weizhong Zhang
  • Lijun Zhang
  • Yao Hu
  • Rong Jin
  • Deng Cai
  • Xiaofei He

In this paper, we focus on Stochastic Composite Optimization (SCO) for sparse learning that aims to learn a sparse solution. Although many SCO algorithms have been developed for sparse learning with an optimal convergence rate O(1/T), they often fail to deliver sparse solutions at the end either because of the limited sparsity regularization during stochastic optimization or due to the limitation in online-to-batch conversion. To improve the sparsity of solutions obtained by SCO, we propose a simple but effective stochastic optimization scheme that adds a novel sparse online-to-batch conversion to the traditional SCO algorithms. The theoretical analysis shows that our scheme can find a solution with better sparse patterns without affecting the convergence rate. Experimental results on both synthetic and real-world data sets show that the proposed methods are more effective in recovering the sparse solution and have comparable convergence rate as the state-of-the-art SCO algorithms for sparse learning.

IJCAI Conference 2013 Conference Paper

A Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and Hashing

  • Debing Zhang
  • Genmao Yang
  • Yao Hu
  • Zhongming Jin
  • Deng Cai
  • Xiaofei He

Nowadays, Nearest Neighbor Search becomes more and more important when facing the challenge of big data. Traditionally, to solve this problem, researchers mainly focus on building effective data structures such as hierarchical k-means tree or using hashing methods to accelerate the query process. In this paper, we propose a novel unified approximate nearest neighbor search scheme to combine the advantages of both the effective data structure and the fast Hamming distance computation in hashing methods. In this way, the searching procedure can be further accelerated. Computational complexity analysis and extensive experiments have demonstrated the effectiveness of our proposed scheme.

IJCAI Conference 2013 Conference Paper

Active Learning via Neighborhood Reconstruction

  • Yao Hu
  • Debing Zhang
  • Zhongming Jin
  • Deng Cai
  • Xiaofei He

In many real world scenarios, active learning methods are used to select the most informative points for labeling to reduce the expensive human action. One direction for active learning is selecting the most representative points, ie. , selecting the points that other points can be approximated by linear combination of the selected points. However, these methods fails to consider the local geometrical information of the data space. In this paper, we propose a novel framework named Active Learning via Neighborhood Reconstruction (ALNR) by taking into account the locality information directly during the selection. Specifically, for the linear reconstruction of target point, the nearer neighbors should have a greater effect and the selected points distant from the target point should be penalized severely. We further develop an efficient two-stage iterative procedure to solve the final optimization problem. Our empirical study shows encouraging results of the proposed algorithms in comparison to other state-of-the-art active learning algorithms on both synthetic and real visual data sets.

IJCAI Conference 2013 Conference Paper

Harmonious Hashing

  • Bin Xu
  • Jiajun Bu
  • Yue Lin
  • Chun Chen
  • Xiaofei He
  • Deng Cai

Hashing-based fast nearest neighbor search technique has attracted great attention in both research and industry areas recently. Many existing hashing approaches encode data with projection-based hash functions and represent each projected dimension by 1-bit. However, the dimensions with high variance hold large energy or information of data but treated equivalently as dimensions with low variance, which leads to a serious information loss. In this paper, we introduce a novel hashing algorithm called Harmonious Hashing which aims at learning hash functions with low information loss. Specifically, we learn a set of optimized projections to preserve the maximum cumulative energy and meet the constraint of equivalent variance on each dimension as much as possible. In this way, we could minimize the information loss after binarization. Despite the extreme simplicity, our method outperforms superiorly to many state-of-the-art hashing methods in large-scale and high-dimensional nearest neighbor search experiments.

JMLR Journal 2013 Journal Article

Parallel Vector Field Embedding

  • Binbin Lin
  • Xiaofei He
  • Chiyuan Zhang
  • Ming Ji

We propose a novel local isometry based dimensionality reduction method from the perspective of vector fields, which is called parallel vector field embedding (PFE). We first give a discussion on local isometry and global isometry to show the intrinsic connection between parallel vector fields and isometry. The problem of finding an isometry turns out to be equivalent to finding orthonormal parallel vector fields on the data manifold. Therefore, we first find orthonormal parallel vector fields by solving a variational problem on the manifold. Then each embedding function can be obtained by requiring its gradient field to be as close to the corresponding parallel vector field as possible. Theoretical results show that our method can precisely recover the manifold if it is isometric to a connected open subset of Euclidean space. Both synthetic and real data examples demonstrate the effectiveness of our method even if there is heavy noise and high curvature. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )

AAAI Conference 2012 Conference Paper

Document Summarization Based on Data Reconstruction

  • Zhanying He
  • Chun Chen
  • Jiajun Bu
  • Can Wang
  • Lijun Zhang
  • Deng Cai
  • Xiaofei He

Document summarization is of great value to many real world applications, such as snippets generation for search results and news headlines generation. Traditionally, document summarization is implemented by extracting sentences that cover the main topics of a document with a minimum redundancy. In this paper, we take a different perspective from data reconstruction and propose a novel framework named Document Summarization based on Data Reconstruction (DSDR). Specifically, our approach generates a summary which consist of those sentences that can best reconstruct the original document. To model the relationship among sentences, we introduce two objective functions: (1) linear reconstruction, which approximates the document by linear combinations of the selected sentences; (2) nonnegative linear reconstruction, which allows only additive, not subtractive, linear combinations. In this framework, the reconstruction error becomes a natural criterion for measuring the quality of the summary. For each objective function, we develop an efficient algorithm to solve the corresponding optimization problem. Extensive experiments on summarization benchmark data sets DUC 2006 and DUC 2007 demonstrate the effectiveness of our proposed approach.

AAAI Conference 2012 Conference Paper

Efficient Online Learning for Large-Scale Sparse Kernel Logistic Regression

  • Lijun Zhang
  • Rong Jin
  • Chun Chen
  • Jiajun Bu
  • Xiaofei He

In this paper, we study the problem of large-scale Kernel Logistic Regression (KLR). A straightforward approach is to apply stochastic approximation to KLR. We refer to this approach as non-conservative online learning algorithm because it updates the kernel classifier after every received training example, leading to a dense classifier. To improve the sparsity of the KLR classifier, we propose two conservative online learning algorithms that update the classifier in a stochastic manner and generate sparse solutions. With appropriately designed updating strategies, our analysis shows that the two conservative algorithms enjoy similar theoretical guarantee as that of the non-conservative algorithm. Empirical studies on several benchmark data sets demonstrate that compared to batch-mode algorithms for KLR, the proposed conservative online learning algorithms are able to produce sparse KLR classifiers, and achieve similar classification accuracy but with significantly shorter training time. Furthermore, both the sparsity and classification accuracy of our methods are comparable to those of the online kernel SVM.

NeurIPS Conference 2012 Conference Paper

Multi-task Vector Field Learning

  • Binbin Lin
  • Sen Yang
  • Chiyuan Zhang
  • Jieping Ye
  • Xiaofei He

Multi-task learning (MTL) aims to improve generalization performance by learning multiple related tasks simultaneously and identifying the shared information among tasks. Most of existing MTL methods focus on learning linear models under the supervised setting. We propose a novel semi-supervised and nonlinear approach for MTL using vector fields. A vector field is a smooth mapping from the manifold to the tangent spaces which can be viewed as a directional derivative of functions on the manifold. We argue that vector fields provide a natural way to exploit the geometric structure of data as well as the shared differential structure of tasks, both are crucial for semi-supervised multi-task learning. In this paper, we develop multi-task vector field learning (MTVFL) which learns the prediction functions and the vector fields simultaneously. MTVFL has the following key properties: (1) the vector fields we learned are close to the gradient fields of the prediction functions; (2) within each task, the vector field is required to be as parallel as possible which is expected to span a low dimensional subspace; (3) the vector fields from all tasks share a low dimensional subspace. We formalize our idea in a regularization framework and also provide a convex relaxation method to solve the original non-convex problem. The experimental results on synthetic and real data demonstrate the effectiveness of our proposed approach.

AAAI Conference 2012 Conference Paper

Random Projection with Filtering for Nearly Duplicate Search

  • Yue Lin
  • Rong Jin
  • Deng Cai
  • Xiaofei He

High dimensional nearest neighbor search is a fundamental problem and has found applications in many domains. Although many hashing based approaches have been proposed for approximate nearest neighbor search in high dimensional space, one main drawback is that they often return many false positives that need to be filtered out by a post procedure. We propose a novel method to address this limitation in this paper. The key idea is to introduce a filtering procedure within the search algorithm, based on the compressed sensing theory, that effectively removes the false positive answers. We first obtain a sparse representation for each data point by the landmark based approach, after which we solve the nearly duplicate search that the difference between the query and its nearest neighbors forms a sparse vector living in a small `p ball, where p ≤ 1. Our empirical study on real-world datasets demonstrates the effectiveness of the proposed approach compared to the state-of-the-art hashing methods.

NeurIPS Conference 2011 Conference Paper

Semi-supervised Regression via Parallel Field Regularization

  • Binbin Lin
  • Chiyuan Zhang
  • Xiaofei He

This paper studies the problem of semi-supervised learning from the vector field perspective. Many of the existing work use the graph Laplacian to ensure the smoothness of the prediction function on the data manifold. However, beyond smoothness, it is suggested by recent theoretical work that we should ensure second order smoothness for achieving faster rates of convergence for semi-supervised regression problems. To achieve this goal, we show that the second order smoothness measures the linearity of the function, and the gradient field of a linear function has to be a parallel vector field. Consequently, we propose to find a function which minimizes the empirical error, and simultaneously requires its gradient field to be as parallel as possible. We give a continuous objective function on the manifold and discuss how to discretize it by using random points. The discretized optimization problem turns out to be a sparse linear system which can be solved very efficiently. The experimental results have demonstrated the effectiveness of our proposed approach.

AAAI Conference 2010 Conference Paper

Gaussian Mixture Model with Local Consistency

  • Jialu Liu
  • Deng Cai
  • Xiaofei He

Gaussian Mixture Model (GMM) is one of the most popular data clustering methods which can be viewed as a linear combination of different Gaussian components. In GMM, each cluster obeys Gaussian distribution and the task of clustering is to group observations into different components through estimating each cluster’s own parameters. The Expectation- Maximization algorithm is always involved in such estimation problem. However, many previous studies have shown naturally occurring data may reside on or close to an underlying submanifold. In this paper, we consider the case where the probability distribution is supported on a submanifold of the ambient space. We take into account the smoothness of the conditional probability distribution along the geodesics of data manifold. That is, if two observations are “close” in intrinsic geometry, their distributions over different Gaussian components are similar. Simply speaking, we introduce a novel method based on manifold structure for data clustering, called Locally Consistent Gaussian Mixture Model (LCGMM). Specifically, we construct a nearest neighbor graph and adopt Kullback-Leibler Divergence as the “distance” measurement to regularize the objective function of GMM. Experiments on several data sets demonstrate the effectiveness of such regularization.

IJCAI Conference 2009 Conference Paper

  • Deng Cai
  • Xiaofei He
  • Xuanhui Wang
  • Hujun Bao
  • Jiawei Han

Matrix factorization techniques have been frequently applied in information processing tasks. Among them, Non-negative Matrix Factorization (NMF) have received considerable attentions due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts-based in human brain. On the other hand, from geometric perspective the data is usually sampled from a low dimensional manifold embedded in high dimensional ambient space. One hopes then to find a compact representation which uncovers the hidden topics and simultaneously respects the intrinsic geometric structure. In this paper, we propose a novel algorithm, called Locality Preserving Non-negative Matrix Factorization (LPNMF), for this purpose. For two data points, we use KL-divergence to evaluate their similarity on the hidden topics. The optimal maps are obtained such that the feature values on hidden topics are restricted to be non-negative and vary smoothly along the geodesics of the data manifold. Our empirical study shows the encouraging results of the proposed algorithm in comparisons to the state-ofthe-art algorithms on two large high-dimensional databases.

IJCAI Conference 2009 Conference Paper

  • Xiaofei He
  • Ming Ji
  • Hujun Bao

Recently graph based dimensionality reduction has received a lot of interests in many fields of information processing. Central to it is a graph structure which models the geometrical and discriminant structure of the data manifold. When label information is available, it is usually incorporated into the graph structure by modifying the weights between data points. In this paper, we propose a novel dimensionality reduction algorithm, called Constrained Graph Embedding, which considers the label information as additional constraints. Specifically, we constrain the space of the solutions that we explore only to contain embedding results that are consistent with the labels. Experimental results on two real life data sets illustrate the effectiveness of our proposed method.

IJCAI Conference 2007 Conference Paper

  • Deng Cai
  • Xiaofei He
  • Kun Zhou
  • Jiawei Han
  • Hujun Bao

Linear Discriminant Analysis (LDA) is a popular data-analytic tool for studying the class relationship between data points. A major disadvantage of LDA is that it fails to discover the local geometrical structure of the data manifold. In this paper, we introduce a novel linear algorithm for discriminant analysis, called {\bf Locality Sensitive Discriminant Analysis} (LSDA). When there is no sufficient training samples, local structure is generally more important than global structure for discriminant analysis. By discovering the local manifold structure, LSDA finds a projection which maximizes the margin between data points from different classes at each local area. Specifically, the data points are mapped into a subspace in which the nearby points with the same label are close to each other while the nearby points with different labels are far apart. Experiments carried out on several standard face databases show a clear improvement over the results of LDA-based recognition.

NeurIPS Conference 2005 Conference Paper

Laplacian Score for Feature Selection

  • Xiaofei He
  • Deng Cai
  • Partha Niyogi

In supervised learning scenarios, feature selection has been studied widely in the literature. Selecting features in unsupervised learning scenarios is a much harder problem, due to the absence of class labels that would guide the search for relevant information. And, almost all of previous unsupervised feature selection methods are "wrapper" techniques that require a learning algorithm to evaluate the candidate feature subsets. In this paper, we propose a "filter" method for feature selection which is independent of any learning algorithm. Our method can be performed in either supervised or unsupervised fashion. The proposed method is based on the observation that, in many real world classification problems, data from the same class are often close to each other. The importance of a feature is evaluated by its power of locality preserving, or, Laplacian Score. We compare our method with data variance (unsupervised) and Fisher score (supervised) on two data sets. Experimental results demonstrate the effectiveness and efficiency of our algorithm.

NeurIPS Conference 2005 Conference Paper

Tensor Subspace Analysis

  • Xiaofei He
  • Deng Cai
  • Partha Niyogi

Previous work has demonstrated that the image variations of many ob- jects (human faces in particular) under variable lighting can be effec- tively modeled by low dimensional linear spaces. The typical linear sub- space learning algorithms include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Locality Preserving Projec- tion (LPP). All of these methods consider an n1 × n2 image as a high dimensional vector in Rn1×n2, while an image represented in the plane is intrinsically a matrix. In this paper, we propose a new algorithm called Tensor Subspace Analysis (TSA). TSA considers an image as the sec- ond order tensor in Rn1 ⊗ Rn2, where Rn1 and Rn2 are two vector spaces. The relationship between the column vectors of the image ma- trix and that between the row vectors can be naturally characterized by TSA. TSA detects the intrinsic local geometrical structure of the tensor space by learning a lower dimensional tensor subspace. We compare our proposed approach with PCA, LDA and LPP methods on two standard databases. Experimental results demonstrate that TSA achieves better recognition rate, while being much more efficient.

NeurIPS Conference 2003 Conference Paper

Locality Preserving Projections

  • Xiaofei He
  • Partha Niyogi

Many problems in information processing involve some form of dimen- sionality reduction. In this paper, we introduce Locality Preserving Pro- jections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Com- ponent Analysis (PCA) – a classical linear technique that projects the data along the directions of maximal variance. When the high dimen- sional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Bel- trami operator on the manifold. As a result, LPP shares many of the data representation properties of nonlinear techniques such as Laplacian Eigenmaps or Locally Linear Embedding. Yet LPP is linear and more crucially is defined everywhere in ambient space rather than just on the training data points. This is borne out by illustrative examples on some high dimensional data sets.