Author name cluster

Hanqing Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

27 papers

2 author rows

TMLR Journal 2026 Journal Article

DiffKGW: Stealthy and Robust Diffusion Model Watermarking

Tianxin Wei
Ruizhong Qiu
Yifan Chen
Yunzhe Qi
Jiacheng Lin
Wenxuan Bao
Wenju Xu
Sreyashi Nag

Diffusion models are known for their supreme capability to generate realistic images. However, ethical concerns, such as copyright protection and the generation of inappropriate content, pose significant challenges for the practical deployment of diffusion models. Recent work has proposed a flurry of watermarking techniques that inject artificial patterns into initial latent representations of diffusion models, offering a promising solution to these issues. However, enforcing a specific pattern on selected elements can disrupt the Gaussian distribution of the initial latent representation. Inspired by watermarks for large language models (LLMs), we generalize the LLM KGW watermark to image diffusion models and propose a stealthy probability adjustment approach DiffKGW that preserves the Gaussian distribution of initial latent representation. In addition, we dissect the design principles of state-of-the-art watermarking techniques and introduce a unified framework. We identify a set of dimensions that explain the manipulation enforced by watermarking methods, including the distribution of individual elements, the specification of watermark shapes within each channel, and the choice of channels for watermark embedding. Through the empirical studies on regular text-to-image applications and the first systematic attempt at watermarking image-to-image diffusion models, we thoroughly verify the effectiveness of our proposed framework through comprehensive evaluations. On all the diffusion models, including Stable Diffusion, our approach induced from the proposed framework not only preserves image quality but also outperforms existing methods in robustness against a wide range of attacks.

ICML Conference 2024 Conference Paper

Language Models as Semantic Indexers

Bowen Jin
Hansi Zeng
Guoyin Wang 0001
Xiusi Chen
Tianxin Wei
Ruirui Li 0002
Zhengyang Wang
Zheng Li 0018

Semantic identifier (ID) is an important concept in information retrieval that aims to preserve the semantics of objects such as documents and items inside their IDs. Previous studies typically adopt a two-stage pipeline to learn semantic IDs by first procuring embeddings using off-the-shelf text encoders and then deriving IDs based on the embeddings. However, each step introduces potential information loss, and there is usually an inherent mismatch between the distribution of embeddings within the latent space produced by text encoders and the anticipated distribution required for semantic indexing. It is non-trivial to design a method that can learn the document’s semantic representations and its hierarchical structure simultaneously, given that semantic IDs are discrete and sequentially structured, and the semantic supervision is deficient. In this paper, we introduce LMIndexer, a self-supervised framework to learn semantic IDs with a generative language model. We tackle the challenge of sequential discrete ID by introducing a semantic indexer capable of generating neural sequential discrete representations with progressive training and contrastive learning. In response to the semantic supervision deficiency, we propose to train the model with a self-supervised document reconstruction objective. We show the high quality of the learned IDs and demonstrate their effectiveness on three tasks including recommendation, product search, and document retrieval on five datasets from various domains. Code is available at https: //github. com/PeterGriffinJin/LMIndexer.

ICLR Conference 2024 Conference Paper

Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond

Tianxin Wei
Bowen Jin
Ruirui Li 0002
Hansi Zeng
Zhengyang Wang
Jianhui Sun
Qingyu Yin
Hanqing Lu

Developing a universal model that can effectively harness heterogeneous resources and respond to a wide range of personalized needs has been a longstanding community aspiration. Our daily choices, especially in domains like fashion and retail, are substantially shaped by multi-modal data, such as pictures and textual descriptions. These modalities not only offer intuitive guidance but also cater to personalized user preferences. However, the predominant personalization approaches mainly focus on ID or text-based recommendation problems, failing to comprehend the information spanning various tasks or modalities. In this paper, our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP), which effectively leverages multi-modal data while eliminating the complexities associated with task- and modality-specific customization. We argue that the advancements in foundational generative modeling have provided the flexibility and effectiveness necessary to achieve the objective. In light of this, we develop a generic and extensible personalization generative framework, that can handle a wide range of personalized needs including item recommendation, product search, preference prediction, explanation generation, and further user-guided image generation. Our methodology enhances the capabilities of foundational language models for personalized tasks by seamlessly ingesting interleaved cross-modal user history information, ensuring a more precise and customized experience for users. To train and evaluate the proposed multi-modal personalized tasks, we also introduce a novel and comprehensive benchmark covering a variety of user requirements. Our experiments on the real-world benchmark showcase the model's potential, outperforming competitive methods specialized for each task.

NeurIPS Conference 2023 Conference Paper

Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation

Wei Jin
Haitao Mao
Zheng Li
Haoming Jiang
Chen Luo
Hongzhi Wen
Haoyu Han
Hanqing Lu

Modeling customer shopping intentions is a crucial task for e-commerce, as it directly impacts user experience and engagement. Thus, accurately understanding customer preferences is essential for providing personalized recommendations. Session-based recommendation, which utilizes customer session data to predict their next interaction, has become increasingly popular. However, existing session datasets have limitations in terms of item attributes, user diversity, and dataset scale. As a result, they cannot comprehensively capture the spectrum of user behaviors and preferences. To bridge this gap, we present the Amazon Multilingual Multi-locale Shopping Session Dataset, namely Amazon-M2. It is the first multilingual dataset consisting of millions of user sessions from six different locales, where the major languages of products are English, German, Japanese, French, Italian, and Spanish. Remarkably, the dataset can help us enhance personalization and understanding of user preferences, which can benefit various existing tasks as well as enable new tasks. To test the potential of the dataset, we introduce three tasks in this work: (1) next-product recommendation, (2) next-product recommendation with domain shifts, and (3) next-product title generation. With the above tasks, we benchmark a range of algorithms on our proposed dataset, drawing new insights for further research and practice. In addition, based on the proposed dataset and tasks, we hosted a competition in the KDD CUP 2023 https: //www. aicrowd. com/challenges/amazon-kdd-cup-23-multilingual-recommendation-challenge and have attracted thousands of users and submissions. The winning solutions and the associated workshop can be accessed at our website~https: //kddcup23. github. io/.

AAAI Conference 2023 Conference Paper

Asynchronous Event Processing with Local-Shift Graph Convolutional Network

Linhui Sun
Yifan Zhang
Jian Cheng
Hanqing Lu

Event cameras are bio-inspired sensors that produce sparse and asynchronous event streams instead of frame-based images at a high-rate. Recent works utilizing graph convolutional networks (GCNs) have achieved remarkable performance in recognition tasks, which model event stream as spatio-temporal graph. However, the computational mechanism of graph convolution introduces redundant computation when aggregating neighbor features, which limits the low-latency nature of the events. And they perform a synchronous inference process, which can not achieve a fast response to the asynchronous event signals. This paper proposes a local-shift graph convolutional network (LSNet), which utilizes a novel local-shift operation equipped with a local spatio-temporal attention component to achieve efficient and adaptive aggregation of neighbor features. To improve the efficiency of pooling operation in feature extraction, we design a node-importance based parallel pooling method (NIPooling) for sparse and low-latency event data. Based on the calculated importance of each node, NIPooling can efficiently obtain uniform sampling results in parallel, which retains the diversity of event streams. Furthermore, for achieving a fast response to asynchronous event signals, an asynchronous event processing procedure is proposed to restrict the network nodes which need to recompute activations only to those affected by the new arrival event. Experimental results show that the computational cost can be reduced by nearly 9 times through using local-shift operation and the proposed asynchronous procedure can further improve the inference efficiency, while achieving state-of-the-art performance on gesture recognition and object recognition.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Consistent-Separable Feature Representation for Semantic Segmentation

Xingjian He
Jing Liu
Jun Fu
Xinxin Zhu
Jinqiao Wang
Hanqing Lu

Cross-entropy loss combined with softmax is one of the most commonly used supervision components in most existing segmentation methods. The softmax loss is typically good at optimizing the inter-class difference, but not good at reducing the intra-class variation, which can be suboptimal for semantic segmentation task. In this paper, we propose a Consistent-Separable Feature Representation Network to model the Consistent-Separable (C-S) features, which are intra-class consistent and inter-class separable, improving the discriminative power of the deep features. Specifically, we develop a Consistent-Separable Feature Learning Module to obtain C-S features through a new loss, called Class-Aware Consistency loss. This loss function is proposed to force the deep features to be consistent among the same class and apart between different classes. Moreover, we design an Adaptive feature Aggregation Module to fuse the C-S features and original features from backbone for the better semantic prediction. We show that compared with various baselines, the proposed method brings consistent performance improvement. Our proposed approach achieves state-of-the-art performance on Cityscapes (82. 6% mIoU in test set), ADE20K (46. 65% mIoU in validation set), COCO Stuff (41. 3% mIoU in validation set) and PASCAL Context (55. 9% mIoU in test set).

IJCAI Conference 2020 Conference Paper

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

Longteng Guo
Jing Liu
Xinxin Zhu
Xingjian He
Jie Jiang
Hanqing Lu

Most image captioning models are autoregressive, i. e. they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. Recently, non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel. Typically, these models use the word-level cross-entropy loss to optimize each word independently. However, such a learning process fails to consider the sentence-level consistency, thus resulting in inferior generation quality of these non-autoregressive models. In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL). CMAL formulates NAIC as a multi-agent reinforcement learning system where positions in the target sequence are viewed as agents that learn to cooperatively maximize a sentence-level reward. Besides, we propose to utilize massive unlabeled images to boost captioning performance. Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13. 9x decoding speedup.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Progressive Bi-C3D Pose Grammar for Human Pose Estimation

Lu Zhou
Yingying Chen
Jinqiao Wang
Hanqing Lu

In this paper, we propose a progressive pose grammar network learned with Bi-C3D (Bidirectional Convolutional 3D) for human pose estimation. Exploiting the dependencies among the human body parts proves effective in solving the problems such as complex articulation, occlusion and so on. Therefore, we propose two articulated grammars learned with Bi-C3D to build the relationships of the human joints and exploit the contextual information of human body structure. Firstly, a local multi-scale Bi-C3D kinematics grammar is proposed to promote the message passing process among the locally related joints. The multi-scale kinematics grammar excavates different levels human context learned by the network. Moreover, a global sequential grammar is put forward to capture the long-range dependencies among the human body joints. The whole procedure can be regarded as a local-global progressive reﬁnement process. Without bells and whistles, our method achieves competitive performance on both MPII and LSP benchmarks compared with previous methods, which conﬁrms the feasibility and effectiveness of C3D in information interactions.

IJCAI Conference 2019 Conference Paper

Densely Connected Attention Flow for Visual Question Answering

Fei Liu
Jing Liu
Zhiwei Fang
Richang Hong
Hanqing Lu

Learning effective interactions between multi-modal features is at the heart of visual question answering (VQA). A common defect of the existing VQA approaches is that they only consider a very limited amount of interactions, which may be not enough to model latent complex image-question relations that are necessary for accurately answering questions. Therefore, in this paper, we propose a novel DCAF (Densely Connected Attention Flow) framework for modeling dense interactions. It densely connects all pairwise layers of the network via Attention Connectors, capturing fine-grained interplay between image and question across all hierarchical levels. The proposed Attention Connector efficiently connects the multi-modal features at any two layers with symmetric co-attention, and produces interaction-aware attention features. Experimental results on three publicly available datasets show that the proposed method achieves state-of-the-art performance.

IJCAI Conference 2019 Conference Paper

Reading selectively via Binary Input Gated Recurrent Unit

Zhe Li
Peisong Wang
Hanqing Lu
Jian Cheng

Recurrent Neural Networks (RNNs) have shown great promise in sequence modeling tasks. Gated Recurrent Unit (GRU) is one of the most used recurrent structures, which makes a good trade-off between performance and time spent. However, its practical implementation based on soft gates only partially achieves the goal to control information flow. We can hardly explain what the network has learnt internally. Inspired by human reading, we introduce binary input gated recurrent unit (BIGRU), a GRU based model using a binary input gate instead of the reset gate in GRU. By doing so, our model can read selectively during interference. In our experiments, we show that BIGRU mainly ignores the conjunctions, adverbs and articles that do not make a big difference to the document understanding, which is meaningful for us to further understand how the network works. In addition, due to reduced interference from redundant information, our model achieves better performances than baseline GRU in all the testing tasks.

AAAI Conference 2018 Conference Paper

Learning Coarse-to-Fine Structured Feature Embedding for Vehicle Re-Identification

Haiyun Guo
Chaoyang Zhao
Zhiwei Liu
Jinqiao Wang
Hanqing Lu

Vehicle re-identiﬁcation (re-ID) is to identify the same vehicle across different cameras. It’s a signiﬁcant but challenging topic, which has received little attention due to the complex intra-class and inter-class variation of vehicle images and the lack of large-scale vehicle re-ID dataset. Previous methods focus on pulling images from different vehicles apart but neglect the discrimination between vehicles from different vehicle models, which is actually quite important to obtain a correct ranking order for vehicle re-ID. In this paper, we learn a structured feature embedding for vehicle re-ID with a novel coarse-to-ﬁne ranking loss to pull images of the same vehicle as close as possible and achieve discrimination between images from different vehicles as well as vehicles from different vehicle models. In the learnt feature space, both intra-class compactness and inter-class distinction are well guaranteed and the Euclidean distance between features directly reﬂects the semantic similarity of vehicle images. Furthermore, we build so far the largest vehicle re-ID dataset “Vehicle-1M”1 which involves nearly 1 million images captured in various surveillance scenarios. Experimental results on “Vehicle-1M” and “VehicleID” demonstrate the superiority of our proposed approach.

AAAI Conference 2017 Conference Paper

Community-Based Question Answering via Asymmetric Multi-Faceted Ranking Network Learning

Zhou Zhao
Hanqing Lu
Vincent Zheng
Deng Cai
Xiaofei He
Yueting Zhuang

Nowadays the community-based question answering (CQA) sites become the popular Internet-based web service, which have accumulated millions of questions and their posted answers over time. Thus, question answering becomes an essential problem in CQA sites, which ranks the high-quality answers to the given question. Currently, most of the existing works study the problem of question answering based on the deep semantic matching model to rank the answers based on their semantic relevance, while ignoring the authority of answerers to the given question. In this paper, we consider the problem of community-based question answering from the viewpoint of asymmetric multi-faceted ranking network embedding. We propose a novel asymmetric multi-faceted ranking network learning framework for community-based question answering by jointly exploiting the deep semantic relevance between question-answer pairs and the answerers’ authority to the given question. We then develop an asymmetric ranking network learning method with deep recurrent neural networks by integrating both answers’ relative quality rank to the given question and the answerers’ following relations in CQA sites. The extensive experiments on a large-scale dataset from a real world CQA site show that our method achieves better performance than other state-of-the-art solutions to the problem.

AAAI Conference 2017 Short Paper

Community-Based Question Answering via Contextual Ranking Metric Network Learning

Hanqing Lu
Ming Kong

The exponential growth of information on Community-based Question Answering (CQA) sites has raised the challenges for the accurate matching of high-quality answers to the given questions. Many existing approaches learn the matching model mainly based on the semantic similarity between questions and answers, which can not effectively handle the ambiguity problem of questions and the sparsity problem of CQA data. In this paper, we propose to solve these two problems by exploiting users’ social contexts. Speciﬁcally, we propose a novel framework for CQA task by exploiting both the question-answer content in CQA site and users’ social contexts. The experiment on real-world dataset shows the effectiveness of our method.

NeurIPS Conference 2017 Conference Paper

Decoding with Value Networks for Neural Machine Translation

Di He
Hanqing Lu
Yingce Xia
Tao Qin
Liwei Wang
Tie-Yan Liu

Neural Machine Translation (NMT) has become a popular technology in recent years, and beam search is its de facto decoding method due to the shrunk search space and reduced computational complexity. However, since it only searches for local optima at each time step through one-step forward looking, it usually cannot output the best target sentence. Inspired by the success and methodology of AlphaGo, in this paper we propose using a prediction network to improve beam search, which takes the source sentence $x$, the currently available decoding output $y_1, \cdots, y_{t-1}$ and a candidate word $w$ at step $t$ as inputs and predicts the long-term value (e. g. , BLEU score) of the partial target sentence if it is completed by the NMT model. Following the practice in reinforcement learning, we call this prediction network \emph{value network}. Specifically, we propose a recurrent structure for the value network, and train its parameters from bilingual data. During the test time, when choosing a word $w$ for decoding, we consider both its conditional probability given by the NMT model and its long-term value predicted by the value network. Experiments show that such an approach can significantly improve the translation accuracy on several translation tasks.

IJCAI Conference 2017 Conference Paper

Microblog Sentiment Classiﬁcation via Recurrent Random Walk Network Learning

Zhou Zhao
Hanqing Lu
Deng Cai
Xiaofei He
Yueting Zhuang

Microblog Sentiment Classiﬁcation (MSC) is a challenging task in microblog mining, arising in many applications such as stock price prediction and crisis management. Currently, most of the existing approaches learn the user sentiment model from their posted tweets in microblogs, which suffer from the insufﬁciency of discriminative tweet representation. In this paper, we consider the problem of microblog sentiment classiﬁcation from the viewpoint of heterogeneous MSC network embedding. We propose a novel recurrent random walk network learning framework for the problem by exploiting both users’ posted tweets and their social relations in microblogs. We then introduce the deep recurrent neural networks with random-walk layer for heterogeneous MSC network embedding, which can be trained end-to-end from the scratch. Weemploytheback-propagationmethodfortraining the proposed recurrent random walk network model. The extensive experiments on the large-scale public datasets from Twitter show that our method achieves better performance than other state-of-the-art solutions to the problem.

IJCAI Conference 2016 Conference Paper

Action Recognition with Joints-Pooled 3D Deep Convolutional Descriptors

Congqi Cao
Yifan Zhang
Chunjie Zhang
Hanqing Lu

Torso joints can be considered as the landmarks of human body. An action consists of a series of body poses which are determined by the positions of the joints. With the rapid development of RGB-D camera technique and pose estimation research, the acquisition of the body joints has become much easier than before. Thus, we propose to incorporate joint positions with currently popular deep-learned features for action recognition. In this paper, we present a simple, yet effective method to aggregate convolutional activations of a 3D deep convolutional neural network (3D CNN) into discriminative descriptors based on joint positions. Two pooling schemes for mapping body joints into convolutional feature maps are discussed. The joints-pooled 3D deep convolutional descriptors (JDDs) are more effective and robust than the original 3D CNN features and other competing features. We evaluate the proposed descriptors on recognizing both short actions and complex activities. Experimental results on real-world datasets show that our method generates promising results, outperforming state-of-the-art results significantly.

AAAI Conference 2016 Conference Paper

MC-HOG Correlation Tracking with Saliency Proposal

Guibo Zhu
Jinqiao Wang
Yi Wu
Xiaoyu Zhang
Hanqing Lu

Designing effective feature and handling the model drift problem are two important aspects for online visual tracking. For feature representation, gradient and color features are most widely used, but how to effectively combine them for visual tracking is still an open problem. In this paper, we propose a rich feature descriptor, MC-HOG, by leveraging rich gradient information across multiple color channels or spaces. Then MC-HOG features are embedded into the correlation tracking framework to estimate the state of the target. For handling the model drift problem caused by occlusion or distracter, we propose saliency proposals as prior information to provide candidates and reduce background interference. In addition to saliency proposals, a ranking strategy is proposed to determine the importance of these proposals by exploiting the learnt appearance ﬁlter, historical preserved object samples and the distracting proposals. In this way, the proposed approach could effectively explore the color-gradient characteristics and alleviate the model drift problem. Extensive evaluations performed on the benchmark dataset show the superiority of the proposed method.

TIST Journal 2016 Journal Article

Multimedia News Summarization in Search

Zechao Li
Jinhui Tang
Xueming Wang
Jing Liu
Hanqing Lu

It is a necessary but challenging task to relieve users from the proliferative news information and allow them to quickly and comprehensively master the information of the whats and hows that are happening in the world every day. In this article, we develop a novel approach of multimedia news summarization for searching results on the Internet, which uncovers the underlying topics among query-related news information and threads the news events within each topic to generate a query-related brief overview. First, the hierarchical latent Dirichlet allocation (hLDA) model is introduced to discover the hierarchical topic structure from query-related news documents, and a new approach based on the weighted aggregation and max pooling is proposed to identify one representative news article for each topic. One representative image is also selected to visualize each topic as a complement to the text information. Given the representative documents selected for each topic, a time-bias maximum spanning tree (MST) algorithm is proposed to thread them into a coherent and compact summary of their parent topic. Finally, we design a friendly interface to present users with the hierarchical summarization of their required news information. Extensive experiments conducted on a large-scale news dataset collected from multiple news Web sites demonstrate the encouraging performance of the proposed solution for news summarization in news retrieval.

IJCAI Conference 2015 Conference Paper

Face Clustering in Videos with Proportion Prior

Zhiqiang Tang
Yifan Zhang
Zechao Li
Hanqing Lu

In this paper, we investigate the problem of face clustering in real-world videos. In many cases, the distribution of the face data is unbalanced. In movies or TV series videos, the leading casts appear quite often and the others appear much less. However, many clustering algorithms cannot well handle such severe unbalance between the data distribution, resulting in that the large class is split apart, and the small class is merged into the large ones and thus missing. On the other hand, the data distribution proportion information may be known beforehand. For example, we can obtain such information by counting the spoken lines of the characters in the script text. Hence, we propose to make use of the proportion prior to regularize the clustering. A Hidden Conditional Random Field(HCRF) model is presented to incorporate the proportion prior. In experiments on a public data set from realworld videos, we observe improvements on clustering performance against state-of-the-art methods.

ICML Conference 2015 Conference Paper

Hashing for Distributed Data

Cong Leng
Jiaxiang Wu 0001
Jian Cheng 0001
Xi Zhang 0018
Hanqing Lu

Recently, hashing based approximate nearest neighbors search has attracted much attention. Extensive centralized hashing algorithms have been proposed and achieved promising performance. However, due to the large scale of many applications, the data is often stored or even collected in a distributed manner. Learning hash functions by aggregating all the data into a fusion center is infeasible because of the prohibitively expensive communication and computation overhead. In this paper, we develop a novel hashing model to learn hash functions in a distributed setting. We cast a centralized hashing model as a set of subproblems with consensus constraints. We find these subproblems can be analytically solved in parallel on the distributed compute nodes. Since no training data is transmitted across the nodes in the learning process, the communication cost of our model is independent to the data size. Extensive experiments on several large scale datasets containing up to 100 million samples demonstrate the efficacy of our method.

IJCAI Conference 2015 Conference Paper

Weakly Supervised RBM for Semantic Segmentation

Yong Li
Jing Liu
Yuhang Wang
Hanqing Lu
Songde Ma

In this paper, we propose a weakly supervised Restricted Boltzmann Machines (WRBM) approach to deal with the task of semantic segmentation with only image-level labels available. In WRBM, its hidden nodes are divided into multiple blocks, and each block corresponds to a specific label. Accordingly, semantic segmentation can be directly modeled by learning the mapping from visible layer to the hidden layer of WRBM. Specifically, based on the standard RBM, we import another two terms to make full use of image-level labels and alleviate the effect of noisy labels. First, we expect the hidden response of each superpixel is suppressed on the labels outside its parent image-level label set, and a non-image-level label suppression term is formulated to implicitly import the image-level labels as weak supervision. Second, semantic graph propagation is employed to exploit the cooccurrence between visually similar regions and labels. Besides, we deal with the problems of label imbalance and diverse backgrounds by adapting the block size to the label frequency and appending hidden response blocks corresponding to backgrounds respectively. Extensive experiments on two real-world datasets demonstrate the good performance of our approach compared with some state-of-the-art methods.

AAAI Conference 2014 Conference Paper

Learning Low-Rank Representations with Classwise Block-Diagonal Structure for Robust Face Recognition

Yong Li
Jing Liu
Zechao Li
Yangmuzi Zhang
Hanqing Lu
Songde Ma

Face recognition has been widely studied due to its importance in various applications. However, the case that both training images and testing images are corrupted is not well addressed. Motivated by the success of low-rank matrix recovery, we propose a novel semisupervised low-rank matrix recovery algorithm for robust face recognition. The proposed method can learn robust discriminative representations for both training images and testing images simultaneously by exploiting the classwise block-diagonal structure. Specifically, low-rank matrix approximation can handle the possible contamination of data. Moreover, the classwise blockdiagonal structure is exploited to promote discrimination of representations for robust recognition. The above issues are formulated into a unified objective function and we design an efficient optimization procedure based on augmented Lagrange multiplier method to solve it. Extensive experiments on three public databases are performed to validate the effectiveness of our approach. The strong identification capability of representations with block-diagonal structure is verified.

AAAI Conference 2014 Conference Paper

Recommendation by Mining Multiple User Behaviors with Group Sparsity

Ting Yuan
Jian Cheng
Xi Zhang
Shuang Qiu
Hanqing Lu

Recently, some recommendation methods try to improve the prediction results by integrating information from user’s multiple types of behaviors. How to model the dependence and independence between different behaviors is critical for them. In this paper, we propose a novel recommendation model, the Group-Sparse Matrix Factorization (GSMF), which factorizes the rating matrices for multiple behaviors into the user and item latent factor space with group sparsity regularization. It can (1) select out the different subsets of latent factors for different behaviors, addressing that users’ decisions on different behaviors are determined by different sets of factors; (2) model the dependence and independence between behaviors by learning the shared and private factors for multiple behaviors automatically; (3) allow the shared factors between different behaviors to be different, instead of all the behaviors sharing the same set of factors. Experiments on the real-world dataset demonstrate that our model can integrate users’ multiple types of behaviors into recommendation better, compared with other state-of-the-arts.

TIST Journal 2014 Journal Article

Snap & Play

Si Liu
Qiang Chen
Shuicheng Yan
Changsheng Xu
Hanqing Lu

In this article, by taking a popular game, the Find-the-Difference (FiDi) game, as a concrete example, we explore how state-of-the-art image processing techniques can assist in developing a personalized, automatic, and dynamic game. Unlike the traditional FiDi game, where image pairs (source image and target image) with five different patches are manually produced by professional game developers, the proposed Personalized FiDi (P-FiDi) electronic game can be played in a fully automatic Snap & Play mode. Snap means that players first take photos with their digital cameras. The newly captured photos are used as source images and fed into the P-FiDi system to autogenerate the counterpart target images for users to play. Four steps are adopted to autogenerate target images: enhancing the visual quality of source images, extracting some changeable patches from the source image, selecting the most suitable combination of changeable patches and difference styles for the image, and generating the differences on the target image with state-of-the-art image processing techniques. In addition, the P-FiDi game can be easily redesigned for the im-game advertising. Extensive experiments show that the P-FiDi electronic game is satisfying in terms of player experience, seamless advertisement, and technical feasibility.

AAAI Conference 2012 Conference Paper

Unsupervised Feature Selection Using Nonnegative Spectral Analysis

Zechao Li
Yi Yang
Jing Liu
Xiaofang Zhou
Hanqing Lu

In this paper, a new unsupervised learning algorithm, namely Nonnegative Discriminative Feature Selection (NDFS), is proposed. To exploit the discriminative information in unsupervised scenarios, we perform spectral clustering to learn the cluster labels of the input samples, during which the feature selection is performed simultaneously. The joint learning of the cluster labels and feature selection matrix enables NDFS to select the most discriminative features. To learn more accurate cluster labels, a nonnegative constraint is explicitly imposed to the class indicators. To reduce the redundant or even noisy features, `2, 1-norm minimization constraint is added into the objective function, which guarantees the feature selection matrix sparse in rows. Our algorithm exploits the discriminative information and feature correlation simultaneously to select a better feature subset. A simple yet efficient iterative algorithm is designed to optimize the proposed objective function. Experimental results on different real world datasets demonstrate the encouraging performance of our algorithm over the state-of-the-arts.

AAAI Conference 2011 Conference Paper

Size Adaptive Selection of Most Informative Features

Si Liu
Hairong Liu
Longin Jan Latecki
Shuicheng Yan
Changsheng Xu
Hanqing Lu

In this paper, we propose a novel method to select the most informative subset of features, which has little redundancy and very strong discriminating power. Our proposed approach automatically determines the optimal number of features and selects the best subset accordingly by maximizing the average pairwise informativeness, thus has obvious advantage over traditional ﬁlter methods. By relaxing the essential combinatorial optimization problem into the standard quadratic programming problem, the most informative feature subset can be obtained efﬁciently, and a strategy to dynamically compute the redundancy between feature pairs further greatly accelerates our method through avoiding unnecessary computations of mutual information. As shown by the extensive experiments, the proposed method can successfully select the most informative subset of features, and the obtained classiﬁcation results signiﬁcantly outperform the state-of-the-art results on most test datasets.

IROS Conference 2008 Conference Paper

IQ evaluation based adaptive wavelet denoising and enhancement for a VTRAN system

Haoting Liu
Hanqing Lu

An Image Quality (IQ) evaluation based wavelet domain denoising and enhancement model for a Vehicle Target Recognition and Assistant Navigation (VTRAN) system is introduced. In order to adapt to a complex atmosphere environment, our model utilizes the IQ parameters as a criterion to search the proper estimation values of the wavelet domain denoising and enhancement model. Firstly, before the wavelet processing, we evaluate the IQ and estimate the expected image gray values from some sub-regions of the original image. Then our algorithm will add some ldquocleanrdquo and ldquoclearrdquo points with those expected gray values in the original image. After that our model will calculate the wavelet based denoising and enhancement model and evaluate the result by the IQ with those added reference points. Finally, if the IQ is not good enough, a feedback will be given and the model parameters will be modified until it gets a better result. Experiment results show our algorithm works well in many outdoor tests and it can be used in a complex atmosphere environment.