Author name cluster

Luo Si

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers

2 author rows

NeurIPS Conference 2023 Conference Paper

From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader

Weiwen Xu
Xin Li
Wenxuan Zhang
Meng Zhou
Wai Lam
Luo Si
Lidong Bing

We present Pre-trained Machine Reader (PMR), a novel method for retrofitting pre-trained masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data. PMR can resolve the discrepancy between model pre-training and downstream fine-tuning of existing MLMs. To build the proposed PMR, we constructed a large volume of general-purpose and high-quality MRC-style training data by using Wikipedia hyperlinks and designed a Wiki Anchor Extraction task to guide the MRC-style pre-training. Apart from its simplicity, PMR effectively solves extraction tasks, such as Extractive Question Answering and Named Entity Recognition. PMR shows tremendous improvements over existing approaches, especially in low-resource scenarios. When applied to the sequence classification task in the MRC formulation, PMR enables the extraction of high-quality rationales to explain the classification process, thereby providing greater prediction explainability. PMR also has the potential to serve as a unified model for tackling various extraction and classification tasks in the MRC formulation.

PDF Details

AAAI Conference 2023 Conference Paper

Graphix-T5: Mixing Pre-trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing

Jinyang Li
Binyuan Hui
Reynold Cheng
Bowen Qin
Chenhao Ma
Nan Huo
Fei Huang
Wenyu Du

The task of text-to-SQL parsing, which aims at converting natural language questions into executable SQL queries, has garnered increasing attention in recent years. One of the major challenges in text-to-SQL parsing is domain generalization, i.e., how to generalize well to unseen databases. Recently, the pre-trained text-to-text transformer model, namely T5, though not specialized for text-to-SQL parsing, has achieved state-of-the-art performance on standard benchmarks targeting domain generalization. In this work, we explore ways to further augment the pre-trained T5 model with specialized components for text-to-SQL parsing. Such components are expected to introduce structural inductive bias into text-to-SQL parsers thus improving the model’s capacity on (potentially multi-hop) reasoning, which is critical for generating structure-rich SQLs. To this end, we propose a new architecture GRAPHIX-T5, a mixed model with the standard pre-trained transformer model augmented by specially-designed graph-aware layers. Extensive experiments and analysis demonstrate the effectiveness of GRAPHIX-T5 across four text-to-SQL benchmarks: SPIDER, SYN, REALISTIC and DK. GRAPHIX-T5 surpasses all other T5-based parsers with a significant margin, achieving new state-of-the-art performance. Notably, GRAPHIX-T5-large reaches performance superior to the original T5-large by 5.7% on exact match (EM) accuracy and 6.6% on execution accuracy (EX). This even outperforms the T5-3B by 1.2% on EM and 1.5% on EX

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning

Xiang Chen
Lei Li
Ningyu Zhang
Xiaozhuan Liang
Shumin Deng
Chuanqi Tan
Fei Huang
Luo Si

Prompt learning approaches have made waves in natural language processing by inducing better few-shot performance while they still follow a parametric-based learning paradigm; the oblivion and rote memorization problems in learning may encounter unstable generalization issues. Specifically, vanilla prompt learning may struggle to utilize atypical instances by rote during fully-supervised training or overfit shallow patterns with low-shot data. To alleviate such limitations, we develop RetroPrompt with the motivation of decoupling knowledge from memorization to help the model strike a balance between generalization and memorization. In contrast with vanilla prompt learning, RetroPrompt constructs an open-book knowledge-store from training instances and implements a retrieval mechanism during the process of input, training and inference, thus equipping the model with the ability to retrieve related contexts from the training corpus as cues for enhancement. Extensive experiments demonstrate that RetroPrompt can obtain better performance in both few-shot and zero-shot settings. Besides, we further illustrate that our proposed RetroPrompt can yield better generalization abilities with new datasets. Detailed analysis of memorization indeed reveals RetroPrompt can reduce the reliance of language models on memorization; thus, improving generalization for downstream tasks. Code is available in https: //github. com/zjunlp/PromptKG/tree/main/research/RetroPrompt.

PDF Details

AAAI Conference 2022 Conference Paper

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-supervised Learning and Explicit Policy Injection

Wanwei He
Yinpei Dai
Yinhe Zheng
Yuchuan Wu
Zheng Cao
Dermot Liu
Peng Jiang
Min Yang

Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on enhancing dialog understanding and generation tasks while neglecting the exploitation of dialog policy. In this paper, we propose GALAXY, a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning. Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation with the help of unlabeled dialogs. We also implement a gating mechanism to weigh suitable unlabeled dialog samples. Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems, and achieves new state-of-the-art results on benchmark datasets: In-Car, MultiWOZ2. 0 and Multi- WOZ2. 1, improving their end-to-end combined scores by 2. 5, 5. 3 and 5. 5 points, respectively. We also show that GALAXY has a stronger few-shot ability than existing models under various low-resource settings. For reproducibility, we release the code and data at https: //github. com/siat-nlp/GALAXY.

PDF Details

IJCAI Conference 2021 Conference Paper

Document-level Relation Extraction as Semantic Segmentation

Ningyu Zhang
Xiang Chen
Xin Xie
Shumin Deng
Chuanqi Tan
Mosha Chen
Fei Huang
Luo Si

Document-level relation extraction aims to extract relations among multiple entity pairs from a document. Previously proposed graph-based or transformer-based models utilize the entities independently, regardless of global information among relational triples. This paper approaches the problem by predicting an entity-level relation matrix to capture local and global information, parallel to the semantic segmentation task in computer vision. Herein, we propose a Document U-shaped Network for document-level relation extraction. Specifically, we leverage an encoder module to capture the context information of entities and a U-shaped segmentation module over the image-style feature map to capture global interdependency among triples. Experimental results show that our approach can obtain state-of-the-art performance on three benchmark datasets DocRED, CDR, and GDA.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Dynamic Hybrid Relation Exploration Network for Cross-Domain Context-Dependent Semantic Parsing

Binyuan Hui
Ruiying Geng
Qiyu Ren
Binhua Li
Yongbin Li
Jian Sun
Fei Huang
Luo Si

Semantic parsing has long been a fundamental problem in natural language processing. Recently, cross-domain contextdependent semantic parsing has become a new focus of research. Central to the problem is the challenge of leveraging contextual information of both natural language utterance and database schemas in the interaction history. In this paper, we present a dynamic graph framework that is capable of effectively modelling contextual utterances, tokens, database schemas, and their complicated interaction as the conversation proceeds. The framework employs a dynamic memory decay mechanism that incorporates inductive bias to integrate enriched contextual relation representation, which is further enhanced with a powerful reranking model. At the time of writing, we demonstrate that the proposed framework outperforms all existing models by large margins, achieving new state-of-the-art performance on two large-scale benchmarks, the SParC and CoSQL datasets. Specifically, the model attains a 55. 8% question-match and 30. 8% interaction-match accuracy on SParC, and a 46. 8% question-match and 17. 0% interaction-match accuracy on CoSQL.

PDF Details

AAAI Conference 2021 Conference Paper

Knowledge-aware Named Entity Recognition with Alleviating Heterogeneity

Binling Nie
Ruixue Ding
Pengjun Xie
Fei Huang
Chen Qian
Luo Si

Named Entity Recognition (NER) is a fundamental and important research topic for many downstream NLP tasks, aiming at detecting and classifying named entities (NEs) mentioned in unstructured text into pre-defined categories. Learning from labeled data only is far from enough when it comes to domain-specific or temporally-evolving entities (e. g. medical terminologies or restaurant names). Luckily, open-source Knowledge Bases (KBs) (e. g. Wikidata and Freebase) contain NEs that are manually labeled with predefined types in different domains, which is potentially beneficial to identify entity boundaries and recognize entity types more accurately. However, the type system of a domain-specific NER task is typically independent of that of current KBs and thus exhibits heterogeneity issue inevitably, which makes matching between the original NER and KB types (e. g. Person in NER potentially matches President in KBs) less likely, or introduces unintended noises without considering domainspecific knowledge (e. g. Band in NER should be mapped to Out of Entity Types in the restaurant-related task). To better incorporate and denoise the abundant knowledge in KBs, we propose a new KB-aware NER framework (KaNa), which utilizes type-heterogeneous knowledge to improve NER. Specifically, for an entity mention along with a set of candidate entities that are linked from KBs, KaNa first uses a type projection mechanism that maps the mention type and entity types into a shared space to homogenize the heterogeneous entity types. Then, based on projected types, a noise detector filters out certain less-confident candidate entities in an unsupervised manner. Finally, the filtered mention-entity pairs are injected into a NER model as a graph to predict answers. The experimental results demonstrate KaNa’s state-ofthe-art performance on five public benchmark datasets from different domain.

PDF Details

AAAI Conference 2021 Conference Paper

Unsupervised Learning of Deterministic Dialogue Structure with Edge-Enhanced Graph Auto-Encoder

Yajing Sun
Yong Shan
Chengguang Tang
Yue Hu
Yinpei Dai
Jing Yu
Jian Sun
Fei Huang

It is important for task-oriented dialogue systems to discover the dialogue structure (i. e. the general dialogue flow) from dialogue corpora automatically. Previous work models dialogue structure by extracting latent states for each utterance first and then calculating the transition probabilities among states. These two-stage methods ignore the contextual information when calculating the probabilities, which makes the transitions between the states ambiguous. This paper proposes a conversational graph (CG) to represent deterministic dialogue structure where nodes and edges represent the utterance and context information respectively. An unsupervised Edge- Enhanced Graph Auto-Encoder (EGAE) architecture is designed to model local-contextual and global-structural information for conversational graph learning. Furthermore, a selfsupervised objective is introduced with the response selection task to guide the unsupervised learning of the dialogue structure. Experimental results on several public datasets demonstrate that the novel model outperforms several alternatives in aggregating utterances with similar semantics. The effectiveness of the learned dialogue structured is also verified by more than 5% joint accuracy improvement in the downstream task of low resource dialogue state tracking.

PDF Details

ECAI Conference 2020 Conference Paper

Behavior Based Dynamic Summarization on Product Aspects via Reinforcement Neighbour Selection

Zheng Gao 0001
Lujun Zhao
Heng Huang
Hongsong Li
Changlong Sun
Luo Si
Xiaozhong Liu 0001

Dynamic summarization on product aspects, as a newly proposed topic, is an important task in E-commerce for tracking and understanding the nature of products. This can benefit both customers and sellers in different downstream tasks, such as explainable recommendations. However, most existing research works focus on analyzing product static reviews but miss dynamic sentiment changes. In this paper, we propose an innovative multi-task model to sample neighbour products whose information is simultaneously utilized to generate product summarization. In detail, a reinforcement learning approach selects neighbour products from a group of seed products by considering their pairwise similarities calculated from user behaviors. Meanwhile, a generative model helps to summarize product aspects via product descriptive phrases and selected neighbour products’ sentimental phrases. To the best of our knowledge, this is the first work that studies dynamic product summarization leveraging user behaviors instead of self-reviews. It means that the proposed approach can naturally address the cold-start scenario where few recent product reviews are available. Extensive experiments are conducted with real-world reviews plus behavior data to validate the proposed method against several strong alternatives.

Details

AAAI Conference 2020 Conference Paper

Knowing What, How and Why: A Near Complete Solution for Aspect-Based Sentiment Analysis

Haiyun Peng
Lu Xu
Lidong Bing
Fei Huang
Wei Lu
Luo Si

Target-based sentiment analysis or aspect-based sentiment analysis (ABSA) refers to addressing various sentiment analysis tasks at a ﬁne-grained level, which includes but is not limited to aspect extraction, aspect sentiment classiﬁcation, and opinion extraction. There exist many solvers of the above individual subtasks or a combination of two subtasks, and they can work together to tell a complete story, i. e. the discussed aspect, the sentiment on it, and the cause of the sentiment. However, no previous ABSA research tried to provide a complete solution in one shot. In this paper, we introduce a new subtask under ABSA, named aspect sentiment triplet extraction (ASTE). Particularly, a solver of this task needs to extract triplets (What, How, Why) from the inputs, which show WHAT the targeted aspects are, HOW their sentiment polarities are and WHY they have such polarities (i. e. opinion reasons). For instance, one triplet from “Waiters are very friendly and the pasta is simply average” could be (‘Waiters’, positive, ‘friendly’). We propose a two-stage framework to address this task. The ﬁrst stage predicts what, how and why in a uniﬁed model, and then the second stage pairs up the predicted what (how) and why from the ﬁrst stage to output triplets. In the experiments, our framework has set a benchmark performance in this novel triplet extraction task. Meanwhile, it outperforms a few strong baselines adapted from state-of-the-art related methods.

PDF Details

AAAI Conference 2020 Conference Paper

Sentiment Classification in Customer Service Dialogue with Topic-Aware Multi-Task Learning

Jiancheng Wang
Jingjing Wang
Changlong Sun
Shoushan Li
Xiaozhong Liu
Luo Si
Min Zhang
Guodong Zhou

Sentiment analysis in dialogues plays a critical role in dialogue data analysis. However, previous studies on sentiment classiﬁcation in dialogues largely ignore topic information, which is important for capturing overall information in some types of dialogues. In this study, we focus on the sentiment classiﬁcation task in an important type of dialogue, namely customer service dialogue, and propose a novel approach which captures overall information to enhance the classiﬁcation performance. Speciﬁcally, we propose a topic-aware multi-task learning (TML) approach which learns topicenriched utterance representations in customer service dialogue by capturing various kinds of topic information. In the experiment, we propose a large-scale and high-quality annotated corpus for the sentiment classiﬁcation task in customer service dialogue and empirical studies on the proposed corpus show that our approach signiﬁcantly outperforms several strong baselines.

PDF Details

ICLR Conference 2020 Conference Paper

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Wei Wang 0225
Bin Bi
Ming Yan 0008
Chen Wu 0006
Jiangnan Xia
Zuyi Bao
Liwei Peng
Luo Si

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman, we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 89.0 (outperforming all published models at the time of model submission), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7.

Details

AAAI Conference 2019 Conference Paper

A Deep Cascade Model for Multi-Document Reading Comprehension

Ming Yan
Jiangnan Xia
Chen Wu
Bin Bi
Zhongzhou Zhao
Ji Zhang
Luo Si
Rui Wang

A fundamental trade-off between effectiveness and efficiency needs to be balanced when designing an online question answering system. Effectiveness comes from sophisticated functions such as extractive machine reading comprehension (MRC), while efficiency is obtained from improvements in preliminary retrieval components such as candidate document selection and paragraph ranking. Given the complexity of the real-world multi-document MRC scenario, it is difficult to jointly optimize both in an end-to-end system. To address this problem, we develop a novel deep cascade learning model, which progressively evolves from the documentlevel and paragraph-level ranking of candidate texts to more precise answer extraction with machine reading comprehension. Specifically, irrelevant documents and paragraphs are first filtered out with simple functions for efficiency consideration. Then we jointly train three modules on the remaining texts for better tracking the answer: the document extraction, the paragraph extraction and the answer extraction. Experiment results show that the proposed method outperforms the previous state-of-the-art methods on two large-scale multidocument benchmark datasets, i. e. , TriviaQA and DuReader. In addition, our online system can stably serve typical scenarios with millions of daily requests in less than 50ms.

PDF Details

IJCAI Conference 2019 Conference Paper

Self-attentive Biaffine Dependency Parsing

Ying Li
Zhenghua Li
Min Zhang
Rui Wang
Sheng Li
Luo Si

The current state-of-the-art dependency parsing approaches employ BiLSTMs to encode input sentences. Motivated by the success of the transformer-based machine translation, this work for the first time applies the self-attention mechanism to dependency parsing as the replacement of the BiLSTM-based encoders, leading to competitive performance on both English and Chinese benchmark data. Based on the detailed error analysis, we then combine the power of both BiLSTM and self-attention via model ensembles, demonstrating their complementary capability of capturing contextual information. Finally, we explore the recently proposed contextualized word representations as extra input features, and further improve the parsing performance.

PDF Details

AAAI Conference 2019 Conference Paper

Syntax-Aware Neural Semantic Role Labeling

Qingrong Xia
Zhenghua Li
Min Zhang
Meishan Zhang
Guohong Fu
Rui Wang
Luo Si

Semantic role labeling (SRL), also known as shallow semantic parsing, is an important yet challenging task in NLP. Motivated by the close correlation between syntactic and semantic structures, traditional discrete-feature-based SRL approaches make heavy use of syntactic features. In contrast, deep-neural-network-based approaches usually encode the input sentence as a word sequence without considering the syntactic structures. In this work, we investigate several previous approaches for encoding syntactic trees, and make a thorough study on whether extra syntax-aware representations are beneficial for neural SRL models. Experiments on the benchmark CoNLL-2005 dataset show that syntax-aware SRL approaches can effectively improve performance over a strong baseline with external word representations from ELMo. With the extra syntax-aware representations, our approaches achieve new state-of-the-art 85. 6 F1 (single model) and 86. 6 F1 (ensemble) on the test data, outperforming the corresponding strong baselines with ELMo by 0. 8 and 1. 0, respectively. Detailed error analysis are conducted to gain more insights on the investigated approaches.

PDF Details

AAAI Conference 2019 Conference Paper

Unsupervised Learning Helps Supervised Neural Word Segmentation

Xiaobin Wang
Deng Cai
Linlin Li
Guangwei Xu
Hai Zhao
Luo Si

By exploiting unlabeled data for further performance improvement for Chinese word segmentation, this work makes the first attempt at exploring adding unsupervised segmentation information into neural supervised segmenter. We survey various effective strategies, including extending the character embedding, augmenting the word score and applying multi-task learning, for leveraging unsupervised information derived from abundant unlabeled data. Experiments on standard data sets show that the explored strategies indeed improve the recall rate of out-of-vocabulary words and thus boost the segmentation accuracy. Moreover, the model enhanced by the proposed methods outperforms state-of-theart models in closed test and shows promising improvement trend when adopting three different strategies with the help of a large unlabeled data set. Our thorough empirical study eventually verifies the proposed approach outperforms the widelyused pre-training approach in terms of effectively making use of freely abundant unlabeled data.

PDF Details

AAAI Conference 2019 Conference Paper

“Bilingual Expert” Can Find Translation Errors

Kai Fan
Jiayi Wang
Bo Li
Fengming Zhou
Boxing Chen
Luo Si

The performances of machine translation (MT) systems are usually evaluated by the metric BLEU when the golden references are provided. However, in the case of model inference or production deployment, golden references are usually expensively available, such as human annotation with bilingual expertise. In order to address the issue of translation quality estimation (QE) without reference, we propose a general framework for automatic evaluation of the translation output for the QE task in the Conference on Statistical Machine Translation (WMT). We first build a conditional target language model with a novel bidirectional transformer, named neural bilingual expert model, which is pre-trained on large parallel corpora for feature extraction. For QE inference, the bilingual expert model can simultaneously produce the joint latent representation between the source and the translation, and real-valued measurements of possible erroneous tokens based on the prior knowledge learned from parallel data. Subsequently, the features will further be fed into a simple Bi-LSTM predictive model for quality estimation. The experimental results show that our approach achieves the state-of-the-art performance in most public available datasets of WMT 2017/2018 QE task.

PDF Details

AAAI Conference 2018 Conference Paper

A Multi-Task Learning Approach for Improving Product Title Compression with User Search Log Data

Jingang Wang
Junfeng Tian
Long Qiu
Sheng Li
Jun Lang
Luo Si
Man Lan

It is a challenging and practical research problem to obtain effective compression of lengthy product titles for Ecommerce. This is particularly important as more and more users browse mobile E-commerce apps and more merchants make the original product titles redundant and lengthy for Search Engine Optimization. Traditional text summarization approaches often require a large amount of preprocessing costs and do not capture the important issue of conversion rate in E-commerce. This paper proposes a novel multi-task learning approach for improving product title compression with user search log data. In particular, a pointer network-based sequence-to-sequence approach is utilized for title compression with an attentive mechanism as an extractive method and an attentive encoder-decoder approach is utilized for generating user search queries. The encoding parameters (i. e. , semantic embedding of original titles) are shared among the two tasks and the attention distributions are jointly optimized. An extensive set of experiments with both human annotated data and online deployment demonstrate the advantage of the proposed research for both compression qualities and online business values.

PDF Details

IJCAI Conference 2018 Conference Paper

Aspect Sentiment Classification with both Word-level and Clause-level Attention Networks

Jingjing Wang
Jie Li
Shoushan Li
Yangyang Kang
Min Zhang
Luo Si
Guodong Zhou

Aspect sentiment classification, a challenging task in sentiment analysis, has been attracting more and more attention in recent years. In this paper, we highlight the need for incorporating the importance degrees of both words and clauses inside a sentence and propose a hierarchical network with both word-level and clause-level attentions to aspect sentiment classification. Specifically, we first adopt sentence-level discourse segmentation to segment a sentence into several clauses. Then, we leverage multiple Bi-directional LSTM layers to encode all clauses and propose a word-level attention layer to capture the importance degrees of words in each clause. Third and finally, we leverage another Bi-directional LSTM layer to encode the outputs from the former layers and propose a clause-level attention layer to capture the importance degrees of all the clauses inside a sentence. Experimental results on the laptop and restaurant datasets from SemEval-2015 demonstrate the effectiveness of our proposed approach to aspect sentiment classification.

PDF Details

IJCAI Conference 2015 Conference Paper

Determining Expert Research Areas with Multi-Instance Learning of Hierarchical Multi-Label Classification Model

Tao Wu
Qifan Wang
Zhiwei Zhang
Luo Si

Automatically identifying the research areas of academic/industry researchers is an important task for building expertise organizations or search systems. In general, this task can be viewed as text classification that generates a set of research areas given the expertise of a researcher like documents of publications. However, this task is challenging because the evidence of a research area may only exist in a few documents instead of all documents. Moreover, the research areas are often organized in a hierarchy, which limits the effectiveness of existing text categorization methods. This paper proposes a novel approach, Multi-instance Learning of Hierarchical Multi-label Classification Model (MIHML) for the task, which effectively identifies multiple research areas in a hierarchy from individual documents within the profile of a researcher. An Expectation- Maximization (EM) optimization algorithm is designed to learn the model parameters. Extensive experiments have been conducted to demonstrate the superior performance of proposed research with a real world application.

PDF Details

IJCAI Conference 2015 Conference Paper

Learning to Hash on Partial Multi-Modal Data

Qifan Wang
Luo Si
Bin Shen

Hashing approach becomes popular for fast similarity search in many large scale applications. Real world data are usually with multiple modalities or having different representations from multiple sources. Various hashing methods have been proposed to generate compact binary codes from multi-modal data. However, most existing multimodal hashing techniques assume that each data example appears in all modalities, or at least there is one modality containing all data examples. But in real applications, it is often the case that every modality suffers from the missing of some data and therefore results in many partial examples, i. e. , examples with some modalities missing. In this paper, we present a novel hashing approach to deal with Partial Multi-Modal data. In particular, the hashing codes are learned by simultaneously ensuring the data consistency among different modalities via latent subspace learning, and preserving data similarity within the same modality through graph Laplacian. We then further improve the codes via orthogonal rotation based on the orthogonal invariant property of our formulation. Experiments on two multi-modal datasets demonstrate the superior performance of the proposed approach over several state-of-the-art multi-modal hashing methods.

PDF Details

AAAI Conference 2015 Conference Paper

Learning to Hash on Structured Data

Qifan Wang
Luo Si
Bin Shen

Hashing techniques have been widely applied for large scale similarity search problems due to the computational and memory efficiency. However, most existing hashing methods assume data examples are independently and identically distributed. But there often exists various additional dependency/structure information between data examples in many real world applications. Ignoring this structure information may limit the performance of existing hashing algorithms. This paper explores the research problem of learning to Hash on Structured Data (HSD) and formulates a novel framework that considers additional structure information. In particular, the hashing function is learned in a unified learning framework by simultaneously ensuring the structural consistency and preserving the similarities between data examples. An iterative gradient descent algorithm is designed as the optimization procedure. Furthermore, we improve the effectiveness of hashing function through orthogonal transformation by minimizing the quantization error. Experimental results on two datasets clearly demonstrate the advantages of the proposed method over several state-of-the-art hashing methods.

PDF Details

IJCAI Conference 2015 Conference Paper

Ranking Preserving Hashing for Fast Similarity Search

Qifan Wang
Zhiwei Zhang
Luo Si

Hashing method becomes popular for large scale similarity search due to its storage and computational efficiency. Many machine learning techniques, ranging from unsupervised to supervised, have been proposed to design compact hashing codes. Most of the existing hashing methods generate binary codes to efficiently find similar data examples to a query. However, the ranking accuracy among the retrieved data examples is not modeled. But in many real world applications, ranking measure is important for evaluating the quality of hashing codes. In this paper, we propose a novel Ranking Preserving Hashing (RPH) approach that directly optimizes a popular ranking measure, Normalized Discounted Cumulative Gain (NDCG), to obtain effective hashing codes with high ranking accuracy. The main difficulty in the direct optimization of NDCG measure is that it depends on the ranking order of data examples, which forms a non-convex non-smooth optimization problem. We address this challenge by optimizing the expectation of NDCG measure calculated based on a linear hashing function. A gradient descent method is designed to achieve the goal. An extensive set of experiments on two large scale datasets demonstrate the superior ranking performance of the proposed approach over several state-of-the-art hashing methods.

PDF Details

AAAI Conference 2014 Conference Paper

Adaptive Knowledge Transfer for Multiple Instance Learning in Image Classification

Qifan Wang
Lingyun Ruan
Luo Si

Multiple Instance Learning (MIL) is a popular learning technique in various vision tasks including image classification. However, most existing MIL methods do not consider the problem of insufficient examples in the given target category. In this case, it is difficult for traditional MIL methods to build an accurate classifier due to the lack of training examples. Motivated by the empirical success of transfer learning, this paper proposes a novel approach of Adaptive Knowledge Transfer for Multiple Instance Learning (AKT-MIL) in image classification. The new method transfers cross-category knowledge from source categories under multiple instance setting for boosting the learning process. A unified learning framework with a data-dependent mixture model is designed to adaptively combine the transferred knowledge from sources with a weak classifier built in the target domain. Based on this framework, an iterative coordinate descent method with Constraint Concave-Convex Programming (CCCP) is proposed as the optimization procedure. An extensive set of experimental results demonstrate that the proposed AKT-MIL approach substantially outperforms several state-of-the-art algorithms on two benchmark datasets, especially in the scenario when very few training examples are available in the target domain.

PDF Details

ICML Conference 2013 Conference Paper

MILEAGE: Multiple Instance LEArning with Global Embedding

Dan Zhang 0007
Jingrui He
Luo Si
Richard D. Lawrence

Multiple Instance Learning (MIL) methods generally represent each example as a collection of instances such that the features for local objects can be better captured, whereas traditional learning methods typically extract a global feature vector for each example as an integral part. However, there is limited research work on which of the two learning scenarios performs better. This paper proposes a novel framework – \emphMultiple Instance LEArning with Global Embedding (MILEAGE), in which the global feature vectors for traditional learning methods are integrated into the MIL setting. MILEAGE can leverage the benefits derived from both learning settings. Within the proposed framework, a large margin method is formulated. In particular, the proposed method adaptively tunes the weights on the two different kinds of feature representations (i. e. , global and multiple instance) for each example and trains the classifier simultaneously. An alternative algorithm is proposed to solve the resulting optimization problem, which extends the bundle method to the non-convex case. Some important properties of the proposed method, such as the convergence rate and the generalization error rate, are analyzed. A series of experiments have been conducted to demonstrate the advantages of the proposed method over several state-of-the-art multiple instance and traditional learning methods.

Details

NeurIPS Conference 2011 Conference Paper

Multiple Instance Learning on Structured Data

Dan Zhang
Yan Liu
Luo Si
Jian Zhang
Richard Lawrence

Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of Concave-Convex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i. e. , webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods.

PDF Details

AAAI Conference 2010 Conference Paper

Non-Negative Matrix Factorization Clustering on Multiple Manifolds

Bin Shen
Luo Si

Nonnegative Matrix Factorization (NMF) is a widely used technique in many applications such as clustering. It approximates the nonnegative data in an original high dimensional space with a linear representation in a low dimensional space by using the product of two nonnegative matrices. In many applications with data such as human faces or digits, data often reside on multiple manifolds, which may overlap or intersect. But the traditional NMF method and other existing variants of NMF do not consider this. This paper proposes a novel clustering algorithm that explicitly models the intrinsic geometrical structure of the data on multiple manifolds with NMF. The idea of the proposed algorithm is that a data point generated by several neighboring points on a specific manifold in the original space should be constructed in a similar way in the low dimensional subspace. A set of experimental results on two real world datasets demonstrate the advantage of the proposed algorithm.

PDF Details

IJCAI Conference 2009 Conference Paper

Dan Zhang
Fei Wang
Luo Si
Tao Li

Clustering, classiﬁcation, and regression, are three major research topics in machine learning. So far, much work has been conducted in solving multiple instance classiﬁcation and multiple instance regression problems, where supervised training patterns are given as bags and each bag consists of some instances. But the research on unsupervised multiple instance clustering is still limited. This paper formulates a novel Maximum Margin Multiple Instance Clustering (M3 IC) problem for the multiple instance clustering task. To avoid solving a nonconvex optimization problem directly, M3 IC is further relaxed, which enables an efﬁcient optimization solution with a combination of Constrained Concave-Convex Procedure (CCCP) and the Cutting Plane method. Furthermore, this paper analyzes some important properties of the proposed method and the relationship between the proposed method and some other related ones. An extensive set of empirical results demonstrate the advantages of the proposed method against existing research for both effectiveness and efﬁciency.

PDF Details

ICML Conference 2005 Conference Paper

Learn to weight terms in information retrieval using category information

Rong Jin 0001
Joyce Chai
Luo Si

Details

UAI Conference 2004 Conference Paper

A Bayesian Approach toward Active Learning for Collaborative Filtering

Rong Jin 0001
Luo Si

Collaborative filtering is a useful technique for exploiting the preference patterns of a group of users to predict the utility of items for the active user. In general, the performance of collaborative filtering depends on the number of rated examples given by the active user. The more the number of rated examples given by the active user, the more accurate the predicted ratings will be. Active learning provides an effective way to acquire the most informative rated examples from active users. Previous work on active learning for collaborative filtering only considers the expected loss function based on the estimated model, which can be misleading when the estimated model is inaccurate. This paper takes one step further by taking into account of the posterior distribution of the estimated model, which results in more robust active learning algorithm. Empirical studies with datasets of movie ratings show that when the number of ratings from the active user is restricted to be small, active learning methods only based on the estimated model don't perform well while the active learning method using the model distribution achieves substantially better performance.

Details

ICML Conference 2003 Conference Paper

Flexible Mixture Model for Collaborative Filtering

Luo Si
Rong Jin 0001

Details

UAI Conference 2003 Conference Paper

Preference-based Graphic Models for Collaborative Filtering

Rong Jin 0001
Luo Si
ChengXiang Zhai

Collaborative filtering is a very useful general technique for exploiting the preference patterns of a group of users to predict the utility of items to a particular user. Previous research has studied several probabilistic graphic models for collaborative filtering with promising results. However, while these models have succeeded in capturing the similarity among users and items in one way or the other, none of them has considered the fact that users with similar interests in items can have very different rating patterns; some users tend to assign a higher rating to all items than other users. In this paper, we propose and study of two new graphic models that address the distinction between user preferences and ratings. In one model, called the decoupled model, we introduce two different variables to decouple a users preferences FROM his ratings. IN the other, called the preference model, we model the orderings OF items preferred BY a USER, rather than the USERs numerical ratings of items. Empirical study over two datasets of movie ratings shows that appropriate modeling of the distinction between user preferences and ratings improves the performance substantially and consistently. Specifically, the proposed decoupled model outperforms all five existing approaches that we compare with significantly, but the preference model is not very successful. These results suggest that explicit modeling of the underlying user preferences is very important for collaborative filtering, but we can not afford ignoring the rating information completely

Details