Arrow Research search

Author name cluster

Xueqi Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

70 papers
2 author rows

Possible papers

70

AAAI Conference 2026 Conference Paper

RLKD: Distilling LLMs’ Reasoning via Reinforcement Learning

  • Shicheng Xu
  • Liang Pang
  • Yunchang Zhu
  • Jia Gu
  • Zihao Wei
  • Jingcheng Deng
  • Feiyang Pan
  • Huawei Shen

Distilling reasoning paths from teacher to student models via supervised fine-tuning (SFT) provides a shortcut for improving the reasoning ability of the smaller Large Language Models (LLMs). However, the reasoning paths generated by teacher models often reflect only surface-level traces of their underlying authentic reasoning. Insights from cognitive neuroscience suggest that authentic reasoning involves a complex interweaving between meta-reasoning that selects the appropriate sub-problem from multiple candidates, and solving, which addresses the sub-problem. It means that authentic reasoning has implicit multi-branch structure. Supervised fine-tuning collapses this rich structure into a flat sequence of token prediction in teacher's reasoning path, which cannot distill this structure to student. To address this limitation, we propose RLKD, a reinforcement learning (RL)-based distillation framework guided by a novel Generative Structure Reward Model (GSRM). Our GSRM converts the reasoning path into multiple meta-reasoning-solving steps and gives the reward to measure the alignment between the reasoning structures of student and teacher. Our RLKD combines this reward with RL, enables the student LLM to internalize the teacher’s implicit multi-branch structure in authentic reasoning, rather than merely mimicking fixed teacher's output paths. Experiments show that RLKD, even when trained on only 0.1% of the data under an RL-only regime, surpasses the performance of standard SFT-RL pipelines and further unleashes the potential reasoning ability of the student LLM than SFT-based distillation.

AAAI Conference 2026 Conference Paper

Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning

  • Wenda Wei
  • Yu-An Liu
  • Ruqing Zhang
  • Jiafeng Guo
  • Lixin Su
  • Shuaiqiang Wang
  • Dawei Yin
  • Maarten de Rijke

Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models, yet its effectiveness remains limited in complex, multi-step reasoning scenarios. Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval. Most approaches rely on outcome-based supervision, offering no explicit guidance for intermediate steps. This often leads to reward hacking and degraded response quality. We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions. To assess the information completeness of each step, we introduce a bidirectional information distance grounded in Kolmogorov complexity, approximated via language model generation probabilities. This quantification measures both how far the current reasoning is from the answer and how well it addresses the question. To optimize reasoning under these bidirectional signals, we adopt a multi-objective reinforcement learning framework with a cascading reward structure that emphasizes early trajectory alignment. Empirical results on seven question answering benchmarks demonstrate that Bi-RAR surpasses previous methods and enables efficient interaction and reasoning with the search engine during training and inference.

ICLR Conference 2025 Conference Paper

A Theory for Token-Level Harmonization in Retrieval-Augmented Generation

  • Shicheng Xu
  • Liang Pang 0001
  • Huawei Shen
  • Xueqi Cheng

Retrieval-augmented generation (RAG) utilizes retrieved texts to enhance large language models (LLMs). Studies show that while RAG provides valuable external information (benefit), it may also mislead LLMs (detriment) with noisy or incorrect retrieved texts. Although many existing methods attempt to preserve benefit and avoid detriment, they lack a theoretical explanation for RAG. The benefit and detriment in the next token prediction of RAG remain a 'black box' that cannot be quantified or compared in an explainable manner, so existing methods are data-driven, need additional utility evaluators or post-hoc. This paper takes the first step towards providing a theory to explain and trade off the benefit and detriment in RAG. We model RAG as the fusion between distributions of LLMs’ knowledge and distributions of retrieved texts. Then, we formalize the trade-off between the value of external knowledge (benefit) and its potential risk of misleading LLMs (detriment) in next token prediction of RAG by distribution difference in this fusion. Finally, we prove that the actual effect of RAG on the token, which is the comparison between benefit and detriment, can be predicted without any training or accessing the utility of retrieval. Based on our theory, we propose a practical novel method, Tok-RAG, which achieves collaborative generation between the pure LLM and RAG at token level to preserve benefit and avoid detriment. Experiments in real-world tasks using LLMs such as OPT, LLaMA-2, and Mistral show the effectiveness of our method and support our theoretical findings. Code is in supplemental material and will be released on GitHub after acceptance.

AAAI Conference 2025 Conference Paper

Attack-in-the-Chain: Bootstrapping Large Language Models for Attacks Against Black-Box Neural Ranking Models

  • Yu-An Liu
  • Ruqing Zhang
  • Jiafeng Guo
  • Maarten de Rijke
  • Yixing Fan
  • Xueqi Cheng

Neural ranking models (NRMs) have been shown to be highly effective in terms of retrieval performance. Unfortunately, they have also displayed a higher degree of sensitivity to attacks than previous generation models. To help expose and address this lack of robustness, we introduce a novel ranking attack framework named Attack-in-the-Chain, which tracks interactions between large language models (LLMs) and NRMs based on chain-of-thought (CoT) prompting to generate adversarial examples under black-box settings. Our approach starts by identifying anchor documents with higher ranking positions than the target document as nodes in the reasoning chain. We then dynamically assign the number of perturbation words to each node and prompt LLMs to execute attacks. Finally, we verify the attack performance of all nodes at each reasoning step and proceed to generate the next reasoning step. Empirical results on two web search benchmarks show the effectiveness of our method.

ICLR Conference 2025 Conference Paper

CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification

  • Mingkun Zhang
  • Keping Bi
  • Wei Chen 0034
  • Jiafeng Guo
  • Xueqi Cheng

In this paper, we aim to build an adversarially robust zero-shot image classifier that can accurately and efficiently classify unseen examples while defending against unforeseen adversarial attacks, addressing critical challenges in real-world safety-sensitive scenarios. To achieve this, we focus on two key challenges: zero-shot classification and defense against unforeseen attacks. We ground our work on CLIP, a vision-language pre-trained model to perform zero-shot classification. To defend against unforeseen attacks, we adopt a purification approach, as it is independent of specific attack types. We then define a purification risk as the KL divergence between the joint distributions of the purification and attack process. The derived lower bound of purification risk inspires us to explore purification in CLIP's multi-modal latent space. We propose a CLIP-based purification method called CLIPure, which has two variants: _CLIPure-Diff_, which models image likelihood with a generative process of its latent vector, and _CLIPure-Cos_, which models the likelihood based on the similarity between embeddings of the image and a blank template. As far as we know, CLIPure is the first purification method in latent space and _CLIPure-Cos_ is the first purification method not relying on generative models, substantially improving defense efficiency. Extensive experimental results show that the robustness achieved by CLIPure is within a small gap of clean accuracy, outperforming SOTA robustness by a large margin, e.g., from 71.7\% to **91.1\%** on CIFAR10, from 59.6\% to **72.6\%** on ImageNet, and **108\%** relative improvements of average robustness on the 13 datasets over previous SOTA, with only 14\% extra inference cost and no additional training.

ICLR Conference 2025 Conference Paper

Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models

  • Shicheng Xu
  • Liang Pang 0001
  • Yunchang Zhu
  • Huawei Shen
  • Xueqi Cheng

Vision-language alignment in Large Vision-Language Models (LVLMs) successfully enables LLMs to understand visual input. However, we find that existing vision-language alignment methods fail to transfer the existing safety mechanism for text in LLMs to vision, which leads to vulnerabilities in toxic image. To explore the cause of this problem, we give the insightful explanation of where and how the safety mechanism of LVLMs operates and conduct comparative analysis between text and vision. We find that the hidden states at the specific transformer layers play a crucial role in the successful activation of safety mechanism, while the vision-language alignment at hidden states level in current methods is insufficient. This results in a semantic shift for input images compared to text in hidden states, therefore misleads the safety mechanism. To address this, we propose a novel Text-Guided vision-language Alignment method (TGA) for LVLMs. TGA retrieves the texts related to input vision and uses them to guide the projection of vision into the hidden states space in LLMs. Experiments show that \textbf{TGA} not only successfully transfers the safety mechanism for text in basic LLMs to vision in vision-language alignment for LVLMs without any safety fine-tuning on the visual modality but also maintains the general performance on various vision tasks (Safe and Good). Code is in supplemental material and will be released on GitHub after acceptance.

ICLR Conference 2025 Conference Paper

Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models

  • Jingcheng Deng
  • Zihao Wei
  • Liang Pang 0001
  • Hanxing Ding
  • Huawei Shen
  • Xueqi Cheng

Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by long-form content, noise, and a complex yet comprehensive nature. Techniques like "local layer key-value storage" and "term-driven optimization", as used in previous methods like MEMIT, are not effective for handling unstructured knowledge. To address these challenges, we propose a novel Unstructured Knowledge Editing method, namely UnKE, which extends previous assumptions in the layer dimension and token dimension. Firstly, in the layer dimension, we propose non-local block key-value storage to replace local layer key-value storage, increasing the representation ability of key-value pairs and incorporating attention layer knowledge. Secondly, in the token dimension, we replace "term-driven optimization" with "cause-driven optimization", which edits the last token directly while preserving context, avoiding the need to locate terms and preventing the loss of context information. Results on newly proposed unstructured knowledge editing dataset (UnKEBench) and traditional structured datasets demonstrate that UnKE achieves remarkable performance, surpassing strong baselines. In addition, UnKE has robust batch editing and sequential editing capabilities.

TIST Journal 2025 Journal Article

Fairness and Diversity in Recommender Systems: A Survey

  • Yuying Zhao
  • Yu Wang
  • Yunchao Liu
  • Xueqi Cheng
  • Charu C. Aggarwal
  • Tyler Derr

Recommender systems (RS) are effective tools for mitigating information overload and have seen extensive applications across various domains. However, the single focus on utility goals proves to be inadequate in addressing real-world concerns, leading to increasing attention to fairness-aware and diversity-aware RS. While most existing studies explore fairness and diversity independently, we identify strong connections between these two domains. In this survey, we first discuss each of them individually and then dive into their connections. Additionally, motivated by the concepts of user-level and item-level fairness, we broaden the understanding of diversity to encompass not only the item level but also the user level. With this expanded perspective on user and item-level diversity, we re-interpret fairness studies from the viewpoint of diversity. This fresh perspective enhances our understanding of fairness-related work and paves the way for potential future research directions. Articles discussed in this survey along with public code links are available at: https://github.com/YuyingZhao/Awesome-Fairness-and-Diversity-Papers-in-Recommender-Systems

NeurIPS Conference 2025 Conference Paper

Inference-time Alignment in Continuous Space

  • Yige Yuan
  • Teng Xiao
  • Li Yunfan
  • Bingbing Xu
  • Shuchang Tao
  • Yunqi Qiu
  • Huawei Shen
  • Xueqi Cheng

Aligning large language models with human feedback at inference time has received increasing attention due to its flexibility. Existing methods rely on generating multiple responses from the base policy for search using a reward model, which can be considered as searching in a discrete response space. However, these methods struggle to explore informative candidates when the base policy is weak or the candidate set is small, resulting in limited effectiveness. In this paper, to address this problem, we propose Simple Energy Adaptation ($\textbf{SEA}$), a simple yet effective algorithm for inference-time alignment. In contrast to expensive search over the discrete space, SEA directly adapts original responses from the base policy toward the optimal one via gradient-based sampling in continuous latent space. Specifically, SEA formulates inference as an iterative optimization procedure on an energy function over actions in the continuous space defined by the optimal policy, enabling simple and effective alignment. For instance, despite its simplicity, SEA outperforms the second-best baseline with a relative improvement of up to $ \textbf{77. 51\%}$ on AdvBench and $\textbf{16. 36\%}$ on MATH. Code is publicly available at [this link](https: //github. com/yuanyige/sea).

ICLR Conference 2025 Conference Paper

Is Factuality Enhancement a Free Lunch For LLMs? Better Factuality Can Lead to Worse Context-Faithfulness

  • Baolong Bi
  • Shenghua Liu
  • Yiwei Wang 0001
  • Lingrui Mei
  • Junfeng Fang
  • Hongcheng Gao
  • Shiyu Ni
  • Xueqi Cheng

As the modern tools of choice for text understanding and generation, large language models (LLMs) are expected to accurately output answers by leveraging the input context. This requires LLMs to possess both context-faithfulness and factual accuracy. While extensive efforts aim to reduce hallucinations through factuality enhancement methods, they also pose risks of hindering context-faithfulness, as factuality enhancement can lead LLMs to become overly confident in their parametric knowledge, causing them to overlook the relevant input context. In this work, we argue that current factuality enhancement methods can significantly undermine the context-faithfulness of LLMs. We first revisit the current factuality enhancement methods and evaluate their effectiveness in enhancing factual accuracy. Next, we evaluate their performance on knowledge editing tasks to assess the potential impact on context-faithfulness. The experimental results reveal that while these methods may yield inconsistent improvements in factual accuracy, they also cause a more severe decline in context-faithfulness, with the largest decrease reaching a striking 69.7\%. To explain these declines, we analyze the hidden states and logit distributions for the tokens representing new knowledge and parametric knowledge respectively, highlighting the limitations of current approaches. Our finding highlights the complex trade-offs inherent in enhancing LLMs. Therefore, we recommend that more research on LLMs' factuality enhancement make efforts to reduce the sacrifice of context-faithfulness.

ICLR Conference 2024 Conference Paper

A Topological Perspective on Demystifying GNN-Based Link Prediction Performance

  • Yu Wang 0160
  • Tong Zhao 0003
  • Yuying Zhao
  • Yunchao Liu 0001
  • Xueqi Cheng
  • Neil Shah
  • Tyler Derr

Graph Neural Networks (GNNs) have shown great promise in learning node embeddings for link prediction (LP). While numerous studies improve the overall GNNs' LP performance, none have explored their varying performance across different nodes and the underlying reasons. To this end, we demystify which nodes perform better from the perspective of their local topology. Despite the widespread belief that low-degree nodes exhibit worse LP performance, we surprisingly observe an inconsistent performance trend. This prompts us to propose a node-level metric, Topological Concentration (TC), based on the intersection of the local subgraph of each node with the ones of its neighbors. We empirically demonstrate that TC correlates with LP performance more than other node-level topological metrics, better identifying low-performing nodes than using degree. With TC, we discover a novel topological distribution shift issue in which nodes' newly joined neighbors tend to become less interactive with their existing neighbors, compromising the generalizability of node embeddings for LP at testing time. To make the computation of TC scalable, We further propose Approximated Topological Concentration (ATC) and justify its efficacy in approximating TC with reduced computation complexity. Given the positive correlation between node TC and its LP performance, we explore the potential of boosting LP performance via enhancing TC by re-weighting edges in the message-passing and discuss its effectiveness with limitations. Our code is publicly available at https://github.com/YuWVandy/Topo_LP_GNN.

NeurIPS Conference 2024 Conference Paper

CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense

  • Mingkun Zhang
  • Keping Bi
  • Wei Chen
  • Quanrun Chen
  • Jiafeng Guo
  • Xueqi Cheng

Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86. 39\% (+4. 01\%) on CIFAR-10, 56. 25\% (+3. 13\%) on CIFAR-100, and 82. 62\% (+4. 93\%) on GTSRB (German Traffic Sign Recognition Benchmark). The code is available athttps: //github. com/CAS-AISafetyBasicResearchGroup/CausalDiff.

ECAI Conference 2024 Conference Paper

Classifier Guidance Enhances Diffusion-Based Adversarial Purification by Preserving Predictive Information

  • Mingkun Zhang
  • Jianing Li
  • Wei Chen 0034
  • Jiafeng Guo
  • Xueqi Cheng

Adversarial purification is one of the promising approaches to defend neural networks against adversarial attacks. Recently, methods utilizing diffusion probabilistic models have achieved great success for adversarial purification in image classification tasks. However, such methods fall into the dilemma of balancing the needs for noise removal and information preservation. This paper points out that existing adversarial purification methods based on diffusion models gradually lose sample information during the core denoising process, causing occasional label shift in subsequent classification tasks. As a remedy, we suggest to suppress such information loss by introducing guidance from the classifier confidence. Specifically, we propose Classifier-cOnfidence gUided Purification (COUP) algorithm, which purifies adversarial examples while keeping away from the classifier decision boundary. Experimental results show that COUP can achieve better adversarial robustness under strong attack methods.

NeurIPS Conference 2024 Conference Paper

Generative Retrieval Meets Multi-Graded Relevance

  • Yubao Tang
  • Ruqing Zhang
  • Jiafeng Guo
  • Maarten de Rijke
  • Wei Chen
  • Xueqi Cheng

Generative retrieval represents a novel approach to information retrieval, utilizing an encoder-decoder architecture to directly produce relevant document identifiers (docids) for queries. While this method offers benefits, current implementations are limited to scenarios with binary relevance data, overlooking the potential for documents to have multi-graded relevance. Extending generative retrieval to accommodate multi-graded relevance poses challenges, including the need to reconcile likelihood probabilities for docid pairs and the possibility of multiple relevant documents sharing the same identifier. To address these challenges, we introduce a new framework called GRaded Generative Retrieval (GR$^2$). Our approach focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training. Firstly, we aim to create identifiers that are both semantically relevant and sufficiently distinct to represent individual documents effectively. This is achieved by jointly optimizing the relevance and distinctness of docids through a combination of docid generation and autoencoder models. Secondly, we incorporate information about the relationship between relevance grades to guide the training process. Specifically, we leverage a constrained contrastive training strategy to bring the representations of queries and the identifiers of their relevant documents closer together, based on their respective relevance grades. Extensive experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of our method.

AAAI Conference 2024 Conference Paper

PDE+: Enhancing Generalization via PDE with Adaptive Distributional Diffusion

  • Yige Yuan
  • Bingbing Xu
  • Bo Lin
  • Liang Hou
  • Fei Sun
  • Huawei Shen
  • Xueqi Cheng

The generalization of neural networks is a central challenge in machine learning, especially concerning the performance under distributions that differ from training ones. Current methods, mainly based on the data-driven paradigm such as data augmentation, adversarial training, and noise injection, may encounter limited generalization due to model non-smoothness. In this paper, we propose to investigate generalization from a Partial Differential Equation (PDE) perspective, aiming to enhance it directly through the underlying function of neural networks, rather than focusing on adjusting input data. Specifically, we first establish the connection between neural network generalization and the smoothness of the solution to a specific PDE, namely transport equation. Building upon this, we propose a general framework that introduces adaptive distributional diffusion into transport equation to enhance the smoothness of its solution, thereby improving generalization. In the context of neural networks, we put this theoretical framework into practice as PDE+ (PDE with Adaptive Distributional Diffusion) which diffuses each sample into a distribution covering semantically similar inputs. This enables better coverage of potentially unobserved distributions in training, thus improving generalization beyond merely data-driven methods. The effectiveness of PDE+ is validated through extensive experimental settings, demonstrating its superior performance compared to state-of-the-art methods. Our code is available at https://github.com/yuanyige/pde-add.

AAAI Conference 2024 Conference Paper

Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off

  • Yu-An Liu
  • Ruqing Zhang
  • Mingkun Zhang
  • Wei Chen
  • Maarten de Rijke
  • Jiafeng Guo
  • Xueqi Cheng

Neural ranking models (NRMs) have shown great success in information retrieval (IR). But their predictions can easily be manipulated using adversarial examples, which are crafted by adding imperceptible perturbations to legitimate documents. This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs. By incorporating adversarial examples into training data, adversarial training has become the de facto defense approach to adversarial attacks against NRMs. However, this defense mechanism is subject to a trade-off between effectiveness and adversarial robustness. In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs. We decompose the robust ranking error into two components, i.e., a natural ranking error for effectiveness evaluation and a boundary ranking error for assessing adversarial robustness. Then, we define the perturbation invariance of a ranking model and prove it to be a differentiable upper bound on the boundary ranking error for attainable computation. Informed by our theoretical analysis, we design a novel perturbation-invariant adversarial training (PIAT) method for ranking models to achieve a better effectiveness-robustness trade-off. We design a regularized surrogate loss, in which one term encourages the effectiveness to be maximized while the regularization term encourages the output to be smooth, so as to improve adversarial robustness. Experimental results on several ranking models demonstrate the superiority of PITA compared to existing adversarial defenses.

NeurIPS Conference 2024 Conference Paper

Understanding and Improving Adversarial Collaborative Filtering for Robust Recommendation

  • Kaike Zhang
  • Qi Cao
  • Yunfan Wu
  • Fei Sun
  • Huawei Shen
  • Xueqi Cheng

Adversarial Collaborative Filtering (ACF), which typically applies adversarial perturbations at user and item embeddings through adversarial training, is widely recognized as an effective strategy for enhancing the robustness of Collaborative Filtering (CF) recommender systems against poisoning attacks. Besides, numerous studies have empirically shown that ACF can also improve recommendation performance compared to traditional CF. Despite these empirical successes, the theoretical understanding of ACF's effectiveness in terms of both performance and robustness remains unclear. To bridge this gap, in this paper, we first theoretically show that ACF can achieve a lower recommendation error compared to traditional CF with the same training epochs in both clean and poisoned data contexts. Furthermore, by establishing bounds for reductions in recommendation error during ACF's optimization process, we find that applying personalized magnitudes of perturbation for different users based on their embedding scales can further improve ACF's effectiveness. Building on these theoretical understandings, we propose Personalized Magnitude Adversarial Collaborative Filtering (PamaCF). Extensive experiments demonstrate that PamaCF effectively defends against various types of poisoning attacks while significantly enhancing recommendation performance.

AAAI Conference 2023 Conference Paper

A Provable Framework of Learning Graph Embeddings via Summarization

  • Houquan Zhou
  • Shenghua Liu
  • Danai Koutra
  • Huawei Shen
  • Xueqi Cheng

Given a large graph, can we learn its node embeddings from a smaller summary graph? What is the relationship between embeddings learned from original graphs and their summary graphs? Graph representation learning plays an important role in many graph mining applications, but learning em-beddings of large-scale graphs remains a challenge. Recent works try to alleviate it via graph summarization, which typ-ically includes the three steps: reducing the graph size by combining nodes and edges into supernodes and superedges,learning the supernode embedding on the summary graph and then restoring the embeddings of the original nodes. How-ever, the justification behind those steps is still unknown. In this work, we propose GELSUMM, a well-formulated graph embedding learning framework based on graph sum-marization, in which we show the theoretical ground of learn-ing from summary graphs and the restoration with the three well-known graph embedding approaches in a closed form.Through extensive experiments on real-world datasets, we demonstrate that our methods can learn graph embeddings with matching or better performance on downstream tasks.This work provides theoretical analysis for learning node em-beddings via summarization and helps explain and under-stand the mechanism of the existing works.

NeurIPS Conference 2023 Conference Paper

Augmentation-Aware Self-Supervision for Data-Efficient GAN Training

  • Liang Hou
  • Qi Cao
  • Yige Yuan
  • Songtao Zhao
  • Chongyang Ma
  • Siyuan Pan
  • Pengfei Wan
  • Zhongyuan Wang

Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting. Previously proposed differentiable augmentation demonstrates improved data efficiency of training GANs. However, the augmentation implicitly introduces undesired invariance to augmentation for the discriminator since it ignores the change of semantics in the label space caused by data transformation, which may limit the representation learning ability of the discriminator and ultimately affect the generative modeling performance of the generator. To mitigate the negative impact of invariance while inheriting the benefits of data augmentation, we propose a novel augmentation-aware self-supervised discriminator that predicts the augmentation parameter of the augmented data. Particularly, the prediction targets of real data and generated data are required to be distinguished since they are different during training. We further encourage the generator to adversarially learn from the self-supervised discriminator by generating augmentation-predictable real and not fake data. This formulation connects the learning objective of the generator and the arithmetic $-$ harmonic mean divergence under certain assumptions. We compare our method with state-of-the-art (SOTA) methods using the class-conditional BigGAN and unconditional StyleGAN2 architectures on data-limited CIFAR-10, CIFAR-100, FFHQ, LSUN-Cat, and five low-shot datasets. Experimental results demonstrate significant improvements of our method over SOTA methods in training data-efficient GANs.

AAAI Conference 2023 Conference Paper

Learning Adversarially Robust Sparse Networks via Weight Reparameterization

  • Chenhao Li
  • Qiang Qiu
  • Zhibin Zhang
  • Jiafeng Guo
  • Xueqi Cheng

Although increasing model size can enhance the adversarial robustness of deep neural networks, in resource-constrained environments, there exist critical sparsity constraints. While the recent robust pruning technologies show promising direction to obtain adversarially robust sparse networks, they perform poorly with high sparsity. In this work, we bridge this performance gap by reparameterizing network parameters to simultaneously learn the sparse structure and the robustness. Specifically, we introduce Twin-Rep, which reparameterizes original weights into the product of two factors during training and performs pruning on the reparameterized weights to satisfy the target sparsity constraint. Twin-Rep implicitly adds the sparsity constraint without changing the robust training objective, thus can enhance robustness under high sparsity. We also introduce another variant of weight reparameterization for better channel pruning. When inferring, we restore the original weight structure to obtain compact and robust networks. Extensive experiments on diverse datasets demonstrate that our method achieves state-of-the-art results, outperforming the current sparse robust training method and robustness-aware pruning method. Our code is available at https://github.com/UCAS-LCH/Twin-Rep.

AAAI Conference 2023 Conference Paper

Rich Event Modeling for Script Event Prediction

  • Long Bai
  • Saiping Guan
  • Zixuan Li
  • Jiafeng Guo
  • Xiaolong Jin
  • Xueqi Cheng

Script is a kind of structured knowledge extracted from texts, which contains a sequence of events. Based on such knowledge, script event prediction aims to predict the subsequent event. To do so, two aspects should be considered for events, namely, event description (i.e., what the events should contain) and event encoding (i.e., how they should be encoded). Most existing methods describe an event by a verb together with a few core arguments (i.e., subject, object, and indirect object), which are not precise enough. In addition, existing event encoders are limited to a fixed number of arguments, which are not flexible enough to deal with extra information. Thus, in this paper, we propose the Rich Event Prediction (REP) framework for script event prediction. Fundamentally, it is based on the proposed rich event description, which enriches the existing ones with three kinds of important information, namely, the senses of verbs, extra semantic roles, and types of participants. REP contains an event extractor to extract such information from texts. Based on the extracted rich information, a predictor then selects the most probable subsequent event. The core component of the predictor is a transformer-based event encoder that integrates the above information flexibly. Experimental results on the widely used Gigaword Corpus show the effectiveness of the proposed framework.

ICML Conference 2022 Conference Paper

Conditional GANs with Auxiliary Discriminative Classifier

  • Liang Hou
  • Qi Cao 0005
  • Huawei Shen
  • Siyuan Pan
  • Xiaoshuang Li
  • Xueqi Cheng

Conditional generative models aim to learn the underlying joint distribution of data and labels to achieve conditional data generation. Among them, the auxiliary classifier generative adversarial network (AC-GAN) has been widely used, but suffers from the problem of low intra-class diversity of the generated samples. The fundamental reason pointed out in this paper is that the classifier of AC-GAN is generator-agnostic, which therefore cannot provide informative guidance for the generator to approach the joint distribution, resulting in a minimization of the conditional entropy that decreases the intra-class diversity. Motivated by this understanding, we propose a novel conditional GAN with an auxiliary discriminative classifier (ADC-GAN) to resolve the above problem. Specifically, the proposed auxiliary discriminative classifier becomes generator-aware by recognizing the class-labels of the real data and the generated data discriminatively. Our theoretical analysis reveals that the generator can faithfully learn the joint distribution even without the original discriminator, making the proposed ADC-GAN robust to the value of the coefficient hyperparameter and the selection of the GAN loss, and stable during training. Extensive experimental results on synthetic and real-world datasets demonstrate the superiority of ADC-GAN in conditional generative modeling compared to state-of-the-art classifier-based and projection-based conditional GANs.

IJCAI Conference 2022 Conference Paper

MGAD: Learning Descriptional Representation Distilled from Distributional Semantics for Unseen Entities

  • Yuanzheng Wang
  • Xueqi Cheng
  • Yixing Fan
  • Xiaofei Zhu
  • Huasheng Liang
  • Qiang Yan
  • Jiafeng Guo

Entity representation plays a central role in building effective entity retrieval models. Recent works propose to learn entity representations based on entity-centric contexts, which achieve SOTA performances on many tasks. However, these methods lead to poor representations for unseen entities since they rely on a multitude of occurrences for each entity to enable accurate representations. To address this issue, we propose to learn enhanced descriptional representations for unseen entities by distilling knowledge from distributional semantics into descriptional embeddings. Specifically, we infer enhanced embeddings for unseen entities based on descriptions by aligning the descriptional embedding space to the distributional embedding space with different granularities, i. e. , element-level, batch-level and space-level alignment. Experimental results on four benchmark datasets show that our approach improves the performance over all baseline methods. In particular, our approach can achieve the effectiveness of the teacher model on almost all entities, and maintain such high performance on unseen entities.

AAAI Conference 2021 Conference Paper

AugSplicing: Synchronized Behavior Detection in Streaming Tensors

  • Jiabao Zhang
  • Shenghua Liu
  • Wenting Hou
  • Siddharth Bhatia
  • Huawei Shen
  • Wenjian Yu
  • Xueqi Cheng

How can we track synchronized behavior in a stream of timestamped tuples, such as mobile devices installing and uninstalling applications in the lockstep, to boost their ranks in the app store? We model such tuples as entries in a streaming tensor, which augments attribute sizes in its modes over time. Synchronized behavior tends to form dense blocks (i. e. subtensors) in such a tensor, signaling anomalous behavior, or interesting communities. However, existing dense block detection methods are either based on a static tensor, or lack an efficient algorithm in a streaming setting. Therefore, we propose a fast streaming algorithm, AUGSPLICING, which can detect the top dense blocks by incrementally splicing the previous detection with the incoming ones in new tuples, avoiding re-runs over all the history data at every tracking time step. AUGSPLICING is based on a splicing condition that guides the algorithm (Section 4). Compared to the state-of-the-art methods, our method is (1) effective to detect fraudulent behavior in installing data of real-world apps and find a synchronized group of students with interesting features in campus Wi-Fi data; (2) robust with splicing theory for dense block detection; (3) streaming and faster than the existing streaming algorithm, with closely comparable accuracy.

AAAI Conference 2021 Conference Paper

Learning to Truncate Ranked Lists for Information Retrieval

  • Chen Wu
  • Ruqing Zhang
  • Jiafeng Guo
  • Yixing Fan
  • Yanyan Lan
  • Xueqi Cheng

Ranked list truncation is of critical importance in a variety of professional information retrieval applications such as patent search or legal search. The goal is to dynamically determine the number of returned documents according to some userdefined objectives, in order to reach a balance between the overall utility of the results and user efforts. Existing methods formulate this task as a sequential decision problem and take some pre-defined loss as a proxy objective, which suffers from the limitation of local decision and non-direct optimization. In this work, we propose a global decision based truncation model named AttnCut, which directly optimizes user-defined objectives for the ranked list truncation. Specifically, we take the successful transformer architecture to capture the global dependency within the ranked list for truncation decision, and employ the reward augmented maximum likelihood (RAML) for direct optimization. We consider two types of user-defined objectives which are of practical usage. One is the widely adopted metric such as F1 which acts as a balanced objective, and the other is the best F1 under some minimal recall constraint which represents a typical objective in professional search. Empirical results over the Robust04 and MQ2007 datasets demonstrate the effectiveness of our approach as compared with the state-of-the-art baselines.

AAAI Conference 2021 Conference Paper

SDGNN: Learning Node Representation for Signed Directed Networks

  • Junjie Huang
  • Huawei Shen
  • Liang Hou
  • Xueqi Cheng

Network embedding is aimed at mapping nodes in a network into low-dimensional vector representations. Graph Neural Networks (GNNs) have received widespread attention and lead to state-of-the-art performance in learning node representations. However, most GNNs only work in unsigned networks, where only positive links exist. It is not trivial to transfer these models to signed directed networks, which are widely observed in the real world yet less studied. In this paper, we first review two fundamental sociological theories (i. e. , status theory and balance theory) and conduct empirical studies on real-world datasets to analyze the social mechanism in signed directed networks. Guided by related sociological theories, we propose a novel Signed Directed Graph Neural Networks model named SDGNN to learn node embeddings for signed directed networks. The proposed model simultaneously reconstructs link signs, link directions, and signed directed triangles. We validate our model’s effectiveness on five real-world datasets, which are commonly used as the benchmark for signed network embeddings. Experiments demonstrate the proposed model outperforms existing models, including feature-based methods, network embedding methods, and several GNN methods.

NeurIPS Conference 2021 Conference Paper

Self-Supervised GANs with Label Augmentation

  • Liang Hou
  • Huawei Shen
  • Qi Cao
  • Xueqi Cheng

Recently, transformation-based self-supervised learning has been applied to generative adversarial networks (GANs) to mitigate catastrophic forgetting in the discriminator by introducing a stationary learning environment. However, the separate self-supervised tasks in existing self-supervised GANs cause a goal inconsistent with generative modeling due to the fact that their self-supervised classifiers are agnostic to the generator distribution. To address this problem, we propose a novel self-supervised GAN that unifies the GAN task with the self-supervised task by augmenting the GAN labels (real or fake) via self-supervision of data transformation. Specifically, the original discriminator and self-supervised classifier are unified into a label-augmented discriminator that predicts the augmented labels to be aware of both the generator distribution and the data distribution under every transformation, and then provide the discrepancy between them to optimize the generator. Theoretically, we prove that the optimal generator could converge to replicate the real data distribution. Empirically, we show that the proposed method significantly outperforms previous self-supervised and data augmentation GANs on both generative modeling and representation learning across benchmark datasets.

AAAI Conference 2021 Conference Paper

Sketch and Customize: A Counterfactual Story Generator

  • Changying Hao
  • Liang Pang
  • Yanyan Lan
  • Yan Wang
  • Jiafeng Guo
  • Xueqi Cheng

Recent text generation models are easy to generate relevant and fluent text for the given text, while lack of causal reasoning ability when we change some parts of the given text. Counterfactual story rewriting is a recently proposed task to test the causal reasoning ability for text generation models, which requires a model to predict the corresponding story ending when the condition is modified to a counterfactual one. Previous works have shown that the traditional sequence-tosequence model cannot well handle this problem, as it often captures some spurious correlations between the original and counterfactual endings, instead of the causal relations between conditions and endings. To address this issue, we propose a sketch-and-customize generation model guided by the causality implicated in the conditions and endings. In the sketch stage, a skeleton is extracted by removing words which are conflict to the counterfactual condition, from the original ending. In the customize stage, a generation model is used to fill proper words in the skeleton under the guidance of the counterfactual condition. In this way, the obtained counterfactual ending is both relevant to the original ending and consistent with the counterfactual condition. Experimental results show that the proposed model generates much better endings, as compared with the traditional sequence-to-sequence model.

AAAI Conference 2021 Conference Paper

Slimmable Generative Adversarial Networks

  • Liang Hou
  • Zehuan Yuan
  • Lei Huang
  • Huawei Shen
  • Xueqi Cheng
  • Changhu Wang

Generative adversarial networks (GANs) have achieved remarkable progress in recent years, but the continuously growing scale of models makes them challenging to deploy widely in practical applications. In particular, for real-time generation tasks, different devices require generators of different sizes due to varying computing power. In this paper, we introduce slimmable GANs (SlimGANs), which can flexibly switch the width of the generator to accommodate various quality-efficiency trade-offs at runtime. Specifically, we leverage multiple discriminators that share partial parameters to train the slimmable generator. To facilitate the consistency between generators of different widths, we present a stepwise inplace distillation technique that encourages narrow generators to learn from wide ones. As for class-conditional generation, we propose a sliceable conditional batch normalization that incorporates the label information into different widths. Our methods are validated, both quantitatively and qualitatively, by extensive experiments and a detailed ablation study.

AAAI Conference 2021 Conference Paper

Towards Consumer Loan Fraud Detection: Graph Neural Networks with Role-Constrained Conditional Random Field

  • Bingbing Xu
  • Huawei Shen
  • Bingjie Sun
  • Rong An
  • Qi Cao
  • Xueqi Cheng

Consumer loans, i. e. , loans to finance consumers to buy certain types of expenditures, is increasingly popular in ecommerce platform. Different from traditional loans with mortgage, online consumer loans only take personal credit as collateral for loans. Consequently, loan fraud detection is particularly critical for lenders to avoid economic loss. Previous methods mainly leverage applicant’s attributes and historical behavior for loan fraud detection. Although these methods gain success at detecting potential charge-offs, yet they perform worse when multiple persons with various roles (e. g. , sellers, intermediaries) collude to apply fraudulent loan. To combat this challenge, we consider the problem of loan fraud detection via exploiting roles of users and multi-type social relationships among users. We propose a novel Graph neural network with a Role-constrained Conditional random field, namely GRC, to learn the representation of applicants and detect loan fraud based on the learned representation. The proposed model characterizes the multiple types of relationships via self-attention mechanism and employs conditional random field to constrain users with the same role to have similar representation. We validate the proposed model through experiments in large-scale auto-loan scenario. Extensive experiments demonstrate that our model achieves stateof-the-art results in loan fraud detection on Alipay, one online credit payment service serving more than 450 million users in China.

NeurIPS Conference 2021 Conference Paper

Uncertainty Calibration for Ensemble-Based Debiasing Methods

  • Ruibin Xiong
  • Yimeng Chen
  • Liang Pang
  • Xueqi Cheng
  • Zhi-Ming Ma
  • Yanyan Lan

Ensemble-based debiasing methods have been shown effective in mitigating the reliance of classifiers on specific dataset bias, by exploiting the output of a bias-only model to adjust the learning target. In this paper, we focus on the bias-only model in these ensemble-based methods, which plays an important role but has not gained much attention in the existing literature. Theoretically, we prove that the debiasing performance can be damaged by inaccurate uncertainty estimations of the bias-only model. Empirically, we show that existing bias-only models fall short in producing accurate uncertainty estimations. Motivated by these findings, we propose to conduct calibration on the bias-only model, thus achieving a three-stage ensemble-based debiasing framework, including bias modeling, model calibrating, and debiasing. Experimental results on NLI and fact verification tasks show that our proposed three-stage debiasing framework consistently outperforms the traditional two-stage one in out-of-distribution accuracy.

ECAI Conference 2020 Conference Paper

Dual Rejection Sampling for Wasserstein Auto-Encoders

  • Liang Hou
  • Huawei Shen
  • Xueqi Cheng

Deep generative models enhanced by Wasserstein distance have achieved remarkable success in recent years. Wasserstein Auto-Encoders (WAEs) are auto-encoder based generative models that aim to minimize the Wasserstein distance between the data distribution and the generated distribution. The quality of generated samples of WAE depends on the distance between the data distribution and the generated distribution. However, WAE actually minimizes a Wasserstein distance between the data distribution and the reconstructed distribution in data space plus a penalty divergence between the aggregated posterior and the prior in latent space, leading a gap between theory and practice. Consequently, the quality of generated samples of WAE is not satisfactory. In this paper, we propose a novel dual rejection sampling method to improve the performance of WAE on the generated samples in the sampling phase. The proposed method first corrects the generative prior by a discriminator based rejection sampling scheme in latent space and then rectifies the generated distribution by another discriminator based rejection sampling method in data space. Our method is validated, both qualitatively and quantitatively, by extensive experiments on three real-world datasets.

AAAI Conference 2020 Short Paper

Entity Type Enhanced Neural Model for Distantly Supervised Relation Extraction (Student Abstract)

  • Long Bai
  • Xiaolong Jin
  • Chuanzhi Zhuang
  • Xueqi Cheng

Distantly Supervised Relation Extraction (DSRE) has been widely studied, since it can automatically extract relations from very large corpora. However, existing DSRE methods only use little semantic information about entities, such as the information of entity type. Thus, in this paper, we propose a method for integrating entity type information into a neural network based DSRE model. It also adopts two attention mechanisms, namely, sentence attention and type attention. The former selects the representative sentences for a sentence bag, while the latter selects appropriate type information for entities. Experimental comparison with existing methods on a benchmark dataset demonstrates its merits.

IJCAI Conference 2020 Conference Paper

Evaluating Natural Language Generation via Unbalanced Optimal Transport

  • Yimeng Chen
  • Yanyan Lan
  • Ruinbin Xiong
  • Liang Pang
  • Zhiming Ma
  • Xueqi Cheng

Embedding-based evaluation measures have shown promising improvements on the correlation with human judgments in natural language generation. In these measures, various intrinsic metrics are used in the computation, including generalized precision, recall, F-score and the earth mover's distance. However, the relations between these metrics are unclear, making it difficult to determine which measure to use in real applications. In this paper, we provide an in-depth study on the relations between these metrics. Inspired by the optimal transportation theory, we prove that these metrics correspond to the optimal transport problem with different hard marginal constraints. However, these hard marginal constraints may cause the problem of incomplete and noisy matching in the evaluation process. Therefore we propose a family of new evaluation metrics, namely Lazy Earth Mover's Distances, based on the more general unbalanced optimal transport problem. Experimental results on WMT18 and WMT19 show that our proposed metrics have the ability to produce more consistent evaluation results with human judgements, as compared with existing intrinsic metrics.

AAAI Conference 2020 Conference Paper

FlowScope: Spotting Money Laundering Based on Graphs

  • Xiangfeng Li
  • Shenghua Liu
  • Zifeng Li
  • Xiaotian Han
  • Chuan Shi
  • Bryan Hooi
  • He Huang
  • Xueqi Cheng

Given a graph of the money transfers between accounts of a bank, how can we detect money laundering? Money laundering refers to criminals using the bank’s services to move massive amounts of illegal money to untraceable destination accounts, in order to inject their illegal money into the legitimate financial system. Existing graph fraud detection approaches focus on dense subgraph detection, without considering the fact that money laundering involves high-volume flows of funds through chains of bank accounts, thereby decreasing their detection accuracy. Instead, we propose to model the transactions using a multipartite graph, and detect the complete flow of money from source to destination using a scalable algorithm, FlowScope. Theoretical analysis shows that FlowScope provides guarantees in terms of the amount of money that fraudsters can transfer without being detected. FlowScope outperforms state-of-the-art baselines in accurately detecting the accounts involved in money laundering, in both injected and real-world data settings.

AAAI Conference 2020 Short Paper

Link Prediction between Group Entities in Knowledge Graphs (Student Abstract)

  • Jialin Su
  • Yuanzhuo Wang
  • Xiaolong Jin
  • Yantao Jia
  • Xueqi Cheng

Link prediction in knowledge graphs (KGs) aims at predicting potential links between entities in KGs. Existing knowledge graph embedding (KGE) based methods represent individual entities and links in KGs as vectors in low-dimension space. However, these methods focus mainly on the link prediction of individual entities, yet neglect that between group entities, which exist widely in real-world KGs. In this paper, we propose a KGE based method, called GTransA, for link prediction between group entities in a heterogeneous network by integrating individual entity links into group entity links during prediction. Experiments show that GTransA decreases mean rank by 5. 4%, compared to TransA.

ICML Conference 2020 Conference Paper

On the Relation between Quality-Diversity Evaluation and Distribution-Fitting Goal in Text Generation

  • Jianing Li
  • Yanyan Lan
  • Jiafeng Guo
  • Xueqi Cheng

The goal of text generation models is to fit the underlying real probability distribution of text. For performance evaluation, quality and diversity metrics are usually applied. However, it is still not clear to what extend can the quality-diversity evaluation reflect the distribution-fitting goal. In this paper, we try to reveal such relation in a theoretical approach. We prove that under certain conditions, a linear combination of quality and diversity constitutes a divergence metric between the generated distribution and the real distribution. We also show that the commonly used BLEU/Self-BLEU metric pair fails to match any divergence metric, thus propose CR/NRR as a substitute for quality/diversity metric pair.

AAAI Conference 2020 Conference Paper

Structure Learning for Headline Generation

  • Ruqing Zhang
  • Jiafeng Guo
  • Yixing Fan
  • Yanyan Lan
  • Xueqi Cheng

Headline generation is an important problem in natural language processing, which aims to describe a document by a compact and informative headline. Some recent successes on this task have been achieved by advanced graph-based neural models, which marry the representational power of deep neural networks with the structural modeling ability of the relational sentence graphs. The advantages of graph-based neural models over traditional Seq2Seq models lie in that they can encode long-distance relationship between sentences beyond the surface linear structure. However, since documents are typically weakly-structured data, modern graph-based neural models usually rely on manually designed rules or some heuristics to construct the sentence graph a prior. This may largely limit the power and increase the cost of the graphbased methods. In this paper, therefore, we propose to incorporate structure learning into the graph-based neural models for headline generation. That is, we want to automatically learn the sentence graph using a data-driven way, so that we can unveil the document structure flexibly without prior heuristics or rules. To achieve this goal, we employ a deep & wide network to encode rich relational information between sentences for the sentence graph learning. For the deep component, we leverage neural matching models, either representation-focused or interaction-focused model, to learn semantic similarity between sentences. For the wide component, we encode a variety of discourse relations between sentences. A Graph Convolutional Network (GCN) is then applied over the sentence graph to generate high-level relational representations for headline generation. The whole model could be optimized end-to-end so that the structure and representation could be learned jointly. Empirical studies show that our model can significantly outperform the stateof-the-art headline generation models.

AAAI Conference 2019 Short Paper

An Adaptive Framework for Conversational Question Answering

  • Lixin Su
  • Jiafeng Guo
  • Yixing Fan
  • Yanyan Lan
  • Ruqing Zhang
  • Xueqi Cheng

In Conversational Question Answering (CoQA), humans propose a series of questions to satisfy their information needs. Based on our preliminary analysis, there are two major types of questions, namely verification questions and knowledgeseeking questions. The first one is to verify some existing facts, while the latter is to obtain new knowledge about some specific object. These two types of questions differ significantly in their answering ways. However, existing methods usually treat them uniformly, which may easily be biased by the dominant type of questions and obtain inferior overall performance. In this work, we propose an adaptive framework to handle these two types of questions in different ways based on their own characteristics. We conduct experiments on the recently released CoQA benchmark dataset, and the results demonstrate that our method outperforms the state-of-the-art baseline methods.

IJCAI Conference 2019 Conference Paper

BeatGAN: Anomalous Rhythm Detection using Adversarially Generated Time Series

  • Bin Zhou
  • Shenghua Liu
  • Bryan Hooi
  • Xueqi Cheng
  • Jing Ye

Given a large-scale rhythmic time series containing mostly normal data segments (or `beats'), can we learn how to detect anomalous beats in an effective yet efficient way? For example, how can we detect anomalous beats from electrocardiogram (ECG) readings? Existing approaches either require excessively high amounts of labeled and balanced data for classification, or rely on less regularized reconstructions, resulting in lower accuracy in anomaly detection. Therefore, we propose BeatGAN, an unsupervised anomaly detection algorithm for time series data. BeatGAN outputs explainable results to pinpoint the anomalous time ticks of an input beat, by comparing them to adversarially generated beats. Its robustness is guaranteed by its regularization of reconstruction error using an adversarial generation approach, as well as data augmentation using time series warping. Experiments show that BeatGAN accurately and efficiently detects anomalous beats in ECG time series, and routes doctors' attention to anomalous time ticks, achieving accuracy of nearly 0. 95 AUC, and very fast inference (2. 6 ms per beat). In addition, we show that BeatGAN accurately detects unusual motions from multivariate motion-capture time series data, illustrating its generality.

AAAI Conference 2019 Conference Paper

Differentiated Distribution Recovery for Neural Text Generation

  • Jianing Li
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Xueqi Cheng

Neural language models based on recurrent neural networks (RNNLM) have significantly improved the performance for text generation, yet the quality of generated text represented by Turing Test pass rate is still far from satisfying. Some researchers propose to use adversarial training or reinforcement learning to promote the quality, however, such methods usually introduce great challenges in the training and parameter tuning processes. Through our analysis, we find the problem of RNNLM comes from the usage of maximum likelihood estimation (MLE) as the objective function, which requires the generated distribution to precisely recover the true distribution. Such requirement favors high generation diversity which restricted the generation quality. This is not suitable when the overall quality is low, since high generation diversity usually indicates lot of errors rather than diverse good samples. In this paper, we propose to achieve differentiated distribution recovery, DDR for short. The key idea is to make the optimal generation probability proportional to the β-th power of the true probability, where β > 1. In this way, the generation quality can be greatly improved by sacrificing diversity from noises and rare patterns. Experiments on synthetic data and two public text datasets show that our DDR method achieves more flexible quality-diversity trade-off and higher Turing Test pass rate, as compared with baseline methods including RNNLM, SeqGAN and LeakGAN.

IJCAI Conference 2019 Conference Paper

Graph Convolutional Networks using Heat Kernel for Semi-supervised Learning

  • Bingbing Xu
  • Huawei Shen
  • Qi Cao
  • Keting Cen
  • Xueqi Cheng

Graph convolutional networks gain remarkable success in semi-supervised learning on graph-structured data. The key to graph-based semisupervised learning is capturing the smoothness of labels or features over nodes exerted by graph structure. Previous methods, spectral methods and spatial methods, devote to defining graph convolution as a weighted average over neighboring nodes, and then learn graph convolution kernels to leverage the smoothness to improve the performance of graph-based semi-supervised learning. One open challenge is how to determine appropriate neighborhood that reflects relevant information of smoothness manifested in graph structure. In this paper, we propose GraphHeat, leveraging heat kernel to enhance low-frequency filters and enforce smoothness in the signal variation on the graph. GraphHeat leverages the local structure of target node under heat diffusion to determine its neighboring nodes flexibly, without the constraint of order suffered by previous methods. GraphHeat achieves state-of-the-art results in the task of graph-based semi-supervised classification across three benchmark datasets: Cora, Citeseer and Pubmed.

AAAI Conference 2019 Conference Paper

HAS-QA: Hierarchical Answer Spans Model for Open-Domain Question Answering

  • Liang Pang
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Lixin Su
  • Xueqi Cheng

This paper is concerned with open-domain question answering (i. e. , OpenQA). Recently, some works have viewed this problem as a reading comprehension (RC) task, and directly applied successful RC models to it. However, the performances of such models are not so good as that in the RC task. In our opinion, the perspective of RC ignores three characteristics in OpenQA task: 1) many paragraphs without the answer span are included in the data collection; 2) multiple answer spans may exist within one given paragraph; 3) the end position of an answer span is dependent with the start position. In this paper, we first propose a new probabilistic formulation of OpenQA, based on a three-level hierarchical structure, i. e. , the question level, the paragraph level and the answer span level. Then a Hierarchical Answer Spans Model (HAS- QA) is designed to capture each probability. HAS-QA has the ability to tackle the above three problems, and experiments on public OpenQA datasets show that it significantly outperforms traditional RC baselines and recent OpenQA baselines.

AAAI Conference 2019 Short Paper

Teaching Machines to Extract Main Content for Machine Reading Comprehension

  • Zhaohui Li
  • Yue Feng
  • Jun Xu
  • Jiafeng Guo
  • Yanyan Lan
  • Xueqi Cheng

Machine reading comprehension, whose goal is to find answers from the candidate passages for a given question, has attracted a lot of research efforts in recent years. One of the key challenge in machine reading comprehension is how to identify the main content from a large, redundant, and overlapping set of candidate sentences. In this paper we propose to tackle the challenge with Markov Decision Process in which the main content identification is formalized as sequential decision making and each action corresponds to selecting a sentence. Policy gradient is used to learn the model parameters. Experimental results based on MSMARCO showed that the proposed model, called MC-MDP, can select high quality main contents and significantly improved the performances of answer span prediction.

IJCAI Conference 2018 Conference Paper

Exploiting POI-Specific Geographical Influence for Point-of-Interest Recommendation

  • Hao Wang
  • Huawei Shen
  • Wentao Ouyang
  • Xueqi Cheng

Point-of-interest (POI) recommendation, i. e. , recommending unvisited POIs for users, is a fundamental problem for location-based social networks. POI recommendation distinguishes itself from traditional item recommendation, e. g. , movie recommendation, via geographical influence among POIs. Existing methods model the geographical influence between two POIs as the probability or propensity that the two POIs are co-visited by the same user given their physical distance. These methods assume that geographical influence between POIs is determined by their physical distance, failing to capture the asymmetry of geographical influence and the high variation of geographical influence across POIs. In this paper, we exploit POI-specific geographical influence to improve POI recommendation. We model the geographical influence between two POIs using three factors: the geo-influence of POI, the geo-susceptibility of POI, and their physical distance. Geo-influence captures POI? s capacity at exerting geographical influence to other POIs, and geo-susceptibility reflects POI? s propensity of being geographically influenced by other POIs. Experimental results on two real-world datasets demonstrate that POI-specific geographical influence significantly improves the performance of POI recommendation.

AAAI Conference 2018 Short Paper

Fast Approximate Nearest Neighbor Search via k-Diverse Nearest Neighbor Graph

  • Yan Xiao
  • Jiafeng Guo
  • Yanyan Lan
  • Jun Xu
  • Xueqi Cheng

Approximate nearest neighbor search is a fundamental problem and has been studied for a few decades. Recently graphbased indexing methods have demonstrated their great efficiency, whose main idea is to construct neighborhood graph offline and perform a greedy search starting from some sampled points of the graph online. Most existing graph-based methods focus on either the precise k-nearest neighbor (k- NN) graph which has good exploitation ability, or the diverse graph which has good exploration ability. In this paper, we propose the k-diverse nearest neighbor (k-DNN) graph, which balances the precision and diversity of the graph, leading to good exploitation and exploration abilities simultaneously. We introduce an efficient indexing algorithm for the construction of the k-DNN graph inspired by a well-known diverse ranking algorithm in information retrieval (IR). Experimental results show that our method can outperform both state-of-the-art precise graph and diverse graph methods.

IJCAI Conference 2018 Conference Paper

NeuCast: Seasonal Neural Forecast of Power Grid Time Series

  • Pudi Chen
  • Shenghua Liu
  • Chuan Shi
  • Bryan Hooi
  • Bai Wang
  • Xueqi Cheng

In the smart power grid, short-term load forecasting (STLF) is a crucial step in scheduling and planning for future load, so as to improve the reliability, cost, and emissions of the power grid. Different from traditional time series forecast, STLF is a more challenging task, because of the complex demand of active and reactive power from numerous categories of electrical loads and the effects of environment. Therefore, we propose NeuCast, a seasonal neural forecasting method, which dynamically models various loads as co-evolving time series in a hidden space, as well as extra weather conditions, in a neural network structure. NeuCast captures seasonality and patterns of the time series by integrating factor modeling and hidden state recognition. NeuCast can also detect anomalies and forecast under different temperature assumptions. Extensive experiments on 134 real-word datasets show the improvements of NeuCast over the stateof-the-art methods.

IJCAI Conference 2018 Conference Paper

Reinforcing Coherence for Sequence to Sequence Model in Dialogue Generation

  • Hainan Zhang
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Xueqi Cheng

Sequence to sequence (Seq2Seq) approach has gained great attention in the field of single-turn dialogue generation. However, one serious problem is that most existing Seq2Seq based models tend to generate common responses lacking specific meanings. Our analysis show that the underlying reason is that Seq2Seq is equivalent to optimizing Kullback–Leibler (KL) divergence, thus does not penalize the case whose generated probability is high while the true probability is low. However, the true probability is unknown, which poses challenges for tackling this problem. Inspired by the fact that the coherence (i. e. similarity) between post and response is consistent with human evaluation, we hypothesize that the true probability of a response is proportional to the coherence degree. The coherence scores are then used as the reward function in a reinforcement learning framework to penalize the case whose generated probability is high while the true probability is low. Three different types of coherence models, including an unlearned similarity function, a pretrained semantic matching function, and an end-to-end dual learning architecture, are proposed in this paper. Experimental results on both Chinese Weibo dataset and English Subtitle dataset show that the proposed models produce more specific and meaningful responses, yielding better performances against Seq2Seq models in terms of both metric-based and human evaluations.

AAAI Conference 2018 Conference Paper

Towards Efficient Detection of Overlapping Communities in Massive Networks

  • Bing-Jie Sun
  • Huawei Shen
  • Jinhua Gao
  • Wentao Ouyang
  • Xueqi Cheng

Community detection is essential to analyzing and exploring natural networks such as social networks, biological networks, and citation networks. However, few methods could be used as off-the-shelf tools to detect communities in real world networks for two reasons. On the one hand, most existing methods for community detection cannot handle massive networks that contain millions or even hundreds of millions of nodes. On the other hand, communities in real world networks are generally highly overlapped, requiring that community detection method could capture the mixed community membership. In this paper, we aim to offer an off-the-shelf method to detect overlapping communities in massive real world networks. For this purpose, we take the widely-used Poisson model for overlapping community detection as starting point and design two speedup strategies to achieve high efficiency. Extensive tests on synthetic and large scale real networks demonstrate that the proposed strategies speedup the community detection method based on Poisson model by 1 to 2 orders of magnitudes, while achieving comparable accuracy at community detection.

IJCAI Conference 2017 Conference Paper

Cascade Dynamics Modeling with Attention-based Recurrent Neural Network

  • Yongqing Wang
  • Huawei Shen
  • Shenghua Liu
  • Jinhua Gao
  • Xueqi Cheng

An ability of modeling and predicting the cascades of resharing is crucial to understanding information propagation and to launching campaign of viral marketing. Conventional methods for cascade prediction heavily depend on the hypothesis of diffusion models, e. g. , independent cascade model and linear threshold model. Recently, researchers attempt to circumvent the problem of cascade prediction using sequential models (e. g. , recurrent neural network, namely RNN) that do not require knowing the underlying diffusion model. Existing sequential models employ a chain structure to capture the memory effect. However, for cascade prediction, each cascade generally corresponds to a diffusion tree, causing cross-dependence in cascade---one sharing behavior could be triggered by its non-immediate predecessor in the memory chain. In this paper, we propose to an attention-based RNN to capture the cross-dependence in cascade. Furthermore, we introduce a \emph{coverage} strategy to combat the misallocation of attention caused by the memoryless of traditional attention mechanism. Extensive experiments on both synthetic and real world datasets demonstrate the proposed models outperform state-of-the-art models at both cascade prediction and inferring diffusion tree.

IJCAI Conference 2017 Conference Paper

Cross-Domain Recommendation: An Embedding and Mapping Approach

  • Tong Man
  • Huawei Shen
  • Xiaolong Jin
  • Xueqi Cheng

Data sparsity is one of the most challenging problems for recommender systems. One promising solution to this problem is cross-domain recommendation, i. e. , leveraging feedbacks or ratings from multiple domains to improve recommendation performance in a collective manner. In this paper, we propose an Embedding and Mapping framework for Cross-Domain Recommendation, called EMCDR. The proposed EMCDR framework distinguishes itself from existing cross-domain recommendation models in two aspects. First, a multi-layer perceptron is used to capture the nonlinear mapping function across domains, which offers high flexibility for learning domain-specific features of entities in each domain. Second, only the entities with sufficient data are used to learn the mapping function, guaranteeing its robustness to noise caused by data sparsity in single domain. Extensive experiments on two cross-domain recommendation scenarios demonstrate that EMCDR significantly outperforms state-of-the-art cross-domain recommendation methods.

TIST Journal 2017 Journal Article

Directly Optimize Diversity Evaluation Measures

  • Jun Xu
  • Long Xia
  • Yanyan Lan
  • Jiafeng Guo
  • Xueqi Cheng

The queries issued to search engines are often ambiguous or multifaceted, which requires search engines to return diverse results that can fulfill as many different information needs as possible; this is called search result diversification. Recently, the relational learning to rank model, which designs a learnable ranking function following the criterion of maximal marginal relevance, has shown effectiveness in search result diversification [Zhu et al. 2014]. The goodness of a diverse ranking model is usually evaluated with diversity evaluation measures such as α-NDCG [Clarke et al. 2008], ERR-IA [Chapelle et al. 2009], and D#-NDCG [Sakai and Song 2011]. Ideally the learning algorithm would train a ranking model that could directly optimize the diversity evaluation measures with respect to the training data. Existing relational learning to rank algorithms, however, only train the ranking models by optimizing loss functions that loosely relate to the evaluation measures. To deal with the problem, we propose a general framework for learning relational ranking models via directly optimizing any diversity evaluation measure. In learning, the loss function upper-bounding the basic loss function defined on a diverse ranking measure is minimized. We can derive new diverse ranking algorithms under the framework, and several diverse ranking algorithms are created based on different upper bounds over the basic loss function. We conducted comparisons between the proposed algorithms with conventional diverse ranking methods using the TREC benchmark datasets. Experimental results show that the algorithms derived under the diverse learning to rank framework always significantly outperform the state-of-the-art baselines.

IJCAI Conference 2017 Conference Paper

Learning Concise Representations of Users' Influences through Online Behaviors

  • Shenghua Liu
  • Houdong Zheng
  • Huawei Shen
  • Xueqi Cheng
  • Xiangwen Liao

Whereas it is well known that social network users influence each other, a fundamental problem in influence maximization, opinion formation and viral marketing is that users' influences are difficult to quantify. Previous work has directly defined an independent model parameter to capture the interpersonal influence between each pair of users. However, such models do not consider how influences depend on each other if they originate from the same user or if they act on the same user. To do so, these models need a parameter for each pair of users, which results in high-dimensional models becoming easily trapped into the overfitting problem. Given these problems, another way of defining the parameters is needed to consider the dependencies. Thus we propose a model that defines parameters for every user with a latent influence vector and a susceptibility vector. Such low-dimensional and distributed representations naturally cause the interpersonal influences involving the same user to be coupled with each other, thus reducing the model's complexity. Additionally, the model can easily consider the sentimental polarities of users' messages and how sentiment affects users' influences. In this study, we conduct extensive experiments on real Microblog data, showing that our model with distributed representations achieves better accuracy than the state-of-the-art and pair-wise models, and that learning influences on sentiments benefit performance.

AAAI Conference 2016 Conference Paper

A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations

  • Shengxian Wan
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Liang Pang
  • Xueqi Cheng

Matching natural language sentences is central for many applications such as information retrieval and question answering. Existing deep models rely on a single sentence representation or multiple granularity representations for matching. However, such methods cannot well capture the contextualized local information in the matching process. To tackle this problem, we present a new deep architecture to match two sentences with multiple positional sentence representations. Specifically, each positional sentence representation is a sentence representation at this position, generated by a bidirectional long short term memory (Bi-LSTM). The matching score is finally produced by aggregating interactions between these different positional sentence representations, through k-Max pooling and a multi-layer perceptron. Our model has several advantages: (1) By using Bi-LSTM, rich context of the whole sentence is leveraged to capture the contextualized local information in each positional sentence representation; (2) By matching with multiple positional sentence representations, it is flexible to aggregate different important contextualized local information in a sentence to support the matching; (3) Experiments on different tasks such as question answering and sentence completion demonstrate the superiority of our model.

AAAI Conference 2016 Conference Paper

Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations

  • Fei Sun
  • Jiafeng Guo
  • Yanyan Lan
  • Jun Xu
  • Xueqi Cheng

Distributional hypothesis lies in the root of most existing word representation models by inferring word meaning from its external contexts. However, distributional models cannot handle rare and morphologically complex words very well and fail to identify some fine-grained linguistic regularity as they are ignoring the word forms. On the contrary, morphology points out that words are built from some basic units, i. e. , morphemes. Therefore, the meaning and function of such rare words can be inferred from the words sharing the same morphemes, and many syntactic relations can be directly identified based on the word forms. However, the limitation of morphology is that it cannot infer the relationship between two words that do not share any morphemes. Considering the advantages and limitations of both approaches, we propose two novel models to build better word representations by modeling both external contexts and internal morphemes in a jointly predictive way, called BEING and SEING. These two models can also be extended to learn phrase representations according to the distributed morphology theory. We evaluate the proposed models on similarity tasks and analogy tasks. The results demonstrate that the proposed models can outperform state-of-the-art models significantly on both word and phrase representation learning.

AAAI Conference 2016 Conference Paper

Locally Adaptive Translation for Knowledge Graph Embedding

  • Yantao Jia
  • Yuanzhuo Wang
  • Hailun Lin
  • Xiaolong Jin
  • Xueqi Cheng

Knowledge graph embedding aims to represent entities and relations in a large-scale knowledge graph as elements in a continuous vector space. Existing methods, e. g. , TransE and TransH, learn embedding representation by defining a global margin-based loss function over the data. However, the optimal loss function is determined during experiments whose parameters are examined among a closed set of candidates. Moreover, embeddings over two knowledge graphs with different entities and relations share the same set of candidate loss functions, ignoring the locality of both graphs. This leads to the limited performance of embedding related applications. In this paper, we propose a locally adaptive translation method for knowledge graph embedding, called TransA, to find the optimal loss function by adaptively determining its margin over different knowledge graphs. Experiments on two benchmark data sets demonstrate the superiority of the proposed method, as compared to the-state-of-the-art ones.

TIST Journal 2016 Journal Article

Location Prediction

  • Yantao Jia
  • Yuanzhuo Wang
  • Xiaolong Jin
  • Xueqi Cheng

In social networks, predicting a user’s location mainly depends on those of his/her friends, where the key lies in how to select his/her most influential friends. In this article, we analyze the theoretically maximal accuracy of location prediction based on friends’ locations and compare it with the practical accuracy obtained by the state-of-the-art location prediction methods. Upon observing a big gap between the theoretical and practical accuracy, we propose a new strategy for selecting influential friends in order to improve the practical location prediction accuracy. Specifically, several features are defined to measure the influence of the friends on a user’s location, based on which we put forth a sequential random-walk-with-restart procedure to rank the friends of the user in terms of their influence. By dynamically selecting the top N most influential friends of the user per time slice, we develop a temporal-spatial Bayesian model to characterize the dynamics of friends’ influence for location prediction. Finally, extensive experimental results on datasets of real social networks demonstrate that the proposed influential friend selection method and temporal-spatial Bayesian model can significantly improve the accuracy of location prediction.

IJCAI Conference 2016 Conference Paper

Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN

  • Shengxian Wan
  • Yanyan Lan
  • Jun Xu
  • Jiafeng Guo
  • Liang Pang
  • Xueqi Cheng

Semantic matching, which aims to determine the matching degree between two texts, is a fundamental problem for many NLP applications. Recently, deep learning approach has been applied to this problem and significant improvements have been achieved. In this paper, we propose to view the generation of the global interaction between two texts as a recursive process: i. e. ~the interaction of two texts at each position is a composition of the interactions between their prefixes as well as the word level interaction at the current position. Based on this idea, we propose a novel deep architecture, namely Match-SRNN, to model the recursive matching structure. Firstly, a tensor is constructed to capture the word level interactions. Then a spatial RNN is applied to integrate the local interactions recursively, with importance determined by four types of gates. Finally, the matching score is calculated based on the global interaction. We show that, after degenerated to the exact matching scenario, Match-SRNN can approximate the dynamic programming process of longest common subsequence. Thus, there exists a clear interpretation for Match-SRNN. Our experiments on two semantic matching tasks showed the effectiveness of Match-SRNN, and its ability of visualizing the learned matching structure.

IJCAI Conference 2016 Conference Paper

Predict Anchor Links across Social Networks via an Embedding Approach

  • Tong Man
  • Huawei Shen
  • Shenghua Liu
  • Xiaolong Jin
  • Xueqi Cheng

Predicting anchor links across social networks has important implications to an array of applications, including cross-network information diffusion and cross-domain recommendation. One challenging problem is: whether and to what extent we can address the anchor link prediction problem, if only structural information of networks is available. Most existing methods, unsupervised or supervised, directly work on networks themselves rather than on their intrinsic structural regularities, and thus their effectiveness is sensitive to the high dimension and sparsity of networks. To offer a robust method, we propose a novel supervised model, called PALE, which employs network embedding with awareness of observed anchor links as supervised information to capture the major and specific structural regularities and further learns a stable cross-network mapping for predicting anchor links. Through extensive experiments on two realistic datasets, we demonstrate that PALE significantly outperforms the state-of-the-art methods.

AAAI Conference 2016 Conference Paper

SPAN: Understanding a Question with Its Support Answers

  • Liang Pang
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Xueqi Cheng

Matching a question to its best answer is a common task in community question answering. In this paper, we focus on the non-factoid questions and aim to pick out the best answer from its candidate answers. Most of the existing deep models directly measure the similarity between question and answer by their individual sentence embeddings. In order to tackle the problem of the information lack in question’s descriptions and the lexical gap between questions and answers, we propose a novel deep architecture namely SPAN in this paper. Specifically we introduce support answers to help understand the question, which are defined as the best answers of those similar questions to the original one. Then we can obtain two kinds of similarities, one is between question and the candidate answer, and the other one is between support answers and the candidate answer. The matching score is finally generated by combining them. Experiments on Yahoo! Answers demonstrate that SPAN can outperform the baseline models.

AAAI Conference 2016 Conference Paper

Text Matching as Image Recognition

  • Liang Pang
  • Yanyan Lan
  • Jiafeng Guo
  • Jun Xu
  • Shengxian Wan
  • Xueqi Cheng

Matching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patterns based on the extracted elementary visual patterns such as oriented edges and corners, we propose to model text matching as the problem of image recognition. Firstly, a matching matrix whose entries represent the similarities between words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. We show that by resembling the compositional hierarchies of patterns in image recognition, our model can successfully identify salient signals such as n-gram and n-term matchings. Experimental results demonstrate its superiority against the baselines.

AAAI Conference 2015 Conference Paper

A Probabilistic Model for Bursty Topic Discovery in Microblogs

  • Xiaohui Yan
  • Jiafeng Guo
  • Yanyan Lan
  • Jun Xu
  • Xueqi Cheng

Bursty topics discovery in microblogs is important for people to grasp essential and valuable information. However, the task is challenging since microblog posts are particularly short and noisy. This work develops a novel probabilistic model, namely Bursty Biterm Topic Model (BBTM), to deal with the task. BBTM extends the Biterm Topic Model (BTM) by incorporating the burstiness of biterms as prior knowledge for bursty topic modeling, which enjoys the following merits: 1) It can well solve the data sparsity problem in topic modeling over short texts as the same as BTM; 2) It can automatical discover high quality bursty topics in microblogs in a principled and efficient way. Extensive experiments on a standard Twitter dataset show that our approach outperforms the state-of-the-art baselines significantly.

AAAI Conference 2015 Conference Paper

Learning User-Specific Latent Influence and Susceptibility from Information Cascades

  • Yongqing Wang
  • Huawei Shen
  • Shenghua Liu
  • Xueqi Cheng

Predicting cascade dynamics has important implications for understanding information propagation and launching viral marketing. Previous works mainly adopt a pair-wise manner, modeling the propagation probability between pairs of users using n2 independent parameters for n users. Consequently, these models suffer from severe overfitting problem, especially for pairs of users without direct interactions, limiting their prediction accuracy. Here we propose to model the cascade dynamics by learning two low-dimensional user-specific vectors from observed cascades, capturing their influence and susceptibility respectively. This model requires much less parameters and thus could combat overfitting problem. Moreover, this model could naturally model context-dependent factors like cumulative effect in information propagation. Extensive experiments on synthetic dataset and a large-scale microblogging dataset demonstrate that this model outperforms the existing pair-wise models at predicting cascade dynamics, cascade size, and “who will be retweeted”.

UAI Conference 2014 Conference Paper

Position-Aware ListMLE: A Sequential Learning Process for Ranking

  • Yanyan Lan
  • Yadong Zhu
  • Jiafeng Guo
  • Shuzi Niu
  • Xueqi Cheng

ListMLE is a state-of-the-art listwise learning-torank algorithm, which has been shown to work very well in application. It defines the probability distribution based on Plackett-Luce Model in a top-down style to take into account the position information. However, both empirical contradiction and theoretical results indicate that ListM- LE cannot well capture the position importance, which is a key factor in ranking. To amend the problem, this paper proposes a new listwise ranking method, called position-aware ListMLE (p- ListMLE for short). It views the ranking problem as a sequential learning process, with each step learning a subset of parameters which maximize the corresponding stepwise probability distribution. To solve this sequential multi-objective optimization problem, we propose to use linear scalarization strategy to transform it into a single-objective optimization problem, which is efficient for computation. Our theoretical study shows that p-ListMLE is better than ListM- LE in statistical consistency with respect to typical ranking evaluation measure NDCG. Furthermore, our experiments on benchmark datasets demonstrate that the proposed method can significantly improve the performance of ListMLE and outperform state-of-the-art listwise learningto-rank algorithms as well.

AAAI Conference 2014 Conference Paper

Ranking Tweets by Labeled and Collaboratively Selected Pairs with Transitive Closure

  • Shenghua Liu
  • Xueqi Cheng
  • Fangtao Li

Tweets ranking is important for information acquisition in Microblog. Due to the content sparsity and lack of labeled data, it is better to employ semi-supervised learning methods to utilize the unlabeled data. However, most of previous semi-supervised learning methods do not consider the pair conflict problem, which means that the new selected unlabeled data may have order conflict with the labeled and previously selected data. It will hurt the learning performance, if the training data contains many conflict pairs. In this paper, we propose a new collaborative semi-supervised SVM ranking model (CSR-TC), selecting unlabeled data based on a dynamically maintained transitive closure graph to avoid pair conflict. We also investigate the two views of features, intrinsic and content-relevant features, for the proposed model. Extensive experiments are conducted on TREC Microblogging corpus. The results demonstrate that our proposed method achieves significant improvement, compared to several state-of-the-art models.

UAI Conference 2013 Conference Paper

Stochastic Rank Aggregation

  • Shuzi Niu
  • Yanyan Lan
  • Jiafeng Guo
  • Xueqi Cheng

This paper addresses the problem of rank aggregation, which aims to find a consensus ranking among multiple ranking inputs. Traditional rank aggregation methods are deterministic, and can be categorized into explicit and implicit methods depending on whether rank information is explicitly or implicitly utilized. Surprisingly, experimental results on real data sets show that explicit rank aggregation methods would not work as well as implicit methods, although rank information is critical for the task. Our analysis indicates that the major reason might be the unreliable rank information from incomplete ranking inputs. To solve this problem, we propose to incorporate uncertainty into rank aggregation and tackle the problem in both unsupervised and supervised scenario. We call this novel framework stochastic rank aggregation (St. Agg for short). Specifically, we introduce a prior distribution on ranks, and transform the ranking functions or objectives in traditional explicit methods to their expectations over this distribution. Our experiments on benchmark data sets show that the proposed St. Agg outperforms the baselines in both unsupervised and supervised scenarios.

NeurIPS Conference 2012 Conference Paper

Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

  • Yanyan Lan
  • Jiafeng Guo
  • Xueqi Cheng
  • Tie-Yan Liu

This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.

IJCAI Conference 2011 Conference Paper

Multi-Select Faceted Navigation Based on Minimum Description Length Principle

  • Chao He
  • Xueqi Cheng
  • Jiafeng Guo
  • Huawei Shen

Faceted navigation can effectively reduce user efforts of reaching targeted resources in databases, by suggesting dynamic facet values for iterative query refinement. A key issue is minimizing the navigation cost in a user query session. Conventional navigation scheme assumes that at each step, users select only one suggested value to figure out resources containing it. To make faceted navigation more flexible and effective, this paper introduces a multi-select scheme where multiple suggested values can be selected at one step, and a selected value can be used to either retain or exclude the resources containing it. Previous algorithms for cost-driven value suggestion can hardly work well under our navigation scheme. Therefore, we propose to optimize the navigation cost using the Minimum Description Length principle, which can well balance the number of navigation steps and the number of suggested values per step under our new scheme. An emperical study demonstrates that our approach is more cost-saving and efficient than state-of-the-art approaches.

ECAI Conference 2010 Conference Paper

Social Recommendation with Interpersonal Influence

  • Junming Huang 0001
  • Xueqi Cheng
  • Jiafeng Guo
  • Huawei Shen
  • Kun Yang 0001

Social recommendation, that an individual recommends an item to another, has gained popularity and success in web applications such as online sharing and shopping services. It is largely different from a traditional recommendation where an automatic system recommends an item to a user. In a social recommendation, the interpersonal influence plays a critical role but is usually ignored in traditional recommendation systems, which recommend items based on user-item utility. In this paper, we propose an approach to model the utility of a social recommendation through combining three factors, i. e. receiver interests, item qualities and interpersonal influences. In our approach, values of all factors can be learned from user behaviors. Experiments are conducted to compare our approach with three conventional methods in social recommendation prediction. Empirical results show the effectiveness of our approach, where an increase by 26% in prediction accuracy can be observed.