Author name cluster

Defu Lian

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

59 papers

2 author rows

AIJ Journal 2026 Journal Article

Interactive graph convolutional filtering

Jin Zhang
Defu Lian
Hong Xie
Yawen Li
Enhong Chen

Details DOI

AAAI Conference 2026 Conference Paper

Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing

Hong Xie
Haoran Gu
Yanying Huang
Tao Tan
Defu Lian

This paper proposes a variant of multiple-play stochastic bandits tailored to resource allocation problems arising from LLM applications, edge intelligence, etc. The model is composed of finite number of arms and plays. Each arm has a stochastic number of capacities, and each unit of capacity is associated with a reward function. Each play is associated with a priority weight. When multiple plays compete for the arm capacity, the arm capacity is allocated in a larger priority weight first manner. Instance independent and instance dependent regret lower bounds are proved, revealing the impact of model parameters on the hardness of learning the optimal allocation policy. When model parameters are given, we design an algorithm named MSB-PRS-OffOpt to locate the optimal play allocation policy with a polynomial computational complexity in the number of arms and plays. Utilizing MSB-PRS-OffOpt as a subroutine, an approximate upper confidence bound (UCB) based algorithm is designed, which has instance independent and instance dependent regret upper bounds matching the corresponding lower bound up to acceptable factors. To this end, we address nontrivial technical challenges arising from optimizing and learning under a special nonlinear combinatorial utility function induced by the prioritized resource sharing mechanism.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Accurate KV Cache Eviction via Anchor Direction Projection for Efficient LLM Inference

Zijie Geng
Jie Wang
Ziqi Liu
Feng Ju
Yiming Li
Xing Li
Mingxuan Yuan
Jianye Hao

Key-Value (KV) cache eviction---which retains the KV pairs of the most important tokens while discarding less important ones---is a critical technique for optimizing both memory usage and inference latency in large language models (LLMs). However, existing approaches often rely on simple heuristics---such as attention weights---to measure token importance, overlooking the spatial relationships between token value states in the vector space. This often leads to suboptimal token selections and thus performance degradation. To tackle this problem, we propose a novel method, namely **AnDPro** (**An**chor **D**irection **Pro**jection), which introduces a projection-based scoring function to more accurately measure token importance. Specifically, AnDPro operates in the space of value vectors and leverages the projections of these vectors onto an *``Anchor Direction''*---the direction of the pre-eviction output---to measure token importance and guide more accurate token selection. Experiments on $16$ datasets from the LongBench benchmark demonstrate that AnDPro can maintain $96. 07\\%$ of the full cache accuracy using only $3. 44\\%$ KV cache budget, reducing KV cache budget size by $46. 0\\%$ without compromising quality compared to previous state-of-the-arts.

PDF Details

NeurIPS Conference 2025 Conference Paper

Advancing Machine-Generated Text Detection from an Easy to Hard Supervision Perspective

Chenwang Wu
Yiu-ming Cheung
Bo Han
Defu Lian

Existing machine-generated text (MGT) detection methods implicitly assume labels as the "golden standard". However, we reveal boundary ambiguity in MGT detection, implying that traditional training paradigms are inexact. Moreover, limitations of human cognition and the superintelligence of detectors make inexact learning widespread and inevitable. To this end, we propose an easy-to-hard enhancement framework to provide reliable supervision under such inexact conditions. Distinct from knowledge distillation, our framework employs an easy supervisor targeting relatively simple longer-text detection tasks (despite weaker capabilities), to enhance the more challenging target detector. Firstly, longer texts targeted by supervisors theoretically alleviate the impact of inexact labels, laying the foundation for reliable supervision. Secondly, by structurally incorporating the detector into the supervisor, we theoretically model the supervisor as a lower performance bound for the detector. Thus, optimizing the supervisor indirectly optimizes the detector, ultimately approximating the underlying "golden" labels. Extensive experiments across diverse practical scenarios, including cross-LLM, cross-domain, mixed text, and paraphrase attacks, demonstrate the framework's significant detection effectiveness. The code is available at: \url{https: //github. com/tmlr-group/Easy2Hard}.

PDF Details

ICML Conference 2025 Conference Paper

From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models

Mingjia Yin
Junwei Pan
Hao Wang 0076
Ximei Wang
Shangyu Zhang
Jie Jiang 0015
Defu Lian
Enhong Chen

Click-Through Rate (CTR) prediction, a core task in recommendation systems, aims to estimate the probability of users clicking on items. Existing models predominantly follow a discriminative paradigm, which relies heavily on explicit interactions between raw ID embeddings. However, this paradigm inherently renders them susceptible to two critical issues: embedding dimensional collapse and information redundancy, stemming from the over-reliance on feature interactions over raw ID embeddings. To address these limitations, we propose a novel Supervised Feature Generation (SFG) framework, shifting the paradigm from discriminative "feature interaction" to generative "feature generation". Specifically, SFG comprises two key components: an Encoder that constructs hidden embeddings for each feature, and a Decoder tasked with regenerating the feature embeddings of all features from these hidden representations. Unlike existing generative approaches that adopt self-supervised losses, we introduce a supervised loss to utilize the supervised signal, i. e. , click or not, in the CTR prediction task. This framework exhibits strong generalizability: it can be seamlessly integrated with most existing CTR models, reformulating them under the generative paradigm. Extensive experiments demonstrate that SFG consistently mitigates embedding collapse and reduces information redundancy, while yielding substantial performance gains across various datasets and base models. The code is available at https: //github. com/USTC-StarTeam/GE4Rec.

Details

NeurIPS Conference 2025 Conference Paper

HawkBench: Investigating Resilience of RAG Methods on Stratified Information-Seeking Tasks

Hongjin Qian
Zheng Liu
Chao Gao
Yankai Wang
Defu Lian
Zhicheng Dou

In real-world information-seeking scenarios, users have dynamic and diverse needs, requiring RAG systems to demonstrate adaptable resilience. To comprehensively evaluate the resilience of current RAG methods, we introduce HawkBench, a human-labeled, multi-domain benchmark designed to rigorously assess RAG performance across categorized task types. By stratifying tasks based on information-seeking behaviors, HawkBench provides a systematic evaluation of how well RAG systems adapt to diverse user needs. Unlike existing benchmarks, which focus primarily on specific task types (mostly factoid queries) and rely on varying knowledge bases, HawkBench offers: (1) systematic task stratification to cover a broad range of query types, including both factoid and rationale queries, (2) integration of multi-domain corpora across all task types to mitigate corpus bias, and (3) rigorous annotation for high-quality evaluation. HawkBench includes 1, 600 high-quality test samples, evenly distributed across domains and task types. Using this benchmark, we evaluate representative RAG methods, analyzing their performance in terms of answer quality and response latency. Our findings highlight the need for dynamic task strategies that integrate decision-making, query interpretation, and global knowledge understanding to improve RAG generalizability. We believe HawkBench serves as a pivotal benchmark for advancing the resilience of RAG methods and their ability to achieve general-purpose information seeking.

PDF Details

ICML Conference 2025 Conference Paper

HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking

Runquan Gui
Zhihai Wang
Jie Wang 0005
Chi Ma
Huiling Zhen
Mingxuan Yuan
Jianye Hao
Defu Lian

Recent advancements have significantly enhanced the performance of large language models (LLMs) in tackling complex reasoning tasks, achieving notable success in domains like mathematical and logical reasoning. However, these methods encounter challenges with complex planning tasks, primarily due to extended reasoning steps, diverse constraints, and the challenge of handling multiple distinct sub-tasks. To address these challenges, we propose HyperTree Planning (HTP), a novel reasoning paradigm that constructs hypertree-structured planning outlines for effective planning. The hypertree structure enables LLMs to engage in hierarchical thinking by flexibly employing the divide-and-conquer strategy, effectively breaking down intricate reasoning steps, accommodating diverse constraints, and managing multiple distinct sub-tasks in a well-organized manner. We further introduce an autonomous planning framework that completes the planning process by iteratively refining and expanding the hypertree-structured planning outlines. Experiments demonstrate the effectiveness of HTP, achieving state-of-the-art accuracy on the TravelPlanner benchmark with Gemini-1. 5-Pro, resulting in a 3. 6$\times$ performance improvement over o1-preview.

Details

NeurIPS Conference 2025 Conference Paper

LogicTree: Improving Complex Reasoning of LLMs via Instantiated Multi-step Synthetic Logical Data

Zehao Wang
Lin Yang
Jie Wang
Kehan Wang
Hanzhu Chen
Bin Wang
Jianye Hao
Defu Lian

Despite their remarkable performance on various tasks, Large Language Models (LLMs) still struggle with logical reasoning, particularly in complex and multi-step reasoning processes. Among various efforts to enhance LLMs' reasoning capabilities, synthesizing large-scale, high-quality logical reasoning datasets has emerged as a promising direction. However, existing methods often rely on predefined templates for logical reasoning data generation, limiting their adaptability to real-world scenarios. To address the limitation, we propose LogicTree, a novel framework for efficiently synthesizing multi-step logical reasoning dataset that excels in both complexity and instantiation. By iteratively searching for applicable logic rules based on structural pattern matching to perform backward deduction, LogicTree constructs multi-step logic trees that capture complex reasoning patterns. Furthermore, we employ a two-stage LLM-based approach to instantiate various real-world scenarios for each logic tree, generating consistent real-world reasoning processes that carry contextual significance. This helps LLMs develop generalizable logical reasoning abilities across diverse scenarios rather than merely memorizing templates. Experiments on multiple benchmarks demonstrate that our approach achieves an average improvement of 9. 4\% in accuracy on complex logical reasoning tasks.

PDF Details

ICLR Conference 2025 Conference Paper

Making Text Embedders Few-Shot Learners

Chaofan Li
Minghao Qin
Shitao Xiao
Jianlyu Chen
Kun Luo
Defu Lian
Yingxia Shao
Zheng Liu 0011

Large language models (LLMs) with decoder-only architectures have demonstrated exceptional text-generation capabilities across a variety of tasks. Some researchers have also adapted these models for text representation tasks. However, in text representation tasks, these models often face performance degradation on unseen tasks. In-context learning (ICL), which leverages examples provided in the input context, enables LLMs to handle unseen tasks effectively. Inspired by this, we aim to fully utilize the inherent properties of LLMs to enhance text representation performance across different tasks through the ICL approach. In this paper, we introduce a simple yet effective training strategy, which significantly improves text representation capabilities. Unlike previous models that prepend task instructions to the text, our method randomly samples a varying number of examples during training, endowing the embedding model with in-context learning abilities while maintaining its zero-shot capabilities. This approach does not require additional data construction or modifications to the model architecture. On the contrary, we find that some popular modifications to the model, such as bidirectional attention, can degrade performance, undermining the inherent characteristics of LLMs. We have publicly released our method at this \href{https://github.com/FlagOpen/FlagEmbedding}{repo}.

Details

ICLR Conference 2025 Conference Paper

Making Transformer Decoders Better Differentiable Indexers

Wuchao Li
Kai Zheng 0001
Defu Lian
Qi Liu 0003
Wentian Bao
Yunen Yu
Yang Song 0008
Han Li 0005

Retrieval aims to find the top-k items most relevant to a query/user from a large dataset. Traditional retrieval models represent queries/users and items as embedding vectors and use Approximate Nearest Neighbor (ANN) search for retrieval. Recently, researchers have proposed a generative-based retrieval method that represents items as token sequences and uses a decoder model for autoregressive training. Compared to traditional methods, this approach uses more complex models and integrates index structure during training, leading to better performance. However, these methods remain two-stage processes, where index construction is separate from the retrieval model, limiting the model's overall capacity. Additionally, existing methods construct indices by clustering pre-trained item representations in Euclidean space. However, real-world scenarios are more complex, making this approach less accurate. To address these issues, we propose a \underline{U}nified framework for \underline{R}etrieval and \underline{I}ndexing, termed \textbf{URI}. URI ensures strong consistency between index construction and the retrieval model, typically a Transformer decoder. URI simultaneously builds the index and trains the decoder, constructing the index through the decoder itself. It no longer relies on one-sided item representations in Euclidean space but constructs the index within the interactive space between queries and items. Experimental comparisons on three real-world datasets show that URI significantly outperforms existing methods.

Details

NeurIPS Conference 2025 Conference Paper

P-Law: Predicting Quantitative Scaling Law with Entropy Guidance in Large Recommendation Models

Tingjia Shen
Hao Wang
Chuhan Wu
Jin Yao Chin
Wei Guo
Yong Liu
Huifeng Guo
Defu Lian

With the growing size of data and models in Large Recommendation Models, the time required for debugging has become increasingly prohibitive, underscoring the urgent need for effective guidance in parameter configuration. The Scaling Law (SL) offers analogous guidance in the Sequential Language domain, having achieved significant success by predicting model loss when scaling model size. However, the existing guidance from SL for Sequential Recommendation (SR) remains qualitative, which is because quantitative analysis of SL on SR encounters challenges with quality measurement on redundant sequences along with loss-performance discrepancy. In response, we introduce the Performance Law (P-Law) for SR models, which predicts model performance across various settings, intending to provide a quantitative framework for guiding the parameter optimization of future models. Initially, Performance Law utilizes Real Entropy to measure data quality, aiming to remove the low-quality influence of low-entropy redundant sequences. Subsequently, Performance Law investigates a fitting decay term, which facilitated the prediction of the major loss-performance discrepancy phenomena of overfitting, ultimately achieving quantitative performance prediction. Extensive experiment on various datasets demonstrates the effectiveness of Performance Law by displaying exceptional quantitative prediction ability against the original and modified qualitative SL. Additional application experiments on optimal parameter prediction and model expansion potential prediction also demonstrated the broad applicability of the Performance Law.

PDF Details

ICLR Conference 2025 Conference Paper

RecFlow: An Industrial Full Flow Recommendation Dataset

Qi Liu 0003
Kai Zheng 0007
Rui Huang 0009
Wuchao Li
Kuo Cai
Yuan Chai
Yanan Niu
Yiqun Hui

Industrial recommendation systems (RS) rely on the multi-stage pipeline to balance effectiveness and efficiency when delivering items from a vast corpus to users. Existing RS benchmark datasets primarily focus on the exposure space, where novel RS algorithms are trained and evaluated. However, when these algorithms transition to real-world industrial RS, they face two critical challenges: (1) handling unexposed items—a significantly larger space than the exposed one, profoundly impacting their practical performance; and (2) overlooking the intricate interplay between multiple stages of the recommendation pipeline, resulting in suboptimal system performance. To bridge the gap between offline RS benchmarks and real-world online environments, we introduce RecFlow—an industrial full-flow recommendation dataset. Unlike existing datasets, RecFlow includes samples not only from the exposure space but also from unexposed items filtered at each stage of the RS funnel. RecFlow comprises 38 million interactions from 42,000 users across nearly 9 million items with additional 1.9 billion stage samples collected from 9.3 million online requests over 37 days and spanning 6 stages. Leveraging RecFlow, we conduct extensive experiments to demonstrate its potential in designing novel algorithms that enhance effectiveness by incorporating stage-specific samples. Some of these algorithms have already been deployed online at KuaiShou, consistently yielding significant gains. We propose RecFlow as the first comprehensive whole-pipeline benchmark dataset for the RS community, enabling research on algorithm design across the entire recommendation pipeline, including selection bias study, debiased algorithms, multi-stage consistency and optimality, multi-task recommendation, and user behavior modeling.

Details

ICLR Conference 2025 Conference Paper

TDDBench: A Benchmark for Training data detection

Zhihao Zhu 0002
Yi Yang 0042
Defu Lian

Training Data Detection (TDD) is a task aimed at determining whether a specific data instance is used to train a machine learning model. In the computer security literature, TDD is also referred to as Membership Inference Attack (MIA). Given its potential to assess the risks of training data breaches, ensure copyright authentication, and verify model unlearning, TDD has garnered significant attention in recent years, leading to the development of numerous methods. Despite these advancements, there is no comprehensive benchmark to thoroughly evaluate the effectiveness of TDD methods. In this work, we introduce TDDBench, which consists of 13 datasets spanning three data modalities: image, tabular, and text. We benchmark 21 different TDD methods across four detection paradigms and evaluate their performance from five perspectives: average detection performance, best detection performance, memory consumption, and computational efficiency in both time and memory. With TDDBench, researchers can identify bottlenecks and areas for improvement in TDD algorithms, while practitioners can make informed trade-offs between effectiveness and efficiency when selecting TDD algorithms for specific use cases. Our extensive experiments also reveal the generally unsatisfactory performance of TDD algorithms across different datasets. To enhance accessibility and reproducibility, we open-source TDDBench for the research community at https://github.com/zzh9568/TDDBench.

Details

ICLR Conference 2025 Conference Paper

ToolACE: Winning the Points of LLM Function Calling

Weiwen Liu
Xu Huang 0008
Xingshan Zeng
Xinlong Hao
Shuai Yu
Dexun Li
Shuai Wang 0020
Weinan Gan

Function calling significantly extends the application boundary of large language models (LLMs), where high-quality and diverse training data is critical for unlocking this capability. However, collecting and annotating real function-calling data is challenging, while synthetic data from existing pipelines often lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data, specifically tailored to the capabilities of LLMs. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, under the guidance of a complexity evaluator. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data---even with only 8B parameters---achieve state-of-the-art performance, comparable to the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.

Details

NeurIPS Conference 2025 Conference Paper

Towards A Generalist Code Embedding Model Based On Massive Data Synthesis

Chaofan Li
Jianlyu Chen
Yingxia Shao
Defu Lian
Zheng Liu

Code embedding models attract increasing attention due to the widespread popularity of retrieval-augmented generation (RAG) in software development. These models are expected to capture the rich semantic relationships inherent to code, which differ significantly from those found in text. However, existing models remain severely limited due to the scarcity of high-quality training data. In this work, we introduce \textbf{CodeR} (\underline{Code} \underline{R}etrieval), a state-of-the-art embedding model for general-purpose code retrieval. The superior performance of CodeR is built upon \textbf{CodeR-Pile}, a large-scale synthetic dataset constructed under the DRU (Diversity, Reliability, Usability) principle via a novel data synthesis pipeline. To optimize training effectiveness, we propose \textbf{Annealing}, a curriculum learning strategy that enables effective knowledge transfer across heterogeneous sources of data. We evaluate CodeR based on 16 diverse code retrieval tasks, where it significantly outperforms existing baselines and exhibits strong out-of-domain generalization performance. We have publicly released our code and the well-trained model to facilitate further research in this critical area\footnote{\url{https: //github. com/FlagOpen/FlagEmbedding/tree/master/research/BGE_Coder}}.

PDF Details

ICLR Conference 2025 Conference Paper

Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning

Gangwei Jiang
Caigao Jiang
Zhaoyi Li
Siqiao Xue
Jun Zhou 0011
Linqi Song
Defu Lian
Ying Wei 0001

Catastrophic forgetting (CF) poses a significant challenge in machine learning, where a model forgets previously learned information upon learning new tasks. Despite the advanced capabilities of Large Language Models (LLMs), they continue to face challenges with CF during continual learning. The majority of existing research focuses on analyzing forgetting patterns through a singular training sequence, thereby overlooking the intricate effects that diverse tasks have on model behavior. Our study explores CF across various settings, discovering that model forgetting is influenced by both the specific training tasks and the models themselves. To this end, we interpret forgetting by examining the function vector (FV), a compact representation of functions in LLMs, offering a model-dependent indicator for the occurrence of CF. Through theoretical and empirical analyses, we demonstrated that CF in LLMs primarily stems from biases in function activation rather than the overwriting of task processing functions. Leveraging these insights, we propose a novel function vector guided training methodology, incorporating a regularization technique to stabilize the FV and mitigate forgetting. Empirical tests on four benchmarks confirm the effectiveness of our proposed training method, substantiating our theoretical framework concerning CF and model function dynamics.

Details

IJCAI Conference 2024 Conference Paper

Adaptive Order Q-learning

Tao Tan
Hong Xie
Defu Lian

This paper revisits the estimation bias control problem of Q-learning, motivated by the fact that the estimation bias is not always evil, i. e. , some environments benefit from overestimation bias or underestimation bias, while others suffer from these biases. Different from previous coarse-grained bias control methods, this paper proposes a fine-grained bias control algorithm called Order Q-learning. It uses the order statistic of multiple independent Q-tables to control bias and flexibly meet the personalized bias needs of different environments, i. e. , the bias can vary from underestimation bias to overestimation bias as one selects a higher order Q-value. We derive the expected estimation bias and its lower bound and upper bound. They reveal that the expected estimation bias is inversely proportional to the number of Q-tables and proportional to the index of order statistic function. To show the versatility of Order Q-learning, we design an adaptive parameter adjustment strategy, leading to AdaOrder (Adaptive Order) Q-learning. It adaptively selects the number of Q-tables and the index of order statistic function via the number of visits to state-action pair and the average Q-value. We extend Order Q-learning and AdaOrder Q-learning to the large scale setting with function approximation, leading to Order DQN and AdaOrder DQN, respectively. Finally, we consider two experiment settings: deep reinforcement learning experiments show that our method outperforms several SOTA baselines drastically; tabular MDP experiments reveal fundamental insights into why our method can achieve superior performance. Our supplementary file can be found in https: //1drv. ms/f/s! Atddp1iaDmL2gjv31CaGquw5WwYI.

PDF Details DOI

AAAI Conference 2024 Conference Paper

AT4CTR: Auxiliary Match Tasks for Enhancing Click-Through Rate Prediction

Qi Liu
Xuyang Hou
Defu Lian
Zhe Wang
Haoran Jin
Jia Cheng
Jun Lei

Click-through rate (CTR) prediction is a vital task in industrial recommendation systems. Most existing methods focus on the network architecture design of the CTR model for better accuracy and suffer from the data sparsity problem. Especially in industrial recommendation systems, the widely applied negative sample down-sampling technique due to resource limitation worsens the problem, resulting in a decline in performance. In this paper, we propose Auxiliary Match Tasks for enhancing Click-Through Rate (AT4CTR) prediction accuracy by alleviating the data sparsity problem. Specifically, we design two match tasks inspired by collaborative filtering to enhance the relevance modeling between user and item. As the "click" action is a strong signal which indicates the user's preference towards the item directly, we make the first match task aim at pulling closer the representation between the user and the item regarding the positive samples. Since the user's past click behaviors can also be treated as the user him/herself, we apply the next item prediction as the second match task. For both the match tasks, we choose the InfoNCE as their loss function. The two match tasks can provide meaningful training signals to speed up the model's convergence and alleviate the data sparsity. We conduct extensive experiments on one public dataset and one large-scale industrial recommendation dataset. The result demonstrates the effectiveness of the proposed auxiliary match tasks. AT4CTR has been deployed in the real industrial advertising system and has gained remarkable revenue.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model

Wenjia Xie
Hao Wang
Luankang Zhang
Rui Zhou
Defu Lian
Enhong Chen

Sequential recommendation (SR) aims to predict items that users may be interested in based on their historical behavior sequences. We revisit SR from a novel information-theoretic perspective and find that conventional sequential modeling methods fail to adequately capture the randomness and unpredictability of user behavior. Inspired by fuzzy information processing theory, this paper introduces the DDSR model, which uses fuzzy sets of interaction sequences to overcome the limitations and better capture the evolution of users' real interests. Formally based on diffusion transition processes in discrete state spaces, which is unlike common diffusion models such as DDPM that operate in continuous domains. It is better suited for discrete data, using structured transitions instead of arbitrary noise introduction to avoid information loss. Additionally, to address the inefficiency of matrix transformations due to the vast discrete space, we use semantic labels derived from quantization or RQ-VAE to replace item IDs, enhancing efficiency and improving cold start issues. Testing on three public benchmark datasets shows that DDSR outperforms existing state-of-the-art methods in various settings, demonstrating its potential and effectiveness in handling SR tasks.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Federated Contextual Cascading Bandits with Asynchronous Communication and Heterogeneous Users

Hantao Yang
Xutong Liu
Zhiyong Wang
Hong Xie
John C. S. Lui
Defu Lian
Enhong Chen

We study the problem of federated contextual combinatorial cascading bandits, where agents collaborate under the coordination of a central server to provide tailored recommendations to users. Existing works consider either a synchronous framework, necessitating full agent participation and global synchronization, or assume user homogeneity with identical behaviors. We overcome these limitations by considering (1) federated agents operating in an asynchronous communication paradigm, where no mandatory synchronization is required and all agents communicate independently with the server, (2) heterogeneous user behaviors, where users can be stratified into latent user clusters, each exhibiting distinct preferences. For this setting, we propose a UCB-type algorithm with delicate communication protocols. Through theoretical analysis, we give sub-linear regret bounds on par with those achieved in the synchronous framework, while incurring only logarithmic communication costs. Empirical evaluation on synthetic and real-world datasets validates our algorithm's superior performance in terms of regrets and communication costs.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

FilterNet: Harnessing Frequency Filters for Time Series Forecasting

Kun Yi
Jingru Fei
Qi Zhang
Hui He
Shufeng Hao
Defu Lian
Wei Fan

Given the ubiquitous presence of time series data across various domains, precise forecasting of time series holds significant importance and finds widespread real-world applications such as energy, weather, healthcare, etc. While numerous forecasters have been proposed using different network architectures, the Transformer-based models have state-of-the-art performance in time series forecasting. However, forecasters based on Transformers are still suffering from vulnerability to high-frequency signals, efficiency in computation, and bottleneck in full-spectrum utilization, which essentially are the cornerstones for accurately predicting time series with thousands of points. In this paper, we explore a novel perspective of enlightening signal processing for deep time series forecasting. Inspired by the filtering process, we introduce one simple yet effective network, namely FilterNet, built upon our proposed learnable frequency filters to extract key informative temporal patterns by selectively passing or attenuating certain components of time series signals. Concretely, we propose two kinds of learnable filters in the FilterNet: (i) Plain shaping filter, that adopts a universal frequency kernel for signal filtering and temporal modeling; (ii) Contextual shaping filter, that utilizes filtered frequencies examined in terms of its compatibility with input signals fordependency learning. Equipped with the two filters, FilterNet can approximately surrogate the linear and attention mappings widely adopted in time series literature, while enjoying superb abilities in handling high-frequency noises and utilizing the whole frequency spectrum that is beneficial for forecasting. Finally, we conduct extensive experiments on eight time series forecasting benchmarks, and experimental results have demonstrated our superior performance in terms of both effectiveness and efficiency compared with state-of-the-art methods. Our code is available at$^1$.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Generalization Error Bounds for Two-stage Recommender Systems with Tree Structure

Jin Zhang
Ze Liu
Defu Lian
Enhong Chen

Two-stage recommender systems play a crucial role in efficiently identifying relevant items and personalizing recommendations from a vast array of options. This paper, based on an error decomposition framework, analyzes the generalization error for two-stage recommender systems with a tree structure, which consist of an efficient tree-based retriever and a more precise yet time-consuming ranker. We use the Rademacher complexity to establish the generalization upper bound for various tree-based retrievers using beam search, as well as for different ranker models under a shifted training distribution. Both theoretical insights and practical experiments on real-world datasets indicate that increasing the branches in tree-based retrievers and harmonizing distributions across stages can enhance the generalization performance of two-stage recommender systems.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Learning from Highly Sparse Spatio-temporal Data

Leyan Deng
Defu Lian
Chenwang Wu
Enhong Chen

Incomplete spatio-temporal data in real-world has spawned many research. However, existing methods often utilize iterative message-passing across temporal and spatial dimensions, resulting in substantial information loss and high computational cost. We provide a theoretical analysis revealing that such iterative models are not only susceptible to data sparsity but also to graph sparsity, causing unstable performances on different datasets. To overcome these limitations, we introduce a novel method named One-step Propagation and Confidence-based Refinement (OPCR). In the first stage, OPCR leverages inherent spatial and temporal relationships by employing sparse attention mechanism. These modules propagate limited observations directly to the global context through one-step imputation, which are theoretically effected only by data sparsity. Following this, we assign confidence levels to the initial imputations by correlating missing data with valid data. This confidence-based propagation refines the seperate spatial and temporal imputation results through spatio-temporal dependencies. We evaluate the proposed model across various downstream tasks involving highly sparse spatio-temporal data. Empirical results indicate that our model outperforms state-of-the-art imputation methods, demonstrating its superior effectiveness and robustness.

PDF Details DOI

ICML Conference 2024 Conference Paper

Learning-Efficient Yet Generalizable Collaborative Filtering for Item Recommendation

Yuanhao Pu
Xiaolong Chen
Xu Huang 0008
Jin Chen 0008
Defu Lian
Enhong Chen

The weighted squared loss is a common component in several Collaborative Filtering (CF) algorithms for item recommendation, including the representative implicit Alternating Least Squares (iALS). Despite its widespread use, this loss function lacks a clear connection to ranking objectives such as Discounted Cumulative Gain (DCG), posing a fundamental challenge in explaining the exceptional ranking performance observed in these algorithms. In this work, we make a breakthrough by establishing a connection between squared loss and ranking metrics through a Taylor expansion of the DCG-consistent surrogate loss—softmax loss. We also discover a new surrogate squared loss function, namely Ranking-Generalizable Squared (RG$^2$) loss, and conduct thorough theoretical analyses on the DCG-consistency of the proposed loss function. Later, we present an example of utilizing the RG$^2$ loss with Matrix Factorization (MF), coupled with a generalization upper bound and an ALS optimization algorithm that leverages closed-form solutions over all items. Experimental results over three public datasets demonstrate the effectiveness of the RG$^2$ loss, exhibiting ranking performance on par with, or even surpassing, the softmax loss while achieving faster convergence.

Details

ICLR Conference 2024 Conference Paper

NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion Models beyond Spherical Linear Interpolation

Pengfei Zheng
Yonggang Zhang 0003
Zhen Fang 0001
Tongliang Liu
Defu Lian
Bo Han 0003

Image interpolation based on diffusion models is promising in creating fresh and interesting images. Advanced interpolation methods mainly focus on spherical linear interpolation, where images are encoded into the noise space and then interpolated for denoising to images. However, existing methods face challenges in effectively interpolating natural images (not generated by diffusion models), thereby restricting their practical applicability. Our experimental investigations reveal that these challenges stem from the invalidity of the encoding noise, which may no longer obey the expected noise distribution, e.g., a normal distribution. To address these challenges, we propose a novel approach to correct noise for image interpolation, NoiseDiffusion. Specifically, NoiseDiffusion approaches the invalid noise to the expected distribution by introducing subtle Gaussian noise and introduces a constraint to suppress noise with extreme values. In this context, promoting noise validity contributes to mitigating image artifacts, but the constraint and introduced exogenous noise typically lead to a reduction in signal-to-noise ratio, i.e., loss of original image information. Hence, NoiseDiffusion performs interpolation within the noisy image space and injects raw images into these noisy counterparts to address the challenge of information loss. Consequently, NoiseDiffusion enables us to interpolate natural images without causing artifacts or information loss, thus achieving the best interpolation results.

Details

NeurIPS Conference 2023 Conference Paper

Frequency-domain MLPs are More Effective Learners in Time Series Forecasting

Kun Yi
Qi Zhang
Wei Fan
Shoujin Wang
Pengyang Wang
Hui He
Ning An
Defu Lian

Time series forecasting has played the key role in different industrial, including finance, traffic, energy, and healthcare domains. While existing literatures have designed many sophisticated architectures based on RNNs, GNNs, or Transformers, another kind of approaches based on multi-layer perceptrons (MLPs) are proposed with simple structure, low complexity, and superior performance. However, most MLP-based forecasting methods suffer from the point-wise mappings and information bottleneck, which largely hinders the forecasting performance. To overcome this problem, we explore a novel direction of applying MLPs in the frequency domain for time series forecasting. We investigate the learned patterns of frequency-domain MLPs and discover their two inherent characteristic benefiting forecasting, (i) global view: frequency spectrum makes MLPs own a complete view for signals and learn global dependencies more easily, and (ii) energy compaction: frequency-domain MLPs concentrate on smaller key part of frequency components with compact signal energy. Then, we propose FreTS, a simple yet effective architecture built upon Frequency-domain MLPs for Time Series forecasting. FreTS mainly involves two stages, (i) Domain Conversion, that transforms time-domain signals into complex numbers of frequency domain; (ii) Frequency Learning, that performs our redesigned MLPs for the learning of real and imaginary part of frequency components. The above stages operated on both inter-series and intra-series scales further contribute to channel-wise and time-wise dependency learning. Extensive experiments on 13 real-world benchmarks (including 7 benchmarks for short-term forecasting and 6 benchmarks for long-term forecasting) demonstrate our consistent superiority over state-of-the-art methods. Code is available at this repository: https: //github. com/aikunyi/FreTS.

PDF Details

ECAI Conference 2023 Conference Paper

GridFormer: Spatial-Temporal Transformer Network for Citywide Crowd Flow Prediction

Chaoqun Su
Chenwang Wu
Defu Lian

Crowd flow prediction plays a vital role in various fields such as traffic management, public safety, and urban planning. The main challenge in crowd flow prediction lies in effectively modeling the periodic temporal dependency and long-range spatial dependency. In the temporal domain, crowd flow shows a strong periodicity which is exploited by existing works to build multi-time-scale spatial-temporal features. However, these works hardly consider the disturbance of periods, that is, the crowd flow is not strictly periodic. In the spatial domain, existing works mainly utilize CNN to capture spatial dependency, but the small receptive field of the convolution operator limits the ability to capture the long-range dependency between crowd flows in different regions. In this paper, we propose GridFormer, a Transformer network, in which a periodically shifted sampling method and attention mechanism are employed to handle the temporal shifting in the daily and weekly periodicity, and a pyramid 3D Swin Transformers network is designed to capture long-range spatial dependency in a hierarchical manner. Meanwhile, the pyramid 3D Swin Transformers network jointly models spatial-temporal features to enable better interaction between the spatial and temporal domains. Experimental results on three crowd flow datasets demonstrate that our GridFormer outperforms the state-of-the-art crowd flow prediction methods.

Details

IJCAI Conference 2023 Conference Paper

KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node Classification

Likang Wu
Junji Jiang
Hongke Zhao
Hao Wang
Defu Lian
Mengdi Zhang
Enhong Chen

Recently, Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis. This task aims to predict nodes from unseen classes which are unobserved in the training process. Existing work mainly utilizes Graph Neural Networks (GNNs) to associate features' prototypes and labels' semantics thus enabling knowledge transfer from seen to unseen classes. However, the multi-faceted semantic orientation in the feature-semantic alignment has been neglected by previous work, i. e. the content of a node usually covers diverse topics that are relevant to the semantics of multiple labels. It's necessary to separate and judge the semantic factors that tremendously affect the cognitive ability to improve the generality of models. To this end, we propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics via the extracted KG (Knowledge Graph)-based topics. And then the content of each node is reconstructed to a topic-level representation that offers multi-faceted and fine-grained semantic relevancy to different labels. Due to the particularity of the graph's instance (i. e. , node) representation, a novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation. Finally, we conduct extensive experiments on several public graph datasets and design an application of zero-shot cross-domain recommendation. The quantitative results demonstrate both the effectiveness and generalization of KMF with the comparison of state-of-the-art baselines.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Knowledge Distillation for High Dimensional Search Index

Zepu Lu
Jin Chen
Defu Lian
Zaixi Zhang
Yong Ge
Enhong Chen

Lightweight compressed models are prevalent in Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) owing to their superiority of retrieval efficiency in large-scale datasets. However, results given by compressed methods are less accurate due to the curse of dimension and the limitations of optimization objectives (e. g. , lacking interactions between queries and documents). Thus, we are encouraged to design a new learning algorithm for the compressed search index on high dimensions to improve retrieval performance. In this paper, we propose a novel KnowledgeDistillation for high dimensional search index framework (KDindex), with the aim of efficiently learning lightweight indexes by distilling knowledge from high-precision ANNS and MIPS models such as graph-based indexes. Specifically, the student is guided to keep the same ranking order of the top-k relevant results yielded by the teacher model, which acts as the additional supervision signals between queries and documents to learn the similarities between documents. Furthermore, to avoid the trivial solutions that all candidates are partitioned to the same centroid, the reconstruction loss that minimizes the compressed error, and the posting list balance strategy that equally allocates the candidates, are integrated into the learning objective. Experiment results demonstrate that KDindex outperforms existing learnable quantization-based indexes and is 40× lighter than the state-of-the-art non-exhaustive methods while achieving comparable recall quality.

PDF Details

ICLR Conference 2023 Conference Paper

Learned Index with Dynamic $\epsilon$

Daoyuan Chen
Wuchao Li
Yaliang Li
Bolin Ding
Kai Zeng 0002
Defu Lian
Jingren Zhou 0001

Index structure is a fundamental component in database and facilitates broad data retrieval applications. Recent learned index methods show superior performance by learning hidden yet useful data distribution with the help of machine learning, and provide a guarantee that the prediction error is no more than a pre-defined $\epsilon$. However, existing learned index methods adopt a fixed $\epsilon$ for all the learned segments, neglecting the diverse characteristics of different data localities. In this paper, we propose a mathematically-grounded learned index framework with dynamic $\epsilon$, which is efficient and pluggable to existing learned index methods. We theoretically analyze prediction error bounds that link $\epsilon$ with data characteristics for an illustrative learned index method. Under the guidance of the derived bounds, we learn how to vary $\epsilon$ and improve the index performance with a better space-time trade-off. Experiments with real-world datasets and several state-of-the-art methods demonstrate the efficiency, effectiveness and usability of the proposed framework.

Details

AAAI Conference 2023 Conference Paper

Query-Aware Quantization for Maximum Inner Product Search

Jin Zhang
Defu Lian
Haodi Zhang
Baoyun Wang
Enhong Chen

Maximum Inner Product Search (MIPS) plays an essential role in many applications ranging from information retrieval, recommender systems to natural language processing. However, exhaustive MIPS is often expensive and impractical when there are a large number of candidate items. The state-of-the-art quantization method of approximated MIPS is product quantization with a score-aware loss, developed by assuming that queries are uniformly distributed in the unit sphere. However, in real-world datasets, the above assumption about queries does not necessarily hold. To this end, we propose a quantization method based on the distribution of queries combined with sampled softmax. Further, we introduce a general framework encompassing the proposed method and multiple quantization methods, and we develop an effective optimization for the proposed general framework. The proposed method is evaluated on three real-world datasets. The experimental results show that it outperforms the state-of-the-art baselines.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Anisotropic Additive Quantization for Fast Inner Product Search

Jin Zhang
Qi Liu
Defu Lian
Zheng Liu
Le Wu
Enhong Chen

Maximum Inner Product Search (MIPS) plays an important role in many applications ranging from information retrieval, recommender systems to natural language processing and machine learning. However, exhaustive MIPS is often expensive and impractical when there are a large number of candidate items. The state-of-the-art approximated MIPS is product quantization with a score-aware loss, which weighs more heavily on items with larger inner product scores. However, it is challenging to extend the score-aware loss for additive quantization due to parallel-orthogonal decomposition of residual error. Learning additive quantization with respect to this loss is important since additive quantization can achieve a lower approximation error than product quantization. To this end, we propose a quantization method called Anisotropic Additive Quantization to combine the scoreaware anisotropic loss and additive quantization. To efficiently update the codebooks in this algorithm, we develop a new alternating optimization algorithm. The proposed algorithm is extensively evaluated on three real-world datasets. The experimental results show that it outperforms the stateof-the-art baselines with respect to approximate search accuracy while guaranteeing a similar retrieval efficiency.

PDF Details

NeurIPS Conference 2022 Conference Paper

Cache-Augmented Inbatch Importance Resampling for Training Recommender Retriever

Jin Chen
Defu Lian
Yucheng Li
Baoyun Wang
Kai Zheng
Enhong Chen

Recommender retrievers aim to rapidly retrieve a fraction of items from the entire item corpus when a user query requests, with the representative two-tower model trained with the log softmax loss. For efficiently training recommender retrievers on modern hardwares, inbatch sampling, where the items in the mini-batch are shared as negatives to estimate the softmax function, has attained growing interest. However, existing inbatch sampling based strategies just correct the sampling bias of inbatch items with item frequency, being unable to distinguish the user queries within the mini-batch and still incurring significant bias from the softmax. In this paper, we propose a Cache-Augmented Inbatch Importance Resampling (XIR) for training recommender retrievers, which not only offers different negatives to user queries with inbatch items, but also adaptively achieves a more accurate estimation of the softmax distribution. Specifically, XIR resamples items from the given mini-batch training pairs based on certain probabilities, where a cache with more frequently sampled items is adopted to augment the candidate item set, with the purpose of reusing the historical informative samples. XIR enables to sample query-dependent negatives based on inbatch items and to capture dynamic changes of model training, which leads to a better approximation of the softmax and further contributes to better convergence. Finally, we conduct experiments to validate the superior performance of the proposed XIR compared with competitive approaches.

PDF Details

NeurIPS Conference 2022 Conference Paper

Graph Convolution Network based Recommender Systems: Learning Guarantee and Item Mixture Powered Strategy

Leyan Deng
Defu Lian
Chenwang Wu
Enhong Chen

Inspired by their powerful representation ability on graph-structured data, Graph Convolution Networks (GCNs) have been widely applied to recommender systems, and have shown superior performance. Despite their empirical success, there is a lack of theoretical explorations such as generalization properties. In this paper, we take a first step towards establishing a generalization guarantee for GCN-based recommendation models under inductive and transductive learning. We mainly investigate the roles of graph normalization and non-linear activation, providing some theoretical understanding, and construct extensive experiments to further verify these findings empirically. Furthermore, based on the proven generalization bound and the challenge of existing models in discrete data learning, we propose Item Mixture (IMix) to enhance recommendation. It models discrete spaces in a continuous manner by mixing the embeddings of positive-negative item pairs, and its effectiveness can be strictly guaranteed from empirical and theoretical aspects.

PDF Details

NeurIPS Conference 2022 Conference Paper

Recommender Forest for Efficient Retrieval

Chao Feng
Wuchao Li
Defu Lian
Zheng Liu
Enhong Chen

Recommender systems (RS) have to select the top-N items from a massive item set. For the sake of efficient recommendation, RS usually represents user and item as latent embeddings, and relies on approximate nearest neighbour search (ANNs) to retrieve the recommendation result. Despite the reduction of running time, the representation learning is independent of ANNs index construction; thus, the two operations can be incompatible, which results in potential loss of recommendation accuracy. To overcome the above problem, we propose the Recommender Forest (a. k. a. , RecForest), which jointly learns latent embedding and index for efficient and high-fidelity recommendation. RecForest consists of multiple k-ary trees, each of which is a partition of the item set via hierarchical balanced clustering such that each item is uniquely represented by a path from the root to a leaf. Given such a data structure, an encoder-decoder based routing network is developed: it first encodes the context, i. e. , user information, into hidden states; then, leveraging a transformer-based decoder, it identifies the top-N items via beam search. Compared with the existing methods, RecForest brings in the following advantages: 1) the false partition of the boundary items can be effectively alleviated by the use of multiple trees; 2) the routing operation becomes much more accurate thanks to the powerful transformer decoder; 3) the tree parameters are shared across different tree levels, making the index to be extremely memory-efficient. The experimental studies are performed on five popular recommendation datasets: with a significantly simplified training cost, RecForest outperforms competitive baseline approaches in terms of both recommendation accuracy and efficiency.

PDF Details

AAAI Conference 2021 Conference Paper

Efficient Optimal Selection for Composited Advertising Creatives with Tree Structure

Jin Chen
Tiezheng Ge
Gangwei Jiang
Zhiqiang Zhang
Defu Lian
Kai Zheng

Ad creatives are one of the prominent mediums for online e-commerce advertisements. Ad creatives with enjoyable visual appearance may increase the click-through rate (CTR) of products. Ad creatives are typically handcrafted by advertisers and then delivered to the advertising platforms for advertisement. In recent years, advertising platforms are capable of instantly compositing ad creatives with arbitrarily designated elements of each ingredient, so advertisers are only required to provide basic materials. While facilitating the advertisers, a great number of potential ad creatives can be composited, making it difficult to accurately estimate CTR for them given limited real-time feedback. To this end, we propose an Adaptive and Efficient ad creative Selection (AES) framework based on a tree structure. The tree structure on compositing ingredients enables dynamic programming for efficient ad creative selection on the basis of CTR. Due to limited feedback, the CTR estimator is usually of high variance. Exploration techniques based on Thompson sampling are widely used for reducing variances of the CTR estimator, alleviating feedback sparsity. Based on the tree structure, Thompson sampling is adapted with dynamic programming, leading to efficient exploration for potential ad creatives with the largest CTR. We finally evaluate the proposed algorithm on the synthetic dataset and the real-world dataset. The results show that our approach can outperform competing baselines in terms of convergence rate and overall CTR.

PDF Details

NeurIPS Conference 2021 Conference Paper

GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph

Junhan Yang
Zheng Liu
Shitao Xiao
Chaozhuo Li
Defu Lian
Sanjay Agrawal
Amit Singh
Guangzhong Sun

The representation learning on textual graph is to generate low-dimensional embeddings for the nodes based on the individual textual features and the neighbourhood information. Recent breakthroughs on pretrained language models and graph neural networks push forward the development of corresponding techniques. The existing works mainly rely on the cascaded model architecture: the textual features of nodes are independently encoded by language models at first; the textual embeddings are aggregated by graph neural networks afterwards. However, the above architecture is limited due to the independent modeling of textual features. In this work, we propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models. With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow, making each node's semantic accurately comprehended from the global perspective. In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph. Extensive evaluations are conducted on three large-scale benchmark datasets, where GraphFormers outperform the SOTA baselines with comparable running efficiency. The source code is released at https: //github. com/microsoft/GraphFormers.

PDF Details

NeurIPS Conference 2021 Conference Paper

Meta-learning with an Adaptive Task Scheduler

Huaxiu Yao
Yu Wang
Ying Wei
Peilin Zhao
Mehrdad Mahdavi
Defu Lian
Chelsea Finn

To benefit the learning of a new task, meta-learning has been proposed to transfer a well-generalized meta-model learned from various meta-training tasks. Existing meta-learning algorithms randomly sample meta-training tasks with a uniform probability, under the assumption that tasks are of equal importance. However, it is likely that tasks are detrimental with noise or imbalanced given a limited number of meta-training tasks. To prevent the meta-model from being corrupted by such detrimental tasks or dominated by tasks in the majority, in this paper, we propose an adaptive task scheduler (ATS) for the meta-training process. In ATS, for the first time, we design a neural scheduler to decide which meta-training tasks to use next by predicting the probability being sampled for each candidate task, and train the scheduler to optimize the generalization capacity of the meta-model to unseen tasks. We identify two meta-model-related factors as the input of the neural scheduler, which characterize the difficulty of a candidate task to the meta-model. Theoretically, we show that a scheduler taking the two factors into account improves the meta-training loss and also the optimization landscape. Under the setting of meta-learning with noise and limited budgets, ATS improves the performance on both miniImageNet and a real-world drug discovery benchmark by up to 13% and 18%, respectively, compared to state-of-the-art task schedulers.

PDF Details

TIST Journal 2021 Journal Article

Predicting Human Mobility with Reinforcement-Learning-Based Long-Term Periodicity Modeling

Shuo Tao
Jingang Jiang
Defu Lian
Kai Zheng
Enhong Chen

Mobility prediction plays an important role in a wide range of location-based applications and services. However, there are three problems in the existing literature: (1) explicit high-order interactions of spatio-temporal features are not systemically modeled; (2) most existing algorithms place attention mechanisms on top of recurrent network, so they can not allow for full parallelism and are inferior to self-attention for capturing long-range dependence; (3) most literature does not make good use of long-term historical information and do not effectively model the long-term periodicity of users. To this end, we propose MoveNet and RLMoveNet. MoveNet is a self-attention-based sequential model, predicting each user’s next destination based on her most recent visits and historical trajectory. MoveNet first introduces a cross-based learning framework for modeling feature interactions. With self-attention on both the most recent visits and historical trajectory, MoveNet can use an attention mechanism to capture the user’s long-term regularity in a more efficient way. Based on MoveNet, to model long-term periodicity more effectively, we add the reinforcement learning layer and named RLMoveNet. RLMoveNet regards the human mobility prediction as a reinforcement learning problem, using the reinforcement learning layer as the regularization part to drive the model to pay attention to the behavior with periodic actions, which can help us make the algorithm more effective. We evaluate both of them with three real-world mobility datasets. MoveNet outperforms the state-of-the-art mobility predictor by around 10% in terms of accuracy, and simultaneously achieves faster convergence and over 4x training speedup. Moreover, RLMoveNet achieves higher prediction accuracy than MoveNet, which proves that modeling periodicity explicitly from the perspective of reinforcement learning is more effective.

Details DOI

IJCAI Conference 2021 Conference Paper

Preference-Adaptive Meta-Learning for Cold-Start Recommendation

Li Wang
Binbin Jin
Zhenya Huang
Hongke Zhao
Defu Lian
Qi Liu
Enhong Chen

In recommender systems, the cold-start problem is a critical issue. To alleviate this problem, an emerging direction adopts meta-learning frameworks and achieves success. Most existing works aim to learn globally shared prior knowledge across all users so that it can be quickly adapted to a new user with sparse interactions. However, globally shared prior knowledge may be inadequate to discern users’ complicated behaviors and causes poor generalization. Therefore, we argue that prior knowledge should be locally shared by users with similar preferences who can be recognized by social relations. To this end, in this paper, we propose a Preference-Adaptive Meta-Learning approach (PAML) to improve existing meta-learning frameworks with better generalization capacity. Specifically, to address two challenges imposed by social relations, we first identify reliable implicit friends to strengthen a user’s social relations based on our defined palindrome paths. Then, a coarse-fine preference modeling method is proposed to leverage social relations and capture the preference. Afterwards, a novel preference-specific adapter is designed to adapt the globally shared prior knowledge to the preference-specific knowledge so that users who have similar tastes share similar knowledge. We conduct extensive experiments on two publicly available datasets. Experimental results validate the power of social relations and the effectiveness of PAML.

PDF Details DOI

AAAI Conference 2020 Conference Paper

A Variational Point Process Model for Social Event Sequences

Zhen Pan
Zhenya Huang
Defu Lian
Enhong Chen

Many events occur in real-world and social networks. Events are related to the past and there are patterns in the evolution of event sequences. Understanding the patterns can help us better predict the type and arriving time of the next event. In the literature, both feature-based approaches and generative approaches are utilized to model the event sequence. Feature-based approaches extract a variety of features, and train a regression or classiﬁcation model to make a prediction. Yet, their performance is dependent on the experience-based feature exaction. Generative approaches usually assume the evolution of events follow a stochastic point process (e. g. , Poisson process or its complexer variants). However, the true distribution of events is never known and the performance depends on the design of stochastic process in practice. To solve the above challenges, in this paper, we present a novel probabilistic generative model for event sequences. The model is termed Variational Event Point Process (VEPP). Our model introduces variational auto-encoder to event sequence modeling that can better use the latent information and capture the distribution over inter-arrival time and types of event sequences. Experiments on real-world datasets prove effectiveness of our proposed model.

PDF Details

IS Journal 2020 Journal Article

Collaborative Filtering With Ranking-Based Priors on Unknown Ratings

Jin Chen
Defu Lian
Kai Zheng

Advanced collaborative filtering methods based on explicit feedback assume that unknown ratings are missing not at random. The state-of-the-art algorithm hypothesizes that unknown items are weakly rated and sets an explicit prior to unknown ratings. However, the prior assuming unknown ratings be close to zero may be questionable and it is challenging to set appropriate prior ratings for unknown items. In this article, to avert the use of prior ratings, we propose a ranking-based prior by hypothesizing that each user's unknown ratings are close to each other. This prior essentially acts as a regularizer to penalize the discrepancy of predicted ratings between any two unknown items. With the ranking-based prior, we design a generic collaborative filtering framework for explicit feedback and develop an efficient optimization algorithm for parameter learning. We finally evaluate the proposed algorithms on four real-world rating datasets. The results show that the proposed algorithms consistently outperform the state-of-the-art baselines and that the ranking-based prior leads to superior recommendation accuracy.

Details DOI

IJCAI Conference 2020 Conference Paper

GoGNN: Graph of Graphs Neural Network for Predicting Structured Entity Interactions

Hanchen Wang
Defu Lian
Ying Zhang
Lu Qin
Xuemin Lin

Entity interaction prediction is essential in many important applications such as chemistry, biology, material science, and medical science. The problem becomes quite challenging when each entity is represented by a complex structure, namely structured entity, because two types of graphs are involved: local graphs for structured entities and a global graph to capture the interactions between structured entities. We observe that existing works on structured entity interaction prediction cannot properly exploit the unique graph of graphs model. In this paper, we propose a Graph of Graphs Neural Network, namely GoGNN, which extracts the features in both structured entity graphs and the entity interaction graph in a hierarchical way. We also propose the dual-attention mechanism that enables the model to preserve the neighbor importance in both levels of graphs. Extensive experiments on real-world datasets show that GoGNN outperforms the state-of-the-art methods on two representative structured entity interaction prediction tasks: chemical-chemical interaction prediction and drug-drug interaction prediction. Our code is available at Github.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Graph Convolutional Networks with Markov Random Field Reasoning for Social Spammer Detection

Yongji Wu
Defu Lian
Yiheng Xu
Le Wu
Enhong Chen

The recent growth of social networking platforms also led to the emergence of social spammers, who overwhelm legitimate users with unwanted content. The existing social spammer detection methods can be characterized into two categories: features based ones and propagation-based ones. Features based methods mainly rely on matrix factorization using tweet text features, and regularization using social graphs is incorporated. However, these methods are fully supervised and can only utilize labeled part of social graphs, which fail to work in a real-world semi-supervised setting. The propagation-based methods primarily employ Markov Random Fields (MRFs) to capture human intuitions in user following relations, which cannot take advantages of rich text features. In this paper, we propose a novel social spammer detection model based on Graph Convolutional Networks (GCNs) that operate on directed social graphs by explicitly considering three types of neighbors. Furthermore, inspired by the propagation-based methods, we propose a MRF layer with reﬁning effects to encapsulate these human insights in social relations, which can be formulated as a RNN through mean-ﬁeld approximate inference, and stack on top of GCN layers to enable end-to-end training. We evaluate our proposed method on two real-world social network datasets, and the results demonstrate that our method outperforms the stateof-the-art approaches.

PDF Details

NeurIPS Conference 2020 Conference Paper

Sampling-Decomposable Generative Adversarial Recommender

Binbin Jin
Defu Lian
Zheng Liu
Qi Liu
Jianhui Ma
Xing Xie
Enhong Chen

Recommendation techniques are important approaches for alleviating information overload. Being often trained on implicit user feedback, many recommenders suffer from the sparsity challenge due to the lack of explicitly negative samples. The GAN-style recommenders (i. e. , IRGAN) addresses the challenge by learning a generator and a discriminator adversarially, such that the generator produces increasingly difficult samples for the discriminator to accelerate optimizing the discrimination objective. However, producing samples from the generator is very time-consuming, and our empirical study shows that the discriminator performs poor in top-k item recommendation. To this end, a theoretical analysis is made for the GAN-style algorithms, showing that the generator of limit capacity is diverged from the optimal generator. This may interpret the limitation of discriminator's performance. Based on these findings, we propose a Sampling-Decomposable Generative Adversarial Recommender (SD-GAR). In the framework, the divergence between some generator and the optimum is compensated by self-normalized importance sampling; the efficiency of sample generation is improved with a sampling-decomposable generator, such that each sample can be generated in O(1) with the Vose-Alias method. Interestingly, due to decomposability of sampling, the generator can be optimized with the closed-form solutions in an alternating manner, being different from policy gradient in the GAN-style algorithms. We extensively evaluate the proposed algorithm with five real-world recommendation datasets. The results show that SD-GAR outperforms IRGAN by 12. 4% and the SOTA recommender by 10% on average. Moreover, discriminator training can be 20x faster on the dataset with more than 120K items.

PDF Details

AAAI Conference 2019 Conference Paper

Adversarial Binary Collaborative Filtering for Implicit Feedback

Haoyu Wang
Nan Shao
Defu Lian

Fast item recommendation based on implicit feedback is vital in practical scenarios due to data-abundance, but challenging because of the lack of negative samples and the large number of recommended items. Recent adversarial methods unifying generative and discriminative models are promising, since the generative model, as a negative sampler, gradually improves as iteration continues. However, binary-valued generative model is still unexplored within the min-max framework, but important for accelerating item recommendation. Optimizing binary-valued models is difficult due to non-smooth and nondifferentiable. To this end, we propose two novel methods to relax the binarization based on the error function and Gumbel trick so that the generative model can be optimized by many popular solvers, such as SGD and ADMM. The binary-valued generative model is then evaluated within the min-max framework on four real-world datasets and shown its superiority to competing hashing-based recommendation algorithms. In addition, our proposed framework can approximate discrete variables precisely and be applied to solve other discrete optimization problems.

PDF Details

IJCAI Conference 2019 Conference Paper

Binarized Collaborative Filtering with Distilling Graph Convolutional Network

Haoyu Wang
Defu Lian
Yong Ge

The efficiency of top-K item recommendation based on implicit feedback are vital to recommender systems in real world, but it is very challenging due to the lack of negative samples and the large number of candidate items. To address the challenges, we firstly introduce an improved Graph Convolutional Network~(GCN) model with high-order feature interaction considered. Then we distill the ranking information derived from GCN into binarized collaborative filtering, which makes use of binary representation to improve the efficiency of online recommendation. However, binary codes are not only hard to be optimized but also likely to incur the loss of information during the training processing. Therefore, we propose a novel framework to convert the binary constrained optimization problem into an equivalent continuous optimization problem with a stochastic penalty. The binarized collaborative filtering model is then easily optimized by many popular solvers like SGD and Adam. The proposed algorithm is finally evaluated on three real-world datasets and shown the superiority to the competing baselines.

PDF Details

IJCAI Conference 2019 Conference Paper

Graph Convolutional Networks on User Mobility Heterogeneous Graphs for Social Relationship Inference

Yongji Wu
Defu Lian
Shuowei Jin
Enhong Chen

Inferring social relations from user trajectory data is of great value in real-world applications such as friend recommendation and ride-sharing. Most existing methods predict relationship based on a pairwise approach using some hand-crafted features or rely on a simple skip-gram based model to learn embeddings on graphs. Using hand-crafted features often fails to capture the complex dynamics in human social relations, while the graph embedding based methods only use random walks to propagate information and cannot incorporate external semantic data provided. We propose a novel model that utilizes Graph Convolutional Networks (GCNs) to learn user embeddings on the User Mobility Heterogeneous Graph in an unsupervised manner. This model is capable of propagating relation layer-wisely as well as combining both the rich structural information in the heterogeneous graph and predictive node features provided. Our method can also be extended to a semi-supervised setting if a part of the social network is available. The evaluation on three real-world datasets demonstrates that our method outperforms the state-of-the-art approaches.

PDF Details

AAAI Conference 2019 Conference Paper

Improving One-Class Collaborative Filtering via Ranking-Based Implicit Regularizer

Jin Chen
Defu Lian
Kai Zheng

One-class collaborative filtering (OCCF) problems are vital in many applications of recommender systems, such as news and music recommendation, but suffers from sparsity issues and lacks negative examples. To address this problem, the state-of-the-arts assigned smaller weights to unobserved samples and performed low-rank approximation. However, the ground-truth ratings of unobserved samples are usually set to zero but ill-defined. In this paper, we propose a ranking-based implicit regularizer and provide a new general framework for OCCF, to avert the ground-truth ratings of unobserved samples. We then exploit it to regularize a ranking-based loss function and design efficient optimization algorithms to learn model parameters. Finally, we evaluate them on three realworld datasets. The results show that the proposed regularizer significantly improves ranking-based algorithms and that the proposed framework outperforms the state-of-the-art OCCF algorithms.

PDF Details

TIST Journal 2019 Journal Article

Predicting Academic Performance for College Students

Huaxiu Yao
Defu Lian
Yi Cao
Yifan Wu
Tao Zhou

Detecting abnormal behaviors of students in time and providing personalized intervention and guidance at the early stage is important in educational management. Academic performance prediction is an important building block to enabling this pre-intervention and guidance. Most of the previous studies are based on questionnaire surveys and self-reports, which suffer from small sample size and social desirability bias. In this article, we collect longitudinal behavioral data from the smart cards of 6,597 students and propose three major types of discriminative behavioral factors, diligence, orderliness, and sleep patterns. Empirical analysis demonstrates these behavioral factors are strongly correlated with academic performance. Furthermore, motivated by the social influence theory, we analyze the correlation between each student’s academic performance with his/her behaviorally similar students’. Statistical tests indicate this correlation is significant. Based on these factors, we further build a multi-task predictive framework based on a learning-to-rank algorithm for academic performance prediction. This framework captures inter-semester correlation, inter-major correlation, and integrates student similarity to predict students’ academic performance. The experiments on a large-scale real-world dataset show the effectiveness of our methods for predicting academic performance and the effectiveness of proposed behavioral factors.

Details DOI

AAAI Conference 2019 Conference Paper

Preference-Aware Task Assignment in Spatial Crowdsourcing

Yan Zhao
Jinfu Xia
Guanfeng Liu
Han Su
Defu Lian
Shuo Shang
Kai Zheng

With the ubiquity of smart devices, Spatial Crowdsourcing (SC) has emerged as a new transformative platform that engages mobile users to perform spatio-temporal tasks by physically traveling to specified locations. Thus, various SC techniques have been studied for performance optimization, among which one of the major challenges is how to assign workers the tasks that they are really interested in and willing to perform. In this paper, we propose a novel preference-aware spatial task assignment system based on workers’ temporal preferences, which consists of two components: History-based Context-aware Tensor Decomposition (HCTD) for workers’ temporal preferences modeling and preference-aware task assignment. We model worker preferences with a three-dimension tensor (worker-task-time). Supplementing the missing entries of the tensor through HCTD with the assistant of historical data and other two context matrices, we recover worker preferences for different categories of tasks in different time slots. Several preference-aware task assignment algorithms are then devised, aiming to maximize the total number of task assignments at every time instance, in which we give higher priorities to the workers who are more interested in the tasks. We conduct extensive experiments using a real dataset, verifying the practicability of our proposed methods.

PDF Details

AAAI Conference 2018 Conference Paper

Attention-Based Transactional Context Embedding for Next-Item Recommendation

Shoujin Wang
Liang Hu
Longbing Cao
Xiaoshui Huang
Defu Lian
Wei Liu

To recommend the next item to a user in a transactional context is practical yet challenging in applications such as marketing campaigns. Transactional context refers to the items that are observable in a transaction. Most existing transactionbased recommender systems (TBRSs) make recommendations by mainly considering recently occurring items instead of all the ones observed in the current context. Moreover, they often assume a rigid order between items within a transaction, which is not always practical. More importantly, a long transaction often contains many items irreverent to the next choice, which tends to overwhelm the inﬂuence of a few truely relevant ones. Therefore, we posit that a good TBRS should not only consider all the observed items in the current transaction but also weight them with different relevance to build an attentive context that outputs the proper next item with a high probability. To this end, we design an effective attentionbased transaction embedding model (ATEM) for context embedding to weight each observed item in a transaction without assuming order. The empirical study on real-world transaction datasets proves that ATEM signiﬁcantly outperforms the state-of-the-art methods in terms of both accuracy and novelty.

PDF Details

AAAI Conference 2018 Conference Paper

Sparse Modeling-Based Sequential Ensemble Learning for Effective Outlier Detection in High-Dimensional Numeric Data

Guansong Pang
Longbing Cao
Ling Chen
Defu Lian
Huan Liu

The large proportion of irrelevant or noisy features in reallife high-dimensional data presents a signiﬁcant challenge to subspace/feature selection-based high-dimensional outlier detection (a. k. a. outlier scoring) methods. These methods often perform the two dependent tasks: relevant feature subset search and outlier scoring independently, consequently retaining features/subspaces irrelevant to the scoring method and downgrading the detection performance. This paper introduces a novel sequential ensemble-based framework SEMSE and its instance CINFO to address this issue. SEMSE learns the sequential ensembles to mutually reﬁne feature selection and outlier scoring by iterative sparse modeling with outlier scores as the pseudo target feature. CINFO instantiates SEMSE by using three successive recurrent components to build such sequential ensembles. Given outlier scores output by an existing outlier scoring method on a feature subset, CINFO ﬁrst deﬁnes a Cantelli’s inequality-based outlier thresholding function to select outlier candidates with a false positive upper bound. It then performs lasso-based sparse regression by treating the outlier scores as the target feature and the original features as predictors on the outlier candidate set to obtain a feature subset that is tailored for the outlier scoring method. Our experiments show that two different outlier scoring methods enabled by CINFO (i) perform signiﬁcantly better on 11 real-life high-dimensional data sets, and (ii) have much better resilience to noisy features, compared to their bare versions and three state-of-theart competitors. The source code of CINFO is available at https: //sites. google. com/site/gspangsite/sourcecode.

PDF Details

AAAI Conference 2017 Conference Paper

Discrete Personalized Ranking for Fast Collaborative Filtering from Implicit Feedback

Yan Zhang
Defu Lian
Guowu Yang

Personalized ranking is usually considered as an ultimate goal of recommendation systems, but it suffers from efﬁciency issues when making recommendations. To this end, we propose a learning-based hashing framework called Discrete Personalized Ranking (DPR), to map users and items to a Hamming space, where user-item afﬁnity can be efﬁciently calculated via Hamming distance. Due to the existence of discrete constraints, it is possible to exploit a two-stage learning procedure for learning binary codes according to most existing methods. This two-stage procedure consists of relaxed optimization by discarding discrete constraints and subsequent binary quantization. However, such a procedure has been shown resulting in a large quantization loss, so that longer binary codes would be required. To this end, DPR directly tackles the discrete optimization problem of personalized ranking. And the balance and un-correlation constraints of binary codes are imposed to derive compact but informatics binary codes. Based on the evaluation on several datasets, the proposed framework shows consistent superiority to the competing baselines even though only using shorter binary code.

PDF Details

IJCAI Conference 2017 Conference Paper

Learning User's Intrinsic and Extrinsic Interests for Point-of-Interest Recommendation: A Unified Approach

Huayu Li
Yong Ge
Defu Lian
Hao Liu

Point-of-Interest (POI) recommendation has been an important service on location-based social networks. However, it is very challenging to generate accurate recommendations due to the complex nature of user's interest in POI and the data sparseness. In this paper, we propose a novel unified approach that could effectively learn fine-grained and interpretable user's interest, and adaptively model the missing data. Specifically, a user's general interest in POI is modeled as a mixture of her intrinsic and extrinsic interests, upon which we formulate the ranking constraints in our unified recommendation approach. Furthermore, a self-adaptive location-oriented method is proposed to capture the inherent property of missing data, which is formulated as squared error based loss in our unified optimization objective. Extensive experiments on real-world datasets demonstrate the effectiveness and advantage of our approach.

PDF Details

IJCAI Conference 2016 Conference Paper

A Relaxed Ranking-Based Factor Model for Recommender System from Implicit Feedback

Huayu Li
Richang Hong
Defu Lian
Zhiang Wu
Meng Wang
Yong Ge

Implicit feedback based recommendation has recently been an important task with the accumulated user-item interaction data. However, it is very challenging to produce recommendations from implicit feedback due to the sparseness of data and the lack of negative feedback/rating. Although various factor models have been proposed to tackle this problem, they either focus on rating prediction that may lead to inaccurate top-k recommendations or are dependent on the sampling of negative feedback that often results in bias. To this end, we propose a Relaxed Ranking-based Factor Model, RRFM, to relax pairwise ranking into a SVM-like task, where positive and negative feedbacks are separated by the soft boundaries, and their non-separate property is employed to capture the characteristic of unobserved data. A smooth and scalable algorithm is developed to solve group- and instance- level's optimization and parameter estimation. Extensive experiments based on real-world datasets demonstrate the effectiveness and advantage of our approach.

PDF Details

IJCAI Conference 2016 Conference Paper

Sparse Bayesian Content-Aware Collaborative Filtering for Implicit Feedback

Defu Lian
Yong Ge
Nicholas Jing Yuan
Xing Xie
Hui Xiong

The popularity of social media creates a large amount of user-generated content, playing an important role in addressing cold-start problems in recommendation. Although much effort has been devoted to incorporating this information into recommendation, past work mainly targets explicit feedback. There is still no general framework tailored to implicit feedback, such as views, listens, or visits. To this end, we propose a sparse Bayesian content-aware collaborative filtering framework especially for implicit feedback, and develop a scalable optimization algorithm to jointly learn latent factors and hyperparameters. Due to the adaptive update of hyperparameters, automatic feature selection is naturally embedded in this framework. Convincing experimental results on three different implicit feedback datasets indicate the superiority of the proposed algorithm to state-of-the-art content-aware recommendation methods.

PDF Details

TIST Journal 2015 Journal Article

CEPR

Defu Lian
Xing Xie
Vincent W. Zheng
Nicholas Jing Yuan
Fuzheng Zhang
Enhong Chen

With the growing popularity of location-based social networks, numerous location visiting records (e.g., check-ins) continue to accumulate over time. The more these records are collected, the better we can understand users’ mobility patterns and the more accurately we can predict their future locations. However, due to the personality trait of neophilia, people also show propensities of novelty seeking in human mobility, that is, exploring unvisited but tailored locations for them to visit. As such, the existing prediction algorithms, mainly relying on regular mobility patterns, face severe challenges because such behavior is beyond the reach of regularity. As a matter of fact, the prediction of this behavior not only relies on the forecast of novelty-seeking tendency but also depends on how to determine unvisited candidate locations. To this end, we put forward a Collaborative Exploration and Periodically Returning model (CEPR), based on a novel problem, Exploration Prediction (EP), which forecasts whether people will seek unvisited locations to visit, in the following. When people are predicted to do exploration, a state-of-the-art recommendation algorithm, armed with collaborative social knowledge and assisted by geographical influence, will be applied for seeking the suitable candidates; otherwise, a traditional prediction algorithm, incorporating both regularity and the Markov model, will be put into use for figuring out the most possible locations to visit. We then perform case studies on check-ins and evaluate them on two large-scale check-in datasets with 6M and 36M records, respectively. The evaluation results show that EP achieves a roughly 20% classification error rate on both datasets, greatly outperforming the baselines, and that CEPR improves performances by as much as 30% compared to the traditional location prediction algorithms.

Details DOI

TIST Journal 2014 Journal Article

Mining Check-In History for Personalized Location Naming

Defu Lian
Xing Xie

Many innovative location-based services have been established to offer users greater convenience in their everyday lives. These services usually cannot map user's physical locations into semantic names automatically. The semantic names of locations provide important context for mobile recommendations and advertisements. In this article, we proposed a novel location naming approach which can automatically provide semantic names for users given their locations and time. In particular, when a user opens a GPS device and submits a query with her physical location and time, she will be returned the most appropriate semantic name. In our approach, we drew an analogy between location naming and local search, and designed a local search framework to propose a spatiotemporal and user preference (STUP) model for location naming. STUP combined three components, user preference (UP), spatial preference (SP), and temporal preference (TP), by leveraging learning-to-rank techniques. We evaluated STUP on 466,190 check-ins of 5,805 users from Shanghai and 135,052 check-ins of 1,361 users from Beijing. The results showed that SP was most effective among three components and that UP can provide personalized semantic names, and thus it was a necessity for location naming. Although TP was not as discriminative as the others, it can still be beneficial when integrated with SP and UP. Finally, according to the experimental results, STUP outperformed the proposed baselines and returned accurate semantic names for 23.6% and 26.6% of the testing queries from Beijing and Shanghai, respectively.

Details DOI