Arrow Research search

Author name cluster

Fuli Feng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers
2 author rows

Possible papers

28

AAAI Conference 2026 Conference Paper

Don’t Start Over: A Cost-Effective Framework for Migrating Personalized Prompts Between LLMs

  • Ziyi Zhao
  • Chongming Gao
  • Yang Zhang
  • Haoyan Liu
  • Weinan Gan
  • Huifeng Guo
  • Yong Liu
  • Fuli Feng

Personalization in Large Language Models (LLMs) often relies on user-specific soft prompts. However, these prompts become obsolete when the foundation model is upgraded, necessitating costly, full-scale retraining. To overcome this limitation, we propose the Prompt-level User Migration Adapter (PUMA), a lightweight framework to efficiently migrate personalized prompts across incompatible models. PUMA utilizes a parameter-efficient adapter to bridge the semantic gap, combined with a group-based user selection strategy to significantly reduce training costs. Experiments on three large-scale datasets show our method matches or even surpasses the performance of retraining from scratch, reducing computational cost by up to 98%. The framework demonstrates strong generalization across diverse model architectures and robustness in advanced scenarios like chained and aggregated migrations, offering a practical path for the sustainable evolution of personalized AI by decoupling user assets from the underlying models.

AAAI Conference 2026 Conference Paper

Navigating Through Paper Flood: Advancing LLM-Based Paper Evaluation Through Domain-Aware Retrieval and Latent Reasoning

  • Wuqiang Zheng
  • Yiyan Xu
  • Xinyu Lin
  • Chongming Gao
  • Wenjie Wang
  • Fuli Feng

With the rapid and continuous increase in academic publications, identifying high-quality research has become an increasingly pressing challenge. While recent methods leveraging Large Language Models (LLMs) for automated paper evaluation have shown great promise, they are often constrained by outdated domain knowledge and limited reasoning capabilities. In this work, we present PaperEval, a novel LLM-based framework for automated paper evaluation that addresses these limitations through two key components: 1) a domain-aware paper retrieval module that retrieves relevant concurrent work to support contextualized assessments of novelty and contributions, and 2) a latent reasoning mechanism that enables deep understanding of complex motivations and methodologies, along with comprehensive comparison against concurrently related work, to support more accurate and reliable evaluation. To guide the reasoning process, we introduce a progressive ranking optimization strategy that encourages the LLM to iteratively refine its predictions with an emphasis on relative comparison. Experiments on two datasets demonstrate that PaperEval consistently outperforms existing methods in both academic impact and paper quality evaluation. In addition, we deploy PaperEval in a real-world paper recommendation system for filtering high-quality papers, which has gained strong engagement on social media---amassing over 8,000 subscribers and attracting over 10,000 views for many filtered high-quality papers---demonstrating the practical effectiveness of PaperEval.

AAAI Conference 2025 Conference Paper

CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG

  • Boyi Deng
  • Wenjie Wang
  • Fengbin Zhu
  • Qifan Wang
  • Fuli Feng

Retrieval-Augmented Generation (RAG) can alleviate hallucinations of Large Language Models (LLMs) by referencing external documents. However, the misinformation in external documents may mislead LLMs' generation. To address this issue, we explore the task of "credibility-aware RAG", in which LLMs automatically adjust the influence of retrieved documents based on their credibility scores to counteract misinformation. To this end, we introduce a plug-and-play method named Credibility-aware Attention Modification (CrAM). CrAM identifies influential attention heads in LLMs and adjusts their attention weights based on the credibility of the documents, thereby reducing the impact of low-credibility documents. Experiments on Natual Questions and TriviaQA using Llama2-13B, Llama3-8B, and Qwen1.5-7B show that CrAM improves the RAG performance of LLMs against misinformation pollution by over 20%, even surpassing supervised fine-tuning methods.

ICLR Conference 2025 Conference Paper

Efficient Inference for Large Language Model-based Generative Recommendation

  • Xinyu Lin 0001
  • Chaoqun Yang 0002
  • Wenjie Wang 0007
  • Yongqi Li 0001
  • Cunxiao Du
  • Fuli Feng
  • See-Kiong Ng
  • Tat-Seng Chua

Large Language Model (LLM)-based generative recommendation has achieved notable success, yet its practical deployment is costly particularly due to excessive inference latency caused by autoregressive decoding. For lossless LLM decoding acceleration, Speculative Decoding (SD) has emerged as a promising solution. However, applying SD to generative recommendation presents unique challenges due to the requirement of generating top-K items (i.e., K distinct token sequences) as a recommendation list by beam search. This leads to more stringent verification in SD, where all the top-K sequences from the target LLM must be successfully drafted by the draft model at each decoding step. To alleviate this, we consider 1) boosting top-K sequence alignment between the draft model and the target LLM, and 2) relaxing the verification strategy to reduce trivial LLM calls. To this end, we propose an alignment framework named AtSpeed, which presents the AtSpeed-S optimization objective for top-K alignment under the strict top-K verification. Moreover, we introduce a relaxed sampling verification strategy that allows high-probability non-top-K drafted sequences to be accepted, significantly reducing LLM calls. Correspondingly, we propose AtSpeed-R for top-K alignment under this relaxed sampling verification. Empirical results on two real-world datasets demonstrate that AtSpeed significantly accelerates LLM-based generative recommendation, e.g., near 2x speedup under strict top-K verification and up to 2.5x speedup under relaxed sampling verification. The codes and datasets are available at~\url{https://github.com/Linxyhaha/AtSpeed}.

NeurIPS Conference 2025 Conference Paper

Fine-grained List-wise Alignment for Generative Medication Recommendation

  • Chenxiao Fan
  • Chongming Gao
  • Wentao Shi
  • Yaxin Gong
  • Zhao Zihao
  • Fuli Feng

Accurate and safe medication recommendations are critical for effective clinical decision-making, especially in multimorbidity cases. However, existing systems rely on point-wise prediction paradigms that overlook synergistic drug effects and potential adverse drug-drug interactions (DDIs). We propose FLAME, a fine-grained list-wise alignment framework for large language models (LLMs), enabling drug-by-drug generation of drug lists. FLAME formulates recommendation as a sequential decision process, where each step adds or removes a single drug. To provide fine-grained learning signals, we devise step-wise Group Relative Policy Optimization (GRPO) with potential-based reward shaping, which explicitly models DDIs and optimizes the contribution of each drug to the overall prescription. Furthermore, FLAME enhances patient modeling by integrating structured clinical knowledge and collaborative information into the representation space of LLMs. Experiments on benchmark datasets demonstrate that FLAME achieves state-of-the-art performance, delivering superior accuracy, controllable safety–accuracy trade-offs, and strong generalization across diverse clinical scenarios. Our code is available at https: //github. com/cxfann/Flame.

NeurIPS Conference 2025 Conference Paper

IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation

  • Zijie Lin
  • Yang Zhang
  • Xiaoyan Zhao
  • Fengbin Zhu
  • Fuli Feng
  • Tat-Seng Chua

Large Language Models (LLMs) have shown strong potential for recommendation by framing item prediction as a token-by-token language generation task. However, existing methods treat all item tokens equally, simply pursuing likelihood maximization during both optimization and decoding. This overlooks crucial token-level differences in decisiveness—many tokens contribute little to item discrimination yet can dominate optimization or decoding. To quantify token decisiveness, we propose a novel perspective that models item generation as a decision process, measuring token decisiveness by the Information Gain (IG) each token provides in reducing uncertainty about the generated item. Our empirical analysis reveals that most tokens have low IG but often correspond to high logits, disproportionately influencing training loss and decoding, which may impair model performance. Building on these insights, we introduce an Information Gain-based Decisiveness-aware Token handling (IGD) strategy that integrates token decisiveness into both tuning and decoding. Specifically, IGD downweights low-IG tokens during tuning and rebalances decoding to emphasize tokens with high IG. In this way, IGD moves beyond pure likelihood maximization, effectively prioritizing high-decisiveness tokens. Extensive experiments on four benchmark datasets with two LLM backbones demonstrate that IGD consistently improves recommendation accuracy, achieving significant gains on widely used ranking metrics compared to strong baselines. Our codes are available at \url{https: //github. com/ZJLin2oo1/IGD}.

NeurIPS Conference 2025 Conference Paper

Less is More: Improving LLM Alignment via Preference Data Selection

  • Xun Deng
  • Han Zhong
  • Rui Ai
  • Fuli Feng
  • Zheng Wang
  • Xiangnan He

Direct Preference Optimization (DPO) has emerged as a promising approach for aligning large language models with human preferences. While prior work mainly extends DPO from the aspect of the objective function, we instead improve DPO from the largely overlooked but critical aspect of data selection. Specifically, we address the issue of parameter shrinkage caused by noisy data by proposing a novel margin-maximization principle for dataset curation in DPO training. To further mitigate the noise in different reward models, we propose a Bayesian Aggregation approach that unifies multiple margin sources (external and implicit) into a single preference probability. Extensive experiments in diverse settings demonstrate the consistently high data efficiency of our approach. Remarkably, by using just 10\% of the Ultrafeedback dataset, our approach achieves 3\% to 8\% improvements across various Llama, Mistral, and Qwen models on the AlpacaEval2 benchmark. Furthermore, our approach seamlessly extends to iterative DPO, yielding a roughly 3\% improvement with 25\% online data, revealing the high redundancy in this presumed high-quality data construction manner. These results highlight the potential of data selection strategies for advancing preference optimization.

AAAI Conference 2025 Conference Paper

Pre-trained Behavioral Model for Malicious User Prediction on Social Platform

  • Meng Jiang
  • Wenjie Wang
  • Shaofeng Hu
  • Kaishen Ou
  • Zhenjing Zheng
  • Fuli Feng

The proliferation of malicious users on social platforms poses significant financial and psychological threats, with activities ranging from scams to the dissemination of illicit content. Existing malicious user prediction comprises supervised and self-supervised learning methods. However, the former relies on extensive labeled malicious users for training, while the latter typically focuses on one form of malicious activity and depends heavily on manually crafted rules and features during pre-training. Moreover, existing pre-training methods fail to effectively capture the crucial repetitive and sporadic behavior patterns of malicious users. To address these limitations, we propose a Malicious User Behavior Pre-training framework (MaP) to build pre-trained behavior models. MaP integrates malicious pattern recognition with behavior consistency augmentation and local disruption augmentation strategies for contrastive learning to capture repetitive and sporadic malicious patterns, respectively. We instantiate MaP on a billion-level behavior pre-training scenario within an industry context. Both online and offline evaluations validate the superior performance of MaP in malicious user detection and classification.

ICML Conference 2024 Conference Paper

A3S: A General Active Clustering Method with Pairwise Constraints

  • Xun Deng
  • Junlong Liu
  • Han Zhong
  • Fuli Feng
  • Chen Shen 0003
  • Xiangnan He 0001
  • Jieping Ye
  • Zheng Wang 0027

Active clustering aims to boost the clustering performance by integrating human-annotated pairwise constraints through strategic querying. Conventional approaches with semi-supervised clustering schemes encounter high query costs when applied to large datasets with numerous classes. To address these limitations, we propose a novel Adaptive Active Aggregation and Splitting (A3S) framework, falling within the cluster-adjustment scheme in active clustering. A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm. In particular, our cluster adjustment is inspired by the quantitative analysis of Normalized mutual information gain under the information theory framework and can provably improve the clustering quality. The proposed A3S framework significantly elevates the performance and scalability of active clustering. In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries compared with existing methods.

ICLR Conference 2024 Conference Paper

Be Aware of the Neighborhood Effect: Modeling Selection Bias under Interference

  • Haoxuan Li 0001
  • Chunyuan Zheng 0001
  • Sihao Ding 0003
  • Peng Wu 0012
  • Zhi Geng
  • Fuli Feng
  • Xiangnan He 0001

Selection bias in recommender system arises from the recommendation process of system filtering and the interactive process of user selection. Many previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model, but ignore the fact that potential outcomes for a given user-item pair may vary with the treatments assigned to other user-item pairs, named neighborhood effect. To fill the gap, this paper formally formulates the neighborhood effect as an interference problem from the perspective of causal inference, and introduces a treatment representation to capture the neighborhood effect. On this basis, we propose a novel ideal loss that can be used to deal with selection bias in the presence of neighborhood effect. We further develop two new estimators for estimating the proposed ideal loss. We theoretically establish the connection between the proposed and previous debiasing methods ignoring the neighborhood effect, showing that the proposed methods can achieve unbiased learning when both selection bias and neighborhood effects are present, while the existing methods are biased. Extensive semi-synthetic and real-world experiments are conducted to demonstrate the effectiveness of the proposed methods.

ICML Conference 2024 Conference Paper

On the Maximal Local Disparity of Fairness-Aware Classifiers

  • Jinqiu Jin
  • Haoxuan Li 0001
  • Fuli Feng

Fairness has become a crucial aspect in the development of trustworthy machine learning algorithms. Current fairness metrics to measure the violation of demographic parity have the following drawbacks: (i) the average difference of model predictions on two groups cannot reflect their distribution disparity, and (ii) the overall calculation along all possible predictions conceals the extreme local disparity at or around certain predictions. In this work, we propose a novel fairness metric called M aximal C umulative ratio D isparity along varying P redictions’ neighborhood (MCDP), for measuring the maximal local disparity of the fairness-aware classifiers. To accurately and efficiently calculate the MCDP, we develop a provably exact and an approximate calculation algorithm that greatly reduces the computational complexity with low estimation error. We further propose a bi-level optimization algorithm using a differentiable approximation of the MCDP for improving the algorithmic fairness. Extensive experiments on both tabular and image datasets validate that our fair training algorithm can achieve superior fairness-accuracy trade-offs.

AAAI Conference 2024 Conference Paper

Temporally and Distributionally Robust Optimization for Cold-Start Recommendation

  • Xinyu Lin
  • Wenjie Wang
  • Jujia Zhao
  • Yongqi Li
  • Fuli Feng
  • Tat-Seng Chua

Collaborative Filtering (CF) recommender models highly depend on user-item interactions to learn CF representations, thus falling short of recommending cold-start items. To address this issue, prior studies mainly introduce item features (e.g., thumbnails) for cold-start item recommendation. They learn a feature extractor on warm-start items to align feature representations with interactions, and then leverage the feature extractor to extract the feature representations of cold-start items for interaction prediction. Unfortunately, the features of cold-start items, especially the popular ones, tend to diverge from those of warm-start ones due to temporal feature shifts, preventing the feature extractor from accurately learning feature representations of cold-start items. To alleviate the impact of temporal feature shifts, we consider using Distributionally Robust Optimization (DRO) to enhance the generation ability of the feature extractor. Nonetheless, existing DRO methods face an inconsistency issue: the worse-case warm-start items emphasized during DRO training might not align well with the cold-start item distribution. To capture the temporal feature shifts and combat this inconsistency issue, we propose a novel temporal DRO with new optimization objectives, namely, 1) to integrate a worst-case factor to improve the worst-case performance, and 2) to devise a shifting factor to capture the shifting trend of item features and enhance the optimization of the potentially popular groups in cold-start items. Substantial experiments on three real-world datasets validate the superiority of our temporal DRO in enhancing the generalization ability of cold-start recommender models.

AAAI Conference 2023 Conference Paper

Black-Box Adversarial Attack on Time Series Classification

  • Daizong Ding
  • Mi Zhang
  • Fuli Feng
  • Yuanmin Huang
  • Erling Jiang
  • Min Yang

With the increasing use of deep neural network (DNN) in time series classification (TSC), recent work reveals the threat of adversarial attack, where the adversary can construct adversarial examples to cause model mistakes. However, existing researches on the adversarial attack of TSC typically adopt an unrealistic white-box setting with model details transparent to the adversary. In this work, we study a more rigorous black-box setting with attack detection applied, which restricts gradient access and requires the adversarial example to be also stealthy. Theoretical analyses reveal that the key lies in: estimating black-box gradient with diversity and non-convexity of TSC models resolved, and restricting the l0 norm of the perturbation to construct adversarial samples. Towards this end, we propose a new framework named BlackTreeS, which solves the hard optimization issue for adversarial example construction with two simple yet effective modules. In particular, we propose a tree search strategy to find influential positions in a sequence, and independently estimate the black-box gradients for these positions. Extensive experiments on three real-world TSC datasets and five DNN based models validate the effectiveness of BlackTreeS, e.g., it improves the attack success rate from 19.3% to 27.3%, and decreases the detection success rate from 90.9% to 6.8% for LSTM on the UWave dataset.

IJCAI Conference 2023 Conference Paper

Discriminative-Invariant Representation Learning for Unbiased Recommendation

  • Hang Pan
  • Jiawei Chen
  • Fuli Feng
  • Wentao Shi
  • Junkang Wu
  • Xiangnan He

Selection bias hinders recommendation models from learning unbiased user preference. Recent works empirically reveal that pursuing invariant user and item representation across biased and unbiased data is crucial for counteracting selection bias. However, our theoretical analysis reveals that simply optimizing representation invariance is insufficient for addressing the selection bias — recommendation performance is bounded by both representation invariance and discriminability. Worse still, current invariant representation learning methods in recommendation neglect even hurt the representation discriminability due to data sparsity and label shift. In this light, we propose a new Discriminative-Invariant Representation Learning framework for unbiased recommendation, which incorporates label-conditional clustering and prior-guided contrasting into conventional invariant representation learning to mitigate the impact of data sparsity and label shift, respectively. We conduct extensive experiments on three real-world datasets, validating the rationality and effectiveness of the proposed framework. Code and supplementary materials are available at: https: //github. com/HungPaan/DIRL.

NeurIPS Conference 2023 Conference Paper

Fairly Recommending with Social Attributes: A Flexible and Controllable Optimization Approach

  • Jinqiu Jin
  • Haoxuan Li
  • Fuli Feng
  • Sihao Ding
  • Peng Wu
  • Xiangnan He

Item-side group fairness (IGF) requires a recommendation model to treat different item groups similarly, and has a crucial impact on information diffusion, consumption activity, and market equilibrium. Previous IGF notions only focus on the direct utility of the item exposures, i. e. , the exposure numbers across different item groups. Nevertheless, the item exposures also facilitate utility gained from the neighboring users via social influence, called social utility, such as information sharing on the social media. To fill this gap, this paper introduces two social attribute-aware IGF metrics, which require similar user social attributes on the exposed items across the different item groups. In light of the trade-off between the direct utility and social utility, we formulate a new multi-objective optimization problem for training recommender models with flexible trade-off while ensuring controllable accuracy. To solve this problem, we develop a gradient-based optimization algorithm and theoretically show that the proposed algorithm can find Pareto optimal solutions with varying trade-off and guaranteed accuracy. Extensive experiments on two real-world datasets validate the effectiveness of our approach.

NeurIPS Conference 2023 Conference Paper

Removing Hidden Confounding in Recommendation: A Unified Multi-Task Learning Approach

  • Haoxuan Li
  • Kunhan Wu
  • Chunyuan Zheng
  • Yanghao Xiao
  • Hao Wang
  • Zhi Geng
  • Fuli Feng
  • Xiangnan He

In recommender systems, the collected data used for training is always subject to selection bias, which poses a great challenge for unbiased learning. Previous studies proposed various debiasing methods based on observed user and item features, but ignored the effect of hidden confounding. To address this problem, recent works suggest the use of sensitivity analysis for worst-case control of the unknown true propensity, but only valid when the true propensity is near to the nominal propensity within a finite bound. In this paper, we first perform theoretical analysis to reveal the possible failure of previous approaches, including propensity-based, multi-task learning, and bi-level optimization methods, in achieving unbiased learning when hidden confounding is present. Then, we propose a unified multi-task learning approach to remove hidden confounding, which uses a few unbiased ratings to calibrate the learned nominal propensities and nominal error imputations from biased data. We conduct extensive experiments on three publicly available benchmark datasets containing a fully exposed large-scale industrial dataset, validating the effectiveness of the proposed methods in removing hidden confounding.

AAAI Conference 2023 Conference Paper

Video-Audio Domain Generalization via Confounder Disentanglement

  • Shengyu Zhang
  • Xusheng Feng
  • Wenyan Fan
  • Wenjing Fang
  • Fuli Feng
  • Wei Ji
  • Shuo Li
  • Li Wang

Existing video-audio understanding models are trained and evaluated in an intra-domain setting, facing performance degeneration in real-world applications where multiple domains and distribution shifts naturally exist. The key to video-audio domain generalization (VADG) lies in alleviating spurious correlations over multi-modal features. To achieve this goal, we resort to causal theory and attribute such correlation to confounders affecting both video-audio features and labels. We propose a DeVADG framework that conducts uni-modal and cross-modal deconfounding through back-door adjustment. DeVADG performs cross-modal disentanglement and obtains fine-grained confounders at both class-level and domain-level using half-sibling regression and unpaired domain transformation, which essentially identifies domain-variant factors and class-shared factors that cause spurious correlations between features and false labels. To promote VADG research, we collect a VADG-Action dataset for video-audio action recognition with over 5,000 video clips across four domains (e.g., cartoon and game) and ten action classes (e.g., cooking and riding). We conduct extensive experiments, i.e., multi-source DG, single-source DG, and qualitative analysis, validating the rationality of our causal analysis and the effectiveness of the DeVADG framework.

IJCAI Conference 2022 Conference Paper

Copy Motion From One to Another: Fake Motion Video Generation

  • Zhenguang Liu
  • Sifan Wu
  • Chejian Xu
  • Xiang Wang
  • Lei Zhu
  • Shuang Wu
  • Fuli Feng

One compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion (from a source person). While the state-of-the-art methods are able to synthesize a video demonstrating similar broad stroke motion details, they are generally lacking in texture details. A pertinent manifestation appears as distorted face, feet, and hands, and such flaws are very sensitively perceived by human observers. Furthermore, current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos, inherently requiring a large amount of training samples to learn the texture details for adequate video generation. In this work, we tackle these challenges from three aspects: 1) We disentangle each video frame into foreground (the person) and background, focusing on generating the foreground to reduce the underlying dimension of the network output. 2) We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image. 3) To enhance texture details, we encode facial features with geometric guidance and employ local GANs to refine the face, feet, and hands. Extensive experiments show that our method is able to generate realistic target person videos, faithfully copying complex motions from a source person. Our code and datasets are released at https: //github. com/Sifann/FakeMotion.

IJCAI Conference 2022 Conference Paper

GL-RG: Global-Local Representation Granularity for Video Captioning

  • Liqi Yan
  • Qifan Wang
  • Yiming Cui
  • Fuli Feng
  • Xiaojun Quan
  • Xiangyu Zhang
  • Dongfang Liu

Video captioning is a challenging task as it needs to accurately transform visual understanding into natural language description. To date, state-of-the-art methods inadequately model global-local representation across video frames for caption generation, leaving plenty of room for improvement. In this work, we approach the video captioning task from a new perspective and propose a GL-RG framework for video captioning, namely a Global-Local Representation Granularity. Our GL-RG demonstrates three advantages over the prior efforts: 1) we explicitly exploit extensive visual representations from different video ranges to improve linguistic expression; 2) we devise a novel global-local encoder to produce rich semantic vocabulary to obtain a descriptive granularity of video contents across frames; 3) we develop an incremental training strategy which organizes model learning in an incremental fashion to incur an optimal captioning behavior. Experimental results on the challenging MSR-VTT and MSVD datasets show that our DL-RG outperforms recent state-of-the-art methods by a significant margin. Code is available at https: //github. com/ylqi/GL-RG.

AAAI Conference 2021 Conference Paper

Conceptualized and Contextualized Gaussian Embedding

  • Chen Qian
  • Fuli Feng
  • Lijie Wen
  • Tat-Seng Chua

Word embedding can represent a word as a point vector or a Gaussian distribution in high-dimensional spaces. Gaussian distribution is innately more expressive than point vector owing to the ability to additionally capture semantic uncertainties of words, and thus can express asymmetric relations among words more naturally (e. g. , animal entails cat but not the reverse). However, previous Gaussian embedders neglect inner-word conceptual knowledge and lack tailored Gaussian contextualizer, leading to inferior performance on both intrinsic (context-agnostic) and extrinsic (context-sensitive) tasks. In this paper, we first propose a novel Gaussian embedder which explicitly accounts for innerword conceptual units (sememes) to represent word semantics more precisely; during learning, we propose Gaussian Distribution Attention over Gaussian representations to adaptively aggregate multiple sememe distributions into a word distribution, which guarantees the Gaussian linear combination property. Additionally, we propose a Gaussian contextualizer to utilize outer-word contexts in a sentence, producing contextualized Gaussian representations for contextsensitive tasks. Extensive experiments on intrinsic and extrinsic tasks demonstrate the effectiveness of the proposed approach, achieving state-of-the-art performance with near 5. 00% relative improvement.

IJCAI Conference 2020 Conference Paper

Bilinear Graph Neural Network with Neighbor Interactions

  • Hongmin Zhu
  • Fuli Feng
  • Xiangnan He
  • Xiang Wang
  • Yan Li
  • Kai Zheng
  • Yongdong Zhang

Graph Neural Network (GNN) is a powerful model to learn representations and make predictions on graph data. Existing efforts on GNN have largely defined the graph convolution as a weighted sum of the features of the connected nodes to form the representation of the target node. Nevertheless, the operation of weighted sum assumes the neighbor nodes are independent of each other, and ignores the possible interactions between them. When such interactions exist, such as the co-occurrence of two neighbor nodes is a strong signal of the target node's characteristics, existing GNN models may fail to capture the signal. In this work, we argue the importance of modeling the interactions between neighbor nodes in GNN. We propose a new graph convolution operator, which augments the weighted sum with pairwise interactions of the representations of neighbor nodes. We term this framework as Bilinear Graph Neural Network (BGNN), which improves GNN representation ability with bilinear interactions between neighbor nodes. In particular, we specify two BGNN models named BGCN and BGAT, based on the well-known GCN and GAT, respectively. Empirical results on three public benchmarks of semi-supervised node classification verify the effectiveness of BGNN --- BGCN (BGAT) outperforms GCN (GAT) by 1. 6% (1. 5%) in classification accuracy. Codes are available at: https: //github. com/zhuhm1996/bgnn.

AAAI Conference 2020 Conference Paper

Solving Sequential Text Classification as Board-Game Playing

  • Chen Qian
  • Fuli Feng
  • Lijie Wen
  • Zhenpeng Chen
  • Li Lin
  • Yanan Zheng
  • Tat-Seng Chua

Sequential Text Classification (STC) aims to classify a sequence of text fragments (e. g. , words in a sentence or sentences in a document) into a sequence of labels. In addition to the intra-fragment text contents, considering the interfragment context dependencies is also important for STC. Previous sequence labeling approaches largely generate a sequence of labels in left-to-right reading order. However, the need for context information in making decisions varies across different fragments and is not strictly organized in a left-to-right order. Therefore, it is appealing to label the fragments that need less consideration of context information first before labeling the fragments that need more. In this paper, we propose a novel model that labels a sequence of fragments in jumping order. Specifically, we devise a dedicated boardgame to develop a correspondence between solving STC and board-game playing. By defining proper game rules and devising a game state evaluator in which context clues are injected, at each round, each player is effectively pushed to find the optimal move without position restrictions via considering the current game state, which corresponds to producing a label for an unlabeled fragment jumpily with the consideration of the contexts clues. The final game-end state is viewed as the optimal label sequence. Extensive results on three representative datasets show that the proposed approach outperforms the state-of-the-art methods with statistical significance.

IJCAI Conference 2019 Conference Paper

Enhancing Stock Movement Prediction with Adversarial Training

  • Fuli Feng
  • Huimin Chen
  • Xiangnan He
  • Ji Ding
  • Maosong Sun
  • Tat-Seng Chua

This paper contributes a new machine learning solution for stock movement prediction, which aims to predict whether the price of a stock will be up or down in the near future. The key novelty is that we propose to employ adversarial training to improve the generalization of a neural network prediction model. The rationality of adversarial training here is that the input features to stock prediction are typically based on stock price, which is essentially a stochastic variable and continuously changed with time by nature. As such, normal training with static price-based features (e. g. the close price) can easily overfit the data, being insufficient to obtain reliable models. To address this problem, we propose to add perturbations to simulate the stochasticity of price variable, and train the model to work well under small yet intentional perturbations. Extensive experiments on two real-world stock data show that our method outperforms the state-of-the-art solution [Xu and Cohen, 2018] with 3. 11% relative improvements on average w. r. t. accuracy, validating the usefulness of adversarial training for stock prediction task.

AAAI Conference 2019 Conference Paper

Explicit Interaction Model towards Text Classification

  • Cunxiao Du
  • Zhaozheng Chen
  • Fuli Feng
  • Lei Zhu
  • Tian Gan
  • Liqiang Nie

Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multilabel and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.

IJCAI Conference 2018 Conference Paper

Cross-Domain Depression Detection via Harvesting Social Media

  • Tiancheng Shen
  • Jia Jia
  • Guangyao Shen
  • Fuli Feng
  • Xiangnan He
  • Huanbo Luan
  • Jie Tang
  • Thanassis Tiropanis

Depression detection is a significant issue for human well-being. In previous studies, online detection has proven effective in Twitter, enabling proactive care for depressed users. Owing to cultural differences, replicating the method to other social media platforms, such as Chinese Weibo, however, might lead to poor performance because of insufficient available labeled (self-reported depression) data for model training. In this paper, we study an interesting but challenging problem of enhancing detection in a certain target domain (e. g. Weibo) with ample Twitter data as the source domain. We first systematically analyze the depression-related feature patterns across domains and summarize two major detection challenges, namely isomerism and divergency. We further propose a cross-domain Deep Neural Network model with Feature Adaptive Transformation & Combination strategy (DNN-FATC) that transfers the relevant information across heterogeneous domains. Experiments demonstrate improved performance compared to existing heterogeneous transfer methods or training directly in the target domain (over 3. 4% improvement in F1), indicating the potential of our model to enable depression detection via social media for more countries with different cultural settings.

IJCAI Conference 2018 Conference Paper

Discrete Factorization Machines for Fast Feature-based Recommendation

  • Han Liu
  • Xiangnan He
  • Fuli Feng
  • Liqiang Nie
  • Rui Liu
  • Hanwang Zhang

User and item features of side information are crucial for accurate recommendation. However, the large number of feature dimensions, e. g. , usually larger than 107, results in expensive storage and computational cost. This prohibits fast recommendation especially on mobile applications where the computational resource is very limited. In this paper, we develop a generic feature-based recommendation model, called Discrete Factorization Machine (DFM), for fast and accurate recommendation. DFM binarizes the real-valued model parameters (e. g. , float32) of every feature embedding into binary codes (e. g. , boolean), and thus supports efficient storage and fast user-item score computation. To avoid the severe quantization loss of the binarization, we propose a convergent updating rule that resolves the challenging discrete optimization of DFM. Through extensive experiments on two real-world datasets, we show that 1) DFM consistently outperforms state-of-the-art binarized recommendation models, and 2) DFM shows very competitive performance compared to its real-valued version (FM), demonstrating the minimized quantization loss.

IJCAI Conference 2018 Conference Paper

Quality Matters: Assessing cQA Pair Quality via Transductive Multi-View Learning

  • Xiaochi Wei
  • Heyan Huang
  • Liqiang Nie
  • Fuli Feng
  • Richang Hong
  • Tat-Seng Chua

Community-based question answering (cQA) sites have become important knowledge sharing platforms, as massive cQA pairs are archived, but the uneven quality of cQA pairs leaves information seekers unsatisfied. Various efforts have been dedicated to predicting the quality of cQA contents. Most of them concatenate different features into single vectors and then feed them into regression models. In fact, the quality of cQA pairs is influenced by different views, and the agreement among them is essential for quality assessment. Besides, the lacking of labeled data significantly hinders the quality prediction performance. Toward this end, we present a transductive multi-view learning model. It is designed to find a latent common space by unifying and preserving information from various views, including question, answer, QA relevance, asker, and answerer. Additionally, rich information in the unlabeled test cQA pairs are utilized via transductive learning to enhance the representation ability of the common space. Extensive experiments on real-world datasets have well-validated the proposed model.

IJCAI Conference 2017 Conference Paper

Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution

  • Guangyao Shen
  • Jia Jia
  • Liqiang Nie
  • Fuli Feng
  • Cunjun Zhang
  • Tianrui Hu
  • Tat-Seng Chua
  • Wenwu Zhu

Depression is a major contributor to the overall global burden of diseases. Traditionally, doctors diagnose depressed people face to face via referring to clinical depression criteria. However, more than 70% of the patients would not consult doctors at early stages of depression, which leads to further deterioration of their conditions. Meanwhile, people are increasingly relying on social media to disclose emotions and sharing their daily lives, thus social media have successfully been leveraged for helping detect physical and mental diseases. Inspired by these, our work aims to make timely depression detection via harvesting social media data. We construct well-labeled depression and non-depression dataset on Twitter, and extract six depression-related feature groups covering not only the clinical depression criteria, but also online behaviors on social media. With these feature groups, we propose a multimodal depressive dictionary learning model to detect the depressed users on Twitter. A series of experiments are conducted to validate this model, which outperforms (+3% to +10%) several baselines. Finally, we analyze a large-scale dataset on Twitter to reveal the underlying online behaviors between depressed and non-depressed users.