Arrow Research search

Author name cluster

Xiaolin Zheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers
2 author rows

Possible papers

36

AAAI Conference 2026 Conference Paper

DAPrompt: Dual Alignment Prompt of Structure and Semantics for Few-shot Graph Learning

  • Lifan Jiang
  • Mengying Zhu
  • Yangyang Wu
  • Xuan Liu
  • Xiaolin Zheng
  • Shenglin Ben

Few-shot graph learning remains a fundamental yet challenging problem, especially under heterophilic graph settings where connected nodes are likely to belong to different classes. In such scenarios, two key challenges arise: (1) unreliable or noisy graph structures that hinder effective message passing, and (2) semantic inconsistency: in heterophilic graphs, aggregating messages from neighbors of different classes entangles representations and introduces misleading semantics. These issues are further exacerbated by the limited labeled data inherent to few-shot learning, making it difficult to adaptively repair structure or disentangle semantics. To address these challenges, we propose DAPrompt, a Dual Alignment Prompt framework that jointly calibrates graph structure and semantic representations across the learning pipeline. In the pretraining stage, DAPrompt incorporates a graph structure learning module to denoise and repair the underlying topology, enhancing structural reliability. In the prompt tuning stage, we introduce two coordinated modules: a structure-aware prompt learner, which employs prompt tokens to repair unreliable graph structures and capture structure-level alignment, and a semantics-aligned prompt learner, which enhances the graph using target node semantics to mitigate representation noise caused by class-mismatched propagation. Extensive experiments on both node-level and graph-level few-shot benchmarks validate its effectiveness, achieving state-of-the-art performance and highlighting the value of structure-semantic dual alignment in heterophilic few-shot graph learning.

AAAI Conference 2026 Conference Paper

Potent but Stealthy: Rethink Profile Pollution Against Sequential Recommendation via Bi-Level Constrained Reinforcement Paradigm

  • Jiajie Su
  • Zihan Nan
  • Yunshan Ma
  • Xiaobo Xia
  • XiaoHua Feng
  • Weiming Liu
  • Xiang Chen
  • Xiaolin Zheng

Sequential Recommenders, which exploit dynamic user intents through interaction sequences, are vulnerable to adversarial attacks. While existing attacks primarily rely on data poisoning, they require large-scale user access or fake profiles thus lacking practicality. In this paper, we focus on the Profile Pollution Attack (PPA) that subtly contaminates partial user interactions to induce targeted mispredictions. Previous PPA methods suffer from two limitations, i.e., i) over-reliance on sequence horizon impact restricts fine-grained perturbations on item transitions, and ii) holistic modifications cause detectable distribution shifts. To address these challenges, we propose a constrained reinforcement driven attack CREAT that synergizes a bi-level optimization framework with multi-reward reinforcement learning to balance adversarial efficacy and stealthiness. We first develop a Pattern Balanced Rewarding Policy, which integrates pattern inversion rewards to invert critical patterns and distribution consistency rewards to minimize detectable shifts via unbalanced co-optimal transport. Then we employ a Constrained Group Relative Reinforcement Learning paradigm, enabling step-wise perturbations through dynamic barrier constraints and group-shared experience replay, achieving targeted pollution with minimal detectability. Extensive experiments demonstrate the effectiveness of CREAT.

AAAI Conference 2026 Conference Paper

TermGPT: Multi-Level Contrastive Fine-Tuning for Terminology Adaptation in Legal and Financial Domains

  • Yidan Sun
  • Mengying Zhu
  • Feiyue Chen
  • Yangyang Wu
  • Xiaolei Dan
  • Mengyuan Yang
  • Xiaolin Zheng
  • Shenglin Ben

Large language models (LLMs) have demonstrated impressive performance in text generation tasks; however, their embedding spaces often suffer from the isotropy problem, resulting in poor discrimination of domain-specific terminology, particularly in legal and financial contexts. This weakness in term-level representation can severely hinder downstream tasks such as legal judgment prediction or financial risk analysis, where subtle semantic distinctions are critical. To address this problem, we propose TermGPT, a multi-level contrastive fine-tuning framework designed for terminology adaptation. We first construct a sentence graph to capture semantic and structural relations, and generate semantically consistent yet discriminative positive and negative samples based on contextual and topological cues. We then devise a multi-level contrastive learning approach at both the sentence and token levels, enhancing global contextual understanding and fine-grained term discrimination. To support robust evaluation, we construct the first financial terminology dataset derived from official regulatory documents. Experiments show that TermGPT outperforms existing baselines in term discrimination tasks within the finance and legal domains.

ICLR Conference 2025 Conference Paper

Controllable Unlearning for Image-to-Image Generative Models via ϵ-Constrained Optimization

  • Xiaohua Feng 0002
  • Yuyuan Li 0001
  • Chaochao Chen 0001
  • Li Zhang
  • Longfei Li
  • Jun Zhou 0011
  • Xiaolin Zheng

While generative models have made significant advancements in recent years, they also raise concerns such as privacy breaches and biases. Machine unlearning has emerged as a viable solution, aiming to remove specific training data, e.g., containing private information and bias, from models. In this paper, we study the machine unlearning problem in Image-to-Image (I2I) generative models. Previous studies mainly treat it as a single objective optimization problem, offering a solitary solution, thereby neglecting the varied user expectations towards the trade-off between complete unlearning and model utility. To address this issue, we propose a controllable unlearning framework that uses a control coefficient $\epsilon$ to control the trade-off. We reformulate the I2I generative model unlearning problem into a $\epsilon$-constrained optimization problem and solve it with a gradient-based method to find optimal solutions for unlearning boundaries. These boundaries define the valid range for the control coefficient. Within this range, every yielded solution is theoretically guaranteed with Pareto optimality. We also analyze the convergence rate of our framework under various control functions. Extensive experiments on two benchmark datasets across three mainstream I2I models demonstrate the effectiveness of our controllable unlearning framework.

AAAI Conference 2025 Conference Paper

DR-VAE: Debiased and Representation-enhanced Variational Autoencoder for Collaborative Recommendation

  • Fan Wang
  • Chaochao Chen
  • Weiming Liu
  • Minye Lei
  • Jintao Chen
  • Yuwen Liu
  • Xiaolin Zheng
  • Jianwei Yin

Recommender Systems (RSs) are widely applied for navigating information, and Collaborative Filtering (CF) is one of prominent recommendation techniques due to the advantages of domain independence and easy interpretation. Among the numerous CF methods, Variational Autoencoders (VAE), benefiting from modeling in a probabilitistic way, stands out in capturing user preferences through representation learning. Despite the superiority, VAE-based CF models still suffer from two challenging problems: (1) Exposure bias: models in training state are narrowly exposed to a limited, biased sample of data, leading to a skewed understanding of users' true preferences; (2) Posterior collapse: models excessively simplify the learned latent variable distributions, generating na"ive representations that are unable to encapsulate the complex data patterns and thereby resulting improper recommendations. In this paper, we propose a Debiased and Representation-enhanced Variational AutoEncoder (DR-VAE) framework for collaborative recommendations. Specifically, for exposure bias problem, DR-VAE incorporates a Debiasing Estimator, mitigating the impact of exposure bias. For poster collapse issue, DR-VAE innovatively introduces a Flow-based Representation Enhancement module, ensuring us to encapsulate complex data patterns by fitting complex and intricate posterior distributions directly. We provide experimental validations over four datasets to substantiate the efficacy of our DR-VAE framework.

ICML Conference 2025 Conference Paper

Efficient Source-free Unlearning via Energy-Guided Data Synthesis and Discrimination-Aware Multitask Optimization

  • Xiuyuan Wang 0002
  • Chaochao Chen 0001
  • Weiming Liu 0005
  • Xinting Liao
  • Fan Wang 0020
  • Xiaolin Zheng

With growing privacy concerns and the enforcement of data protection regulations, machine unlearning has emerged as a promising approach for removing the influence of forget data while maintaining model performance on retain data. However, most existing unlearning methods require access to the original training data, which is often impractical due to privacy policies, storage constraints, and other limitations. This gives rise to the challenging task of source-free unlearning, where unlearning must be accomplished without accessing the original training data. Few existing source-free unlearning methods rely on knowledge distillation and model retraining, which impose substantial computational costs. In this work, we propose the Data Synthesis-based Discrimination-Aware (DSDA) unlearning framework, which enables efficient source-free unlearning in two stages: (1) Accelerated Energy-Guided Data Synthesis (AEGDS), which employs Langevin dynamics to model the training data distribution while integrating Runge–Kutta methods and momentum to enhance efficiency. (2) Discrimination-Aware Multitask Optimization (DAMO), which refines the feature distribution of retain data and mitigates the gradient conflicts among multiple unlearning objectives. Extensive experiments on three benchmark datasets demonstrate that DSDA outperforms existing unlearning methods, validating its effectiveness and efficiency in source-free unlearning.

AAAI Conference 2025 Conference Paper

FedGOG: Federated Graph Out-of-Distribution Generalization with Diffusion Data Exploration and Latent Embedding Decorrelation

  • Pengyang Zhou
  • Chaochao Chen
  • Weiming Liu
  • Xinting Liao
  • Wenkai Shen
  • Jiahe Xu
  • Zhihui Fu
  • Jun Wang

Federated graph learning (FGL) has emerged as a promising approach to enable collaborative training of graph models while preserving data privacy. However, current FGL methods overlook the out-of-distribution (OOD) shifts that occur in real-world scenarios. The distribution shifts between training and testing datasets in each client impact the FGL performance. To address this issue, we propose federated graph OOD generalization framework FedGOG, which includes two modules, i.e., diffusion data exploration (DDE) and latent embedding decorrelation (LED). In DDE, all clients jointly train score models to accurately estimate the global graph data distribution and sufficiently explore sample space using score-based graph diffusion with conditional generation. In LED, each client models a global invariant GNN and a personalized spurious GNN. LED aims to decorrelate spuriousness from invariant relationships by minimizing the mutual information between two categories of latent embeddings from different GNN models. Extensive experiments on six benchmark datasets demonstrate the superiority of FedGOG.

ICML Conference 2025 Conference Paper

FOCoOp: Enhancing Out-of-Distribution Robustness in Federated Prompt Learning for Vision-Language Models

  • Xinting Liao
  • Weiming Liu 0005
  • Jiaming Qian
  • Pengyang Zhou 0001
  • Jiahe Xu 0003
  • Wenjie Wang 0007
  • Chaochao Chen 0001
  • Xiaolin Zheng

Federated prompt learning (FPL) for vision-language models is a powerful approach to collaboratively adapt models across distributed clients while preserving data privacy. However, existing FPL approaches suffer from a trade-off between performance and robustness, particularly in out-of-distribution (OOD) shifts, limiting their reliability in real-world scenarios. The inherent in-distribution (ID) data heterogeneity among different clients makes it more challenging to maintain this trade-off. To fill this gap, we introduce a Federated OOD-aware Context Optimization (FOCoOp) framework, which captures diverse distributions among clients using ID global prompts, local prompts, and OOD prompts. Specifically, FOCoOp leverages three sets of prompts to create both class-level and distribution-level separations, which adapt to OOD shifts through bi-level distributionally robust optimization. Additionally, FOCoOp improves the discrimination consistency among clients, i. e. , calibrating global prompts, seemly OOD prompts, and OOD prompts by Semi-unbalanced optimal transport. The extensive experiments on real-world datasets demonstrate that FOCoOp effectively captures decentralized heterogeneous distributions and enhances robustness of different OOD shifts. The project is available at GitHub.

AAAI Conference 2025 Conference Paper

LoGoFair: Post-Processing for Local and Global Fairness in Federated Learning

  • Li Zhang
  • Chaochao Chen
  • Zhongxuan Han
  • Qiyong Zhong
  • Xiaolin Zheng

Federated learning (FL) has garnered considerable interest for its capability to learn from decentralized data sources. Given the increasing application of FL in decision-making scenarios, addressing fairness issues across different sensitive groups (e.g., female, male) in FL is crucial. Current research typically focus on facilitating fairness at each client's data (local fairness) or within the entire dataset across all clients (global fairness). However, existing approaches that focus exclusively on either global or local fairness fail to address two key challenges: (CH1) Under statistical heterogeneity, global fairness does not imply local fairness, and vice versa. (CH2) Achieving fairness under model-agnostic setting. To tackle the aforementioned challenges, this paper proposes a novel post-processing framework for achieving both Local and Global Fairness in the FL context, namely LoGoFair. To address CH1, LoGoFair endeavors to seek the Bayes optimal classifier under local and global fairness constraints, which strikes the optimal accuracy-fairness balance in the probabilistic sense. To address CH2, LoGoFair employs a model-agnostic federated post-processing procedure that enables clients to collaboratively optimize global fairness while ensuring local fairness, thereby achieving the optimal fair classifier within FL. Experimental results on three real-world datasets further illustrate the effectiveness of the proposed LoGoFair framework.

NeurIPS Conference 2025 Conference Paper

UMU-Bench: Closing the Modality Gap in Multimodal Unlearning Evaluation

  • Chengye Wang
  • Yuyuan Li
  • XiaoHua Feng
  • Chaochao Chen
  • Xiaolin Zheng
  • Jianwei Yin

Although Multimodal Large Language Models (MLLMs) have advanced numerous fields, their training on extensive multimodal datasets introduces significant privacy concerns, prompting the necessity for efficient unlearning methods. However, current multimodal unlearning approaches often directly adapt techniques from unimodal contexts, largely overlooking the critical issue of modality alignment, i. e. , consistently removing knowledge across both unimodal and multimodal settings. To close this gap, we introduce UMU-bench, a unified benchmark specifically targeting modality misalignment in multimodal unlearning. UMU-bench consists of a meticulously curated dataset featuring 653 individual profiles, each described with both unimodal and multimodal knowledge. Additionally, novel tasks and evaluation metrics focusing on modality alignment are introduced, facilitating a comprehensive analysis of unimodal and multimodal unlearning effectiveness. Through extensive experimentation with state-of-the-art unlearning algorithms on UMU-bench, we demonstrate prevalent modality misalignment issues in existing methods. These findings underscore the critical need for novel multimodal unlearning approaches explicitly considering modality alignment.

AAAI Conference 2024 Conference Paper

ECHO-GL: Earnings Calls-Driven Heterogeneous Graph Learning for Stock Movement Prediction

  • Mengpu Liu
  • Mengying Zhu
  • Xiuyuan Wang
  • Guofang Ma
  • Jianwei Yin
  • Xiaolin Zheng

Stock movement prediction serves an important role in quantitative trading. Despite advances in existing models that enhance stock movement prediction by incorporating stock relations, these prediction models face two limitations, i.e., constructing either insufficient or static stock relations, which fail to effectively capture the complex dynamic stock relations because such complex dynamic stock relations are influenced by various factors in the ever-changing financial market. To tackle the above limitations, we propose a novel stock movement prediction model ECHO-GL based on stock relations derived from earnings calls. ECHO-GL not only constructs comprehensive stock relations by exploiting the rich semantic information in the earnings calls but also captures the movement signals between related stocks based on multimodal and heterogeneous graph learning. Moreover, ECHO-GL customizes learnable stock stochastic processes based on the post earnings announcement drift (PEAD) phenomenon to generate the temporal stock price trajectory, which can be easily plugged into any investment strategy with different time horizons to meet investment demands. Extensive experiments on two financial datasets demonstrate the effectiveness of ECHO-GL on stock price movement prediction tasks together with high prediction accuracy and trading profitability.

AAAI Conference 2024 Conference Paper

Fine-Tuning Large Language Model Based Explainable Recommendation with Explainable Quality Reward

  • Mengyuan Yang
  • Mengying Zhu
  • Yan Wang
  • Linxun Chen
  • Yilei Zhao
  • Xiuyuan Wang
  • Bing Han
  • Xiaolin Zheng

Large language model-based explainable recommendation (LLM-based ER) systems can provide remarkable human-like explanations and have widely received attention from researchers. However, the original LLM-based ER systems face three low-quality problems in their generated explanations, i.e., lack of personalization, inconsistency, and questionable explanation data. To address these problems, we propose a novel LLM-based ER model denoted as LLM2ER to serve as a backbone and devise two innovative explainable quality reward models for fine-tuning such a backbone in a reinforcement learning paradigm, ultimately yielding a fine-tuned model denoted as LLM2ER-EQR, which can provide high-quality explanations. LLM2ER-EQR can generate personalized, informative, and consistent high-quality explanations learned from questionable-quality explanation datasets. Extensive experiments conducted on three real-world datasets demonstrate that our model can generate fluent, diverse, informative, and highly personalized explanations.

NeurIPS Conference 2024 Conference Paper

FOOGD: Federated Collaboration for Both Out-of-distribution Generalization and Detection

  • Xinting Liao
  • Weiming Liu
  • Pengyang Zhou
  • Fengyuan Yu
  • Jiahe Xu
  • Jun Wang
  • Wenjie Wang
  • Chaochao Chen

Federated learning (FL) is a promising machine learning paradigm that collaborates with client models to capture global knowledge. However, deploying FL models in real-world scenarios remains unreliable due to the coexistence of in-distribution data and unexpected out-of-distribution (OOD) data, such as covariate-shift and semantic-shift data. Current FL researches typically address either covariate-shift data through OOD generalization or semantic-shift data via OOD detection, overlooking the simultaneous occurrence of various OOD shifts. In this work, we propose FOOGD, a method that estimates the probability density of each client and obtains reliable global distribution as guidance for the subsequent FL process. Firstly, SM3D in FOOGD estimates score model for arbitrary distributions without prior constraints, and detects semantic-shift data powerfully. Then SAG in FOOGD provides invariant yet diverse knowledge for both local covariate-shift generalization and client performance generalization. In empirical validations, FOOGD significantly enjoys three main advantages: (1) reliably estimating non-normalized decentralized distributions, (2) detecting semantic shift data via score values, and (3) generalizing to covariate-shift data by regularizing feature extractor. The project is open in https: //github. com/XeniaLLL/FOOGD-main. git.

AAAI Conference 2024 Conference Paper

Intra- and Inter-group Optimal Transport for User-Oriented Fairness in Recommender Systems

  • Zhongxuan Han
  • Chaochao Chen
  • Xiaolin Zheng
  • Meng Li
  • Weiming Liu
  • Binhui Yao
  • Yuyuan Li
  • Jianwei Yin

Recommender systems are typically biased toward a small group of users, leading to severe unfairness in recommendation performance, i.e., User-Oriented Fairness (UOF) issue. Existing research on UOF exhibits notable limitations in two phases of recommendation models. In the training phase, current methods fail to tackle the root cause of the UOF issue, which lies in the unfair training process between advantaged and disadvantaged users. In the evaluation phase, the current UOF metric lacks the ability to comprehensively evaluate varying cases of unfairness. In this paper, we aim to address the aforementioned limitations and ensure recommendation models treat user groups of varying activity levels equally. In the training phase, we propose a novel Intra- and Inter-GrOup Optimal Transport framework (II-GOOT) to alleviate the data sparsity problem for disadvantaged users and narrow the training gap between advantaged and disadvantaged users. In the evaluation phase, we introduce a novel metric called?-UOF, which enables the identification and assessment of various cases of UOF. This helps prevent recommendation models from leading to unfavorable fairness outcomes, where both advantaged and disadvantaged users experience subpar recommendation performance. We conduct extensive experiments on three real-world datasets based on four backbone recommendation models to prove the effectiveness of?-UOF and the efficiency of our proposed II-GOOT.

AAAI Conference 2024 Conference Paper

Learning Accurate and Bidirectional Transformation via Dynamic Embedding Transportation for Cross-Domain Recommendation

  • Weiming Liu
  • Chaochao Chen
  • Xinting Liao
  • Mengling Hu
  • Yanchao Tan
  • Fan Wang
  • Xiaolin Zheng
  • Yew Soon Ong

With the rapid development of Internet and Web techniques, Cross-Domain Recommendation (CDR) models have been widely explored for resolving the data-sparsity and cold-start problem. Meanwhile, most CDR models should utilize explicit domain-shareable information (e.g., overlapped users or items) for knowledge transfer across domains. However, this assumption may not be always satisfied since users and items are always non-overlapped in real practice. The performance of many previous works will be severely impaired when these domain-shareable information are not available. To address the aforementioned issues, we propose the Joint Preference Exploration and Dynamic Embedding Transportation model (JPEDET) in this paper which is a novel framework for solving the CDR problem when users and items are non-overlapped. JPEDET includes two main modules, i.e., joint preference exploration module and dynamic embedding transportation module. The joint preference exploration module aims to fuse rating and review information for modelling user preferences. The dynamic embedding transportation module is set to share knowledge via neural ordinary equations for dual transformation across domains. Moreover, we innovatively propose the dynamic transport flow equipped with linear interpolation guidance on barycentric Wasserstein path for achieving accurate and bidirectional transformation. Our empirical study on Amazon datasets demonstrates that JPEDET significantly outperforms the state-of-the-art models under the CDR setting.

IJCAI Conference 2024 Conference Paper

Protecting Split Learning by Potential Energy Loss

  • Fei Zheng
  • Chaochao Chen
  • Lingjuan Lyu
  • Xinyi Fu
  • Xing Fu
  • Weiqiang Wang
  • Xiaolin Zheng
  • Jianwei Yin

As a practical privacy-preserving learning method, split learning has drawn much attention in academia and industry. However, its security is constantly being questioned since the intermediate results are shared during training and inference. In this paper, we focus on the privacy leakage from the forward embeddings of split learning. Specifically, since the forward embeddings contain too much information about the label, the attacker can either use a few labeled samples to fine-tune the top model or perform unsupervised attacks such as clustering to infer the true labels from the forward embeddings. To prevent such kind of privacy leakage, we propose the potential energy loss to make the forward embeddings more 'complicated', by pushing embeddings of the same class towards the decision boundary. Therefore, it is hard for the attacker to learn from the forward embeddings. Experiment results show that our method significantly lowers the performance of both fine-tuning attacks and clustering attacks.

ICML Conference 2024 Conference Paper

Reducing Item Discrepancy via Differentially Private Robust Embedding Alignment for Privacy-Preserving Cross Domain Recommendation

  • Weiming Liu 0005
  • Xiaolin Zheng
  • Chaochao Chen 0001
  • Jiahe Xu 0003
  • Xinting Liao
  • Fan Wang 0020
  • Yanchao Tan
  • Yew-Soon Ong

Cross-Domain Recommendation (CDR) have become increasingly appealing by leveraging useful information to tackle the data sparsity problem across domains. Most of latest CDR models assume that domain-shareable user-item information (e. g. , rating and review on overlapped users or items) are accessible across domains. However, these assumptions become impractical due to the strict data privacy protection policy. In this paper, we propose Reducing Item Discrepancy (RidCDR) model on solving Privacy-Preserving Cross-Domain Recommendation (PPCDR) problem. Specifically, we aim to enhance the model performance on both source and target domains without overlapped users and items while protecting the data privacy. We innovatively propose private-robust embedding alignment module in RidCDR for knowledge sharing across domains while avoiding negative transfer privately. Our empirical study on Amazon and Douban datasets demonstrates that RidCDR significantly outperforms the state-of-the-art models under the PPCDR without overlapped users and items.

IJCAI Conference 2023 Conference Paper

Deep Hashing-based Dynamic Stock Correlation Estimation via Normalizing Flow

  • Xiaolin Zheng
  • Mengpu Liu
  • Mengying Zhu

In financial scenarios, influenced by common factors such as global macroeconomic and sector-specific factors, stocks exhibit varying degrees of correlations with each other, which is essential in risk-averse portfolio allocation. Because the real risk matrix is unobservable, the covariance-based correlation matrix is widely used for constructing diversified stock portfolios. However, studies have seldom focused on dynamic correlation matrix estimation under the non-stationary financial market. Moreover, as the number of stocks in the market grows, existing correlation matrix estimation methods face more serious challenges with regard to efficiency and effectiveness. In this paper, we propose a novel hash-based dynamic correlation forecasting model (HDCF) to estimate dynamic stock correlations. Under structural assumptions on the correlation matrix, HDCF learns the hash representation based on normalizing flows instead of the real-valued representation, which performs extremely efficiently in high-dimensional settings. Experiments show that our proposed model outperforms baselines on portfolio decisions in terms of effectiveness and efficiency.

AAAI Conference 2023 Conference Paper

Positive Distribution Pollution: Rethinking Positive Unlabeled Learning from a Unified Perspective

  • Qianqiao Liang
  • Mengying Zhu
  • Yan Wang
  • Xiuyuan Wang
  • Wanjia Zhao
  • Mengyuan Yang
  • Hua Wei
  • Bing Han

Positive Unlabeled (PU) learning, which has a wide range of applications, is becoming increasingly prevalent. However, it suffers from problems such as data imbalance, selection bias, and prior agnostic in real scenarios. Existing studies focus on addressing part of these problems, which fail to provide a unified perspective to understand these problems. In this paper, we first rethink these problems by analyzing a typical PU scenario and come up with an insightful point of view that all these problems are inherently connected to one problem, i.e., positive distribution pollution, which refers to the inaccuracy in estimating positive data distribution under very little labeled data. Then, inspired by this insight, we devise a variational model named CoVPU, which addresses all three problems in a unified perspective by targeting the positive distribution pollution problem. CoVPU not only accurately separates the positive data from the unlabeled data based on discrete normalizing flows, but also effectively approximates the positive distribution based on our derived unbiased rebalanced risk estimator and supervises the approximation based on a novel prior-free variational loss. Rigorous theoretical analysis proves the convergence of CoVPU to an optimal Bayesian classifier. Extensive experiments demonstrate the superiority of CoVPU over the state-of-the-art PU learning methods under these problems.

AAAI Conference 2023 Conference Paper

PPGenCDR: A Stable and Robust Framework for Privacy-Preserving Cross-Domain Recommendation

  • Xinting Liao
  • Weiming Liu
  • Xiaolin Zheng
  • Binhui Yao
  • Chaochao Chen

Privacy-preserving cross-domain recommendation (PPCDR) refers to preserving the privacy of users when transferring the knowledge from source domain to target domain for better performance, which is vital for the long-term development of recommender systems. Existing work on cross-domain recommendation (CDR) reaches advanced and satisfying recommendation performance, but mostly neglects preserving privacy. To fill this gap, we propose a privacy-preserving generative cross-domain recommendation (PPGenCDR) framework for PPCDR. PPGenCDR includes two main modules, i.e., stable privacy-preserving generator module, and robust cross-domain recommendation module. Specifically, the former isolates data from different domains with a generative adversarial network (GAN) based model, which stably estimates the distribution of private data in the source domain with ́Renyi differential privacy (RDP) technique. Then the latter aims to robustly leverage the perturbed but effective knowledge from the source domain with the raw data in target domain to improve recommendation performance. Three key modules, i.e., (1) selective privacy preserver, (2) GAN stabilizer, and (3) robustness conductor, guarantee the cost-effective trade-off between utility and privacy, the stability of GAN when using RDP, and the robustness of leveraging transferable knowledge accordingly. The extensive empirical studies on Douban and Amazon datasets demonstrate that PPGenCDR significantly outperforms the state-of-the-art recommendation models while preserving privacy.

IJCAI Conference 2023 Conference Paper

Spotlight News Driven Quantitative Trading Based on Trajectory Optimization

  • Mengyuan Yang
  • Mengying Zhu
  • Qianqiao Liang
  • Xiaolin Zheng
  • Menghan Wang

News-driven quantitative trading (NQT) has been popularly studied in recent years. Most existing NQT methods are performed in a two-step paradigm, i. e. , first analyzing markets by a financial prediction task and then making trading decisions, which is doomed to failure due to the nearly futile financial prediction task. To bypass the financial prediction task, in this paper, we focus on reinforcement learning (RL) based NQT paradigm, which leverages news to make profitable trading decisions directly. In this paper, we propose a novel NQT framework SpotlightTrader based on decision trajectory optimization, which can effectively stitch together a continuous and flexible sequence of trading decisions to maximize profits. In addition, we enhance this framework by constructing a spotlight-driven state trajectory that obeys a stochastic process with irregular abrupt jumps caused by spotlight news. Furthermore, in order to adapt to non-stationary financial markets, we propose an effective training pipeline for this framework, which blends offline pretraining with online finetuning to balance exploration and exploitation effectively during online tradings. Extensive experiments on three real-world datasets demonstrate our proposed model’s superiority over the state-of-the-art NQT methods.

NeurIPS Conference 2023 Conference Paper

UltraRE: Enhancing RecEraser for Recommendation Unlearning via Error Decomposition

  • Yuyuan Li
  • Chaochao Chen
  • Yizhao Zhang
  • Weiming Liu
  • Lingjuan Lyu
  • Xiaolin Zheng
  • Dan Meng
  • Jun Wang

With growing concerns regarding privacy in machine learning models, regulations have committed to granting individuals the right to be forgotten while mandating companies to develop non-discriminatory machine learning systems, thereby fueling the study of the machine unlearning problem. Our attention is directed toward a practical unlearning scenario, i. e. , recommendation unlearning. As the state-of-the-art framework, i. e. , RecEraser, naturally achieves full unlearning completeness, our objective is to enhance it in terms of model utility and unlearning efficiency. In this paper, we rethink RecEraser from an ensemble-based perspective and focus on its three potential losses, i. e. , redundancy, relevance, and combination. Under the theoretical guidance of the above three losses, we propose a new framework named UltraRE, which simplifies and powers RecEraser for recommendation tasks. Specifically, for redundancy loss, we incorporate transport weights in the clustering algorithm to optimize the equilibrium between collaboration and balance while enhancing efficiency; for relevance loss, we ensure that sub-models reach convergence on their respective group data; for combination loss, we simplify the combination estimator without compromising its efficacy. Extensive experiments on three real-world datasets demonstrate the effectiveness of UltraRE.

IJCAI Conference 2022 Conference Paper

A Smart Trader for Portfolio Management based on Normalizing Flows

  • Mengyuan Yang
  • Xiaolin Zheng
  • Qianqiao Liang
  • Bing Han
  • Mengying Zhu

In this paper, we study a new kind of portfolio problem, named trading point aware portfolio optimization (TPPO), which aims to obtain excess intraday profit by deciding the portfolio weights and their trading points simultaneously based on microscopic information. However, a strategy for the TPPO problem faces two challenging problems, i. e. , modeling the ever-changing and irregular microscopic stock price time series and deciding the scattering candidate trading points. To address these problems, we propose a novel TPPO strategy named STrader based on normalizing flows. STrader is not only promising in reversibly transforming the geometric Brownian motion process to the unobservable and complicated stochastic process of the microscopic stock price time series for modeling such series, but also has the ability to earn excess intraday profit by capturing the appropriate trading points of the portfolio. Extensive experiments conducted on three public datasets demonstrate STrader's superiority over the state-of-the-art portfolio strategies.

IJCAI Conference 2022 Conference Paper

HCFRec: Hash Collaborative Filtering via Normalized Flow with Structural Consensus for Efficient Recommendation

  • Fan Wang
  • Weiming Liu
  • Chaochao Chen
  • Mengying Zhu
  • Xiaolin Zheng

The ever-increasing data scale of user-item interactions makes it challenging for an effective and efficient recommender system. Recently, hash-based collaborative filtering (Hash-CF) approaches employ efficient Hamming distance of learned binary representations of users and items to accelerate recommendations. However, Hash-CF often faces two challenging problems, i. e. , optimization on discrete representations and preserving semantic information in learned representations. To address the above two challenges, we propose HCFRec, a novel Hash-CF approach for effective and efficient recommendations. Specifically, HCFRec not only innovatively introduces normalized flow to learn the optimal hash code by efficiently fitting a proposed approximate mixture multivariate normal distribution, a continuous but approximately discrete distribution, but also deploys a cluster consistency preserving mechanism to preserve the semantic structure in representations for more accurate recommendations. Extensive experiments conducted on six real-world datasets demonstrate the superiority of our HCFRec compared to the state-of-art methods in terms of effectiveness and efficiency.

TIST Journal 2022 Journal Article

Toward Scalable and Privacy-preserving Deep Neural Network via Algorithmic-Cryptographic Co-design

  • Jun Zhou
  • Longfei Zheng
  • Chaochao Chen
  • Yan Wang
  • Xiaolin Zheng
  • Bingzhe Wu
  • Cen Chen
  • Li Wang

Deep Neural Networks (DNNs) have achieved remarkable progress in various real-world applications, especially when abundant training data are provided. However, data isolation has become a serious problem currently. Existing works build privacy-preserving DNN models from either algorithmic perspective or cryptographic perspective. The former mainly splits the DNN computation graph between data holders or between data holders and server, which demonstrates good scalability but suffers from accuracy loss and potential privacy risks. In contrast, the latter leverages time-consuming cryptographic techniques, which has strong privacy guarantee but poor scalability. In this article, we propose SPNN—a Scalable and Privacy-preserving deep Neural Network learning framework, from an algorithmic-cryptographic co-perspective. From algorithmic perspective, we split the computation graph of DNN models into two parts, i.e., the private-data-related computations that are performed by data holders and the rest heavy computations that are delegated to a semi-honest server with high computation ability. From cryptographic perspective, we propose using two types of cryptographic techniques, i.e., secret sharing and homomorphic encryption, for the isolated data holders to conduct private-data-related computations privately and cooperatively. Furthermore, we implement SPNN in a decentralized setting and introduce user-friendly APIs. Experimental results conducted on real-world datasets demonstrate the superiority of our proposed SPNN.

IJCAI Conference 2022 Conference Paper

Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification

  • Chaochao Chen
  • Jun Zhou
  • Longfei Zheng
  • Huiwen Wu
  • Lingjuan Lyu
  • Jia Wu
  • Bingzhe Wu
  • Ziqi Liu

Recently, Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data, consisting of node features and the adjacent information between different nodes. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different data holders in practice, which is the so-called data isolation problem. To solve this problem, in this paper, we propose VFGNN, a federated GNN learning paradigm for privacy-preserving node classification task under data vertically partitioned setting, which can be generalized to existing GNN models. Specifically, we split the computation graph into two parts. We leave the private data (i. e. , features, edges, and labels) related computations on data holders, and delegate the rest of computations to a semi-honest server. We also propose to apply differential privacy to prevent potential information leakage from the server. We conduct experiments on three benchmarks and the results demonstrate the effectiveness of VFGNN.

IJCAI Conference 2021 Conference Paper

An Adaptive News-Driven Method for CVaR-sensitive Online Portfolio Selection in Non-Stationary Financial Markets

  • Qianqiao Liang
  • Mengying Zhu
  • Xiaolin Zheng
  • Yan Wang

CVaR-sensitive online portfolio selection (CS-OLPS) becomes increasingly important for investors because of its effectiveness to minimize conditional value at risk (CVaR) and control extreme losses. However, the non-stationary nature of financial markets makes it very difficult to address the CS-OLPS problem effectively. To address the CS-OLPS problem in non-stationary markets, we propose an effective news-driven method, named CAND, which adaptively exploits news to determine the adjustment tendency and adjustment scale for tracking the dynamic optimal portfolio with minimal CVaR in each trading round. In addition, we devise a filtering mechanism to reduce the errors caused by the noisy news for further improving CAND's effectiveness. We rigorously prove a sub-linear regret of CAND. Extensive experiments on three real-world datasets demonstrate CAND’s superiority over the state-of-the-art portfolio methods in terms of returns and risks.

NeurIPS Conference 2021 Conference Paper

Leveraging Distribution Alignment via Stein Path for Cross-Domain Cold-Start Recommendation

  • Weiming Liu
  • Jiajie Su
  • Chaochao Chen
  • Xiaolin Zheng

Cross-Domain Recommendation (CDR) has been popularly studied to utilize different domain knowledge to solve the cold-start problem in recommender systems. In this paper, we focus on the Cross-Domain Cold-Start Recommendation (CDCSR) problem. That is, how to leverage the information from a source domain, where items are 'warm', to improve the recommendation performance of a target domain, where items are 'cold'. Unfortunately, previous approaches on cold-start and CDR cannot reduce the latent embedding discrepancy across domains efficiently and lead to model degradation. To address this issue, we propose DisAlign, a cross-domain recommendation framework for the CDCSR problem, which utilizes both rating and auxiliary representations from the source domain to improve the recommendation performance of the target domain. Specifically, we first propose Stein path alignment for aligning the latent embedding distributions across domains, and then further propose its improved version, i. e. , proxy Stein path, which can reduce the operation consumption and improve efficiency. Our empirical study on Douban and Amazon datasets demonstrate that DisAlign significantly outperforms the state-of-the-art models under the CDCSR setting.

IJCAI Conference 2020 Conference Paper

A Graphical and Attentional Framework for Dual-Target Cross-Domain Recommendation

  • Feng Zhu
  • Yan Wang
  • Chaochao Chen
  • Guanfeng Liu
  • Xiaolin Zheng

The conventional single-target Cross-Domain Recommendation (CDR) only improves the recommendation accuracy on a target domain with the help of a source domain (with relatively richer information). In contrast, the novel dual-target CDR has been proposed to improve the recommendation accuracies on both domains simultaneously. However, dual-target CDR faces two new challenges: (1) how to generate more representative user and item embeddings, and (2) how to effectively optimize the user/item embeddings on each domain. To address these challenges, in this paper, we propose a graphical and attentional framework, called GA-DTCDR. In GA-DTCDR, we first construct two separate heterogeneous graphs based on the rating and content information from two domains to generate more representative user and item embeddings. Then, we propose an element-wise attention mechanism to effectively combine the embeddings of common users learned from both domains. Both steps significantly enhance the quality of user and item embeddings and thus improve the recommendation accuracy on each domain. Extensive experiments conducted on four real-world datasets demonstrate that GA-DTCDR significantly outperforms the state-of-the-art approaches.

IJCAI Conference 2020 Conference Paper

Online Portfolio Selection with Cardinality Constraint and Transaction Costs based on Contextual Bandit

  • Mengying Zhu
  • Xiaolin Zheng
  • Yan Wang
  • Qianqiao Liang
  • Wenfang Zhang

Online portfolio selection (OLPS) is a fundamental and challenging problem in financial engineering, which faces two practical constraints during the real trading, i. e. , cardinality constraint and non-zero transaction costs. In order to achieve greater feasibility in financial markets, in this paper, we propose a novel online portfolio selection method named LExp4. TCGP with theoretical guarantee of sublinear regret to address the OLPS problem with the two constraints. In addition, we incorporate side information into our method based on contextual bandit, which further improves the effectiveness of our method. Extensive experiments conducted on four representative real-world datasets demonstrate that our method significantly outperforms the state-of-the-art methods when cardinality constraint and non-zero transaction costs co-exist.

TIST Journal 2020 Journal Article

Practical Privacy Preserving POI Recommendation

  • Chaochao Chen
  • Jun Zhou
  • Bingzhe Wu
  • Wenjing Fang
  • Li Wang
  • Yuan Qi
  • Xiaolin Zheng

Point-of-Interest (POI) recommendation has been extensively studied and successfully applied in industry recently. However, most existing approaches build centralized models on the basis of collecting users’ data. Both private data and models are held by the recommender, which causes serious privacy concerns. In this article, we propose a novel Privacy preserving POI Recommendation (PriRec) framework. First, to protect data privacy, users’ private data (features and actions) are kept on their own side, e.g., Cellphone or Pad. Meanwhile, the public data that need to be accessed by all the users are kept by the recommender to reduce the storage costs of users’ devices. Those public data include: (1) static data only related to the status of POI, such as POI categories, and (2) dynamic data dependent on user-POI actions such as visited counts. The dynamic data could be sensitive, and we develop local differential privacy techniques to release such data to the public with privacy guarantees. Second, PriRec follows the representations of Factorization Machine (FM) that consists of a linear model and the feature interaction model. To protect the model privacy, the linear models are saved on the users’ side, and we propose a secure decentralized gradient descent protocol for users to learn it collaboratively. The feature interaction model is kept by the recommender since there is no privacy risk, and we adopt a secure aggregation strategy in a federated learning paradigm to learn it. To this end, PriRec keeps users’ private raw data and models in users’ own hands, and protects user privacy to a large extent. We apply PriRec in real-world datasets, and comprehensive experiments demonstrate that, compared with FM, PriRec achieves comparable or even better recommendation accuracy.

AAAI Conference 2018 Conference Paper

Collaborative Filtering With Social Exposure: A Modular Approach to Social Recommendation

  • Menghan Wang
  • Xiaolin Zheng
  • Yang Yang
  • Kun Zhang

This paper is concerned with how to make efficient use of social information to improve recommendations. Most existing social recommender systems assume people share similar preferences with their social friends. Which, however, may not hold true due to various motivations of making online friends and dynamics of online social networks. Inspired by recent causal process based recommendations that first model user exposures towards items and then use these exposures to guide rating prediction, we utilize social information to capture user exposures rather than user preferences. We assume that people get information of products from their online friends and they do not have to share similar preferences, which is less restrictive and seems closer to reality. Under this new assumption, in this paper, we present a novel recommendation approach (named SERec) to integrate social exposure into collaborative filtering. We propose two methods to implement SERec, namely social regularization and social boosting, each with different ways to construct social exposures. Experiments on four real-world datasets demonstrate that our methods outperform the state-of-the-art methods on top-N recommendations. Further study compares the robustness and scalability of the two proposed methods.

NeurIPS Conference 2018 Conference Paper

Modeling Dynamic Missingness of Implicit Feedback for Recommendation

  • Menghan Wang
  • Mingming Gong
  • Xiaolin Zheng
  • Kun Zhang

Implicit feedback is widely used in collaborative filtering methods for recommendation. It is well known that implicit feedback contains a large number of values that are \emph{missing not at random} (MNAR); and the missing data is a mixture of negative and unknown feedback, making it difficult to learn user's negative preferences. Recent studies modeled \emph{exposure}, a latent missingness variable which indicates whether an item is missing to a user, to give each missing entry a confidence of being negative feedback. However, these studies use static models and ignore the information in temporal dependencies among items, which seems to be a essential underlying factor to subsequent missingness. To model and exploit the dynamics of missingness, we propose a latent variable named ``\emph{user intent}'' to govern the temporal changes of item missingness, and a hidden Markov model to represent such a process. The resulting framework captures the dynamic item missingness and incorporate it into matrix factorization (MF) for recommendation. We also explore two types of constraints to achieve a more compact and interpretable representation of \emph{user intents}. Experiments on real-world datasets demonstrate the superiority of our method against state-of-the-art recommender systems.

IJCAI Conference 2018 Conference Paper

Recurrent Collaborative Filtering for Unifying General and Sequential Recommender

  • Disheng Dong
  • Xiaolin Zheng
  • Ruixun Zhang
  • Yan Wang

General recommender and sequential recommender are two commonly applied modeling paradigms for recommendation tasks. General recommender focuses on modeling the general user preferences, ignoring the sequential patterns in user behaviors; whereas sequential recommender focuses on exploring the item-to-item sequential relations, failing to model the global user preferences. In addition, better recommendation performance has recently been achieved by adopting an approach to combine them. However, previous approaches are unable to solve both tasks in a unified way and cannot capture the whole historical sequential information. In this paper, we propose a recommendation model named Recurrent Collaborative Filtering (RCF), which unifies both paradigms within a single model. Specifically, we combine recurrent neural network (the sequential recommender part) and matrix factorization model (the general recommender part) in a multi-task learning framework, where we perform joint optimization with shared model parameters enforcing the two parts to regularize each other. Furthermore, we empirically demonstrate on MovieLens and Netflix datasets that our model outperforms the state-of-the-art methods across the tasks of both sequential and general recommender.

AAAI Conference 2016 Conference Paper

Capturing Semantic Correlation for Item Recommendation in Tagging Systems

  • Chaochao Chen
  • Xiaolin Zheng
  • Yan Wang
  • Fuxing Hong
  • Deren Chen

The popularity of tagging systems provides a great opportunity to improve the performance of item recommendation. Although existing approaches use topic modeling to mine the semantic information of items by grouping the tags labelled for items, they overlook an important property that tags link users and items as a bridge. Thus these methods cannot deal with the data sparsity without commonly rated items (DS-WO-CRI) problem, limiting their recommendation performance. Towards solving this challenging problem, we propose a novel tag and rating based collaborative filtering (CF) model for item recommendation, which first uses topic modeling to mine the semantic information of tags for each user and for each item respectively, and then incorporates the semantic information into matrix factorization to factorize rating information and to capture the bridging feature of tags and ratings between users and items. As a result, our model captures the semantic correlation between users and items, and is able to greatly improve recommendation performance, especially in DS-WO-CRI situations. Experiments conducted on two popular real-world datasets demonstrate that our proposed model significantly outperforms the conventional CF approach, the state-of-the-art social relation based CF approach, and the state-of-the-art topic modeling based CF approaches in terms of both precision and recall, and it is an effective approach to the DS-WO-CRI problem.

AAAI Conference 2014 Conference Paper

Context-Aware Collaborative Topic Regression with Social Matrix Factorization for Recommender Systems

  • Chaochao Chen
  • Xiaolin Zheng
  • Yan Wang
  • Fuxing Hong
  • Zhen Lin

Online social networking sites have become popular platforms on which users can link with each other and share information, not only basic rating information but also information such as contexts, social relationships, and item contents. However, as far as we know, no existing works systematically combine diverse types of information to build more accurate recommender systems. In this paper, we propose a novel context-aware hierarchical Bayesian method. First, we propose the use of spectral clustering for user-item subgrouping, so that users and items in similar contexts are grouped. We then propose a novel hierarchical Bayesian model that can make predictions for each user-item subgroup, our model incorporate not only topic modeling to mine item content but also social matrix factorization to handle ratings and social relationships. Experiments on an Epinions dataset show that our method significantly improves recommendation performance compared with six categories of state-of-the-art recommendation methods in terms of both prediction accuracy and recall. We have also conducted experiments to study the extent to which ratings, contexts, social relationships, and item contents contribute to recommendation performance in terms of prediction accuracy and recall.