Arrow Research search

Author name cluster

Ruixuan Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers
1 author row

Possible papers

26

AAAI Conference 2026 Conference Paper

Data-Centric Sequential Recommendation with Relation-Augmented Generation

  • Yichen Li
  • Yichen Tan
  • Yijing Shan
  • Haozhao Wang
  • Rui Zhang
  • Imran Razzak
  • Ruixuan Li

Data-Centric Sequential Recommendation (DaCSR) has emerged as a promising technique that enhances dataset quality to better capture user preferences without increasing training complexity. However, mining item relations to improve data quality remains challenging due to the intricate nature of interaction sequences. Existing methods predominantly either: 1) optimize models to learn such item relations from fixed datasets at significant training cost, or 2) employ generative models to adaptively learn only interaction patterns, which lack interpretability and cannot guarantee effective data quality enhancement. In this paper, we pioneer a relation-guided dataset augmentation and regeneration framework for sequential recommendation called \textbf{RaSR}. This framework can significantly improve model performance on original datasets while maintaining training efficiency without modifying the model architecture. Specifically, we first preprocess user interactions to construct standardized sequential data and extract semantic representations via a Large Language Model (LLM). We then build a multi-relation graph with manually predefined metrics and semantic representations to generate augmented datasets. Finally, a relation-aware generator can produce regenerated datasets with both the multi-relation graph and the augmented dataset. To verify the effectiveness of RaSR, we conduct experiments on various backbone models and datasets, and achieve significant performance improvement compared to training the model only on the original dataset.

TIST Journal 2026 Journal Article

SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language Video Localization

  • Wen Shi
  • Wenchao Xu
  • Haozhao Wang
  • Xingshuo Han
  • Pan Zhou
  • Ruixuan Li

Natural language video localization (NLVL) is an important task in the vision-language understanding area, which calls for an in-depth understanding of not only computer vision and natural language side alone, but more importantly the interplay between both sides. Adversarial attack has been well-recognized as a critical security issue of deep neural network models, which requires prudent investigation. Despite its extensive yet separated studies in video and language two kinds of unimodal tasks, current understanding of the adversarial attack in NLVL tasks including both the vision and language modality is less developed. This paper therefore aims to comprehensively investigate the adversarial attacks of NLVL models by examining three different facets of adversarial attack methods. To achieve the attack goal, we propose a new adversarial attack paradigm called synonymous sentences-aware adversarial attack on NLVL (SNEAK), which captures the cross-modality interplay between the vision and language sides. To further enhance the stealthiness of SNEAK, we propose a frame importance-guided pruning with SNEAK attack (PSA) mechanism to reduce the amount of perturbed frames. Extensive experiments on five NLVL models and two datasets demonstrate the effectiveness and stealthiness of the proposed attack methods.

AAAI Conference 2026 Conference Paper

Unbiased Rectification for Sequential Recommender Systems Under Fake Orders

  • Qiyu Qin
  • Yichen Li
  • Haozhao Wang
  • Cheng Wang
  • Rui Zhang
  • Ruixuan Li

Fake orders pose increasing threats to sequential recommender systems by misleading recommendation results through artificially manipulated interactions, including click farming, context-irrelevant substitutions, and sequential perturbations. Unlike injecting carefully designed fake users to influence recommendation performance, fake orders embedded within genuine user sequences aim to disrupt user preferences and mislead recommendation results, thereby manipulating exposure rates of specific items to gain competitive advantages. To protect users' authentic interest preferences and eliminate misleading information, this paper aims to perform precise and efficient rectification on compromised sequential recommender systems while avoiding the enormous computational and time costs of retraining existing models. Specifically, we identify that fake orders are not absolutely harmful—in certain cases, partial fake orders can even have a data augmentation effect. Based on this insight, we propose Dual-view Identification and Targeted Rectification (DITaR), which primarily identifies harmful samples to achieve unbiased rectification of the system. The core idea of this method is to obtain differentiated representations from collaborative and semantic views for precise detection, and then filters detected suspicious fake orders to select truly harmful ones for targeted rectification with gradient ascent. This ensures that useful information in fake orders is not removed while preventing bias residue. Moreover, it maintains the original data volume and sequence structure, thus protecting system performance and trustworthiness to achieve optimal unbiased rectification. Extensive experiments on three datasets demonstrate that DITaR achieves superior performance compared to state-of-the-art methods in terms of recommendation quality, computational efficiency, and system robustness.

NeurIPS Conference 2025 Conference Paper

Beyond Higher Rank: Token-wise Input-Output Projections for Efficient Low-Rank Adaptation

  • Shiwei Li
  • Xiandi Luo
  • Haozhao Wang
  • Xing Tang
  • Ziqiang Cui
  • Dugang Liu
  • Yuhua Li
  • Xiuqiang He

Low-rank adaptation (LoRA) is a parameter-efficient fine-tuning (PEFT) method widely used in large language models (LLMs). LoRA essentially describes the projection of an input space into a low-dimensional output space, with the dimensionality determined by the LoRA rank. In standard LoRA, all input tokens share the same weights and undergo an identical input-output projection. This limits LoRA's ability to capture token-specific information due to the inherent semantic differences among tokens. To address this limitation, we propose **Token-wise Projected Low-Rank Adaptation (TopLoRA)**, which dynamically adjusts LoRA weights according to the input token, thereby learning token-wise input-output projections in an end-to-end manner. Formally, the weights of TopLoRA can be expressed as $B\Sigma_X A$, where $A$ and $B$ are low-rank matrices (as in standard LoRA), and $\Sigma_X$ is a diagonal matrix generated from each input token $X$. Notably, TopLoRA does not increase the rank of LoRA weights but achieves more granular adaptation by learning token-wise LoRA weights (i. e. , token-wise input-output projections). Extensive experiments across multiple models and datasets demonstrate that TopLoRA consistently outperforms LoRA and its variants. The code is available at https: //github. com/Leopold1423/toplora-neurips25.

NeurIPS Conference 2025 Conference Paper

ChatbotID: Identifying Chatbots with Granger Causality Test

  • Xiaoquan Yi
  • Haozhao Wang
  • Yining Qi
  • Wenchao Xu
  • Rui Zhang
  • Yuhua Li
  • Ruixuan Li

With the increasing sophistication of Large Language Models (LLMs), it is crucial to develop reliable methods to accurately identify whether an interlocutor in real-time dialogue is human or chatbot. However, existing detection methods are primarily designed for analyzing full documents, not the unique dynamics and characteristics of dialogue. These approaches frequently overlook the nuances of interaction that are essential in conversational contexts. This work identifies two key patterns in dialogues: (1) Human-Human (H-H) interactions exhibit significant bidirectional sentiment influence, while (2) Human-Chatbot (H-C) interactions display a clear asymmetric pattern. We propose an innovative approach named ChatbotID, which applies the Granger Causality Test (GCT) to extract a novel set of interactional features that capture the evolving, predictive relationships between conversational attributes. By synergistically fusing these GCT-based interactional features with contextual embeddings, and optimizing the model through a meticulous loss function. Experimental results across multiple datasets and detection models demonstrate the effectiveness of our framework, with significant improvements in accuracy for distinguishing between H-H and H-C dialogues.

EAAI Journal 2025 Journal Article

Context-aware resemblance detection for data deduplication with neural network

  • Xuming Ye
  • Wenlong Tian
  • Yaping Wan
  • Ruixuan Li
  • Weijun Xiao
  • Zhiyong Xu

As the prevalence of cloud storage increases, many individuals and companies prefer outsourcing their data for backup and management. However, this has led to a significant increase in redundancy, decreasing storage utilization and wasting network bandwidth. While conventional resemblance detection methods remove redundancy among similar data by comparing the features extracted from each chunk’s content. However, we observed that small changes between similar data chunks may cause false dissimilarity detection by conventional resemblance detection techniques. This is because features derived solely from the chunk content are highly susceptible to various modification patterns. Fortunately, we have discovered that two chunks are likely to be similar if their surrounding chunks are also similar, a concept we refer to as “chunk-context”. Therefore, we propose a novel chunk-context aware resemblance detection method, called CARD, which includes a network-based chunk-context aware model and an N-sub-chunk shingles-based initial feature extraction strategy. By leveraging the Neural network, it can discover the complex patterns between the chunk-context and chunk content itself. A high-level understanding of the contextual information with chunk content can be synthesized into the representation of a chunk. The primary difference compared with others is that our design can significantly improves the accuracy or efficiency of resemblance detection by considering the chunk-context with chunk content itself. Furthermore, we implemented a CARD prototype and conducted extensive experiments using real workload, demonstrating that CARD can detect up to 75. 03% more redundant data and accelerate the resemblance detection operations by 5. 6 × to 86. 7 × faster than state-of-the-art work.

NeurIPS Conference 2025 Conference Paper

Efficient Knowledge Transfer in Federated Recommendation for Joint Venture Ecosystem

  • Yichen Li
  • Yijing Shan
  • Yi Liu
  • Haozhao Wang
  • Cheng Wang
  • Yi Wang
  • Ruixuan Li

The current Federated Recommendation System (FedRS) focuses on personalized recommendation services and assumes clients are personalized IoT devices (e. g. , Mobile phones). In this paper, we deeply dive into new but practical FedRS applications within the joint venture ecosystem. Subsidiaries engage as participants with their users and items. However, in such a situation, merely exchanging item embedding is insufficient, as user bases always exhibit both overlaps and exclusive segments, demonstrating the complexity of user information. Meanwhile, directly uploading user information is a violation of privacy and unacceptable. To tackle the above challenges, we propose an efficient and privacy-enhanced federated recommendation for the joint venture ecosystem (FR-JVE) that each client transfers more common knowledge from other clients with a distilled user's \textit{rating preference} from the local dataset. More specifically, we first transform the local data into a new format and apply model inversion techniques to distill the rating preference with frozen user gradients before the federated training. Then, a bridge function is employed on each client side to align the local rating preference and aggregated global preference in a privacy-friendly manner. Finally, each client matches similar users to make a better prediction for overlapped users. From a theoretical perspective, we analyze how effectively FR-JVE can guarantee user privacy. Empirically, we show that FR-JVE achieves superior performance compared to state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

Enhancing Privacy in Multimodal Federated Learning with Information Theory

  • Tianzhe Xiao
  • Yichen Li
  • Yining Qi
  • Yi Liu
  • Haozhao Wang
  • Yi Wang
  • Ruixuan Li

Multimodal federated learning (MMFL) has gained increasing popularity due to its ability to leverage the correlation between various modalities, meanwhile preserving data privacy for different clients. However, recent studies show that correlation between modalities increase the vulnerability of federated learning against Gradient Inversion Attack (GIA). The complicated situation of MMFL privacy preserving can be summarized as follows: 1) different modality transmits different amounts of information, thus requires various protection strength; 2) correlation between modalities should be taken into account. This paper introduces an information theory perspective to analyze the leaked privacy in process of MMFL, and tries to propose a more reasonable protection method \textbf{Sec-MMFL} based on assessing different information leakage possibilities of each modality by conditional mutual information and adjust the corresponding protection strength. Moreover, we use mutual information to reduce the cross-modality information leakage in MMFL. Experiments have proven that our method can bring more balanced and comprehensive protection at an acceptable cost.

NeurIPS Conference 2025 Conference Paper

Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

  • Yichen Li
  • Xiuying Wang
  • Wenchao Xu
  • Haozhao Wang
  • Yining Qi
  • Jiahua Dong
  • Ruixuan Li

Model-Heterogeneous Federated Learning (Hetero-FL) has attracted growing attention for its ability to aggregate knowledge from heterogeneous models while keeping private data locally. To better aggregate knowledge from clients, ensemble distillation, as a widely used and effective technique, is often employed after global aggregation to enhance the performance of the global model. However, simply combining Hetero-FL and ensemble distillation does not always yield promising results and can make the training process unstable. The reason is that existing methods primarily focus on logit distillation, which, while being model-agnostic with softmax predictions, fails to compensate for the knowledge bias arising from heterogeneous models. To tackle this challenge, we propose a stable and efficient Feature Distillation for model-heterogeneous Federated learning, dubbed FedFD, that can incorporate aligned feature information via orthogonal projection to integrate knowledge from heterogeneous models better. Specifically, a new feature-based ensemble federated knowledge distillation paradigm is proposed. The global model on the server needs to maintain a projection layer for each client-side model architecture to align the features separately. Orthogonal techniques are employed to re-parameterize the projection layer to mitigate knowledge bias from heterogeneous models and thus maximize the distilled knowledge. Extensive experiments show that FedFD achieves superior performance compared to state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models

  • Jintao Tong
  • Wenwei Jin
  • Pengda Qin
  • Anqi Li
  • Yixiong Zou
  • Yuhong Li
  • Yuhua Li
  • Ruixuan Li

Large vision-language models (LVLMs) excel at multimodal understanding but suffer from high computational costs due to redundant vision tokens. Existing pruning methods typically rely on single-layer attention scores to rank and prune redundant visual tokens to solve this inefficiency. However, as the interaction between tokens and layers is complicated, this raises a basic question: Is such a simple single-layer criterion sufficient to identify redundancy? To answer this question, we rethink the emergence of redundant visual tokens from a fundamental perspective: information flow, which models the interaction between tokens and layers by capturing how information moves between tokens across layers. We find (1) the CLS token acts as an information relay, which can simplify the complicated flow analysis; (2) the redundancy emerges progressively and dynamically via layer-wise attention concentration; and (3) relying solely on attention scores from single layers can lead to contradictory redundancy identification. Based on this, we propose FlowCut, an information-flow-aware pruning framework, mitigating the insufficiency of the current criterion for identifying redundant tokens and better aligning with the model's inherent behaviors. Extensive experiments show FlowCut achieves superior results, outperforming SoTA by 1. 6% on LLaVA-1. 5-7B with 88. 9% token reduction, and by 4. 3% on LLaVA-NeXT-7B with 94. 4% reduction, delivering 3. 2$\times$ speed-up in the prefilling stage. Our code is available at https: //github. com/TungChintao/FlowCut.

NeurIPS Conference 2025 Conference Paper

Quantifying Distributional Invariance in Causal Subgraph for IRM-Free Graph Generalization

  • Yang Qiu
  • Yixiong Zou
  • Jun Wang
  • Wei Liu
  • Xiangyu Fu
  • Ruixuan Li

Out-of-distribution generalization under distributional shifts remains a critical challenge for graph neural networks. Existing methods generally adopt the Invariant Risk Minimization (IRM) framework, requiring costly environment annotations or heuristically generated synthetic splits. To circumvent these limitations, in this work, we aim to develop an IRM-free method for capturing causal subgraphs. We first identify that causal subgraphs exhibit substantially smaller distributional variations than non-causal components across diverse environments, which we formalize as the Invariant Distribution Criterion and theoretically prove in this paper. Building on this criterion, we systematically uncover the quantitative relationship between distributional shift and representation norm for identifying the causal subgraph, and investigate its underlying mechanisms in depth. Finally, we propose an IRM-free method by introducing a norm-guided invariant distribution objective for causal subgraph discovery and prediction. Extensive experiments on two widely used benchmarks demonstrate that our method consistently outperforms state-of-the-art methods in graph generalization. Code is available at https: //github. com/anders1123/IDG.

AAAI Conference 2025 Conference Paper

Reconstruction Target Matters in Masked Image Modeling for Cross-Domain Few-Shot Learning

  • Ran Ma
  • Yixiong Zou
  • Yuhua Li
  • Ruixuan Li

Cross-Domain Few-Shot Learning (CDFSL) requires the model to transfer knowledge from the data-abundant source domain to data-scarce target domains for fast adaptation, where the large domain gap makes CDFSL a challenging problem. Masked Autoencoder (MAE) excels in effectively using unlabeled data and learning image’s global structures, enhancing model generalization and robustness. However, in the CDFSL task with significant domain shifts, we find MAE even shows lower performance than the baseline supervised models. In this paper, we first delve into this phenomenon for an interpretation. We find that MAE tends to focus on low-level domain information during reconstructing pixels while changing the reconstruction target to token features could mitigate this problem. However, not all features are beneficial, as we then find reconstructing high-level features can hardly improve the model’s transferability, indicating a trade-off between filtering domain information and preserving the image’s global structure. In all, the reconstruction target matters for the CDFSL task. Based on the above findings and interpretations, we further propose Domain-Agnostic Masked Image Modeling (DAMIM) for the CDFSL task. DAMIM includes an Aggregated Feature Reconstruction module to automatically aggregate features for reconstruction, with balanced learning of domain-agnostic information and images’ global structure, and a Lightweight Decoder module to further benefit the encoder’s generalizability. Experiments on four CDFSL datasets demonstrate that our method achieves state-of-the-art performance.

NeurIPS Conference 2025 Conference Paper

Resource-Constrained Federated Continual Learning: What Does Matter?

  • Yichen Li
  • Yuying Wang
  • Jiahua Dong
  • Haozhao Wang
  • Yining Qi
  • Rui Zhang
  • Ruixuan Li

Federated Continual Learning (FCL) aims to enable sequential privacy-preserving model training on streams of incoming data that vary in edge devices by preserving previous knowledge while adapting to new data. Current FCL literature focuses on restricted data privacy and access to previously seen data while imposing no constraints on the training overhead. This is unreasonable for FCL applications in real-world scenarios, where edge devices are primarily constrained by resources such as storage, computational budget, and label rate. We revisit this problem with a large-scale benchmark and analyze the performance of state-of-the-art FCL approaches under different resource-constrained settings. Various typical FCL techniques and six datasets in two incremental learning scenarios (Class-IL and Domain-IL) are involved in our experiments. Through extensive experiments amounting to a total of over 1, 000+ GPU hours, we find that, under limited resource-constrained settings, existing FCL approaches, with no exception, fail to achieve the expected performance. Our conclusions are consistent in the sensitivity analysis. This suggests that most existing FCL methods are particularly too resource-dependent for real-world deployment. Moreover, we study the performance of typical FCL techniques with resource constraints and shed light on future research directions in FCL.

NeurIPS Conference 2024 Conference Paper

A Closer Look at the CLS Token for Cross-Domain Few-Shot Learning

  • Yixiong Zou
  • Shuai Yi
  • Yuhua Li
  • Ruixuan Li

Vision Transformer (ViT) has shown great power in learning from large-scale datasets. However, collecting sufficient data for expert knowledge is always difficult. To handle this problem, Cross-Domain Few-Shot Learning (CDFSL) has been proposed to transfer the source-domain knowledge learned from sufficient data to target domains where only scarce data is available. In this paper, we find an intriguing phenomenon neglected by previous works for the CDFSL task based on ViT: leaving the CLS token to random initialization, instead of loading source-domain trained parameters, could consistently improve target-domain performance. We then delve into this phenomenon for an interpretation. We find the CLS token naturally absorbs domain information due to the inherent structure of the ViT, which is represented as the low-frequency component in the Fourier frequency space of images. Based on this phenomenon and interpretation, we further propose a method for the CDFSL task to decouple the domain information in the CLS token during the source-domain training, and adapt the CLS token on the target domain for efficient few-shot learning. Extensive experiments on four benchmarks validate our rationale and state-of-the-art performance. Our codes are available at https: //github. com/Zoilsen/CLS Token CDFSL.

NeurIPS Conference 2024 Conference Paper

Attention Temperature Matters in ViT-Based Cross-Domain Few-Shot Learning

  • Yixiong Zou
  • Ran Ma
  • Yuhua Li
  • Ruixuan Li

Cross-domain few-shot learning (CDFSL) is proposed to transfer knowledge from large-scale source-domain datasets to downstream target-domain datasets with only a few training samples. However, Vision Transformer (ViT), as a strong backbone network to achieve many top performances, is still under-explored in the CDFSL task in its transferability against large domain gaps. In this paper, we find an interesting phenomenon of ViT in the CDFSL task: by simply multiplying a temperature (even as small as 0) to the attention in ViT blocks, the target-domain performance consistently increases, even though the attention map is downgraded to a uniform map. In this paper, we delve into this phenomenon for an interpretation. Through experiments, we interpret this phenomenon as a remedy for the ineffective target-domain attention caused by the query-key attention mechanism under large domain gaps. Based on it, we further propose a simple but effective method for the CDFSL task to boost ViT's transferability by resisting the learning of query-key parameters and encouraging that of non-query-key ones. Experiments on four CDFSL datasets validate the rationale of our interpretation and method, showing we can consistently outperform state-of-the-art methods. Our codes are available at https: //github. com/Zoilsen/Attn Temp CDFSL.

AAAI Conference 2024 Conference Paper

Decoupling Representation and Knowledge for Few-Shot Intent Classification and Slot Filling

  • Jie Han
  • Yixiong Zou
  • Haozhao Wang
  • Jun Wang
  • Wei Liu
  • Yao Wu
  • Tao Zhang
  • Ruixuan Li

Few-shot intent classification and slot filling are important but challenging tasks due to the scarcity of finely labeled data. Therefore, current works first train a model on source domains with sufficiently labeled data, and then transfer the model to target domains where only rarely labeled data is available. However, experience transferring as a whole usually suffers from gaps that exist among source domains and target domains. For instance, transferring domain-specific-knowledge-related experience is difficult. To tackle this problem, we propose a new method that explicitly decouples the transferring of general-semantic-representation-related experience and the domain-specific-knowledge-related experience. Specifically, for domain-specific-knowledge-related experience, we design two modules to capture intent-slot relation and slot-slot relation respectively. Extensive experiments on Snips and FewJoint datasets show that our method achieves state-of-the-art performance. The method improves the joint accuracy metric from 27.72% to 42.20% in the 1-shot setting, and from 46.54% to 60.79% in the 5-shot setting.

IJCAI Conference 2024 Conference Paper

Delve into Base-Novel Confusion: Redundancy Exploration for Few-Shot Class-Incremental Learning

  • Haichen Zhou
  • Yixiong Zou
  • Ruixuan Li
  • Yuhua Li
  • Kui Xiao

Few-shot class-incremental learning (FSCIL) aims to acquire knowledge from novel classes with limited samples while retaining information about base classes. Existing methods address catastrophic forgetting and overfitting by freezing the feature extractor during novel-class learning. However, these methods usually tend to cause the confusion between base and novel classes, i. e. , classifying novel-class samples into base classes. In this paper, we delve into this phenomenon to study its cause and solution. We first interpret the confusion as the collision between the novel-class and the base-class region in the feature space. Then, we find the collision is caused by the label-irrelevant redundancies within the base-class feature and pixel space. Through qualitative and quantitative experiments, we identify this redundancy as the shortcut in the base-class training, which can be decoupled to alleviate the collision. Based on this analysis, to alleviate the collision between base and novel classes, we propose a method for FSCIL named Redundancy Decoupling and Integration (RDI). RDI first decouples redundancies from base-class space to shrink the intra-base-class feature space. Then, it integrates the redundancies as a dummy class to enlarge the inter-base-class feature space. This process effectively compresses the base-class feature space, creating buffer space for novel classes and alleviating the model's confusion between the base and novel classes. Extensive experiments across benchmark datasets, including CIFAR-100, miniImageNet, and CUB-200-2011 demonstrate that our method achieves state-of-the-art performance.

NeurIPS Conference 2024 Conference Paper

Generate Universal Adversarial Perturbations for Few-Shot Learning

  • Yiman Hu
  • Yixiong Zou
  • Ruixuan Li
  • Yuhua Li

Deep networks are known to be vulnerable to adversarial examples which are deliberately designed to mislead the trained model by introducing imperceptible perturbations to input samples. Compared to traditional perturbations crafted specifically for each data point, Universal Adversarial Perturbations (UAPs) are input-agnostic and shown to be more practical in the real world. However, UAPs are typically generated in a close-set scenario that shares the same classification task during the training and testing phases. This paper demonstrates the ineffectiveness of traditional UAPs in open-set scenarios like Few-Shot Learning (FSL). Through analysis, we identify two primary challenges that hinder the attacking process: the task shift and the semantic shift. To enhance the transferability of UAPs in FSL, we propose a unifying attacking framework addressing these two shifts. The task shift is addressed by aligning proxy tasks to the downstream tasks, while the semantic shift is handled by leveraging the generalizability of pre-trained encoders. The proposed Few-Shot Attacking FrameWork, denoted as FSAFW, can effectively generate UAPs across various FSL training paradigms and different downstream tasks. Our approach not only sets a new standard for state-of-the-art works but also significantly enhances attack performance, exceeding the baseline method by over 16\%.

NeurIPS Conference 2024 Conference Paper

Is the MMI Criterion Necessary for Interpretability? Degenerating Non-causal Features to Plain Noise for Self-Rationalization

  • Wei Liu
  • Zhiying Deng
  • Zhongyu Niu
  • Jun Wang
  • Haozhao Wang
  • YuanKai Zhang
  • Ruixuan Li

An important line of research in the field of explainability is to extract a small subset of crucial rationales from the full input. The most widely used criterion for rationale extraction is the maximum mutual information (MMI) criterion. However, in certain datasets, there are spurious features non-causally correlated with the label and also get high mutual information, complicating the loss landscape of MMI. Although some penalty-based methods have been developed to penalize the spurious features (e. g. , invariance penalty, intervention penalty, etc) to help MMI work better, these are merely remedial measures. In the optimization objectives of these methods, spurious features are still distinguished from plain noise, which hinders the discovery of causal rationales. This paper aims to develop a new criterion that treats spurious features as plain noise, allowing the model to work on datasets rich in spurious features as if it were working on clean datasets, thereby making rationale extraction easier. We theoretically observe that removing either plain noise or spurious features from the input does not alter the conditional distribution of the remaining components relative to the task label. However, significant changes in the conditional distribution occur only when causal features are eliminated. Based on this discovery, the paper proposes a criterion for \textbf{M}aximizing the \textbf{R}emaining \textbf{D}iscrepancy (MRD). Experiments on six widely used datasets show that our MRD criterion improves rationale quality (measured by the overlap with human-annotated rationales) by up to $10. 4\%$ as compared to several recent competitive MMI variants. Code: \url{https: //github. com/jugechengzi/Rationalization-MRD}.

NeurIPS Conference 2024 Conference Paper

Lightweight Frequency Masker for Cross-Domain Few-Shot Semantic Segmentation

  • Jintao Tong
  • Yixiong Zou
  • Yuhua Li
  • Ruixuan Li

Cross-domain few-shot segmentation (CD-FSS) is proposed to first pre-train the model on a large-scale source-domain dataset, and then transfer the model to data-scarce target-domain datasets for pixel-level segmentation. The significant domain gap between the source and target datasets leads to a sharp decline in the performance of existing few-shot segmentation (FSS) methods in cross-domain scenarios. In this work, we discover an intriguing phenomenon: simply filtering different frequency components for target domains can lead to a significant performance improvement, sometimes even as high as 14% mIoU. Then, we delve into this phenomenon for an interpretation, and find such improvements stem from the reduced inter-channel correlation in feature maps, which benefits CD-FSS with enhanced robustness against domain gaps and larger activated regions for segmentation. Based on this, we propose a lightweight frequency masker, which further reduces channel correlations by an Amplitude-Phase Masker (APM) module and an Adaptive Channel Phase Attention (ACPA) module. Notably, APM introduces only 0. 01% additional parameters but improves the average performance by over 10%, and ACPA imports only 2. 5% parameters but further improves the performance by over 1. 5%, which significantly surpasses the state-of-the-art CD-FSS methods.

AAAI Conference 2023 Conference Paper

Adaptive Low-Precision Training for Embeddings in Click-Through Rate Prediction

  • Shiwei Li
  • Huifeng Guo
  • Lu Hou
  • Wei Zhang
  • Xing Tang
  • Ruiming Tang
  • Rui Zhang
  • Ruixuan Li

Embedding tables are usually huge in click-through rate (CTR) prediction models. To train and deploy the CTR models efficiently and economically, it is necessary to compress their embedding tables. To this end, we formulate a novel quantization training paradigm to compress the embeddings from the training stage, termed low-precision training (LPT). Also, we provide theoretical analysis on its convergence. The results show that stochastic weight quantization has a faster convergence rate and a smaller convergence error than deterministic weight quantization in LPT. Further, to reduce accuracy degradation, we propose adaptive low-precision training (ALPT) which learns the step size (i.e., the quantization resolution). Experiments on two real-world datasets confirm our analysis and show that ALPT can significantly improve the prediction accuracy, especially at extremely low bit width. For the first time in CTR models, we successfully train 8-bit embeddings without sacrificing prediction accuracy.

IJCAI Conference 2023 Conference Paper

CSGCL: Community-Strength-Enhanced Graph Contrastive Learning

  • Han Chen
  • Ziwen Zhao
  • Yuhua Li
  • Yixiong Zou
  • Ruixuan Li
  • Rui Zhang

Graph Contrastive Learning (GCL) is an effective way to learn generalized graph representations in a self-supervised manner, and has grown rapidly in recent years. However, the underlying community semantics has not been well explored by most previous GCL methods. Research that attempts to leverage communities in GCL regards them as having the same influence on the graph, leading to extra representation errors. To tackle this issue, we define ''community strength'' to measure the difference of influence among communities. Under this premise, we propose a Community-Strength-enhanced Graph Contrastive Learning (CSGCL) framework to preserve community strength throughout the learning process. Firstly, we present two novel graph augmentation methods, Communal Attribute Voting (CAV) and Communal Edge Dropping (CED), where the perturbations of node attributes and edges are guided by community strength. Secondly, we propose a dynamic ''Team-up'' contrastive learning scheme, where community strength is used to progressively fine-tune the contrastive objective. We report extensive experiment results on three downstream tasks: node classification, node clustering, and link prediction. CSGCL achieves state-of-the-art performance compared with other GCL methods, validating that community strength brings effectiveness and generality to graph representations. Our code is available at https: //github. com/HanChen-HUST/CSGCL.

NeurIPS Conference 2023 Conference Paper

D-Separation for Causal Self-Explanation

  • Wei Liu
  • Jun Wang
  • Haozhao Wang
  • Ruixuan Li
  • Zhiying Deng
  • YuanKai Zhang
  • Yang Qiu

Rationalization aims to strengthen the interpretability of NLP models by extracting a subset of human-intelligible pieces of their inputting texts. Conventional works generally employ the maximum mutual information (MMI) criterion to find the rationale that is most indicative of the target label. However, this criterion can be influenced by spurious features that correlate with the causal rationale or the target label. Instead of attempting to rectify the issues of the MMI criterion, we propose a novel criterion to uncover the causal rationale, termed the Minimum Conditional Dependence (MCD) criterion, which is grounded on our finding that the non-causal features and the target label are \emph{d-separated} by the causal rationale. By minimizing the dependence between the non-selected parts of the input and the target label conditioned on the selected rationale candidate, all the causes of the label are compelled to be selected. In this study, we employ a simple and practical measure for dependence, specifically the KL-divergence, to validate our proposed MCD criterion. Empirically, we demonstrate that MCD improves the F1 score by up to 13. 7% compared to previous state-of-the-art MMI-based methods. Our code is in an anonymous repository: https: //anonymous. 4open. science/r/MCD-CE88.

NeurIPS Conference 2022 Conference Paper

FR: Folded Rationalization with a Unified Encoder

  • Wei Liu
  • Haozhao Wang
  • Jun Wang
  • Ruixuan Li
  • Chao Yue
  • YuanKai Zhang

Rationalization aims to strengthen the interpretability of NLP models by extracting a subset of human-intelligible pieces of their inputting texts. Conventional works generally employ a two-phase model in which a generator selects the most important pieces, followed by a predictor that makes predictions based on the selected pieces. However, such a two-phase model may incur the degeneration problem where the predictor overfits to the noise generated by a not yet well-trained generator and in turn, leads the generator to converge to a suboptimal model that tends to select senseless pieces. To tackle this challenge, we propose Folded Rationalization (FR) that folds the two phases of the rationale model into one from the perspective of text semantic extraction. The key idea of FR is to employ a unified encoder between the generator and predictor, based on which FR can facilitate a better predictor by access to valuable information blocked by the generator in the traditional two-phase model and thus bring a better generator. Empirically, we show that FR improves the F1 score by up to 10. 3% as compared to state-of-the-art methods.

NeurIPS Conference 2022 Conference Paper

Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation

  • Yixiong Zou
  • Shanghang Zhang
  • Yuhua Li
  • Ruixuan Li

Few-shot class-incremental learning (FSCIL) is designed to incrementally recognize novel classes with only few training samples after the (pre-)training on base classes with sufficient samples, which focuses on both base-class performance and novel-class generalization. A well known modification to the base-class training is to apply a margin to the base-class classification. However, a dilemma exists that we can hardly achieve both good base-class performance and novel-class generalization simultaneously by applying the margin during the base-class training, which is still under explored. In this paper, we study the cause of such dilemma for FSCIL. We first interpret this dilemma as a class-level overfitting (CO) problem from the aspect of pattern learning, and then find its cause lies in the easily-satisfied constraint of learning margin-based patterns. Based on the analysis, we propose a novel margin-based FSCIL method to mitigate the CO problem by providing the pattern learning process with extra constraint from the margin-based patterns themselves. Extensive experiments on CIFAR100, Caltech-USCD Birds-200-2011 (CUB200), and miniImageNet demonstrate that the proposed method effectively mitigates the CO problem and achieves state-of-the-art performance.

JAIR Journal 2019 Journal Article

Polynomial and Exponential Bounded Logic Programs with Function Symbols: Some New Decidable Classes

  • Vernon Asuncion
  • Yan Zhang
  • Heng Zhang
  • Ruixuan Li

A logic program with function symbols is called finitely ground if there is a finite propositional logic program whose stable models are exactly the same as the stable models of this program. Finite groundability is an important property for logic programs with function symbols because it makes feasible to compute such programs' stable models using traditional ASP solvers. In this paper, we introduce new decidable classes of finitely ground programs called poly-bounded and k-EXP-bounded programs, which, to the best of our knowledge, strictly contain all other decidable classes of finitely ground programs discovered so far in the literature. We also study the relevant complexity properties for these classes of programs. We prove that the membership complexities for poly-bounded and k-EXP-bounded programs are EXPTIME-complete and (k+1)-EXPTIME-complete, respectively.