Author name cluster

Feifei Kou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

AAAI Conference 2026 Conference Paper

Adaptive Graph Attention Based Discrete Hashing for Incomplete Cross-modal Retrieval

Shuang Zhang
Yue Wu
Lei Shi
Huilong Jin
Feifei Kou
Pengfei Zhang
Mingying Xu
Pengtao Lv

Cross-modal hashing has emerged as a pivotal solution for efficient retrieval across diverse modalities, such as images and texts, by mapping them into compact binary hash spaces. However, in real-world scenarios, the modalities data is often missing or misaligned. Existing methods are most rely on fully paired training data and ignore missing or misaligned modalities data, resulting in the semantic inconsistencies. To address these challenges, we propose an Adaptive Graph Attention-Based Discrete Hashing (AGADH) method, which consists of three parts. First, to solve the problem of missing modalities, AGADH employs a masked completion strategy to reconstruct missing modalities. Second, to mitigate semantic misalignment, AGADH leverages a Graph Attention Network (GAT) encoder-decoder architecture with alignment module to construct features from different modalities. Additionally, to enhance the fusion performance, an adaptive fusion module dynamically adjusting the contributions of image and text modalities with learnable weighting coefficients is proposed. Extensive experiments on three benchmark datasets, MS-COCO, NUS-WIDE, and MIRFlickr-25K, demonstrating that AGADH outperforms state-of-the-art methods in both fully paired and incompletely paired scenarios, showing its robustness and effectiveness in cross-modal retrieval tasks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Collaborative Transformers with Multi-Level Forensic Attention for Image Manipulation Localization

Jiwei Zhang
Wenbo Feng
Siwei Wang
Feifei Kou
Haoyang Yu
Shaozhang Niu

The proliferation of the tampered images on social media can pose serious societal risks, influencing public opinion and causing panic. Image Manipulation Localization technique has advanced to address this, but some methods focus on microscopic traces, overlooking macroscopic semantics that deceive viewers. To address this problem, we propose a novel Image Manipulation Localization framework called Collaborative Transformers (Co-Transformers), designed to fully explore and utilize the collaborative information between macroscopic semantics and microscopic traces. This framework is based on two Vision Transformer variants. The first variant captures the semantic logic of the image. The second variant delves into microscopic tampering traces. By dynamically fusing these two complementary features, the framework enables interaction between macroscopic semantic inconsistencies and microscopic abnormal traces, effectively coordinating their relationship in the latent space. Furthermore, we introduce a new Multi-Level Forensic Attention (MLF-Attention) mechanism to enhance the model's ability to extract various tampered traces, this mechanism can be integrated into our framework. Compared with existing methods, our proposed framework achieves state-of-the-art results in localization accuracy and shows good robustness against various attacks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DiMA: Distinguishing Resident and Tourist Preferences via Multi-Modal LLM Alignment for Out-of-Town Cross-Domain Recommendation

Fan Zhang
Jinpeng Chen
Tao Wang
Huan Li
Senzhang Wang
Feifei Kou
Ye Ji
Kaimin Wei

Out-of-Town (OOT) recommendation aims to provide personalized suggestions for users in unfamiliar cities. However, OOT recommendation faces two fundamental challenges: the difficulty of reasoning across modalities, as preference signals in disparate formats such as images and text are hard to compare; and the preference deviation problem, since a user's resident and tourist preferences often diverge, rendering simple preference transfer ineffective. To address these challenges, we propose Distinguishing Resident and Tourist Preferences via Multi-Modal LLM Alignment for Out-of-Town Cross-Domain Recommendation (DiMA), a framework for re-ranking Points of Interest (POIs). To tackle the multimodal challenge, DiMA first leverages Multimodal Large Language Models and Large Language Models (LLMs) to transform heterogeneous POI data into unified semantic tags, enabling both cross-modal reasoning and efficient downstream processing. To address preference deviation, a ``teacher'' LLM executes a custom Chain-of-Thought (CoT) process to disentangle resident and tourist preferences from multi-city histories for re-ranking. Finally, a lightweight student model learns this CoT reasoning via Supervised Fine-Tuning and is then refined with Direct Preference Optimization to align with true user choices, with the potential to surpass the teacher. Extensive experiments on a real-world dataset demonstrate that DiMA significantly enhances the performance of baseline models in the OOT recommendation re-ranking task.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MusicRec: Multi-modal Semantic-Enhanced Identifier with Collaborative Signals for Generative Recommendation

Yuqiu Zhao
Lei Shi
Yan Zhong
Feifei Kou
Pengfei Zhang
Jiwei Zhang
Mingying Xu
Yanchao Liu

Generative recommendation as a new paradigm is influencing the current development of recommender systems. It aims to assign identifiers that capture richer semantic and collaborative information to items, and subsequently predict item identifiers via autoregressive generation using Large Language Models (LLMs). Existing approaches primarily tokenize item text into codebooks with preserved semantic IDs through RQ-VAE, or separately tokenize different modality features of items. However, existing tokenization methods face two major challenges: (1) Learning decoupled multi-modal features limits the quality of the semantic representation. (2) Ignoring collaborative signals from interaction history limits the comprehensiveness of identifiers. To address these limitations, we propose a multi-modal semantic-enhanced identifier with collaborative signals for generative recommendation, named MusicRec. In MusicRec, we propose a tokenization approach based on shared-specific modal fusion, enabling the generated identifiers to preserve semantic information more comprehensively from all modalities. In addition, we incorporate collaborative signals from user interactions to guide identifier generation, preserving collaborative patterns in the semantic representation space. Extensive experiments on three public datasets demonstrate that MusicRec achieves state-of-the-art performance compared to existing baseline methods.

PDF Details DOI

ICML Conference 2025 Conference Paper

CFPT: Empowering Time Series Forecasting through Cross-Frequency Interaction and Periodic-Aware Timestamp Modeling

Feifei Kou
Jiahao Wang
Lei Shi 0030
Yuhan Yao 0001
Yawen Li 0001
Suguo Zhu
Zhongbao Zhang
Junping Du 0001

Long-term time series forecasting has been widely studied, yet two aspects remain insufficiently explored: the interaction learning between different frequency components and the exploitation of periodic characteristics inherent in timestamps. To address the above issues, we propose CFPT, a novel method that empowering time series forecasting through C ross- F requency Interaction (CFI) and P eriodic-Aware T imestamp Modeling (PTM). To learn cross-frequency interactions, we design the CFI branch to process signals in frequency domain and captures their interactions through a feature fusion mechanism. Furthermore, to enhance prediction performance by leveraging timestamp periodicity, we develop the PTM branch which transforms timestamp sequences into 2D periodic tensors and utilizes 2D convolution to capture both intra-period dependencies and inter-period correlations of time series based on timestamp patterns. Extensive experiments on multiple real-world benchmarks demonstrate that CFPT achieves state-of-the-art performance in long-term forecasting tasks. The code is publicly available at this repository: https: //github. com/BUPT-SN/CFPT.

Details

NeurIPS Conference 2025 Conference Paper

Dynamic Masking and Auxiliary Hash Learning for Enhanced Cross-Modal Retrieval

Shuang Zhang
Yue Wu
Lei Shi
Yingxue Zhang
Feifei Kou
Huilong Jin
Pengfei Zhang
Meiyu Liang

The demand for multimodal data processing drives the development of information technology. Cross-modal hash retrieval has attracted much attention because it can overcome modal differences and achieve efficient retrieval, and has shown great application potential in many practical scenarios. Existing cross-modal hashing methods have difficulties in fully capturing the semantic information of different modal data, which leads to a significant semantic gap between modalities. Moreover, these methods often ignore the importance differences of channels, and due to the limitation of a single goal, the matching effect between hash codes is also affected to a certain extent, thus facing many challenges. To address these issues, we propose a Dynamic Masking and Auxiliary Hash Learning (AHLR) method for enhanced cross-modal retrieval. By jointly leveraging the dynamic masking and auxiliary hash learning mechanisms, our approach effectively resolves the problems of channel information imbalance and insufficient key information capture, thereby significantly improving the retrieval accuracy. Specifically, we introduce a dynamic masking mechanism that automatically screens and weights the key information in images and texts during the training process, enhancing the accuracy of feature matching. We further construct an auxiliary hash layer to adaptively balance the weights of features across each channel, compensating for the deficiencies of traditional methods in key information capture and channel processing. In addition, we design a contrastive loss function to optimize the generation of hash codes and enhance their discriminative power, further improving the performance of cross-modal retrieval. Comprehensive experimental results on NUS-WIDE, MIRFlickr-25K and MS-COCO benchmark datasets show that the proposed AHLR algorithm outperforms several existing algorithms.

PDF Details

IJCAI Conference 2025 Conference Paper

EVICheck: Evidence-Driven Independent Reasoning and Combined Verification Method for Fact-Checking

Lingxiao Wang
Lei Shi
Feifei Kou
Ligu Zhu
Chen Ma
Pengfei Zhang
Mingying Xu
Zeyu Li

Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) have demonstrated significant potential in automated fact-checking. However, existing methods face limitations in insufficient evidence utilization and lack of explicit verification criteria. Specifically, these approaches aggregate evidence for collective reasoning without independently analyzing each piece, hindering their ability to leverage the available information thoroughly. Additionally, they rely on simple prompts or few-shot learning for verification, which makes truthfulness judgments less reliable, especially for complex claims. To address these limitations, we propose a novel method to enhance evidence utilization and introduce explicit verification criteria, named EVICheck. Our approach independently reasons each evidence piece and synthesizes the results to enable more thorough exploration and enhance interpretability. Additionally, by incorporating fine-grained truthfulness criteria, we make the model's verification process more structured and reliable, especially when handling complex claims. Experimental results on the public RAWFC dataset demonstrate that EVICheck achieves state-of-the-art performance across all evaluation metrics. Our method demonstrates strong potential in fake news verification, significantly improving the accuracy.

PDF Details DOI

AAAI Conference 2025 Conference Paper

IWRN:A Robust Blind Watermarking Method for Artwork Image Copyright Protection Against Noise Attack

Feifei Kou
Yuhan Yao
Siyuan Yao
Jiahao Wang
Lei Shi
Yawen Li
Xuejing Kang

Adding imperceptible watermarks to artwork images, such as paintings and photographs, can effectively safeguard the copyright of these images without compromising their usability. However, existing blind watermarking techniques encounter two major challenges in addressing this task: imperceptibility and robustness, particularly when subjected to various noise attacks. In this paper, we propose a blind watermarking method for artwork image copyright protection, IWRN, which can ensure both the Imperceptibility of the Watermark and Robustness against Noise attacks. For imperceptibility, we design a Learnable Wavelet Network (LWN) to adaptively embed the watermark into the high-frequency region where the watermark has better invisibility. For robustness, we establish a Deform-Attention based Invertible Neural Network (DA-INN) with a decoding optimization, which offers the advantage of computational reversion, and combines the deform-attention mechanism and decoding optimization to enhance the model's resistance against noises. Additionally, we design a Joint Contrast Learning (JCL) mechanism to improve imperceptibility and robustness simultaneously. Experiments show that our IWRN outperforms other state-of-the-art blind watermarking methods, achieves an average performance of 41.55 PSNR and 99.57% accuracy on the Coco2017, Wikiart, and Div2k datasets when facing 12 kinds of noise attacks.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Leveraging the Dual Capabilities of LLM: LLM-Enhanced Text Mapping Model for Personality Detection

Weihong Bi
Feifei Kou
Lei Shi
Yawen Li
Haisheng Li
Jinpeng Chen
Mingying Xu

Personality detection aims to deduce a user’s personality from their published posts. The goal of this task is to map posts to specific personality types. Existing methods encode post information to obtain user vectors, which are then mapped to personality labels. However, existing methods face two main issues: first, only using small models makes it hard to accurately extract semantic features from multiple long documents. Second, the relationship between user vectors and personality labels is not fully considered. To address the issue of poor user representation, we utilize the text embedding capabilities of LLM. To solve the problem of insufficient consideration of the relationship between user vectors and personality labels, we leverage the text generation capabilities of LLM. Therefore, we propose the LLM-Enhanced Text Mapping Model (ETM) for Personality Detection. The model applies LLM’s text embedding capability to enhance user vector representations. Additionally, it uses LLM’s text generation capability to create multi-perspective interpretations of the labels, which are then used within a contrastive learning framework to strengthen the mapping of these vectors to personality labels. Experimental results show that our model achieves state-of-the-art performance on benchmark datasets.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

OSTAR: Optimized Statistical Text-classifier with Adversarial Resistance

Yuhan Yao
Feifei Kou
Lei Shi
Xiao Yang
Zhongbao Zhang
Suguo Zhu
Jiwei Zhang
Lirong Qiu

The advancements in generative models and the real-world attack of machine-generated text(MGT) create a demand for more robust detection methods. The existing MGT detection methods for adversarial environments primarily consist of manually designed statistical-based methods and fine-tuned classifier-based approaches. Statistical-based methods extract intrinsic features but suffer from rigid decision boundaries vulnerable to adaptive attacks, while fine-tuned classifiers achieve outstanding performance at the cost of overfitting to superficial textual feature. We argue that the key to detection in current adversarial environments lies in how to extract intrinsic invariant features and ensure that the classifier possesses dynamic adaptability. In that case, we propose OSTAR, a novel MGT detection framework designed for adversarial environments which composed of a statistical enhanced classifier and a Multi-Faceted Contrastive Learning(MFCL). In the classifier aspect, our Multi-Dimensional Statistical Profiling (MDSP) module extracts intrinsic difference between human and machine texts, complementing classifiers with useful stable features. In the model optimization aspect, the MFCL strategy enhances robustness by contrasting feature variations before and after text attacks, jointly optimizing statistical feature mapping and baseline pre-trained models. Experimental results on three public datasets under various adversarial scenarios demonstrate that our framework outperforms existing MGT detection methods, achieving state-of-the-art performance and robust against attacks. The code is available at https: //github. com/BUPT-SN/OSTAR.

PDF Details

NeurIPS Conference 2024 Conference Paper

An End-To-End Graph Attention Network Hashing for Cross-Modal Retrieval

Huilong Jin
Yingxue Zhang
Lei Shi
Shuang Zhang
Feifei Kou
Jiapeng Yang
Chuangying Zhu
Jia Luo

Due to its low storage cost and fast search speed, cross-modal retrieval based on hashing has attracted widespread attention and is widely used in real-world applications of social media search. However, most existing hashing methods are often limited by uncomprehensive feature representations and semantic associations, which greatly restricts their performance and applicability in practical applications. To deal with this challenge, in this paper, we propose an end-to-end graph attention network hashing (EGATH) for cross-modal retrieval, which can not only capture direct semantic associations between images and texts but also match semantic content between different modalities. We adopt the contrastive language image pretraining (CLIP) combined with the Transformer to improve understanding and generalization ability in semantic consistency across different data modalities. The classifier based on graph attention network is applied to obtain predicted labels to enhance cross-modal feature representation. We construct hash codes using an optimization strategy and loss function to preserve the semantic information and compactness of the hash code. Comprehensive experiments on the NUS-WIDE, MIRFlickr25K, and MS-COCO benchmark datasets show that our EGATH significantly outperforms against several state-of-the-art methods.

PDF Details DOI

TIST Journal 2021 Journal Article

MVGAN: Multi-View Graph Attention Network for Social Event Detection

Wanqiu Cui
Junping Du
Dawei Wang
Feifei Kou
Zhe Xue

Social networks are critical sources for event detection thanks to the characteristics of publicity and dissemination. Unfortunately, the randomness and semantic sparsity of the social network text bring significant challenges to the event detection task. In addition to text, time is another vital element in reflecting events since events are often followed for a while. Therefore, in this article, we propose a novel method named Multi-View Graph Attention Network (MVGAN) for event detection in social networks. It enriches event semantics through both neighbor aggregation and multi-view fusion in a heterogeneous social event graph. Specifically, we first construct a heterogeneous graph by adding the hashtag to associate the isolated short texts and describe events comprehensively. Then, we learn view-specific representations of events through graph convolutional networks from the perspectives of text semantics and time distribution, respectively. Finally, we design a hashtag-based multi-view graph attention mechanism to capture the intrinsic interaction across different views and integrate the feature representations to discover events. Extensive experiments on public benchmark datasets demonstrate that MVGAN performs favorably against many state-of-the-art social network event detection algorithms. It also proves that more meaningful signals can contribute to improving the event detection effect in social networks, such as published time and hashtags.

Details DOI

TIST Journal 2019 Journal Article

Short Text Analysis Based on Dual Semantic Extension and Deep Hashing in Microblog

Wanqiu Cui
Junping Du
Dawei Wang
Xunpu Yuan
Feifei Kou
Liyan Zhou
Nan Zhou

Short text analysis is a challenging task as far as the sparsity and limitation of semantics. The semantic extension approach learns the meaning of a short text by introducing external knowledge. However, for the randomness of short text descriptions in microblogs, traditional extension methods cannot accurately mine the semantics suitable for the microblog theme. Therefore, we use the prominent and refined hashtag information in microblogs as well as complex social relationships to provide implicit guidance for semantic extension of short text. Specifically, we design a deep hash model based on social and conceptual semantic extension, which consists of dual semantic extension and deep hashing representation. In the extension method, the short text is first conceptualized to achieve the construction of hashtag graph under conceptual space. Then, the associated hashtags are generated by correlation calculation based on the integration of social relationships and concepts to extend the short text. In the deep hash model, we use the semantic hashing model to encode the abundant semantic features and form a compact and meaningful binary encoding. Finally, extensive experiments demonstrate that our method can learn and represent the short texts well by using more meaningful semantic signal. It can effectively enhance and guide the semantic analysis and understanding of short text in microblogs.

Details DOI