Arrow Research search

Author name cluster

Weifeng Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
1 author row

Possible papers

14

AAAI Conference 2026 Conference Paper

HCC-3D: Hierarchical Compensatory Compression for 98% 3D Token Reduction in Vision-Language Models

  • Liheng Zhang
  • Jin Wang
  • Hui Li
  • Bingfeng Zhang
  • Weifeng Liu

3D understanding has drawn significant attention recently, leveraging Vision-Language Models (VLMs) to enable multi-modal reasoning between point cloud and text data. Current 3D-VLMs directly embed the 3D point clouds into 3D tokens, following large 2D-VLMs with powerful reasoning capabilities. However, this framework has a great computational cost limiting its application, where we identify that the bottleneck lies in processing all 3D tokens in the Large Language Model (LLM) part. This raises the question: how can we reduce the computational overhead introduced by 3D tokens while preserving the integrity of their essential information? To address this question, we introduce Hierarchical Compensatory Compression (HCC-3D) to efficiently compress 3D tokens while maintaining critical detail retention. Specifically, we first propose a global structure compression (GSC), in which we design global queries to compress all 3D tokens into a few key tokens while keeping overall structural information. Then, to compensate for the information loss in GSC, we further propose an adaptive detail mining (ADM) module that selectively recompresses salient but under-attended features through complementary scoring. Extensive experiments demonstrate that HCC-3D not only achieves extreme compression ratios (approximately 98%) compared to previous 3D VLMs, but also achieves new state-of-the-art performance, showing the great improvements on both efficiency and performance.

AAAI Conference 2025 Conference Paper

Excluding the Impossible for Open Vocabulary Semantic Segmentation

  • Shiyuan Zhao
  • Baodi Liu
  • Yu Bai
  • Weifeng Liu
  • Shuai Shao

Open vocabulary semantic segmentation is a hot topic in research, focusing on segmenting and recognizing a diverse array of categories in varied environments, including those previously unknown, thereby holding significant practical value. Mainstream studies utilize the CLIP model for direct semantic segmentation (denoted as “forward methods”), which often struggles to represent underrepresented categories effectively. To address this issue, this paper introduces a novel approach Excluding the ImpossibLe Semantic Segmentation Network (ELSE-Net) based on reverse thinking. By excluding improbable categories, ELSE-Net narrows the selection range for forward methods, significantly reducing the risk of misclassification. In implementation, we initially draw on leading research to design the General Processing Block (GP-Block), which generates inclusion probabilities (the likelihood of belonging to a category) by using the CLIP model cooperated with a Mask Proposal Network (MPN). We then present the EXcluding the ImPossible Block (EXP-Block), which computes exclusion probabilities (the likelihood of not belonging to a category) through the CLIPN model and a custom-designed Reverse Retrieval Adapter (R2-Adapter). These exclusion probabilities are subsequently used to refine the inclusion probabilities, which are ultimately employed to annotate class-agnostic masks. Moreover, the core component of our EXP-Block is model-agnostic, enabling it to enhance the capabilities of existing frameworks. Experimental results from four benchmark datasets validate the effectiveness of ELSE-Net and underscore the seamless model-agnostic functionality of the EXP-Block.

AAAI Conference 2025 Conference Paper

Modeling All Response Surfaces in One for Conditional Search Spaces

  • Jiaxing Li
  • Wei Liu
  • Chao Xue
  • Yibing Zhan
  • Xiaoxing Wang
  • Weifeng Liu
  • Dacheng Tao

Bayesian Optimization (BO) is a sample-efficient black-box optimizer commonly used in search spaces where hyperparameters are independent. However, in many practical AutoML scenarios, there will be dependencies among hyperparameters, forming a conditional search space, which can be partitioned into structurally distinct subspaces. The structure and dimensionality of hyperparameter configurations vary across these subspaces, challenging the application of BO. Some previous BO works have proposed solutions to develop multiple Gaussian Process models in these subspaces. However, these approaches tend to be inefficient as they require a substantial number of observations to guarantee each GP's performance and cannot capture relationships between hyperparameters across different subspaces. To address these issues, this paper proposes a novel approach to model the response surfaces of all subspaces in one, which can model the relationships between hyperparameters elegantly via a self-attention mechanism. Concretely, we design a structure-aware hyperparameter embedding to preserve the structural information. Then, we introduce an attention-based deep feature extractor, capable of projecting configurations with different structures from various subspaces into a unified feature space, where the response surfaces can be formulated using a single standard Gaussian Process. The empirical results on a simulation function, various real-world tasks, and HPO-B benchmark demonstrate that our proposed approach improves the efficacy and efficiency of BO within conditional search spaces.

AAAI Conference 2025 Conference Paper

Transfer Learning of Real Image Features with Soft Contrastive Loss for Fake Image Detection

  • Ziyou Liang
  • Weifeng Liu
  • Run Wang
  • Mengjie Wu
  • Boheng Li
  • Yuyang Zhang
  • Lina Wang
  • Xinyi Yang

In the last few years, the artifact patterns in fake images synthesized by different generative models have been inconsistent, leading to the failure of previous research that relied on spotting subtle differences between real and fake. In our preliminary experiments, we find that the artifacts in fake images always change with the development of the generative model, while natural images exhibit stable statistical properties. In this paper, we employ natural traces shared only by real images as an additional target for a classifier. Specifically, we introduce a self-supervised feature mapping process for natural trace extraction and develop a transfer learning based on soft contrastive loss to bring them closer to real images and further away from fake ones. This motivates the detector to make decisions based on the proximity of images to the natural traces. To conduct a comprehensive experiment, we built a high-quality and diverse dataset that includes generative models comprising GANs and diffusion models, to evaluate the effectiveness in generalizing unknown forgery techniques and robustness in surviving different transformations. Experimental results show that our proposed method gives 96.2% mAP significantly outperforms the baselines. Extensive experiments conducted on the widely recognized platform Midjourney reveal that our proposed method achieves an accuracy exceeding 78.4%, underscoring its practicality for real-world application deployment.

AAAI Conference 2024 Conference Paper

A Dual Stealthy Backdoor: From Both Spatial and Frequency Perspectives

  • Yudong Gao
  • Honglong Chen
  • Peng Sun
  • Junjian Li
  • Anqing Zhang
  • Zhibo Wang
  • Weifeng Liu

Backdoor attacks pose serious security threats to deep neural networks (DNNs). Backdoored models make arbitrarily (targeted) incorrect predictions on inputs containing well-designed triggers, while behaving normally on clean inputs. Prior researches have explored the invisibility of backdoor triggers to enhance attack stealthiness. However, most of them only focus on the invisibility in the spatial domain, neglecting the generation of invisible triggers in the frequency domain. This limitation renders the generated poisoned images easily detectable by recent defense methods. To address this issue, we propose a DUal stealthy BAckdoor attack method named DUBA, which simultaneously considers the invisibility of triggers in both the spatial and frequency domains, to achieve desirable attack performance, while ensuring strong stealthiness. Specifically, we first use Wavelet Transform to embed the high-frequency information of the trigger image into the clean image to ensure attack effectiveness. Then, to attain strong stealthiness, we incorporate Fourier Transform and Cosine Transform to mix the poisoned image and clean image in the frequency domain. Moreover, DUBA adopts a novel attack strategy, training the model with weak triggers and attacking with strong triggers to further enhance attack performance and stealthiness. DUBA is evaluated extensively on four datasets against popular image classifiers, showing significant superiority over state-of-the-art backdoor attacks in attack success rate and stealthiness.

NeurIPS Conference 2024 Conference Paper

Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes

  • Weifeng Liu
  • Tianyi She
  • Jiawei Liu
  • Boheng Li
  • Dongyu Yao
  • Ziyou Liang
  • Run Wang

In recent years, DeepFake technology has achieved unprecedented success in high-quality video synthesis, but these methods also pose potential and severe security threats to humanity. DeepFake can be bifurcated into entertainment applications like face swapping and illicit uses such as lip-syncing fraud. However, lip-forgery videos, which neither change identity nor have discernible visual artifacts, present a formidable challenge to existing DeepFake detection methods. Our preliminary experiments have shown that the effectiveness of the existing methods often drastically decrease or even fail when tackling lip-syncing videos. In this paper, for the first time, we propose a novel approach dedicated to lip-forgery identification that exploits the inconsistency between lip movements and audio signals. We also mimic human natural cognition by capturing subtle biological links between lips and head regions to boost accuracy. To better illustrate the effectiveness and advances of our proposed method, we create a high-quality LipSync dataset, AVLips, by employing the state-of-the-art lip generators. We hope this high-quality and diverse dataset could be well served the further research on this challenging and interesting field. Experimental results show that our approach gives an average accuracy of more than 95. 3% in spotting lip-syncing videos, significantly outperforming the baselines. Extensive experiments demonstrate the capability to tackle deepfakes and the robustness in surviving diverse input transformations. Our method achieves an accuracy of up to 90. 2% in real-world scenarios (e. g. , WeChat video call) and shows its powerful capabilities in real scenario deployment. To facilitate the progress of this research community, we release all resources at https: //github. com/AaronComo/LipFD.

IJCAI Conference 2023 Conference Paper

Annealing Genetic-based Preposition Substitution for Text Rubbish Example Generation

  • Chen Li
  • Xinghao Yang
  • Baodi Liu
  • Weifeng Liu
  • Honglong Chen

Modern Natural Language Processing (NLP) models expose under-sensitivity towards text rubbish examples. The text rubbish example is the heavily modified input text which is nonsensical to humans but does not change the model’s prediction. Prior work crafts rubbish examples by iteratively deleting words and determining the deletion order with beam search. However, the produced rubbish examples usually cause a reduction in model confidence and sometimes deliver human-readable text. To address these problems, we propose an Annealing Genetic based Preposition Substitution (AGPS) algorithm for text rubbish sample generation with two major merits. Firstly, the AGPS crafts rubbish text examples by substituting input words with meaningless prepositions instead of directly removing them, which brings less degradation to the model’s confidence. Secondly, we design an Annealing Genetic algorithm to optimize the word replacement priority, which allows the Genetic Algorithm (GA) to jump out the local optima with probabilities. This is significant in achieving better objectives, i. e. , a high word modification rate and a high model confidence. Experimental results on five popular datasets manifest the superiority of AGPS compared with the baseline and expose the fact: the NLP models can not really understand the semantics of sentences, as they give the same prediction with even higher confidence for the nonsensical preposition sequences.

IJCAI Conference 2021 Conference Paper

BESA: BERT-based Simulated Annealing for Adversarial Text Attacks

  • Xinghao Yang
  • Weifeng Liu
  • Dacheng Tao
  • Wei Liu

Modern Natural Language Processing (NLP) models are known immensely brittle towards text adversarial examples. Recent attack algorithms usually adopt word-level substitution strategies following a pre-computed word replacement mechanism. However, their resultant adversarial examples are still imperfect in achieving grammar correctness and semantic similarities, which is largely because of their unsuitable candidate word selections and static optimization methods. In this research, we propose BESA, a BERT-based Simulated Annealing algorithm, to address these two problems. Firstly, we leverage the BERT Masked Language Model (MLM) to generate contextual-aware candidate words to produce fluent adversarial text and avoid grammar errors. Secondly, we employ Simulated Annealing (SA) to adaptively determine the word substitution order. The SA provides sufficient word replacement options via internal simulations, with an objective to obtain both a high attack success rate and a low word substitution rate. Besides, our algorithm is able to jump out of local optima with a controlled probability, making it closer to achieve the best possible attack (i. e. , the global optima). Experiments on five popular datasets manifest the superiority of BESA compared with existing methods, including TextFooler, BAE, BERT-Attack, PWWS, and PSO.

AAAI Conference 2021 Conference Paper

Bigram and Unigram Based Text Attack via Adaptive Monotonic Heuristic Search

  • Xinghao Yang
  • Weifeng Liu
  • James Bailey
  • Dacheng Tao
  • Wei Liu

Deep neural networks (DNNs) are known to be vulnerable to adversarial images, while their robustness in text classification are rarely studied. Several lines of text attack methods have been proposed in the literature, such as character-level, word-level, and sentence-level attacks. However, it is still a challenge to minimize the number of word distortions necessary to induce misclassification, while simultaneously ensuring the lexical correctness, syntactic correctness, and semantic similarity. In this paper, we propose the Bigram and Unigram based Monotonic Heuristic Search (BU-MHS) method to examine the vulnerability of deep models. Our method has three major merits. Firstly, we propose to attack text documents not only at the unigram word level but also at the bigram level to avoid producing meaningless outputs. Secondly, we propose a hybrid method to replace the input words with both their synonyms and sememe candidates, which greatly enriches potential substitutions compared to only using synonyms. Lastly, we design a search algorithm, i. e. , Monotonic Heuristic Search (MHS), to determine the priority of word replacements, aiming to reduce the modification cost in an adversarial attack. We evaluate the effectiveness of BU-MHS on IMDB, AG’s News, and Yahoo! Answers text datasets by attacking four popular DNNs models. Results show that our BU-MHS achieves the highest attack success rate by changing the smallest number of words compared with baselines.

TIST Journal 2021 Journal Article

MKEL: Multiple Kernel Ensemble Learning via Unified Ensemble Loss for Image Classification

  • Xiangjun Shen
  • Kou Lu
  • Sumet Mehta
  • Jianming Zhang
  • Weifeng Liu
  • Jianping Fan
  • Zhengjun Zha

In this article, a novel ensemble model, called Multiple Kernel Ensemble Learning (MKEL), is developed by introducing a unified ensemble loss. Different from the previous multiple kernel learning (MKL) methods, which attempt to seek a linear combination of basis kernels as a unified kernel, our MKEL model aims to find multiple solutions in corresponding Reproducing Kernel Hilbert Spaces (RKHSs) simultaneously. To achieve this goal, multiple individual kernel losses are integrated into a unified ensemble loss. Therefore, each model can co-optimize to learn its optimal parameters by minimizing a unified ensemble loss in multiple RKHSs. Furthermore, we apply our proposed ensemble loss into the deep network paradigm and take the sub-network as a kernel mapping from the original input space into a feature space, named Deep-MKEL (D-MKEL). Our D-MKEL model can utilize the diversified deep individual sub-networks into a whole unified network to improve the classification performance. With this unified loss design, our D-MKEL model can make our network much wider than other traditional deep kernel networks and more parameters are learned and optimized. Experimental results on several mediate UCI classification and computer vision datasets demonstrate that our MKEL model can achieve the best classification performance among comparative MKL methods, such as Simple MKL, GMKL, Spicy MKL, and Matrix-Regularized MKL. On the contrary, experimental results on large-scale CIFAR-10 and SVHN datasets concretely show the advantages and potentialities of the proposed D-MKEL approach compared to state-of-the-art deep kernel methods.