Arrow Research search

Author name cluster

Qiufeng Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers
1 author row

Possible papers

16

AAAI Conference 2026 Conference Paper

Adaptive-Learngene: Continual Expansion and Task-Aware Selection of Learngenes for Dynamic Environments

  • Shuxia Lin
  • Qiufeng Wang
  • Chang Liu
  • Xu Yang
  • Xin Geng

Pre-trained Vision Transformer (ViT) models have achieved impressive performance across various computer vision tasks. However, most existing pre-trained models are built on fixed datasets and lack the flexibility to incorporate new pre-training data. When additional data becomes available, previous models must typically be retrained on both old and new data, which is costly and impractical, especially in privacy-sensitive or resource-constrained environments. Moreover, direct fine-tuning on downstream tasks does not provide mechanisms to adapt to the specific data distributions of those tasks, and it only supports fixed model sizes. To address these challenges, we propose Adaptive-Learngene, a novel framework in which the ancestry model is trained solely on newly available data, and a new component, termed a learngene, is extracted and added to a global learngene pool that expands incrementally. This design enables a dynamically evolving pool of learngenes without requiring access to previous data. For each new downstream task, the Task-Adaptive Learngene Selector (TALS) retrieves a sparse combination of learngenes that best match to the data distribution of the target task. TALS requires only a small amount of downstream data for this selection, enabling descendant models of different sizes to be efficiently initialized and tailored to specific data distributions and resource constraints. Extensive experiments on diverse downstream tasks demonstrate that our method matches or outperforms existing approaches while offering superior scalability, adaptability, and efficiency in dynamic learning environments.

NeurIPS Conference 2025 Conference Paper

Can MLLMs Absorb Math Reasoning Abilities from LLMs as Free Lunch?

  • Yijie Hu
  • Zihao Zhou
  • Kaizhu Huang
  • Xiaowei Huang
  • Qiufeng Wang

Math reasoning has been one crucial ability of large language models (LLMs), where significant advancements have been achieved in recent years. However, most efforts focus on LLMs by curating high-quality annotation data and intricate training (or inference) paradigms, while the math reasoning performance of multi-modal LLMs (MLLMs) remains lagging behind. Since the MLLM typically consists of an LLM and vision block, we wonder: \textit{Can MLLMs directly absorb math reasoning abilities from off-the-shelf math LLMs without tuning? } Recent model-merging approaches may offer insights into this question. However, they overlook the alignment between the MLLM and LLM, where we find that there is a large gap between their parameter spaces, resulting in lower performance. Our empirical evidence reveals two key factors behind this issue: the identification of crucial reasoning-associated layers in the model and the mitigation of the gaps in parameter space. Based on the empirical insights, we propose \textbf{IP-Merging} that first \textbf{I}dentifies the reasoning-associated parameters in both MLLM and Math LLM, then \textbf{P}rojects them into the subspace of MLLM aiming to maintain the alignment, finally merges parameters in this subspace. IP-Merging is a tuning-free approach since parameters are directly adjusted. Extensive experiments demonstrate that our IP-Merging method can enhance the math reasoning ability of MLLMs directly from Math LLMs without compromising their other capabilities.

AAAI Conference 2025 Conference Paper

GNS: Solving Plane Geometry Problems by Neural-Symbolic Reasoning with Multi-Modal LLMs

  • Maizhen Ning
  • Zihao Zhou
  • Qiufeng Wang
  • Xiaowei Huang
  • Kaizhu Huang

With the outstanding capabilities of Large Language Models (LLMs), solving math word problems (MWP) has greatly progressed, achieving higher performance on several benchmark datasets. However, it is more challenging to solve plane geometry problems (PGPs) due to the necessity of understanding, reasoning and computation on two modality data including both geometry diagrams and textual questions, where Multi-Modal Large Language Models (MLLMs) have not been extensively explored. Previous works simply regarded a plane geometry problem as multi-modal QA task, which ignored the importance of explicit parsing geometric elements from problems. To tackle this limitation, we propose to solve plane Geometry problems by Neural-Symbolic reasoning with MLLMs (GNS). We first leverage an MLLM to understand PGPs through knowledge prediction and symbolic parsing, next perform mathematical reasoning to obtain solutions, last adopt a symbolic solver to compute answers. Correspondingly, we introduce the largest PGPs dataset GNS-260K with multiple annotations including symbolic parsing, understanding, reasoning and computation. In experiments, our Phi3-Vision-based MLLM wins the first place on the PGPs solving task of MathVista benchmark, outperforming GPT-4o, Gemini Ultra and other much larger MLLMs. While LLaVA-13B-based MLLM markedly exceeded other close-source and open-source MLLMs on the MathVerse benchmark and also achieved the new SOTA on GeoQA dataset.

AAAI Conference 2025 Conference Paper

Inheriting Generalized Learngene for Efficient Knowledge Transfer across Multiple Tasks

  • Yuankun Zu
  • Shiyu Xia
  • Xu Yang
  • Qiufeng Wang
  • Han Zhang
  • Xin Geng

In practical applications, it is often necessary to transfer knowledge from large pretrained models to small ones with various architectures for tackling different tasks. The Learngene framework, proposed recently, firstly extracts one compact module termed as learngene from a large well-trained model, after which learngene is used to build descendant models for handling diverse tasks. In this paper, we aim to explore extracting and inheriting learngene which can be generalized across different model architectures and tasks, remaining understudied in previous works. Inspired by the existing observations that large kernel convolutional neural networks (CNNs) exhibit significant generalization potential across various architectures and tasks, we propose a novel two-stage Learngene method termed CLKG (Convolutional Learngene for Knowledge Generalization), which inherits convolutional kernels containing generalized knowledge as learngene to build diverse models for multiple tasks. Specifically, we construct an auxiliary model comprised of small kernels and train it through dense feature distillation to inherit the feature extraction ability from large kernel CNNs. After distillation, we select certain kernels from the auxiliary model as learngene based on three criteria: direct kernel extraction, priority to edge kernels, and continuous kernel selection. Subsequently, we adapt learngene according to the width of the descendant models and use it to initialize the backbone part of descendant models. Experiments on diverse vision tasks such as image classification, object detection and semantic segmentation demonstrate the superiority of CLKG. For example, compared with from scratch training, it brings 2.89% improvements on VOC12+SBD, and reduces around 2x training data volume and training epochs to achieve better results. Furthermore, compared to knowledge distillation method, CLKG significantly reduces negative transfer on certain datasets, e.g., achieves 1.88% performance improvements on NAO dataset despite domain differences.

NeurIPS Conference 2025 Conference Paper

Predictable Scale (Part II) --- Farseer: A Refined Scaling Law in LLMs

  • Houyi Li
  • Wenzhen Zheng
  • Qiufeng Wang
  • Zhenyu Ding
  • Haoying Wang
  • Zili Wang
  • Shijie Xuyang
  • Ning Ding

Training Large Language Models (LLMs) is prohibitively expensive, creating a critical scaling gap where insights from small-scale experiments often fail to transfer to resource-intensive production systems, thereby hindering efficient innovation. To bridge this, we introduce Farseer, a novel and refined scaling law offering enhanced predictive accuracy across scales. By systematically constructing a model loss surface $L(N, D)$, Farseer achieves a significantly better fit to empirical data than prior laws (e. g. , \Chinchilla's law). Our methodology yields accurate, robust, and highly generalizable predictions, demonstrating excellent extrapolation capabilities, outperforming Chinchilla's law, whose extrapolation error is 433\% higher. This allows for the reliable evaluation of competing training strategies across all $(N, D)$ settings, enabling conclusions from small-scale ablation studies to be confidently extrapolated to predict large-scale performance. Furthermore, Farseer provides new insights into optimal compute allocation, better reflecting the nuanced demands of modern LLM training. To validate our approach, we trained an extensive suite of approximately 1, 000 LLMs across diverse scales and configurations, consuming roughly 3 million NVIDIA H100 GPU hours. To foster further research, we are comprehensively open-sourcing all code, data, results (https: //github. com/Farseer-Scaling-Law/Farseer), all training logs (https: //wandb. ai/billzid/Farseer? nw=nwuserbillzid), all models used in scaling law fitting (https: //huggingface. co/Farseer-Scaling-Law).

AAAI Conference 2025 Conference Paper

Template-Driven LLM-Paraphrased Framework for Tabular Math Word Problem Generation

  • Xiaoqiang Kang
  • Zimu Wang
  • Xiaobo Jin
  • Wei Wang
  • Kaizhu Huang
  • Qiufeng Wang

Solving tabular math word problems (TMWPs) has become a critical role in evaluating the mathematical reasoning ability of large language models (LLMs), where large-scale TMWP samples are commonly required for fine-tuning. Since the collection of high-quality TMWP datasets is costly and time-consuming, recent research has concentrated on automatic TMWP generation. However, current generated samples usually suffer from issues of either correctness or diversity. In this paper, we propose a Template-driven LLM-paraphrased (TeLL) framework for generating high-quality TMWP samples with diverse backgrounds and accurate tables, questions, answers, and solutions. To this end, we first extract templates from existing real samples to generate initial problems, ensuring correctness. Then, we adopt an LLM to extend templates and paraphrase problems, obtaining diverse TMWP samples. Furthermore, we find the reasoning annotation is important for solving TMWPs. Therefore, we propose to enrich each solution with illustrative reasoning steps. Through the proposed framework, we construct a high-quality dataset TabMWP-TeLL by adhering to the question types in the TabMWP dataset, and we conduct extensive experiments on a variety of LLMs to demonstrate the effectiveness of TabMWP-TeLL in improving TMWP-solving performance.

AAAI Conference 2025 Conference Paper

Towards Better Robustness Against Natural Corruptions in Document Tampering Localization

  • Huiru Shao
  • Kaizhu Huang
  • Wei Wang
  • Xiaowei Huang
  • Qiufeng Wang

Marvelous advances have been exhibited in recent document tampering localization (DTL) systems. However, confronted with corrupted tampered document images, their vulnerability is fatal in real-world scenarios. While robustness against adversarial attack has been extensively studied by adversarial training (AT), the robustness on natural corruptions remains under-explored for DTL. In this paper, to overcome forensic dependency, we propose the adversarial forensic regularization (AFR) based on min-max optimization to improve robustness. Specifically, we adopt mutual information (MI) to represent forensic dependency between two random variable over tampered and authentic pixels spaces, where the MI can be approximated by Jensen-Shannon-Divergence (JSD) with empirical sampling. To further enable a trade-off between predictive representations in clean tampered document pixels and robust ones in corrupted pixels, an additional regularization term is formulated with divergence between clean and perturbed pixels distribution (DDR). Following min-max optimization framework, our method can also work well against adversarial attacks. To evaluate our proposed method, we collect a dataset (i.e., TSorie-CRP) for evaluating robustness against natural corruptions in real scenarios. Extensive experiments demonstrate the effectiveness of our method against natural corruptions. Without any surprise, our method also achieves good performance against adversarial attack on DTL benchmark datasets.

TIST Journal 2024 Journal Article

Biomedical Information Retrieval with Positive-Unlabeled Learning and Knowledge Graphs

  • Yuqi Wang
  • Qiuyi Chen
  • Haiyang Zhang
  • Wei Wang
  • Qiufeng Wang
  • Yushan Pan
  • Liangru Xie
  • Kaizhu Huang

The rapid growth of biomedical publications has presented significant challenges in the field of information retrieval. Most existing work focuses on document retrieval given explicit queries. However, in real applications such as curated biomedical database maintenance, explicit queries are missing. In this paper, we propose a two-step model for biomedical information retrieval in the case that only a small set of example documents is available without explicit queries. Initially, we extract keywords from the observed documents using large pre-trained language models and biomedical knowledge graphs. These keywords are then enriched with domain-specific entities. Information retrieval techniques can subsequently use the collected entities to rank the documents. Following this, we introduce an iterative Positive-Unlabeled learning method to classify all unlabeled documents. Experiments conducted on the PubMed dataset demonstrate that the proposed technique outperforms the state-of-the-art positive-unlabeled learning methods. The results underscore the effectiveness of integrating large language models and biomedical knowledge graphs in improving zero-shot information retrieval performance in the biomedical domain.

NeurIPS Conference 2024 Conference Paper

Cluster-Learngene: Inheriting Adaptive Clusters for Vision Transformers

  • Qiufeng Wang
  • Xu Yang
  • Fu Feng
  • Jing Wang
  • Xin Geng

In recent years, the merging of vast datasets with powerful computational resources has led to the emergence of large pre-trained models in the field of deep learning. However, the common practices often overgeneralize the applicability of these models, overlooking the task-specific resource constraints. To mitigate this issue, we propose \textbf{Cluster-Learngene}, which effectively clusters critical internal modules from a large ancestry model and then inherits them to initialize descendant models of elastic scales. Specifically, based on the density characteristics of attention heads, our method adaptively clusters attention heads of each layer and position-wise feed-forward networks (FFNs) in the ancestry model as the learngene. Moreover, we introduce priority weight-sharing and learnable parameter transformations that expand the learngene to initialize descendant models of elastic scales. Through extensive experimentation, we demonstrate that Cluster-Learngene not only is more efficient compared to other initialization methods but also customizes models of elastic scales according to downstream task resources.

NeurIPS Conference 2024 Conference Paper

Interpret Your Decision: Logical Reasoning Regularization for Generalization in Visual Classification

  • Zhaorui Tan
  • Xi Yang
  • Qiufeng Wang
  • Anh Nguyen
  • Kaizhu Huang

Vision models excel in image classification but struggle to generalize to unseen data, such as classifying images from unseen domains or discovering novel categories. In this paper, we explore the relationship between logical reasoning and deep learning generalization in visual classification. A logical regularization termed L-Reg is derived which bridges a logical analysis framework to image classification. Our work reveals that L-Reg reduces the complexity of the model in terms of the feature distribution and classifier weights. Specifically, we unveil the interpretability brought by L-Reg, as it enables the model to extract the salient features, such as faces to persons, for classification. Theoretical analysis and experiments demonstrate that L-Reg enhances generalization across various scenarios, including multi-domain generalization and generalized category discovery. In complex real-world scenarios where images span unknown classes and unseen domains, L-Reg consistently improves generalization, highlighting its practical efficacy.

NeurIPS Conference 2024 Conference Paper

Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models

  • Shuxia Lin
  • Miaosen Zhang
  • Ruiming Chen
  • Qiufeng Wang
  • Xu Yang
  • Xin Geng

Vision Transformers (ViTs) are widely used in a variety of applications, while they usually have a fixed architecture that may not match the varying computational resources of different deployment environments. Thus, it is necessary to adapt ViT architectures to devices with diverse computational overheads to achieve an accuracy-efficient trade-off. This concept is consistent with the motivation behind Learngene. To achieve this, inspired by polynomial decomposition in calculus, where a function can be approximated by linearly combining several basic components, we propose to linearly decompose the ViT model into a set of components called learngenes during element-wise training. These learngenes can then be recomposed into differently scaled, pre-initialized models to satisfy different computational resource constraints. Such a decomposition-recomposition strategy provides an economical and flexible approach to generating different scales of ViT models for different deployment scenarios. Compared to model compression or training from scratch, which require to repeatedly train on large datasets for diverse-scale models, such strategy reduces computational costs since it only requires to train on large datasets once. Extensive experiments are used to validate the effectiveness of our method: ViTs can be decomposed and the decomposed learngenes can be recomposed into diverse-scale ViTs, which can achieve comparable or better performance compared to traditional model compression and pre-training methods. The code for our experiments is available in the supplemental material.

AAAI Conference 2024 Conference Paper

MathAttack: Attacking Large Language Models towards Math Solving Ability

  • Zihao Zhou
  • Qiufeng Wang
  • Mingyu Jin
  • Jie Yao
  • Jianan Ye
  • Wei Liu
  • Wei Wang
  • Xiaowei Huang

With the boom of Large Language Models (LLMs), the research of solving Math Word Problem (MWP) has recently made great progress. However, there are few studies to examine the robustness of LLMs in math solving ability. Instead of attacking prompts in the use of LLMs, we propose a MathAttack model to attack MWP samples which are closer to the essence of robustness in solving math problems. Compared to traditional text adversarial attack, it is essential to preserve the mathematical logic of original MWPs during the attacking. To this end, we propose logical entity recognition to identify logical entries which are then frozen. Subsequently, the remaining text are attacked by adopting a word-level attacker. Furthermore, we propose a new dataset RobustMath to evaluate the robustness of LLMs in math solving ability. Extensive experiments on our RobustMath and two another math benchmark datasets GSM8K and MultiAirth show that MathAttack could effectively attack the math solving ability of LLMs. In the experiments, we observe that (1) Our adversarial samples from higher-accuracy LLMs are also effective for attacking LLMs with lower accuracy (e.g., transfer from larger to smaller-size LLMs, or from few-shot to zero-shot prompts); (2) Complex MWPs (such as more solving steps, longer text, more numbers) are more vulnerable to attack; (3) We can improve the robustness of LLMs by using our adversarial samples in few-shot prompts. Finally, we hope our practice and observation can serve as an important attempt towards enhancing the robustness of LLMs in math solving ability. The code and dataset is available at: https://github.com/zhouzihao501/MathAttack.

AAAI Conference 2024 Conference Paper

Unraveling Batch Normalization for Realistic Test-Time Adaptation

  • Zixian Su
  • Jingwei Guo
  • Kai Yao
  • Xi Yang
  • Qiufeng Wang
  • Kaizhu Huang

While recent test-time adaptations exhibit efficacy by adjusting batch normalization to narrow domain disparities, their effectiveness diminishes with realistic mini-batches due to inaccurate target estimation. As previous attempts merely introduce source statistics to mitigate this issue, the fundamental problem of inaccurate target estimation still persists, leaving the intrinsic test-time domain shifts unresolved. This paper delves into the problem of mini-batch degradation. By unraveling batch normalization, we discover that the inexact target statistics largely stem from the substantially reduced class diversity in batch. Drawing upon this insight, we introduce a straightforward tool, Test-time Exponential Moving Average (TEMA), to bridge the class diversity gap between training and testing batches. Importantly, our TEMA adaptively extends the scope of typical methods beyond the current batch to incorporate a diverse set of class information, which in turn boosts an accurate target estimation. Built upon this foundation, we further design a novel layer-wise rectification strategy to consistently promote test-time performance. Our proposed method enjoys a unique advantage as it requires neither training nor tuning parameters, offering a truly hassle-free solution. It significantly enhances model robustness against shifted domains and maintains resilience in diverse real-world scenarios with various batch sizes, achieving state-of-the-art performance on several major benchmarks. Code is available at https://github.com/kiwi12138/RealisticTTA.

JBHI Journal 2023 Journal Article

Mind the Gap: Alleviating Local Imbalance for Unsupervised Cross-Modality Medical Image Segmentation

  • Zixian Su
  • Kai Yao
  • Xi Yang
  • Qiufeng Wang
  • Yuyao Yan
  • Jie Sun
  • Kaizhu Huang

Unsupervised cross-modality medical image adaptation aims to alleviate the severe domain gap between different imaging modalities without using the target domain label. A key in this campaign relies upon aligning the distributions of source and target domain. One common attempt is to enforce the global alignment between two domains, which, however, ignores the fatal local-imbalance domain gap problem, i. e. , some local features with larger domain gap are harder to transfer. Recently, some methods conduct alignment focusing on local regions to improve the efficiency of model learning. While this operation may cause a deficiency of critical information from contexts. To tackle this limitation, we propose a novel strategy to alleviate the domain gap imbalance considering the characteristics of medical images, namely Global-Local Union Alignment. Specifically, a feature-disentanglement style-transfer module first synthesizes the target-like source images to reduce the global domain gap. Then, a local feature mask is integrated to reduce the ‘inter-gap’ for local features by prioritizing those discriminative features with larger domain gap. This combination of global and local alignment can precisely localize the crucial regions in segmentation target while preserving the overall semantic consistency. We conduct a series of experiments with two cross-modality adaptation tasks, i, e. cardiac substructure and abdominal multi-organ segmentation. Experimental results indicate that our method achieves state-of-the-art performance in both tasks.

AAAI Conference 2023 Conference Paper

Rethinking Data Augmentation for Single-Source Domain Generalization in Medical Image Segmentation

  • Zixian Su
  • Kai Yao
  • Xi Yang
  • Kaizhu Huang
  • Qiufeng Wang
  • Jie Sun

Single-source domain generalization (SDG) in medical image segmentation is a challenging yet essential task as domain shifts are quite common among clinical image datasets. Previous attempts most conduct global-only/random augmentation. Their augmented samples are usually insufficient in diversity and informativeness, thus failing to cover the possible target domain distribution. In this paper, we rethink the data augmentation strategy for SDG in medical image segmentation. Motivated by the class-level representation invariance and style mutability of medical images, we hypothesize that unseen target data can be sampled from a linear combination of C (the class number) random variables, where each variable follows a location-scale distribution at the class level. Accordingly, data augmented can be readily made by sampling the random variables through a general form. On the empirical front, we implement such strategy with constrained Bezier transformation on both global and local (i.e. class-level) regions, which can largely increase the augmentation diversity. A Saliency-balancing Fusion mechanism is further proposed to enrich the informativeness by engaging the gradient information, guiding augmentation with proper orientation and magnitude. As an important contribution, we prove theoretically that our proposed augmentation can lead to an upper bound of the generalization risk on the unseen target domain, thus confirming our hypothesis. Combining the two strategies, our Saliency-balancing Location-scale Augmentation (SLAug) exceeds the state-of-the-art works by a large margin in two challenging SDG tasks. Code is available at https://github.com/Kaiseem/SLAug.