Arrow Research search

Author name cluster

Sai Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

AAAI Conference 2026 Conference Paper

An Invariant Latent Space Perspective on Language Model Inversion

  • Wentao Ye
  • Jiaqi Hu
  • Haobo Wang
  • Xinpeng Ti
  • Zhiqing Xiao
  • Hao Chen
  • Liyao Li
  • Lei Feng

Language model inversion (LMI), i.e., recovering hidden prompts from outputs, emerges as a concrete threat to user privacy and system security. We recast LMI as reusing the LLM's own latent space and propose the Invariant Latent Space Hypothesis (ILSH): (1) diverse outputs from the same source prompt should preserve consistent semantics (source invariance), and (2) input output cyclic mappings should be self-consistent within a shared latent space (cyclic invariance). Accordingly, we present Inv2A, which treats the LLM as an invariant decoder and learns only a lightweight inverse encoder that maps outputs to a denoised pseudo-representation. When multiple outputs are available, they are sparsely concatenated at the representation layer to increase information density. Training proceeds in two stages: contrastive alignment (source invariance) and supervised reinforcement (cyclic invariance). An optional training-free neighborhood search can refine local performance. Across 9 datasets covering user and system prompt scenarios, Inv2A outperforms baselines by an average of 4.77% BLEU score while reducing dependence on large inverse corpora. Our analysis further shows that prevalent defenses provide limited protection, underscoring the need for stronger strategies.

AAAI Conference 2026 Conference Paper

Learner-Tailored Program Repair: A Solution Generator with Iterative Edit-Driven Retrieval Enhancement

  • Zhenlong Dai
  • Zhuoluo Zhao
  • Hengning Wang
  • Xiu Tang
  • Sai Wu
  • Chang Yao
  • Zhipeng Gao
  • Jingyuan Chen

With the development of large language models (LLMs) in the field of programming, intelligent programming coaching systems have gained widespread attention. However, most research focuses on repairing the buggy code of programming learners without providing the underlying causes of the bugs. To address this gap, we introduce a novel task, namely LPR (Learner-Tailored Program Repair). We then propose a novel and effective framework, LSGen (Learner-Tailored Solution Generator), to enhance program repair while offering the bug descriptions for the buggy code. In the first stage, we utilize a repair solution retrieval framework to construct a solution retrieval database and then employ an edit-driven code retrieval approach to retrieve valuable solutions, guiding LLMs in identifying and fixing the bugs in buggy code. In the second stage, we propose a solution-guided program repair method, which fixes the code and provides explanations under the guidance of retrieval solutions. Moreover, we propose an Iterative Retrieval Enhancement method that utilizes evaluation results of the generated code to iteratively optimize the retrieval direction and explore more suitable repair strategies, improving performance in practical programming coaching scenarios. The experimental results show that our approach outperforms a set of baselines by a large margin, validating the effectiveness of our framework for the newly proposed LPR task.

ICLR Conference 2025 Conference Paper

Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL

  • Lin Long
  • Xijun Gu
  • Xinjie Sun
  • Wentao Ye
  • Haobo Wang 0001
  • Sai Wu
  • Gang Chen 0001
  • Junbo Zhao 0002

The rise of Large Language Models (LLMs) has revolutionized numerous domains, yet these models still exhibit weakness in understanding structured tabular data. Although the growing context window promises to accommodate a larger volume of table contents, it does not inherently improve the model's ability to understand the underlying structure and semantics of tabular data. To bridge the semantic gap between **T**ext and **T**able, we propose **T**n**T**, a table-language model that features multimodal table representations to empower LLMs to effectively and efficiently abstract structure-enriched semantics from tabular data. **T**n**T** also introduces a scalable and efficient training pipeline, featuring novel self-supervised tasks, to integrate abstract tabular knowledge into the language modality. Extensive experimental results on NL2SQL demonstrate a much better table understanding of **T**n**T**, which achieves up to **14.4** higher execution accuracy compared with traditional text-based table representations.

IJCAI Conference 2025 Conference Paper

CycSeq: Leveraging Cyclic Data Generation for Accurate Perturbation Prediction in Single-Cell RNA-Seq

  • Yicheng Liu
  • Sai Wu
  • Tianyun Zhang
  • Chang Yao
  • Ning Shen

Understanding and predicting the effects of cellular perturbations using single-cell sequencing technology remains a critical and challenging problem in biotechnology. In this work, we introduce CycSeq, a deep learning framework that leverages cyclic data generation and recent advances in neural architectures to predict single-cell responses under specified perturbations across multiple cell lines, while also generating the corresponding single-cell expression profiles. Specifically, CycSeq addresses the challenge of learning heterogeneous perturbation responses from unpaired single-cell gene expression data by generating pseudo-pairs through cyclic data generation. Experimental results demonstrate that CycSeq outperforms existing methods in perturbation prediction tasks, as evaluated using computational metrics such as R-squared and MAE. Furthermore, CycSeq employs a unified architecture that integrates information from multiple cell lines, enabling robust predictions even for long-tail cell lines with limited training data. The source code is publicly available at https: //github. com/yczju/cycseq.

AAAI Conference 2025 Conference Paper

Less Is More: Adaptive Program Repair with Bug Localization and Preference Learning

  • Zhenlong Dai
  • Bingrui Chen
  • Zhuoluo Zhao
  • Xiu Tang
  • Sai Wu
  • Chang Yao
  • Zhipeng Gao
  • Jingyuan Chen

Automated Program Repair (APR) is a task to automatically generate patches for the buggy code. However, most research focuses on generating correct patches while ignoring the consistency between the fixed code and the original buggy code. How to conduct adaptive bug fixing and generate patches with minimal modifications have seldom been investigated. To bridge this gap, we first introduce a novel task, namely AdaPR (Adaptive Program Repair). We then propose a two-stage approach AdaPatcher (Adaptive Patch Generator) to enhance program repair while maintaining the consistency. In the first stage, we utilize a Bug Locator with self-debug learning to accurately pinpoint bug locations. In the second stage, we train a Program Modifier to ensure consistency between the post-modified fixed code and the pre-modified buggy code. The Program Modifier is enhanced with a location-aware repair learning strategy to generate patches based on identified buggy lines, a hybrid training strategy for selective reference and an adaptive preference learning to prioritize fewer changes. The experimental results show that our approach outperforms a set of baselines by a large margin, validating the effectiveness of our two-stage framework for the newly proposed AdaPR task.

IJCAI Conference 2025 Conference Paper

POLO: An LLM-Powered Project-Level Code Performance Optimization Framework

  • Jiameng Bai
  • Ruoyi Xu
  • Sai Wu
  • Dingyu Yang
  • Junbo Zhao
  • Gang Chen

Program performance optimization is essential for achieving high execution efficiency, yet it remains a challenging task that requires expertise in both software and hardware. Large Language Models (LLMs), trained on high-quality code from platforms like GitHub and other open-source sources, have shown promise in generating optimized code for simple snippets. However, current LLM-based solutions often fall short when tackling project-level programs due to the complexity of call graphs and the intricate interactions among functions. In this paper, we emulate the process a human expert might follow when optimizing project-level programs and introduce a three-phase framework POLO (PrOject-Level Optimizer) to address this limitation. First, we profile the program to identify performance bottlenecks using an iterative weighting algorithm. Next, we conduct structural analysis by scanning the project and generating a graph that represents the program's structure. Finally, two LLM agents collaborate in iterative cycles to rewrite and optimize the code at these hotspots, gradually improving performance. We conduct experiments on open-source and proprietary projects. The results demonstrate that POLO accurately identifies performance bottlenecks and successfully applies optimizations. Under the O3 compilation flag, the optimized programs achieved speedups ranging from 1. 34x to 21. 5x.

NeurIPS Conference 2025 Conference Paper

SALoM: Structure Aware Temporal Graph Networks with Long-Short Memory Updater

  • Hanwen Liu
  • Longjiao Zhang
  • Rui Wang
  • Tongya Zheng
  • Sai Wu
  • Chang Yao
  • Mingli Song

Dynamic graph learning is crucial for accurately modeling complex systems by integrating topological structure and temporal information within graphs. While memory-based methods are commonly used and excel at capturing short-range temporal correlations, they struggle with modeling long-range dependencies, harmonizing long-range and short-range correlations, and integrating structural information effectively. To address these challenges, we present SALoM: Structure Aware Temporal Graph Networks with Long-Short Memory Updater. SALoM features a memory module that addresses gradient vanishing and information forgetting, enabling the capture of long-term dependencies across various time scales. Additionally, SALoM utilizes a long-short memory updater (LSMU) to dynamically balance long-range and short-range temporal correlations, preventing over-generalization. By integrating co-occurrence encoding and LSMU through information bottleneck-based fusion, SALoM effectively captures both the structural and temporal information within graphs. Experimental results across various graph datasets demonstrate SALoM's superior performance, achieving state-of-the-art results in dynamic graph link prediction. Our code is openly accessible at https: //github. com/wave5418/SALoM.

AAAI Conference 2025 Conference Paper

Semantic-guided Masked Mutual Learning for Multi-modal Brain Tumor Segmentation with Arbitrary Missing Modalities

  • Guoyan Liang
  • Qin Zhou
  • Zhe Wang
  • Jingyuan Chen
  • Lin Gu
  • Chang Yao
  • Sai Wu
  • Bingcang Huang

Malignant brain tumors have become an aggressive and dangerous disease that leads to death worldwide. Multi-modal MRI data is crucial for accurate brain tumor segmentation, but missing modalities common in clinical practice can severely degrade the segmentation performance. While incomplete multi-modal learning methods attempt to address this, learning robust and discriminative features from arbitrary missing modalities remains challenging. To address this challenge, we propose a novel Semantic-guided Masked Mutual Learning (SMML) approach to distill robust and discriminative knowledge across diverse missing modality scenarios. Specifically, we propose a novel dual-branch masked mutual learning scheme guided by Hierarchical Consistency Constraints (HCC) to ensure multi-level consistency, thereby enhancing mutual learning in incomplete multi-modal scenarios. The HCC framework comprises a pixel-level constraint that selects and exchanges reliable knowledge to guide the mutual learning process. Additionally, it includes a feature-level constraint that uncovers robust inter-sample and inter-class relational knowledge within the latent feature space. To further enhance multi-modal learning from missing modality data, we integrate a refinement network into each student branch. This network leverages semantic priors from the Segment Anything Model (SAM) to provide supplementary information, effectively complementing the masked mutual learning strategy in capturing auxiliary discriminative knowledge. Extensive experiments on three challenging brain tumor segmentation datasets demonstrate that our method significantly improves performance over state-of-the-art methods in diverse missing modality settings.

JBHI Journal 2025 Journal Article

Unsupervised Brain Anomaly Detection Using Structure-Preserving Noise Generation and Multi-Scale Dual-Expert Ensembles

  • Qianyi Yang
  • Bingcang Huang
  • Qin Zhou
  • Zhe Wang
  • Kai Chen
  • Xiu Tang
  • Chang Yao
  • Sai Wu

Detecting early brain anomalies is crucial for patient prognosis and recovery, but obtaining expert-annotated data is challenging, especially for clinically silent early brain anomalies. Unsupervised brain anomaly detection, which identifies anomalous regions by modeling normal brain patterns, has gained interest for its label efficiency. However, the inherent variability in normal brains and subtle anomalies that closely resemble normal tissue pose challenges for traditional autoencoders in distinguishing anomalies. Denoising AutoEncoder (DAE) methods have been explored to enhance the model's ability, while their success hinges on effective noise generation strategies. In this paper, we introduce a novel, structure-preserving noise generation scheme based on cross-modal CutMix, aiming to enhance the diversity of noise patterns while preserving the anatomical structure of the brain. To enhance the robustness of DAE learning, we propose an ensemble approach featuring dual experts, each incorporating distinct scale of noise. This dual-expert scheme effectively amplifies reconstruction errors in anomalous regions and suppresses false alarms in healthy areas. Additionally, we propose an anatomically-aware bidirectional consistency loss to ensure high-fidelity reconstruction at the regional level, using superpixels for anatomy perception and bidirectional distillation for reliable knowledge transfer. Extensive experiments across two different settings demonstrate the effectiveness and generalization ability of our proposed method.

NeurIPS Conference 2024 Conference Paper

Enhancing LLM Reasoning via Vision-Augmented Prompting

  • Ziyang Xiao
  • Dongxiang Zhang
  • Xiongwei Han
  • Xiaojin Fu
  • Yin Yu
  • Tao Zhong
  • Sai Wu
  • Yuan Wang

Verbal and visual-spatial information processing are two critical subsystems that activate different brain regions and often collaborate together for cognitive reasoning. Despite the rapid advancement of LLM-based reasoning, the mainstream frameworks, such as Chain-of-Thought (CoT) and its variants, primarily focus on the verbal dimension, resulting in limitations in tackling reasoning problems with visual and spatial clues. To bridge the gap, we propose a novel dual-modality reasoning framework called Vision-Augmented Prompting (VAP). Upon receiving a textual problem description, VAP automatically synthesizes an image from the visual and spatial clues by utilizing external drawing tools. Subsequently, VAP formulates a chain of thought in both modalities and iteratively refines the synthesized image. Finally, a conclusive reasoning scheme based on self-alignment is proposed for final result generation. Extensive experiments are conducted across four versatile tasks, including solving geometry problems, Sudoku, time series prediction, and travelling salesman problem. The results validated the superiority of VAP over existing LLMs-based reasoning frameworks.

NeurIPS Conference 2024 Conference Paper

Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild

  • Jianan Yang
  • Chenchao Gao
  • Zhiqing Xiao
  • Junbo Zhao
  • Sai Wu
  • Gang Chen
  • Haobo Wang

The recent large-scale text-to-image generative models have attained unprecedented performance, while people established adaptor modules like LoRA and DreamBooth to extend this performance to even more unseen concept tokens. However, we empirically find that this workflow often fails to accurately depict the out-of-distribution concepts. This failure is highly related to the low quality of training data. To resolve this, we present a framework called Controllable Adaptor Towards Out-of-Distribution Concepts (CATOD). Our framework follows the active learning paradigm which includes high-quality data accumulation and adaptor training, enabling a finer-grained enhancement of generative results. The aesthetics score and concept-matching score are two major factors that impact the quality of synthetic results. One key component of CATOD is the weighted scoring system that automatically balances between these two scores and we also offer comprehensive theoretical analysis for this point. Then, it determines how to select data and schedule the adaptor training based on this scoring system. The extensive results show that CATOD significantly outperforms the prior approaches with an 11. 10 boost on the CLIP score and a 33. 08% decrease on the CMMD metric.

AAAI Conference 2024 Conference Paper

Sampling-Resilient Multi-Object Tracking

  • Zepeng Li
  • Dongxiang Zhang
  • Sai Wu
  • Mingli Song
  • Gang Chen

Multi-Object Tracking (MOT) is a cornerstone operator for video surveillance applications. To enable real-time processing of large-scale live video streams, we study an interesting scenario called down-sampled MOT, which performs object tracking only on a small subset of video frames. The problem is challenging for state-of-the-art MOT methods, which exhibit significant performance degradation under high frame reduction ratios. In this paper, we devise a sampling-resilient tracker with a novel sparse-observation Kalman filter (SOKF). It integrates an LSTM network to capture non-linear and dynamic motion patterns caused by sparse observations. Since the LSTM-based state transition is not compatible with the original noise estimation mechanism, we propose new estimation strategies based on Bayesian neural networks and derive the optimal Kalman gain for SOKF. To associate the detected bounding boxes robustly, we also propose a comprehensive similarity metric that systematically integrates multiple spatial matching signals. Experiments on three benchmark datasets show that our proposed tracker achieves the best trade-off between efficiency and accuracy. With the same tracking accuracy, we reduce the total processing time of ByteTrack by 2× in MOT17 and 3× in DanceTrack.

ICML Conference 2023 Conference Paper

Byzantine-Robust Learning on Heterogeneous Data via Gradient Splitting

  • Yuchen Liu
  • Chen Chen 0043
  • Lingjuan Lyu
  • Fangzhao Wu
  • Sai Wu
  • Gang Chen 0001

Federated learning has exhibited vulnerabilities to Byzantine attacks, where the Byzantine attackers can send arbitrary gradients to a central server to destroy the convergence and performance of the global model. A wealth of robust AGgregation Rules (AGRs) have been proposed to defend against Byzantine attacks. However, Byzantine clients can still circumvent robust AGRs when data is non-Identically and Independently Distributed (non-IID). In this paper, we first reveal the root causes of performance degradation of current robust AGRs in non-IID settings: the curse of dimensionality and gradient heterogeneity. In order to address this issue, we propose GAS, a GrAdient Splitting approach that can successfully adapt existing robust AGRs to non-IID settings. We also provide a detailed convergence analysis when the existing robust AGRs are combined with GAS. Experiments on various real-world datasets verify the efficacy of our proposed GAS. The implementation code is provided in https: //github. com/YuchenLiu-a/byzantine-gas.

ICLR Conference 2023 Conference Paper

Learning a Data-Driven Policy Network for Pre-Training Automated Feature Engineering

  • Liyao Li
  • Haobo Wang 0001
  • Liangyu Zha
  • Qingyi Huang
  • Sai Wu
  • Gang Chen 0001
  • Junbo Zhao 0002

Feature engineering is widely acknowledged to be pivotal in tabular data analysis and prediction. Automated feature engineering (AutoFE) emerged to automate this process managed by experienced data scientists and engineers conventionally. In this area, most — if not all — prior work adopted an identical framework from the neural architecture search (NAS) method. While feasible, we posit that the NAS framework very much contradicts the way how human experts cope with the data since the inherent Markov decision process (MDP) setup differs. We point out that its data-unobserved setup consequentially results in an incapability to generalize across different datasets as well as also high computational cost. This paper proposes a novel AutoFE framework Feature Set Data-Driven Search (FETCH), a pipeline mainly for feature generation and selection. Notably, FETCH is built on a brand-new data-driven MDP setup using the tabular dataset as the state fed into the policy network. Further, we posit that the crucial merit of FETCH is its transferability where the yielded policy network trained on a variety of datasets is indeed capable to enact feature engineering on unseen data, without requiring additional exploration. To the best of our knowledge, this is a pioneer attempt to build a tabular data pre-training paradigm via AutoFE. Extensive experiments show that FETCH systematically surpasses the current state-of-the-art AutoFE methods and validates the transferability of AutoFE pre-training.

ICML Conference 2023 Conference Paper

Towards Controlled Data Augmentations for Active Learning

  • Jianan Yang
  • Haobo Wang 0001
  • Sai Wu
  • Gang Chen 0001
  • Junbo Zhao 0002

The mission of active learning is to identify the most valuable data samples, thus attaining decent performance with much fewer samples. The data augmentation techniques seem straightforward yet promising to enhance active learning by extending the exploration of the input space, which helps locate more valuable samples. In this work, we thoroughly study the coupling of data augmentation and active learning, thereby proposing Controllable Augmentation ManiPulator for Active Learning. In contrast to the few prior works that touched on this line, CAMPAL emphasizes a purposeful, tighten, and better-controlled integration of data augmentation into active learning in three folds: (i)-carefully designed augmentation policies applied separately on labeled and unlabeled data pools; (ii)-controlled and quantifiably optimizable augmentation strengths; (iii)-full and flexible coverage for most (if not all) active learning schemes. Theories are proposed and associated with the development of key components in CAMPAL. Through extensive empirical experiments, we bring the performance of active learning methods to a new level: an absolute performance boost of 16. 99% on CIFAR-10 and 12. 25 on SVHN with 1, 000 annotated samples. Codes are available at https: //github. com/jnzju/CAMPAL.

IJCAI Conference 2022 Conference Paper

Comparison Knowledge Translation for Generalizable Image Classification

  • Zunlei Feng
  • Tian Qiu
  • Sai Wu
  • Xiaotuan Jin
  • Zengliang He
  • Mingli Song
  • Huiqiong Wang

Deep learning has recently achieved remarkable performance in image classification tasks, which depends heavily on massive annotation. However, the classification mechanism of existing deep learning models seems to contrast to humans' recognition mechanism. With only a glance at an image of the object even unknown type, humans can quickly and precisely find other same category objects from massive images, which benefits from daily recognition of various objects. In this paper, we attempt to build a generalizable framework that emulates the humans' recognition mechanism in the image classification task, hoping to improve the classification performance on unseen categories with the support of annotations of other categories. Specifically, we investigate a new task termed Comparison Knowledge Translation (CKT). Given a set of fully labeled categories, CKT aims to translate the comparison knowledge learned from the labeled categories to a set of novel categories. To this end, we put forward a Comparison Classification Translation Network (CCT-Net), which comprises a comparison classifier and a matching discriminator. The comparison classifier is devised to classify whether two images belong to the same category or not, while the matching discriminator works together in an adversarial manner to ensure whether classified results match the truth. Exhaustive experiments show that CCT-Net achieves surprising generalization ability on unseen categories and SOTA performance on target categories.

AAAI Conference 2022 Conference Paper

Model Doctor: A Simple Gradient Aggregation Strategy for Diagnosing and Treating CNN Classifiers

  • Zunlei Feng
  • Jiacong Hu
  • Sai Wu
  • XiaoTian Yu
  • Jie Song
  • Mingli Song

Recently, Convolutional Neural Network (CNN) has achieved excellent performance in the classification task. It is widely known that CNN is deemed as a ‘black-box’, which is hard for understanding the prediction mechanism and debugging the wrong prediction. Some model debugging and explanation works are developed for solving the above drawbacks. However, those methods focus on explanation and diagnosing possible causes for model prediction, based on which the researchers handle the following optimization of models manually. In this paper, we propose the first completely automatic model diagnosing and treating tool, termed as Model Doctor. Based on two discoveries that 1) each category is only correlated with sparse and specific convolution kernels, and 2) adversarial samples are isolated while normal samples are successive in the feature space, a simple aggregate gradient constraint is devised for effectively diagnosing and optimizing CNN classifiers. The aggregate gradient strategy is a versatile module for mainstream CNN classifiers. Extensive experiments demonstrate that the proposed Model Doctor applies to all existing CNN classifiers, and improves the accuracy of 16 mainstream CNN classifiers by 1% ∼ 5%.

AAAI Conference 2021 Conference Paper

Effective Slot Filling via Weakly-Supervised Dual-Model Learning

  • Jue Wang
  • Ke Chen
  • Lidan Shou
  • Sai Wu
  • Gang Chen

Slot filling is a challenging task in Spoken Language Understanding (SLU). Supervised methods usually require large amounts of annotation to maintain desirable performance. A solution to relieve the heavy dependency on labeled data is to employ bootstrapping, which leverages unlabeled data. However, bootstrapping is known to suffer from semantic drift. We argue that semantic drift can be tackled by exploiting the correlation between slot values (phrases) and their respective types. By using some particular weakly-labeled data, namely the plain phrases included in sentences, we propose a weaklysupervised slot filling approach. Our approach trains two models, namely a classifier and a tagger, which can effectively learn from each other on the weakly-labeled data. The experimental results demonstrate that our approach achieves better results than standard baselines on multiple datasets, especially in the low-resource setting.

ICML Conference 2021 Conference Paper

Joining datasets via data augmentation in the label space for neural networks

  • Junbo Zhao 0002
  • Mingfeng Ou
  • Linji Xue
  • Yunkai Cui
  • Sai Wu
  • Gang Chen 0001

Most, if not all, modern deep learning systems restrict themselves to a single dataset for neural network training and inference. In this article, we are interested in systematic ways to join datasets that are made of similar purposes. Unlike previous published works that ubiquitously conduct the dataset joining in the uninterpretable latent vectorial space, the core to our method is an augmentation procedure in the label space. The primary challenge to address the label space for dataset joining is the discrepancy between labels: non-overlapping label annotation sets, different labeling granularity or hierarchy and etc. Notably we propose a new technique leveraging artificially created knowledge graph, recurrent neural networks and policy gradient that successfully achieve the dataset joining in the label space. Empirical results on both image and text classification justify the validity of our approach.