Author name cluster

Han Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

1 author row

TMLR Journal 2026 Journal Article

SpikingBrain: Spiking Brain-inspired Large Models

Yuqi Pan
Yupeng Feng
JingHao Zhuang
siyu ding
Han Xu
Zehao Liu
Bohan Sun
Yuhong Chou

Mainstream Transformer-based large language models (LLMs) face significant efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly. These constraints limit their ability to process long sequences effectively. In addition, building large models on non-NVIDIA computing platforms poses major challenges in achieving stable and efficient training and deployment. To address these issues, we introduce SpikingBrain, a new family of brain-inspired models designed for efficient long-context training and inference. SpikingBrain leverages the MetaX GPU cluster and focuses on three core aspects: (1) Model Architecture: linear and hybrid-linear attention architectures with adaptive spiking neurons; (2) Algorithmic Optimizations: an efficient, conversion-based training pipeline compatible with existing LLMs, along with a dedicated spike coding framework; (3) System Engineering: customized training frameworks, operator libraries, and parallelism strategies tailored to the MetaX hardware. Using these techniques, we develop two models: SpikingBrain-7B, a linear LLM, and SpikingBrain-76B, a hybrid-linear MoE LLM. These models demonstrate the feasibility of large-scale LLM development on non-NVIDIA platforms, and our training framework supports weeks of stable training on hundreds of MetaX GPUs with Model FLOPs Utilization (MFU) at expected levels. SpikingBrain achieves performance comparable to open-source Transformer baselines while using exceptionally low data resources (continual pre-training of approximately 150B tokens). Our models also significantly improve long-context efficiency and deliver inference with (partially) constant memory and event-driven spiking behavior. For example, SpikingBrain-7B achieves more than 100× speedup in Time to First Token (TTFT) for 4M-token sequences. Furthermore, the proposed spiking scheme achieves 69.15% sparsity, enabling low-power operation. Overall, this work demonstrates the potential of brain-inspired mechanisms to drive the next generation of efficient and scalable large model design.

PDF Details

AAAI Conference 2026 Conference Paper

WaveC2R: Wavelet-Driven Coarse-to-Refined Hierarchical Learning for Radar Retrieval

Chunlei Shi
Han Xu
Yinghao Li
Yi-Lin Wei
Yongchao Feng
Yecheng Zhang
Dan Niu

Satellite-based radar retrieval methods are widely employed to fill coverage gaps in ground-based radar systems, especially in remote areas affected by terrain blockage and limited detection range. Existing methods predominantly rely on overly simplistic spatial-domain architectures constructed from a single data source, limiting their ability to accurately capture complex precipitation patterns and sharply defined meteorological boundaries. To address these limitations, we propose WaveC2R, a novel wavelet-driven coarse-to-refined framework for radar retrieval. WaveC2R integrates complementary multi-source data and leverages frequency-domain decomposition to separately model low-frequency components for capturing precipitation patterns and high-frequency components for delineating sharply defined meteorological boundaries. Specifically, WaveC2R consists of two stages (i) Intensity-Boundary Decoupled Learning, which leverages wavelet decomposition and frequency-specific loss functions to separately optimize low-frequency intensity and high-frequency boundaries; and (ii) Detail-Enhanced Diffusion Refinement, which employs frequency-aware conditional priors and multi-source data to progressively enhance fine-scale precipitation structures while preserving coarse-scale meteorological consistency. Experimental results on the publicly available SEVIR dataset demonstrate that WaveC2R achieves state-of-the-art performance in satellite-based radar retrieval, particularly excelling at preserving high-intensity precipitation features and sharply defined meteorological boundaries.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Cross-Modal Stealth: A Coarse-to-Fine Attack Framework for RGB-T Tracker

Xinyu Xiang
Qinglong Yan
Hao Zhang
Jianfeng Ding
Han Xu
Zhongyuan Wang
Jiayi Ma

Current research on adversarial attacks mainly focuses on RGB trackers, with no existing methods for attacking RGB-T cross-modal trackers. To fill this gap and overcome its challenges, we propose a progressive adversarial patch generation framework and achieve cross-modal stealth. On the one hand, we design a coarse-to-fine architecture grounded in the latent space to progressively and precisely uncover the vulnerabilities of RGB-T trackers. On the other hand, we introduce a correlation-breaking loss that disrupts the modal coupling within trackers, spanning from the pixel to the semantic level. These two design elements ensure that the proposed method can overcome the obstacles posed by cross-modal information complementarity in implementing attacks. Furthermore, to enhance the reliable application of the adversarial patches in real world, we develop a point tracking-based reprojection strategy that effectively mitigates performance degradation caused by multi-angle distortion during imaging. Extensive experiments demonstrate the superiority of our method.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Deno-IF: Unsupervised Noisy Visible and Infrared Image Fusion Method

Han Xu
Yuyang Li
Yunfei Deng
Jiayi Ma
Guangcan Liu

Most image fusion methods are designed for ideal scenarios and struggle to handle noise. Existing noise-aware fusion methods are supervised and heavily rely on constructed paired data, limiting performance and generalization. This paper proposes a novel unsupervised noisy visible and infrared image fusion method, comprising two key modules. First, when only noisy source images are available, a convolutional low-rank optimization module decomposes clean components based on convolutional low-rank priors, guiding subsequent optimization. The unsupervised approach eliminates data dependency and enhances generalization across various and variable noise. Second, a unified network jointly realizes denoising and fusion. It consists of both intra-modal recovery and inter-modal recovery and fusion, also with a convolutional low-rankness loss for regularization. By exploiting the commonalities of denoising and fusion, the joint framework significantly reduces network complexity while expanding functionality. Extensive experiments validate the effectiveness and generalization of the proposed method for image fusion under various and variable noise conditions. The code is publicly available at https: //github. com/hanna-xu/Deno-IF.

PDF Details

NeurIPS Conference 2024 Conference Paper

Certified Robustness for Deep Equilibrium Models via Serialized Random Smoothing

Weizhi Gao
Zhichao Hou
Han Xu
Xiaorui Liu

Implicit models such as Deep Equilibrium Models (DEQs) have emerged as promising alternative approaches for building deep neural networks. Their certified robustness has gained increasing research attention due to security concerns. Existing certified defenses for DEQs employing interval bound propagation and Lipschitz-bounds not only offer conservative certification bounds but also are restricted to specific forms of DEQs. In this paper, we provide the first randomized smoothing certified defense for DEQs to solve these limitations. Our study reveals that simply applying randomized smoothing to certify DEQs provides certified robustness generalized to large-scale datasets but incurs extremely expensive computation costs. To reduce computational redundancy, we propose a novel Serialized Randomized Smoothing (SRS) approach that leverages historical information. Additionally, we derive a new certified radius estimation for SRS to theoretically ensure the correctness of our algorithm. Extensive experiments and ablation studies on image recognition demonstrate that our algorithm can significantly accelerate the certification of DEQs by up to 7x almost without sacrificing the certified accuracy. The implementation will be publicly available upon the acceptance of this work. Our code is available at https: //github. com/WeizhiGao/Serialized-Randomized-Smoothing.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Self-playing Adversarial Language Game Enhances LLM Reasoning

Pengyu Cheng
Yong Dai
Tianhao Hu
Han Xu
Zhisong Zhang
Lei Han
Nan Du
Xiaolong Li

We explore the potential of self-play training for large language models (LLMs) in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate around a target word only visible to the attacker. The attacker aims to induce the defender to speak the target word unconsciously, while the defender tries to infer the target word from the attacker's utterances. To win the game, both players must have sufficient knowledge about the target word and high-level reasoning ability to infer and express in this information-reserved conversation. Hence, we are curious about whether LLMs' reasoning ability can be further enhanced by Self-Playing this Adversarial language Game (SPAG). With this goal, we select several open-source LLMs and let each act as the attacker and play with a copy of itself as the defender on an extensive range of target words. Through reinforcement learning on the game outcomes, we observe that the LLMs' performances uniformly improve on a broad range of reasoning benchmarks. Furthermore, iteratively adopting this self-play process can continuously promote LLMs' reasoning abilities. The code is available at https: //github. com/Linear95/SPAG.

PDF Details DOI

TMLR Journal 2024 Journal Article

Stealthy Backdoor Attack via Confidence-driven Sampling

Pengfei He
Yue Xing
Han Xu
Jie Ren
Yingqian Cui
Shenglai Zeng
Jiliang Tang
Makoto Yamada

Backdoor attacks facilitate unauthorized control in the testing stage by carefully injecting harmful triggers during the training phase of deep neural networks. Previous works have focused on improving the stealthiness of the trigger while randomly selecting samples to attack. However, we find that random selection harms the stealthiness of the model. In this paper, we identify significant pitfalls of random sampling, which make the attacks more detectable and easier to defend against. To improve the stealthiness of existing attacks, we introduce a method of strategically poisoning samples near the model's decision boundary, aiming to minimally alter the model's behavior (decision boundary) before and after backdooring. Our main insight for detecting boundary samples is exploiting the confidence scores as a metric for being near the decision boundary and selecting those to poison (inject) the attack. The proposed approach makes it significantly harder for defenders to identify the attacks. Our method is versatile and independent of any specific trigger design. We provide theoretical insights and conduct extensive experiments to demonstrate the effectiveness of the proposed method.

PDF Details

AAAI Conference 2023 Conference Paper

Unsupervised Multi-Exposure Image Fusion Breaking Exposure Limits via Contrastive Learning

Han Xu
Liang Haochen
Jiayi Ma

This paper proposes an unsupervised multi-exposure image fusion (MEF) method via contrastive learning, termed as MEF-CL. It breaks exposure limits and performance bottleneck faced by existing methods. MEF-CL firstly designs similarity constraints to preserve contents in source images. It eliminates the need for ground truth (actually not exist and created artificially) and thus avoids negative impacts of inappropriate ground truth on performance and generalization. Moreover, we explore a latent feature space and apply contrastive learning in this space to guide fused image to approximate normal-light samples and stay away from inappropriately exposed ones. In this way, characteristics of fused images (e.g., illumination, colors) can be further improved without being subject to source images. Therefore, MEF-CL is applicable to image pairs of any multiple exposures rather than a pair of under-exposed and over-exposed images mandated by existing methods. By alleviating dependence on source images, MEF-CL shows better generalization for various scenes. Consequently, our results exhibit appropriate illumination, detailed textures, and saturated colors. Qualitative, quantitative, and ablation experiments validate the superiority and generalization of MEF-CL. Our code is publicly available at https://github.com/hanna-xu/MEF-CL.

PDF Details DOI

AAAI Conference 2021 System Paper

DeepRobust: a Platform for Adversarial Attacks and Defenses

Yaxin Li
Wei Jin
Han Xu
Jiliang Tang

DeepRobust is a PyTorch platform for generating adversarial examples and building robust machine learning models for different data domains. Users can easily evaluate the attack performance against different defense methods with Deep- Robust. In this paper, we introduce the functions of Deep- Robust with detailed instructions. We will demonstrate that DeepRobust is a useful tool to measure deep learning model robustness and to identify the suitable countermeasures against adversarial attacks. The platform is kept updated and can be found at https: //github. com/DSE-MSU/DeepRobust. More details of instructions can be found in the documentation at https: //deeprobust. readthedocs. io/en/latest/.

PDF Details

NeurIPS Conference 2021 Conference Paper

Graph Neural Networks with Adaptive Residual

Xiaorui Liu
Jiayuan Ding
Wei Jin
Han Xu
Yao Ma
Zitao Liu
Jiliang Tang

Graph neural networks (GNNs) have shown the power in graph representation learning for numerous tasks. In this work, we discover an interesting phenomenon that although residual connections in the message passing of GNNs help improve the performance, they immensely amplify GNNs' vulnerability against abnormal node features. This is undesirable because in real-world applications, node features in graphs could often be abnormal such as being naturally noisy or adversarially manipulated. We analyze possible reasons to understand this phenomenon and aim to design GNNs with stronger resilience to abnormal features. Our understandings motivate us to propose and derive a simple, efficient, interpretable, and adaptive message passing scheme, leading to a novel GNN with Adaptive Residual, AirGNN. Extensive experiments under various abnormal feature scenarios demonstrate the effectiveness of the proposed algorithm.

PDF Details

AAAI Conference 2020 Conference Paper

FusionDN: A Unified Densely Connected Network for Image Fusion

Han Xu
Jiayi Ma
Zhuliang Le
Junjun Jiang
Xiaojie Guo

In this paper, we present a new unsupervised and uniﬁed densely connected network for different types of image fusion tasks, termed as FusionDN. In our method, the densely connected network is trained to generate the fused image conditioned on source images. Meanwhile, a weight block is applied to obtain two data-driven weights as the retention degrees of features in different source images, which are the measurement of the quality and the amount of information in them. Losses of similarities based on these weights are applied for unsupervised learning. In addition, we obtain a single model applicable to multiple fusion tasks by applying elastic weight consolidation to avoid forgetting what has been learned from previous tasks when training multiple tasks sequentially, rather than train individual models for every fusion task or jointly train tasks roughly. Qualitative and quantitative results demonstrate the advantages of FusionDN compared with state-of-the-art methods in different fusion tasks.

PDF Details

AAAI Conference 2020 Conference Paper

Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity

Hao Zhang
Han Xu
Yang Xiao
Xiaojie Guo
Jiayi Ma

In this paper, we propose a fast uniﬁed image fusion network based on proportional maintenance of gradient and intensity (PMGI), which can end-to-end realize a variety of image fusion tasks, including infrared and visible image fusion, multiexposure image fusion, medical image fusion, multi-focus image fusion and pan-sharpening. We unify the image fusion problem into the texture and intensity proportional maintenance problem of the source images. On the one hand, the network is divided into gradient path and intensity path for information extraction. We perform feature reuse in the same path to avoid loss of information due to convolution. At the same time, we introduce the pathwise transfer block to exchange information between different paths, which can not only pre-fuse the gradient information and intensity information, but also enhance the information to be processed later. On the other hand, we deﬁne a uniform form of loss function based on these two kinds of information, which can adapt to different fusion tasks. Experiments on publicly available datasets demonstrate the superiority of our PMGI over the state-of-the-art in terms of both visual effect and quantitative metric in a variety of fusion tasks. In addition, our method is faster compared with the state-of-the-art.

PDF Details

IJCAI Conference 2019 Conference Paper

Learning a Generative Model for Fusing Infrared and Visible Images via Conditional Generative Adversarial Network with Dual Discriminators

Han Xu
Pengwei Liang
Wei Yu
Junjun Jiang
Jiayi Ma

In this paper, we propose a new end-to-end model, called dual-discriminator conditional generative adversarial network (DDcGAN), for fusing infrared and visible images of different resolutions. Unlike the pixel-level methods and existing deep learning-based methods, the fusion task is accomplished through the adversarial process between a generator and two discriminators, in addition to the specially designed content loss. The generator is trained to generate real-like fused images to fool discriminators. The two discriminators are trained to calculate the JS divergence between the probability distribution of downsampled fused images and infrared images, and the JS divergence between the probability distribution of gradients of fused images and gradients of visible images, respectively. Thus, the fused images can compensate for the features that are not constrained by the single content loss. Consequently, the prominence of thermal targets in the infrared image and the texture details in the visible image can be preserved or even enhanced in the fused image simultaneously. Moreover, by constraining and distinguishing between the downsampled fused image and the low-resolution infrared image, DDcGAN can be preferably applied to the fusion of different resolution images. Qualitative and quantitative experiments on publicly available datasets demonstrate the superiority of our method over the state-of-the-art.

PDF Details

AAAI Conference 2017 System Paper

SenseRun: Real-Time Running Routes Recommendation towards Providing Pleasant Running Experiences

Jiayu Long
Jia Jia
Han Xu

In this demo, we develop a mobile running application, SenseRun, to involve landscape experiences for routes recommendation. We firstly define landscape experiences, perceived enjoyment from landscape as motivators for running, by public natural area and traffic density. Based on landscape experiences, we categorize locations into 3 types (natural, leisure, traffic space) and set them with different basic weight. Real-time context factors (weather, season and hour of the day) are involved to adjust the weight. We propose a multiattributes method to recommend routes with weight based on MVT (The Marginal Value Theorem) k-shortest-paths algorithm. We also use a landscape-awareness sounds algorithm as supplementary of landscape experiences. Experimental results improve that SenseRun can enhance running experiences and is helpful to promote regular physical activities.

PDF Details