Author name cluster

Haifeng Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

31 papers

2 author rows

AAAI Conference 2026 Conference Paper

BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation

Yuhao Wang
Ruiyang Ren
Yucheng Wang
Jing Liu
Xin Zhao
Hua Wu
Haifeng Wang

With the rapid advancement of large language models (LLMs), retrieval-augmented generation (RAG) has emerged as a critical approach to supplement the inherent knowledge limitations of LLMs. However, due to the typically large volume of retrieved information, RAG tends to operate with long context lengths. From the perspective of entropy engineering, we identify unconstrained entropy growth and attention dilution due to long retrieval context as significant factors affecting RAG performance. In this paper, we propose the balanced entropy-engineered RAG (BEE-RAG) framework, which improves the adaptability of RAG systems to varying context lengths through the principle of entropy invariance. By leveraging balanced context entropy to reformulate attention dynamics, BEE-RAG separates attention sensitivity from context length, ensuring a stable entropy level. Building upon this, we introduce a zero-shot inference strategy for multi-importance estimation and a parameter-efficient adaptive fine-tuning mechanism to obtain the optimal balancing factor for different settings. Extensive experiments across multiple RAG tasks demonstrate the effectiveness of BEE-RAG.

PDF Details DOI

JBHI Journal 2026 Journal Article

GPFD-Net: A Geometry-Pose Frequency Decoupling Network for Privacy-Preserving Human Action Recognition in Healthcare

Xing Li
Jingfan Liang
Ge Gao
Li Wang
Haifeng Wang
Shihao Han

Human Action Recognition (HAR) holds significant application value in healthcare informatics, facilitating tasks such as clinical diagnosis and rehabilitation monitoring. Point cloud sequences have emerged as a pivotal modality for balancing privacy preservation with high-fidelity geometric structural representation, ensuring anonymity while retaining critical 3D behavioral information. However, existing point cloud sequence encoding methods struggle to precisely encode micro-geometric details and macro-pose contours within the spatial dimension, as well as the dynamic heterogeneity of actions within the temporal dimension. These limitations impede the realization of high-precision clinical motion analysis. To address these challenges, we propose a Geometry-Pose Frequency Decoupling Network (GPFD-Net) for human action recognition. First, we design a Geometry-Pose Parallel-Collaborative Spatial Encoder (GPCSE). This module designs a parallel dual-stream architecture to explicitly capture and fuse complementary micro-geometric details and macro-pose contours, generating an informative geometry-enhanced pose feature sequence. Second, we introduce a Frequency-Decoupled Temporal Capturer (FDTC). This module adaptively decomposes the geometry-enhanced pose feature sequence into a smooth trend sequence and a transient detail sequence, which are subsequently processed by two parallel expert encoders via differentiated encoding to achieve robust human action recognition. Extensive experiments on four public benchmark datasets demonstrate that GPFD-Net achieves superior performance. The proposed method provides a novel paradigm for high-precision and privacy-preserving motion analysis in healthcare applications.

Details DOI

ICLR Conference 2025 Conference Paper

FlashMask: Efficient and Rich Mask Extension of FlashAttention

Guoxia Wang
Jinle Zeng
Xiyuan Xiao
Siming Wu
Jiabin Yang
Lujing Zheng
Zeyu Chen
Jiang Bian

The computational and memory demands of vanilla attention scale quadratically with the sequence length $N$, posing significant challenges for processing long sequences in Transformer models. FlashAttention alleviates these challenges by eliminating the $\mathcal{O}(N^2)$ memory dependency and reducing attention latency through IO-aware memory optimizations. However, its native support for certain attention mask types is limited, and it does not inherently accommodate more complex masking requirements. Previous approaches resort to using dense masks with $\mathcal{O}(N^2)$ memory complexity, leading to inefficiencies. In this paper, we propose \ours{}, an extension of FlashAttention that introduces a column-wise sparse representation of attention masks. This approach efficiently represents a wide range of mask types and facilitates the development of optimized kernel implementations. By adopting this novel representation, \ours{} achieves linear memory complexity $\mathcal{O}(N)$, making it suitable for modeling long-context sequences. Moreover, this representation enables kernel optimizations that eliminate unnecessary computations by leveraging sparsity in the attention mask, without sacrificing computational accuracy, resulting in higher computational efficiency. We evaluate \ours{}'s performance in fine-tuning and alignment training of LLMs such as SFT, LoRA, DPO, and RM. \ours{} achieves significant throughput improvements, with end-to-end speedups ranging from 1.65x to 3.22x compared to existing FlashAttention dense method. Additionally, our kernel-level comparisons demonstrate that \ours{} surpasses the latest counterpart, FlexAttention, by 12.1\% to 60.7\% in terms of kernel TFLOPs/s, achieving 37.8\% to 62.3\% of the theoretical maximum FLOPs/s on the A100 GPU. The code is open-sourced on PaddlePaddle\footnote{\url{https://github.com/PaddlePaddle/Paddle}} and integrated into PaddleNLP\footnote{\url{https://github.com/PaddlePaddle/PaddleNLP}}, supporting models with over 100 billion parameters for contexts extending up to 128K tokens.

Details

JBHI Journal 2025 Journal Article

PEARL: Cascaded Self-Supervised Cross-Fusion Learning for Parallel MRI Acceleration

Qingyong Zhu
Bei Liu
Zhuo-Xu Cui
Chentao Cao
Xiaomeng Yan
Yuanyuan Liu
Jing Cheng
Yihang Zhou

Supervised deep learning (SDL) methodology holds promise for accelerated magnetic resonance imaging (AMRI) but is hampered by the reliance on extensive training data. Some self-supervised frameworks, such as deep image prior (DIP), have emerged, eliminating the explicit training procedure but often struggling to remove noise and artifacts under significant degradation. This work introduces a novel self-supervised accelerated parallel MRI approach called PEARL, leveraging a multiple-stream joint deep decoder with two cross-fusion schemes to accurately reconstruct one or more target images from compressively sampled k-space. Each stream comprises cascaded cross-fusion sub-block networks (SBNs) that sequentially perform combined upsampling, 2D convolution, joint attention, ReLU activation and batch normalization (BN). Among them, combined upsampling and joint attention facilitate mutual learning between multiple-stream networks by integrating multi-parameter priors in both additive and multiplicative manners. Long-range unified skip connections within SBNs ensure effective information propagation between distant cross-fusion layers. Additionally, incorporating dual-normalized edge-orientation similarity regularization into the training loss enhances detail reconstruction and prevents overfitting. Experimental results consistently demonstrate that PEARL outperforms the existing state-of-the-art (SOTA) self-supervised AMRI technologies in various MRI cases. Notably, 5-fold $\sim$ 6-fold accelerated acquisition yields a 1 $\%$ $\sim$ 2 $\%$ improvement in SSIM $_{\mathsf{ROI}}$ and a 3 $\%$ $\sim$ 6 $\%$ improvement in PSNR $_{\mathsf{ROI}}$, along with a significant 15 $\%$ $\sim$ 20 $\%$ reduction in RLNE $_{\mathsf{ROI}}$.

Details DOI

NeurIPS Conference 2025 Conference Paper

Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds

Fan Wang
Pengtao Shao
Yiming Zhang
Bo Yu
Shaoshan Liu
Ning Ding
Yang Cao
Yu Kang

In-Context Reinforcement Learning (ICRL) enables agents to learn automatically and on-the-fly from their interactive experiences. However, a major challenge in scaling up ICRL is the lack of scalable task collections. To address this, we propose the procedurally generated tabular Markov Decision Processes, named AnyMDP. Through a carefully designed randomization process, AnyMDP is capable of generating high-quality tasks on a large scale while maintaining relatively low structural biases. To facilitate efficient meta-training at scale, we further introduce decoupled policy distillation and induce prior information in the ICRL framework. Our results demonstrate that, with a sufficiently large scale of AnyMDP tasks, the proposed model can generalize to tasks that were not considered in the training set through versatile in-context learning paradigms. The scalable task set provided by AnyMDP also enables a more thorough empirical investigation of the relationship between data distribution and ICRL performance. We further show that the generalization of ICRL potentially comes at the cost of increased task diversity and longer adaptation periods. This finding carries critical implications for scaling robust ICRL capabilities, highlighting the necessity of diverse and extensive task design, and prioritizing asymptotic performance over few-shot adaptation.