Arrow Research search

Author name cluster

Dewen Hu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers
2 author rows

Possible papers

23

JBHI Journal 2025 Journal Article

Cognitive Load Prediction From Multimodal Physiological Signals Using Multiview Learning

  • Yingxin Liu
  • Yang Yu
  • Hong Tao
  • Zeqi Ye
  • Si Wang
  • Hao Li
  • Dewen Hu
  • Zongtan Zhou

Predicting cognitive load is a crucial issue in the emerging field of human-computer interaction and holds significant practical value, particularly in flight scenarios. Although previous studies have realized efficient cognitive load classification, new research is still needed to adapt the current state-of-the-art multimodal fusion methods. Here, we proposed a feature selection framework based on multiview learning to address the challenges of information redundancy and reveal the common physiological mechanisms underlying cognitive load. Specifically, the multimodal signal features [electroencephalogram (EEG), electrodermal activity (EDA), electrocardiogram (ECG), electrooculogram (EOG), & eye movements] at three cognitive load levels were estimated during multiattribute task battery (MATB) tasks performed by 22 healthy participants and fed into a feature selection-multiview classification with cohesion and diversity (FS-MCCD) framework. The optimized feature set was extracted from the original feature set by integrating the weight of each view and the feature weights to formulate the ranking criteria. The cognitive load prediction model, evaluated using real-time classification results, achieved an average accuracy of 81. 08% and an average F1-score of 80. 94% for three-class classification among 22 participants. Furthermore, the weights of the physiological signal features revealed the physiological mechanisms related to cognitive load. Specifically, heightened cognitive load was linked to amplified $\delta$ and $\theta$ power in the frontal lobe, reduced $\alpha$ power in the parietal lobe, and an increase in pupil diameter. Thus, the proposed multimodal feature fusion framework emphasizes the effectiveness and efficiency of using these features to predict cognitive load.

IROS Conference 2025 Conference Paper

ETA: Learning Optical Flow with Efficient Temporal Attention

  • Bo Wang 0144
  • Zhenping Sun
  • Yang Yu 0014
  • Li Liu 0002
  • Jian Li 0003
  • Dewen Hu

Considering the potential of using multi-frame information to solve the occlusion problem, we introduce a novel idea of multi-frame information integration, which uses the attention mechanism to fuse the temporal information from the previous frame. The idea can effectively improve the estimation accuracy in occluded regions and optimize the inference speed under multi-frame settings. Meanwhile, we suggest the concept of attention confidence to provide an explicit value criterion for the model to utilize useful attention information more efficiently. Furthermore, we propose an Efficient Temporal Attention network (ETA), which achieves promising results on Sintel and KITTI benchmarks, especially with a 9. 4% error reduction compared to the baseline method GMA on Sintel (test) Clean.

JBHI Journal 2025 Journal Article

Few-Shot Class-Incremental Learning for Retinal Disease Recognition

  • Jinghua Zhang
  • Peng Zhao
  • Yongkun Zhao
  • Chen Li
  • Dewen Hu

Few-Shot Class-Incremental Learning (FSCIL) techniques are essential for developing Deep Learning (DL) models that can continuously learn new classes with limited samples while retaining existing knowledge. This capability is particularly crucial for DL-based retinal disease diagnosis system, where acquiring large annotated datasets is challenging, and disease phenotypes evolve over time. This paper introduces Re-FSCIL, a novel framework for Few-Shot Class-Incremental Retinal Disease Recognition (FSCIRDR). Re-FSCIL integrates the RETFound model with a fine-grained module, employing a forward-compatible training strategy to improve adaptability, supervised contrastive learning to enhance feature discrimination, and feature fusion for robust representation quality. We convert existing datasets into the FSCIL format and reproduce numerous representative FSCIL methods to create two new benchmarks, RFMiD38 and JSIEC39, specifically for FSCIRDR. Our experimental results demonstrate that Re-FSCIL achieves State-of-the-art (SOTA) performance, significantly surpassing existing FSCIL methods on these benchmarks.

NeurIPS Conference 2025 Conference Paper

Fully Spiking Neural Networks for Unified Frame-Event Object Tracking

  • Jingjun Yang
  • Liangwei Fan
  • Jinpu Zhang
  • Xiangkai Lian
  • Hui Shen
  • Dewen Hu

The integration of image and event streams offers a promising approach for achieving robust visual object tracking in complex environments. However, current fusion methods achieve high performance at the cost of significant computational overhead and struggle to efficiently extract the sparse, asynchronous information from event streams, failing to leverage the energy-efficient advantages of event-driven spiking paradigms. To address this challenge, we propose the first fully Spiking Frame-Event Tracking framework called SpikeFET. This network achieves synergistic integration of convolutional local feature extraction and Transformer-based global modeling within the spiking paradigm, effectively fusing frame and event data. To overcome the degradation of translation invariance caused by convolutional padding, we introduce a Random Patchwork Module (RPM) that eliminates positional bias through randomized spatial reorganization and learnable type encoding while preserving residual structures. Furthermore, we propose a Spatial-Temporal Regularization (STR) strategy that overcomes similarity metric degradation from asymmetric features by enforcing spatio-temporal consistency among temporal template features in latent space. Extensive experiments across multiple benchmarks demonstrate that the proposed framework achieves superior tracking accuracy over existing methods while significantly reducing power consumption, attaining an optimal balance between performance and efficiency.

ICRA Conference 2025 Conference Paper

HGSLoc: 3DGS-Based Heuristic Camera Pose Refinement

  • Zhongyan Niu
  • Zhen Tan 0002
  • Jinpu Zhang
  • Xueliang Yang
  • Dewen Hu

Visual localization refers to the process of determining camera poses and orientation within a known scene representation. This task is often complicated by factors such as changes in illumination and variations in viewing angles. In this paper, we propose HGSLoc, a novel lightweight plug-and-play pose optimization framework, which integrates 3D reconstruction with a heuristic refinement strategy to achieve higher pose estimation accuracy. Specifically, we introduce an explicit geometric map for 3D representation and high-fidelity rendering, allowing the generation of high-quality synthesized views to support accurate visual localization. Our method demonstrates higher localization accuracy compared to NeRFbased neural rendering localization approaches. We introduce a heuristic refinement strategy, its efficient optimization capability can quickly locate the target node, while we set the steplevel optimization step to enhance the pose accuracy in the scenarios with small errors. With carefully designed heuristic functions, it offers efficient optimization capabilities, enabling rapid error reduction in rough localization estimations. Our method mitigates the dependence on complex neural network models while demonstrating improved robustness against noise and higher localization accuracy in challenging environments, as compared to neural network joint optimization strategies. The optimization framework proposed in this paper introduces novel approaches to visual localization by integrating the advantages of 3D reconstruction and the heuristic refinement strategy, which demonstrates strong performance across multiple benchmark datasets, including 7Scenes and Deep Blending dataset. The implementation of our method has been released at https://github.com/anchang699/HGSLoc.

IROS Conference 2025 Conference Paper

Tracking Any Point with Frame-Event Fusion Network at High Frame Rate

  • Jiaxiong Liu
  • Bo Wang 0144
  • Zhen Tan 0002
  • Jinpu Zhang
  • Hui Shen 0004
  • Dewen Hu

Tracking any point based on image frames is constrained by frame rates, leading to instability in high-speed scenarios and limited generalization in real-world applications. To overcome these limitations, we propose an image-event fusion point tracker, FE-TAP, which combines the contextual information from image frames with the high temporal resolution of events, achieving high frame rate and robust point tracking under various challenging conditions. Specifically, we designed an Evolution Fusion module (EvoFusion) to model the image generation process guided by events. This module can effectively integrate valuable information from both modalities operating at different frequencies. To achieve smoother point trajectories, we employed a transformer-based refinement strategy that updates the point’s trajectories and features iteratively. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches, particularly improving expected feature age by 24% on EDS datasets. Finally, we qualitatively validated the robustness of our algorithm in real driving scenarios using our custom-designed image-event synchronization device.

IROS Conference 2024 Conference Paper

TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization

  • Zhen Tan 0002
  • Zongtan Zhou
  • Yangbing Ge
  • Zi Wang
  • Xieyuanli Chen
  • Dewen Hu

The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncated Depth NeRF (TD-NeRF), a novel approach that enables training NeRF from unknown camera poses - by jointly optimizing learnable parameters of the radiance field and camera poses. Our approach explicitly utilizes monocular depth priors through three key advancements: 1) we propose a novel depth-based ray sampling strategy based on the truncated normal distribution, which improves the convergence speed and accuracy of pose estimation; 2) to circumvent local minima and refine depth geometry, we introduce a coarse-to-fine training strategy that progressively improves the depth precision; 3) we propose a more robust inter-frame point constraint that enhances robustness against depth noise during training. The experimental results on three datasets demonstrate that TD-NeRF achieves superior performance in the joint optimization of camera pose and NeRF, surpassing prior works, and generates more accurate depth geometry. The implementation of our method has been released at https://github.com/nubot-nudt/TD-NeRF.

JMLR Journal 2023 Journal Article

Label Distribution Changing Learning with Sample Space Expanding

  • Chao Xu
  • Hong Tao
  • Jing Zhang
  • Dewen Hu
  • Chenping Hou

With the evolution of data collection ways, label ambiguity has arisen from various applications. How to reduce its uncertainty and leverage its effectiveness is still a challenging task. As two types of representative label ambiguities, Label Distribution Learning (LDL), which annotates each instance with a label distribution, and Emerging New Class (ENC), which focuses on model reusing with new classes, have attached extensive attentions. Nevertheless, in many applications, such as emotion distribution recognition and facial age estimation, we may face a more complicated label ambiguity scenario, i.e., label distribution changing with sample space expanding owing to the new class. To solve this crucial but rarely studied problem, we propose a new framework named as Label Distribution Changing Learning (LDCL) in this paper, together with its theoretical guarantee with generalization error bound. Our approach expands the sample space by re-scaling previous distribution and then estimates the emerging label value via scaling constraint factor. For demonstration, we present two special cases within the framework, together with their optimizations and convergence analyses. Besides evaluating LDCL on most of the existing 13 data sets, we also apply it in the application of emotion distribution recognition. Experimental results demonstrate the effectiveness of our approach in both tackling label ambiguity problem and estimating facial emotion [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )