Arrow Research search

Author name cluster

Dingwen Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

JBHI Journal 2026 Journal Article

MuST: Multi-Scale Transformer Incorporating Hierarchical Attention and TCN for EEG Decoding

  • Kui Zhao
  • Enze Shi
  • Di Zhu
  • Sigang Yu
  • Geng Chen
  • Shijie Zhao
  • Dingwen Zhang
  • Shu Zhang

Electroencephalography (EEG) signals exhibit significant and inherent time scales differences across individuals and tasks. Despite notable successes in decoding EEG signals in single-tasks (e. g. , detection of epilepsy), where the time scales are relatively consistent, substantial differences in temporal characteristics among various tasks pose a significant challenge. To address these limitations, we propose the MuST, which stands for Mu lti- S cale T ransformer, aiming to dynamically learn characteristics of EEG signals on different time scales. Building on the conventional Convolutional Neural Network (CNN)-Transformer model, the MuST introduces two innovations: (1) A hierarchical Transformer structure to dynamically capture global dependencies and long-range information from EEG signals at different scales. (2) A novel temporal convolutional network (TCN) module to replace the original feed forward network (FFN) module in the Transformer, effectively capturing local temporal patterns and short-term dependencies from EEG signals. To validate the performance of the MuST, we conducted experiments on five public EEG datasets with extreme time-scale differences. The experimental results on these datasets demonstrate that we have achieved an average classification accuracy of 91. 69% under identical parameter settings. This surpasses the baseline EEGNet by 5. 65%, highlighting its superior capability in handling multi-scale EEG signals for diverse tasks. More critically, MuST demonstrates a successful unified modeling of EEG temporal heterogeneity through mixed dataset training (epilepsy detection and sleep staging classification). This breakthrough validates our multi-scale architecture's capability to dynamically reconcile divergent neurophysiological timescales within a single model. Our code can be found at https://github.com/wisercc/MuST.

ICRA Conference 2025 Conference Paper

DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes

  • Hao Li 0075
  • Yuanyuan Gao
  • Haosong Peng
  • Chenming Wu
  • Weicai Ye
  • Yufeng Zhan
  • Chen Zhao 0011
  • Dingwen Zhang

Novel-view synthesis approaches play a critical role in vast scene reconstruction. However, these methods rely heavily on dense image inputs and prolonged training times, making them unsuitable where computational resources are limited. Additionally, few-shot methods often struggle with poor reconstruction quality in vast environments. This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes. Our approach divides the scene into regions, processed independently by drones with sparse image inputs. Using a feed-forward Gaussian model, we predict high-quality Gaussian primitives, followed by a global alignment algorithm to ensure geometric consistency. Depth priors is incorporated to further enhance training, while a distillation-based model aggregation mechanism enables efficient reconstruction. Our method achieves high-quality large-scale scene reconstruction and novel-view synthesis in significantly reduced training times, outperforming existing approaches in both speed and scalability. We demonstrate the effectiveness of our framework on vast aerial scenes, achieving high-quality results within minutes. Code will released on our project page https://3d-aigc.github.io/DGTR.

JBHI Journal 2025 Journal Article

Frequency-Aware B-Line and Pleural Line Analysis in Lung Ultrasound Videos

  • Kaihui Yang
  • Guangyu Guo
  • Ying Zhang
  • Linxuan Pang
  • Zhaohui Zheng
  • Ruyu Liu
  • Jin Ding
  • Dingwen Zhang

Accurately identifying B-lines and pleural line (P-line) in lung ultrasound (LUS) videos is valuable for evaluating certain lung conditions. However, manual interpretation remains subjective and highly dependent on operator expertise. Existing deep learning methods often suffer from performance degradation due to speckle noise and motion artifacts. Moreover, the limited availability of LUS video data annotated for multiple diagnostic features such as B-lines and the P-line limits model development. Therefore, this paper introduces ILD-LUS, a new clinical LUS database designed based on interstitial lung disease (ILD) analysis by category labeling, comprising 2, 149 ultrasound videos (193, 410 frames). Also, we construct an external test set based on the public Covid-BLUES dataset for the evaluation of B-lines and P-line recognition in different pulmonary pathologies. Then, we propose a novel video analysis framework that integrates wavelet enhancement with temporal attention modeling. Specifically, we employ a dual-component frequency feature enhancement method using the Discrete Wavelet Transform (DWT), which effectively suppresses noise while preserving important landmarks. Subsequently, an adaptive attention module is introduced to model long-range temporal dependencies and improve dynamic feature representation across consecutive frames. Experimental results show that the proposed method achieves over 94% AUC and 82% ACC for both B-lines and P-line classification on both the ILD-LUS and Covid-BLUES datasets, outperforming existing methods. These findings demonstrate the robustness and generalizability of our approach across different pathological conditions. Overall, the proposed framework shows strong potential for supporting clinical decision-making in LUS analysis. The code is available at https://github.com/KaIi-github/WaveLUS.

JBHI Journal 2025 Journal Article

MHKD: Multi-Step Hybrid Knowledge Distillation for Low-Resolution Whole Slide Images Glomerulus Detection

  • Xiangsen Zhang
  • Longfei Han
  • Chenchu Xu
  • Zhaohui Zheng
  • Jin Ding
  • Xianghui Fu
  • Dingwen Zhang
  • Junwei Han

Glomerulus detection is a critical component of renal histopathology assessment, essential for diagnosing glomerulonephritis. To mitigate the increasing workload on pathologists, AI-assisted diagnostic methods based on high-resolution digital pathology whole slide images have been developed. However, these current AI-assisted approaches are limited to high-resolution whole slide images, necessitating expensive digital scanner equipment, high image storage costs, and significant computational complexity. To address this limitation, this paper pioneers a method for facilitating glomerulus detection in low-resolution human kidney pathology images. Specifically, we propose a novel multi-step hybrid knowledge distillation method. Our method distills both the global features and the semantic information through a hybrid knowledge distillation strategy that integrates offline and online knowledge distillation, where the information from high-resolution pathological images is successively transferred to student model from the global features in the shallow network layers to the semantic information of the back-end through a multi-step training strategy. Experimental results on two datasets show that the proposed method achieves effective detection outcomes for low-resolution kidney pathology images. Compared to other state-of-the-art detection techniques, our method achieves an ${AP}_{0. 5: 0. 95}$ improvement of 23. 1% on the private LN dataset and 15. 9% on the public HUBMAP dataset.

ICML Conference 2025 Conference Paper

Navigating Semantic Drift in Task-Agnostic Class-Incremental Learning

  • Fangwen Wu
  • Lechao Cheng
  • Shengeng Tang
  • Xiaofeng Zhu
  • Chaowei Fang
  • Dingwen Zhang
  • Meng Wang 0001

Class-incremental learning (CIL) seeks to enable a model to sequentially learn new classes while retaining knowledge of previously learned ones. Balancing flexibility and stability remains a significant challenge, particularly when the task ID is unknown. To address this, our study reveals that the gap in feature distribution between novel and existing tasks is primarily driven by differences in mean and covariance moments. Building on this insight, we propose a novel semantic drift calibration method that incorporates mean shift compensation and covariance calibration. Specifically, we calculate each class’s mean by averaging its sample embeddings and estimate task shifts using weighted embedding changes based on their proximity to the previous mean, effectively capturing mean shifts for all learned classes with each new task. We also apply Mahalanobis distance constraint for covariance calibration, aligning class-specific embedding covariances between old and current networks to mitigate the covariance shift. Additionally, we integrate a feature-level self-distillation approach to enhance generalization. Comprehensive experiments on commonly used datasets demonstrate the effectiveness of our approach. The source code is available at https: //github. com/fwu11/MACIL. git.

NeurIPS Conference 2025 Conference Paper

STRIDER: Navigation via Instruction-Aligned Structural Decision Space Optimization

  • Diqi He
  • Xuehao Gao
  • Hao Li
  • Junwei Han
  • Dingwen Zhang

The Zero-shot Vision-and-Language Navigation in Continuous Environments (VLN-CE) task requires agents to navigate previously unseen 3D environments using natural language instructions, without any scene-specific training. A critical challenge in this setting lies in ensuring agents’ actions align with both spatial structure and task intent over long-horizon execution. Existing methods often fail to achieve robust navigation due to a lack of structured decision-making and insufficient integration of feedback from previous actions. To address these challenges, we propose STRIDER (Instruction-Aligned Structural Decision Space Optimization), a novel framework that systematically optimizes the agent’s decision space by integrating spatial layout priors and dynamic task feedback. Our approach introduces two key innovations: 1) a Structured Waypoint Generator that constrains the action space through spatial structure, and 2) a Task-Alignment Regulator that adjusts behavior based on task progress, ensuring semantic alignment throughout navigation. Extensive experiments on the R2R-CE and RxR-CE benchmarks demonstrate that STRIDER significantly outperforms strong SOTA across key metrics; in particular, it improves Success Rate (SR) from 29\% to 35\%, a relative gain of 20. 7\%. Such results highlight the importance of spatially constrained decision-making and feedback-guided execution in improving navigation fidelity for zero-shot VLN-CE.

ICML Conference 2024 Conference Paper

Revisiting the Power of Prompt for Visual Tuning

  • Yuzhu Wang
  • Lechao Cheng
  • Chaowei Fang
  • Dingwen Zhang
  • Manni Duan
  • Meng Wang 0001

Visual prompt tuning (VPT) is a promising solution incorporating learnable prompt tokens to customize pre-trained models for downstream tasks. However, VPT and its variants often encounter challenges like prompt initialization, prompt length, and subpar performance in self-supervised pretraining, hindering successful contextual adaptation. This study commences by exploring the correlation evolvement between prompts and patch tokens during proficient training. Inspired by the observation that the prompt tokens tend to share high mutual information with patch tokens, we propose initializing prompts with downstream token prototypes. The strategic initialization, a stand-in for the previous initialization, substantially improves performance. To refine further, we optimize token construction with a streamlined pipeline that maintains excellent performance with almost no increase in computational expenses compared to VPT. Exhaustive experiments show our proposed approach outperforms existing methods by a remarkable margin. For instance, after MAE pre-training, our method improves accuracy by up to 10%$\sim$30% compared to VPT, and outperforms Full fine-tuning 19 out of 24 cases while using less than 0. 4% of learnable parameters. Besides, the experimental results demonstrate the proposed SPT is robust to prompt lengths and scales well with model capacity and training data size. We finally provide an insightful exploration into the amount of target data facilitating the adaptation of pre-trained models to downstream tasks. The code is available at https: //github. com/WangYZ1608/Self-Prompt-Tuning.

AAAI Conference 2024 Conference Paper

SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer

  • Wenqi Zhong
  • Linzhi Yu
  • Chen Xia
  • Junwei Han
  • Dingwen Zhang

Saccadic scanpath, a data representation of human visual behavior, has received broad interest in multiple domains. Scanpath is a complex eye-tracking data modality that includes the sequences of fixation positions and fixation duration, coupled with image information. However, previous methods usually face the spatial misalignment problem of fixation features and loss of critical temporal data (including temporal correlation and fixation duration). In this study, we propose a Transformer-based scanpath model, SpFormer, to alleviate these problems. First, we propose a fixation-centric paradigm to extract the aligned spatial fixation features and tokenize the scanpaths. Then, according to the visual working memory mechanism, we design a local meta attention to reduce the semantic redundancy of fixations and guide the model to focus on the meta scanpath. Finally, we progressively integrate the duration information and fuse it with the fixation features to solve the problem of ambiguous location with the Transformer block increasing. We conduct extensive experiments on four databases under three tasks. The SpFormer establishes new state-of-the-art results in distinct settings, verifying its flexibility and versatility in practical applications. The code can be obtained from https://github.com/wenqizhong/SpFormer.

ICML Conference 2024 Conference Paper

Task-aware Orthogonal Sparse Network for Exploring Shared Knowledge in Continual Learning

  • Yusong Hu
  • De Cheng
  • Dingwen Zhang
  • Nannan Wang 0001
  • Tongliang Liu
  • Xinbo Gao 0001

Continual learning (CL) aims to learn from sequentially arriving tasks without catastrophic forgetting (CF). By partitioning the network into two parts based on the Lottery Ticket Hypothesis—one for holding the knowledge of the old tasks while the other for learning the knowledge of the new task—the recent progress has achieved forget-free CL. Although addressing the CF issue well, such methods would encounter serious under-fitting in long-term CL, in which the learning process will continue for a long time and the number of new tasks involved will be much higher. To solve this problem, this paper partitions the network into three parts—with a new part for exploring the knowledge sharing between the old and new tasks. With the shared knowledge, this part of network can be learnt to simultaneously consolidate the old tasks and fit to the new task. To achieve this goal, we propose a task-aware Orthogonal Sparse Network (OSN), which contains shared knowledge induced network partition and sharpness-aware orthogonal sparse network learning. The former partitions the network to select shared parameters, while the latter guides the exploration of shared knowledge through shared parameters. Qualitative and quantitative analyses, show that the proposed OSN induces minimum to no interference with past tasks, i. e. , approximately no forgetting, while greatly improves the model plasticity and capacity, and finally achieves the state-of-the-art performances.

IJCAI Conference 2022 Conference Paper

Robust Single Image Dehazing Based on Consistent and Contrast-Assisted Reconstruction

  • De Cheng
  • Yan Li
  • Dingwen Zhang
  • Nannan Wang
  • Xinbo Gao
  • Jiande Sun

Single image dehazing as a fundamental low-level vision task, is essential for the development of robust intelligent surveillance system. In this paper, we make an early effort to consider dehazing robustness under variational haze density, which is a realistic while under-studied problem in the research filed of singe image dehazing. To properly address this problem, we propose a novel density-variational learning framework to improve the robustness of the image dehzing model assisted by a variety of negative hazy images, to better deal with various complex hazy scenarios. Specifically, the dehazing network is optimized under the consistency-regularized framework with the proposed Contrast-Assisted Reconstruction Loss (CARL). The CARL can fully exploit the negative information to facilitate the traditional positive-orient dehazing objective function, by squeezing the dehazed image to its clean target from different directions. Meanwhile, the consistency regularization keeps consistent outputs given multi-level hazy images, thus improving the model robustness. Extensive experimental results on two synthetic and three real-world datasets demonstrate that our method significantly surpasses the state-of-the-art approaches.

AAAI Conference 2020 Conference Paper

Deep Embedded Complementary and Interactive Information for Multi-View Classification

  • Jinglin Xu
  • Wenbin Li
  • Xinwang Liu
  • Dingwen Zhang
  • Ji Liu
  • Junwei Han

Multi-view classification optimally integrates various features from different views to improve classification tasks. Though most of the existing works demonstrate promising performance in various computer vision applications, we observe that they can be further improved by sufficiently utilizing complementary view-specific information, deep interactive information between different views, and the strategy of fusing various views. In this work, we propose a novel multi-view learning framework that seamlessly embeds various view-specific information and deep interactive information and introduces a novel multi-view fusion strategy to make a joint decision during the optimization for classification. Specifically, we utilize different deep neural networks to learn multiple view-specific representations, and model deep interactive information through a shared interactive network using the cross-correlations between attributes of these representations. After that, we adaptively integrate multiple neural networks by flexibly tuning the power exponent of weight, which not only avoids the trivial solution of weight but also provides a new approach to fuse outputs from different deterministic neural networks. Extensive experiments on several public datasets demonstrate the rationality and effectiveness of our method.

NeurIPS Conference 2020 Conference Paper

Few-Cost Salient Object Detection with Adversarial-Paced Learning

  • Dingwen Zhang
  • HaiBin Tian
  • Jungong Han

Detecting and segmenting salient objects from given image scenes has received great attention in recent years. A fundamental challenge in training the existing deep saliency detection models is the requirement of large amounts of annotated data. While gathering large quantities of training data becomes cheap and easy, annotating the data is an expensive process in terms of time, labor and human expertise. To address this problem, this paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only, thus dramatically alleviating human labor in training models. To this end, we name this new task as the few-cost salient object detection and propose an adversarial-paced learning (APL)-based framework to facilitate the few-cost learning scenario. Essentially, APL is derived from the self-paced learning (SPL) regime but it infers the robust learning pace through the data-driven adversarial learning mechanism rather than the heuristic design of the learning regularizer. Comprehensive experiments on four widely-used benchmark datasets have demonstrated that the proposed approach can effectively approach to the existing supervised deep salient object detection models with only 1k human-annotated training images.

TIST Journal 2018 Journal Article

A Review of Co-Saliency Detection Algorithms

  • Dingwen Zhang
  • Huazhu Fu
  • Junwei Han
  • Ali Borji
  • Xuelong Li

Co-saliency detection is a newly emerging and rapidly growing research area in the computer vision community. As a novel branch of visual saliency, co-saliency detection refers to the discovery of common and salient foregrounds from two or more relevant images, and it can be widely used in many computer vision tasks. The existing co-saliency detection algorithms mainly consist of three components: extracting effective features to represent the image regions, exploring the informative cues or factors to characterize co-saliency, and designing effective computational frameworks to formulate co-saliency. Although numerous methods have been developed, the literature is still lacking a deep review and evaluation of co-saliency detection techniques. In this article, we aim at providing a comprehensive review of the fundamentals, challenges, and applications of co-saliency detection. Specifically, we provide an overview of some related computer vision works, review the history of co-saliency detection, summarize and categorize the major algorithms in this research area, discuss some open issues in this area, present the potential applications of co-saliency detection, and finally point out some unsolved challenges and promising future works. We expect this review to be beneficial to both fresh and senior researchers in this field and to give insights to researchers in other related areas regarding the utility of co-saliency detection algorithms.

AAAI Conference 2018 Conference Paper

Multi-Rate Gated Recurrent Convolutional Networks for Video-Based Pedestrian Re-Identification

  • Zhihui Li
  • Lina Yao
  • Feiping Nie
  • Dingwen Zhang
  • Min Xu

Matching pedestrians across multiple camera views has attracted lots of recent research attention due to its apparent importance in surveillance and security applications. While most existing works address this problem in a still-image setting, we consider the more informative and challenging video-based person re-identification problem, where a video of a pedestrian as seen in one camera needs to be matched to a gallery of videos captured by other non-overlapping cameras. We employ a convolutional network to extract the appearance and motion features from raw video sequences, and then feed them into a multi-rate recurrent network to exploit the temporal correlations, and more importantly, to take into account the fact that pedestrians, sometimes even the same pedestrian, move in different speeds across different camera views. The combined network is trained in an end-to-end fashion, and we further propose an initialization strategy via context reconstruction to largely improve the performance. We conduct extensive experiments on the iLIDS-VID and PRID-2011 datasets, and our experimental results confirm the effectiveness and the generalization ability of our model.

IJCAI Conference 2017 Conference Paper

How Unlabeled Web Videos Help Complex Event Detection?

  • Huan Liu
  • Qinghua Zheng
  • Minnan Luo
  • Dingwen Zhang
  • Xiaojun Chang
  • Cheng Deng

The lack of labeled exemplars is an important factor that makes the task of multimedia event detection (MED) complicated and challenging. Utilizing artificially picked and labeled external sources is an effective way to enhance the performance of MED. However, building these data usually requires professional human annotators, and the procedure is too time-consuming and costly to scale. In this paper, we propose a new robust dictionary learning framework for complex event detection, which is able to handle both labeled and easy-to-get unlabeled web videos by sharing the same dictionary. By employing the lq-norm based loss jointly with the structured sparsity based regularization, our model shows strong robustness against the substantial noisy and outlier videos from open source. We exploit an effective optimization algorithm to solve the proposed highly non-smooth and non-convex problem. Extensive experiment results over standard datasets of TRECVID MEDTest 2013 and TRECVID MEDTest 2014 demonstrate the effectiveness and superiority of the proposed framework on complex event detection.

IJCAI Conference 2017 Conference Paper

Self-paced Mixture of Regressions

  • Longfei Han
  • Dingwen Zhang
  • Dong Huang
  • Xiaojun Chang
  • Jun Ren
  • Senlin Luo
  • Junwei Han

Mixture of regressions (MoR) is the well-established and effective approach to model discontinuous and heterogeneous data in regression problems. Existing MoR approaches assume smooth joint distribution for its good anlaytic properties. However, such assumption makes existing MoR very sensitive to intra-component outliers (the noisy training data residing in certain components) and the inter-component imbalance (the different amounts of training data in different components). In this paper, we make the earliest effort on Self-paced Learning (SPL) in MoR, i. e. , Self-paced mixture of regressions (SPMoR) model. We propose a novel self-paced regularizer based on the Exclusive LASSO, which improves inter-component balance of training data. As a robust learning regime, SPL pursues confidence sample reasoning. To demonstrate the effectiveness of SPMoR, we conducted experiments on both the sythetic examples and real-world applications to age estimation and glucose estimation. The results show that SPMoR outperforms the state-of-the-arts methods.

IJCAI Conference 2016 Conference Paper

Bridging Saliency Detection to Weakly Supervised Object Detection Based on Self-Paced Curriculum Learning

  • Dingwen Zhang
  • Deyu Meng
  • Long Zhao
  • Junwei Han

Weakly-supervised object detection (WOD) is a challenging problems in computer vision. The key problem is to simultaneously infer the exact object locations in the training images and train the object detectors, given only the training images with weak image-level labels. Intuitively, by simulating the selective attention mechanism of human visual system, saliency detection technique can select attractive objects in scenes and thus is a potential way to provide useful priors for WOD. However, the way to adopt saliency detection in WOD is not trivial since the detected saliency region might be possibly highly ambiguous in complex cases. To this end, this paper first comprehensively analyzes the challenges in applying saliency detection to WOD. Then, we make one of the earliest efforts to bridge saliency detection to WOD via the self-paced curriculum learning, which can guide the learning procedure to gradually achieve faithful knowledge of multi-class objects from easy to hard. The experimental results demonstrate that the proposed approach can successfully bridge saliency detection and WOD tasks and achieve the state-of-the-art object detection results under the weak supervision.

IJCAI Conference 2016 Conference Paper

Towards Intelligent Visual Understanding under Minimal Supervision

  • Dingwen Zhang

Because of playing one of the most important roles in the artificial intelligent systems like robots, visual understanding has gained vast interests in the past few decades. Most of the existing approaches need human labelled training data to train the learning models for visual understanding and in the most recent years, significant performance gain was obtained relying on unparalleled tremendous amount of human labelled training data. Under this circumstance, people are endowed with great burden to cost energy and time on the tedious data annotation for the traditional visual understanding approaches. To alleviate this problem, we propose to develop novel visual understanding algorithms which can learn informative visual patterns under minimal (none or very weak) supervision and thus facilitate higher-level intelligence of the visual understanding systems. Specifically, we focus on three subtopics, i. e. , saliency detection, co-saliency detection, and weakly supervised learning based object detection, which can be used in both the image and video understanding systems. The experimental results have demonstrated the effectiveness of the proposed algorithms.