Arrow Research search

Author name cluster

Dawei Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
1 author row

Possible papers

8

AAAI Conference 2026 Conference Paper

Exploiting All Mamba Fusion for Efficient RGB-D Tracking

  • Ge Ying
  • Dawei Zhang
  • Chengzhuan Yang
  • Wei Liu
  • Sang-Woon Jeon
  • Hua Wang
  • Changqin Huang
  • Zhonglong Zheng

Despite the progress made through deep learning, existing Visual Object Tracking (VOT) frameworks struggle with real-world challenges. Recent approaches incorporate additional modalities like Depth, Thermal Infrared, and Language to enhance the robustness of VOT, particularly with the improvement of the depth sensor precision, facilitating RGB-D tracking. However, current RGB-D trackers often copy RGB tracking paradigms, leading to inefficiency due to two-stream architectures that fail to exploit heterogeneous features, and reliance on simplistic or large-parameter fusion methods. To address these challenges, we propose AMTrack, a one-stream RGB-D tracker leveraging Mamba's linear complexity for simultaneous feature extraction and two-stage cross-modal feature fusion. Our innovation also includes a low-parameter Multimodal Mix Mamba (3M) module, which optimizes deep feature fusion and reduces computational overhead. The advantage of the 3M module stems from our Multimodal State Space Model (MSSM), a multimodal feature interaction component reconstructed based on SSM. Experiments across multiple RGB-D tracking datasets indicate that AMTrack achieves superior performance with lower parameters and memory demands compared to state-of-the-arts.

AAAI Conference 2026 Conference Paper

IGIANet: Illumination Guided Implicit Alignment Network for Infrared–Visible UAV Detection

  • Xiangqi Chen
  • Dawei Zhang
  • Li Zhao
  • Chengzhuan Yang
  • Zhongyu Chen
  • Jungang Lou
  • Zhonglong Zheng
  • Sang-Woon Jeon

Visible-Infrared (RGB-IR) Unmanned Aerial Vehicle (UAV) object detection integrates complementary cues from visible and infrared sensors, offering broad application potential. However, due to sensor parallax, it still faces the challenge of weak spatial misalignment, which significantly limits its performance in UAV-based object detection. Existing methods emphasize strict alignment, overlooking spectral heterogeneity under varying illumination. To address these issues, we propose the Illumination Guided Implicit Alignment Network (IGIANet) to mitigate modality heterogeneity without explicit alignment. Specifically, we integrate three novel modules. First, we propose an illumination-guided frequency modulation module that adaptively allocates fusion weights to visible and infrared features based on global illumination estimation, effectively alleviating modality imbalance under varying lighting conditions. Second, we introduce a frequency-guided cross-modality differential enhancement module, which computes differential cues across frequency domains to enhance complementary information and highlight weakly aligned and low-contrast regions. Finally, we introduce an implicit alignment-driven dynamic fusion module that actively estimates offsets and generates dynamic, position-adaptive fusion kernels to align and fuse modalities. Extensive experiments demonstrate that IGIANet outperforms state-of-the-art models on various benchmarks, achieving 80.9% mAP on DroneVehicle, 57.1% mAP on VEDAI, and 49.4% mAP on FLIR.

IJCAI Conference 2021 Conference Paper

Knowledge-Aware Dialogue Generation via Hierarchical Infobox Accessing and Infobox-Dialogue Interaction Graph Network

  • Sixing Wu
  • Minghui Wang
  • Dawei Zhang
  • Yang Zhou
  • Ying Li
  • Zhonghai Wu

Due to limited knowledge carried by queries, traditional dialogue systems often face the dilemma of generating boring responses, leading to poor user experience. To alleviate this issue, this paper proposes a novel infobox knowledge-aware dialogue generation approach, HITA-Graph, with three unique features. First, open-domain infobox tables that describe entities with relevant attributes are adopted as the knowledge source. An order-irrelevance Hierarchical Infobox Table Encoder is proposed to represent an infobox table at three levels of granularity. In addition, an Infobox-Dialogue Interaction Graph Network is built to effectively integrate the infobox context and the dialogue context into a unified infobox representation. Second, a Hierarchical Infobox Attribute Attention mechanism is developed to access the encoded infobox knowledge at different levels of granularity. Last but not least, a Dynamic Mode Fusion strategy is designed to allow the Decoder to select a vocabulary word or copy a word from the given infobox/query. We extract infobox tables from Chinese Wikipedia and construct an infobox knowledge base. Extensive evaluation on an open-released Chinese corpus demonstrates the superior performance of our approach against several representative methods.

AAAI Conference 2021 Conference Paper

Visual Tracking via Hierarchical Deep Reinforcement Learning

  • Dawei Zhang
  • Zhonglong Zheng
  • Riheng Jia
  • Minglu Li

Visual tracking has achieved great progress due to numerous different algorithms. However, deep trackers based on classification or Siamese network still have their specific limitations. In this work, we show how to teach machines to track a generic object in videos like humans, who can use a few search steps to perform tracking. By constructing a Markov decision process in Deep Reinforcement Learning (DRL), our agents can learn to determine hierarchical decisions on tracking mode and motion estimation. To be specific, our Hierarchical DRL framework is composed of a Siamese-based observation network which models the motion information of an arbitrary target, a policy network for mode switch and an actor-critic network for box regression. This tracking strategy is more in line with human behavior paradigm, and is effective and efficient to cope with fast motion, background clutter and large deformations. Extensive experiments on the GOT- 10k, OTB-100, UAV-123, VOT and LaSOT tracking benchmarks, demonstrate that the proposed tracker achieves stateof-the-art performance while running in real-time.

AAAI Conference 2020 Conference Paper

ParamE: Regarding Neural Network Parameters as Relation Embeddings for Knowledge Graph Completion

  • Feihu Che
  • Dawei Zhang
  • Jianhua Tao
  • Mingyue Niu
  • Bocheng Zhao

We study the task of learning entity and relation embeddings in knowledge graphs for predicting missing links. Previous translational models on link prediction make use of translational properties but lack enough expressiveness, while the convolution neural network based model (ConvE) takes advantage of the great nonlinearity fitting ability of neural networks but overlooks translational properties. In this paper, we propose a new knowledge graph embedding model called ParamE which can utilize the two advantages together. In ParamE, head entity embeddings, relation embeddings and tail entity embeddings are regarded as the input, parameters and output of a neural network respectively. Since parameters in networks are effective in converting input to output, taking neural network parameters as relation embeddings makes ParamE much more expressive and translational. In addition, the entity and relation embeddings in ParamE are from feature space and parameter space respectively, which is in line with the essence that entities and relations are supposed to be mapped into two different spaces. We evaluate the performances of ParamE on standard FB15k-237 and WN18RR datasets, and experiments show ParamE can significantly outperform existing state-of-the-art models, such as ConvE, SACN, RotatE and D4-STE/Gumbel.

IJCAI Conference 2020 Conference Paper

TopicKA: Generating Commonsense Knowledge-Aware Dialogue Responses Towards the Recommended Topic Fact

  • Sixing Wu
  • Ying Li
  • Dawei Zhang
  • Yang Zhou
  • Zhonghai Wu

Insufficient semantic understanding of dialogue always leads to the appearance of generic responses, in generative dialogue systems. Recently, high-quality knowledge bases have been introduced to enhance dialogue understanding, as well as to reduce the prevalence of boring responses. Although such knowledge-aware approaches have shown tremendous potential, they always utilize the knowledge in a black-box fashion. As a result, the generation process is somewhat uncontrollable, and it is also not interpretable. In this paper, we introduce a topic fact-based commonsense knowledge-aware approach, TopicKA. Different from previous works, TopicKA generates responses conditioned not only on the query message but also on a topic fact with an explicit semantic meaning, which also controls the direction of generation. Topic facts are recommended by a recommendation network trained under the Teacher-Student framework. To integrate the recommendation network and the generation network, this paper designs four schemes, which include two non-sampling schemes and two sampling methods. We collected and constructed a large-scale Chinese commonsense knowledge graph. Experimental results on an open Chinese benchmark dataset indicate that our model outperforms baselines in terms of both the objective and the subjective metrics.

IJCAI Conference 2017 Conference Paper

Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification

  • Jin Wang
  • Zhongyuan Wang
  • Dawei Zhang
  • Jun Yan

Text classification is a fundamental task in NLP applications. Most existing work relied on either explicit or implicit text representation to address this problem. While these techniques work well for sentences, they can not easily be applied to short text because of its shortness and sparsity. In this paper, we propose a framework based on convolutional neural networks that combines explicit and implicit representations of short text for classification. We first conceptualize a short text as a set of relevant concepts using a large taxonomy knowledge base. We then obtain the embedding of short text by coalescing the words and relevant concepts on top of pre-trained word vectors. We further incorporate character level features into our model to capture fine-grained subword information. Experimental results on five commonly used datasets show that our proposed method significantly outperforms state-of-the-art methods.