Arrow Research search

Author name cluster

Mingjin Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
1 author row

Possible papers

13

AAAI Conference 2026 Conference Paper

CO²IF: Language-Bridging Hyperspectral-Multispectral Image Fusion with Coordinated and Cross-modal Optimal Transport

  • Mingjin Zhang
  • Zhongkai Yang
  • Fei Gao

Due to the difficulties of directly obtaining high-resolution hyperspectral images (HR-HSI), the fusion of low-resolution hyperspectral images (LR-HSI) and high-resolution multispectral images (HR-MSI) has emerged as an effective approach. While existing methods leverage image-level priors from HR-MSI, they often lack explicit semantic guidance for precise detail reconstruction. Recognizing that textual scene descriptions encapsulate valuable object attributes and contextual information, we introduce the first Language-Bridging framework for Hyperspectral and Multispectral image fusion (CO²IF). CO²IF leverages language semantics as prior knowledge to explicitly guide the reconstruction process. To bridge the modality gap between textual descriptions and high-dimensional hyperspectral data, we design a Cross-modal Optimal Transport (COT) module. COT establishes precise semantic correspondences between language features and the visual cues of individual spectral bands. Building upon this semantic alignment, we develop a Multimodal Coordinated State Space Model (CoMamba). CoMamba effectively integrates the language-derived priors with spatial information from HR-MSI and spectral information from LR-HSI. This language-guided reconstruction significantly enhances the extraction of crucial spatial-spectral details, leading to superior fidelity in the generated HR-HSI. In addition, this paper adds text descriptions for three widely used datasets. Both qualitative and quantitative experimental results on the public datasets confirm the superiority of the proposed method compared to the SOTA methods.

AAAI Conference 2026 Conference Paper

S-DAG: A Subject-Based Directed Acyclic Graph for Multi-Agent Heterogeneous Reasoning

  • Jiangwen Dong
  • Zehui Lin
  • Wanyu Lin
  • Mingjin Zhang

Large Language Models (LLMs) have achieved impressive performance in complex reasoning problems. Their effectiveness highly depends on the specific nature of the task, especially the required domain knowledge. Existing approaches, such as mixture-of-experts, typically operate at the task level; they are too coarse to effectively solve the heterogeneous problems involving multiple subjects. This work proposes a novel framework that performs fine-grained analysis at subject level equipped with a designated multi-agent collaboration strategy for addressing heterogeneous problem reasoning. Specifically, given an input query, we first employ a Graph Neural Network to identify the relevant subjects and infer their interdependencies to generate an Subject-based Directed Acyclic Graph (S-DAG), where nodes represent subjects and edges encode information flow. Then we profile the LLM models by assigning each model a subject-specific expertise score, and select the top-performing one for matching corresponding subject of the S-DAG. Such subject-model matching enables graph-structured multi-agent collaboration where information flows from the starting model to the ending model over S-DAG. We curate and release multi-subject subsets of standard benchmarks (MMLU-Pro, GPQA, MedMCQA) to better reflect complex, real-world reasoning tasks. Extensive experiments show that our approach significantly outperforms existing task-level model selection and multi-agent collaboration baselines in accuracy and efficiency. These results highlight the effectiveness of subject-aware reasoning and structured collaboration in addressing complex and multi-subject problems.

AAAI Conference 2025 Conference Paper

IRMamba: Pixel Difference Mamba with Layer Restoration for Infrared Small Target Detection

  • Mingjin Zhang
  • Xiaolong Li
  • Fei Gao
  • Jie Guo

Infrared small target detection (IRSTD) focuses on identifying small targets in infrared images. Despite advancements with deep learning, challenges persist due to the IR long-range imaging mechanism, where targets are small, dim, and easily lost in noise and background clutter. Current deep learning methods struggle to suppress noise and background interference while preserving fine details, leading to missed detections and false alarms. To address these issues, we propose IRMamba, an encoder-decoder architecture featuring Pixel Difference Mamba (PDMamba) and a Layer Restoration Module (LRM). Specifically, PDMamba integrates the intensity and directional information of pixel differences between scanning positions and their central neighborhoods into the state equation of the state space model (SSM). This enhances target detail representation and suppresses background interference by capturing local 2D dependencies from a global perspective. In addition, LRM incorporates the double-depth image prior into the iterative convergence algorithm, and utilizes the inter-layer interrelationships to gradually reverse the separation of the target layer, achieving noise suppression and refined reconstruction of the image mask. Experiments conducted on multiple public datasets, including NUAA-SIRST, NUDT-SIRST, and IRSTD-1K, demonstrate the significant advantages of IRMamba over SOTA methods.

AAAI Conference 2025 Conference Paper

MOCID: Motion Context and Displacement Information Learning for Moving Infrared Small Target Detection

  • Mingjin Zhang
  • Yuanjun Ouyang
  • Fei Gao
  • Jie Guo
  • Qiming Zhang
  • Jing Zhang

In the field of Moving Infrared Small Target Detection (MIRSTD), current methods typically use sequential modeling with two individual modules for spatial and temporal processing. However, such a modeling strategy lacks clear guidance on the motion and displacement difference between moving targets and background noise, thereby limiting the feature discriminability and resulting in error-prone target localization. This paper addresses this issue from clip and frame levels and proposes a novel architecture MOCID for MIRSTD. For clip-level feature fusion, we design a spatio-temporal backbone consisting of several proposed Fourier-inspired Spatio-temporal Attention (FISTA) layers. Each FISTA layer sequentially processes the features from spatial and temporal views to capture clip-level temporal motion context, where Fourier Transformation and Inverse Fourier Transformation are employed for each view. This context is then embedded into dynamic convolutional kernels for subsequent spatial feature extraction, thereby enabling clear motion difference guidance and generating comprehensive features. For frame-level feature fusion, we design a Displacement-aware Mamba Module (DAM) to capture detailed frame-to-frame displacement information. DAM utilizes an innovative Temporal Interpolation and Displacement-aware Scan technique to perform spatio-temporal difference-aware displacement modeling, introducing elaborate temporal indicators into feature extraction. Combining the above improvements, our model captures comprehensive motion and displacement contexts, significantly improving the detection of the small target. Extensive experiments demonstrate that MOCID achieves state-of-the-art detection accuracy on popular IRDST and DAUB datasets. Furthermore, MOCID offers a superior balance between throughput and performance compared to other methods. The code for this work will be made publicly available.

IJCAI Conference 2025 Conference Paper

Multimodal Prior Learning with Double Constraint Alignment for Snapshot Spectral Compressive Imaging

  • Mingjin Zhang
  • Longyi Li
  • Fei Gao
  • Qiming Zhang
  • Jie Guo

The objective of snapshot spectral compressive imaging reconstruction is to recover the 3D hyperspectral image (HSI) from a 2D measurement. Existing methods either focus on network architecture design or simply introduce image-level prior to the model. However, these methods lack guiding information for accurate reconstruction. Recognizing that textual description contain rich semantic information that can significantly enhance details, this paper introduces a novel framework, CAMM, which integrates text information into the model to improve the performance. The framework comprises two key components: Fine-grained Alignment Module (FAM) and Multimodal Fusion Mamba (MFM). Specifically, FAM is used to reduce the knowledge gap between the RGB domain obtained by the pre-trained vision-language model and the HSI domain. Through the double constraints of distribution similarity and entropy, the adaptive alignment of different complexity features is realized, which makes the encoded features more accurate. MFM aims to identify the guiding effect of RGB features and text features on HSI in space and channel dimensions. Instead of fusing features directly, it integrates prior at image-level and text-level prior into Mamba's state-space equation, so that each scanning step can be accurately guided. This kind of positive feedback adjustment ensures the authenticity of the guiding information. To our knowledge, this is the first text-guided model for compressive spectral imaging. Extensive experimental results the public datasets demonstrate the superior performance of CAMM, validating the effectiveness of our proposed method.

AAAI Conference 2025 Conference Paper

Semi-supervised Infrared Small Target Detection with Thermodynamic-Inspired Uneven Perturbation and Confidence Adaptation

  • Mingjin Zhang
  • Wenteng Shang
  • Fei Gao
  • Qiming Zhang
  • FengQin Lu
  • Jing Zhang

Single-frame Infrared Small Target (SIRST) detection has made significant advancements, but it still faces challenges due to limited labeled data and the foreground-background class imbalance. To address these issues, we introduce a novel Semi-Supervised SIRST Detection (S^3D) pipeline in this paper. First, drawing inspiration from thermodynamics, we propose augmenting infrared images using both chromatically and spatially uneven perturbations. This dual-stream perturbation enhances the diversity and balance of infrared samples, contributing to the robustness of detection models. Additionally, we develop a confidence-adaptive matching method to maintain weighted consistency among perturbed unlabeled samples. Second, to tackle class imbalance in labeled data, we compel the model to generate discriminative predictions for challenging, misclassified examples while down-weighting well-classified examples. We achieve this by modifying the standard cross-entropy loss to squeeze the detector and truncating the loss on well-classified examples. Our innovative Truncated Squeeze (TS) loss focuses on learning discriminative representations for difficult cases and prevents over-optimization for simpler ones. To assess the effectiveness of the perturbation techniques and loss functions, we apply them to various SIRST detectors and conduct comprehensive experiments on two benchmark datasets. Notably, our proposed methods consistently and significantly improve accuracy. Remarkably, our approach achieves over 98% performance of the state-of-the-art fully-supervised method using only 1/8 of the labeled samples.

AAAI Conference 2024 Conference Paper

IRPruneDet: Efficient Infrared Small Target Detection via Wavelet Structure-Regularized Soft Channel Pruning

  • Mingjin Zhang
  • Handi Yang
  • Jie Guo
  • Yunsong Li
  • Xinbo Gao
  • Jing Zhang

Infrared Small Target Detection (IRSTD) refers to detecting faint targets in infrared images, which has achieved notable progress with the advent of deep learning. However, the drive for improved detection accuracy has led to larger, intricate models with redundant parameters, causing storage and computation inefficiencies. In this pioneering study, we introduce the concept of utilizing network pruning to enhance the efficiency of IRSTD. Due to the challenge posed by low signal-to-noise ratios and the absence of detailed semantic information in infrared images, directly applying existing pruning techniques yields suboptimal performance. To address this, we propose a novel wavelet structure-regularized soft channel pruning method, giving rise to the efficient IRPruneDet model. Our approach involves representing the weight matrix in the wavelet domain and formulating a wavelet channel pruning strategy. We incorporate wavelet regularization to induce structural sparsity without incurring extra memory usage. Moreover, we design a soft channel reconstruction method that preserves important target information against premature pruning, thereby ensuring an optimal sparse structure while maintaining overall sparsity. Through extensive experiments on two widely-used benchmarks, our IRPruneDet method surpasses established techniques in both model complexity and accuracy. Specifically, when employing U-net as the baseline network, IRPruneDet achieves a 64.13% reduction in parameters and a 51.19% decrease in FLOPS, while improving IoU from 73.31% to 75.12% and nIoU from 70.92% to 74.30%. The code is available at https://github.com/hd0013/IRPruneDet.

IJCAI Conference 2022 Conference Paper

SAR-to-Optical Image Translation via Neural Partial Differential Equations

  • Mingjin Zhang
  • Chengyu He
  • Jing Zhang
  • Yuxiang Yang
  • Xiaoqi Peng
  • Jie Guo

Synthetic Aperture Radar (SAR) becomes prevailing in remote sensing while SAR images are challenging to interpret by human visual perception due to the active imaging mechanism and speckle noise. Recent researches on SAR-to-optical image translation provide a promising solution and have attracted increasing attentions, though still suffering from low optical image quality with geometric distortion due to the large domain gap. In this paper, we mitigate this issue from a novel perspective, i. e. , neural partial differential equations (PDE). First, based on the efficient numerical scheme for solving PDE, i. e. , Taylor Central Difference (TCD), we devise a basic TCD residual block to build the backbone network, which promotes the extraction of useful information in SAR images by aggregating and enhancing features from different levels. Furthermore, inspired by the Perona-Malik Diffusion (PMD), we devise a PMD neural module to implement feature diffusion through layers, aiming at removing the noises in smooth regions while preserving the geometric structures. Assembling them together, we propose a novel SAR-to-Optical image translation network named S2O-NPDE, which delivers optical images with finer structures and less noise while enjoying an explainability advantage from explicit mathematical derivation. Experiments on the popular SEN1-2 dataset show that our model outperforms state-of-the-art methods in terms of both objective metrics and visual quality.

AAAI Conference 2018 Conference Paper

Face Sketch Synthesis From Coarse to Fine

  • Mingjin Zhang
  • Nannan Wang
  • Yunsong Li
  • Ruxin Wang
  • Xinbo Gao

Synthesizing fine face sketches from photos is a valuable yet challenging problem in digital entertainment. Face sketches synthesized by conventional methods usually exhibit coarse structures of faces, whereas fine details are lost especially on some critical facial components. In this paper, by imitating the coarse-to-fine drawing process of artists, we propose a novel face sketch synthesis framework consisting of a coarse stage and a fine stage. In the coarse stage, a mapping relationship between face photos and sketches is learned via the convolutional neural network. It ensures that the synthesized sketches keep coarse structures of faces. Given the test photo and the coarse synthesized sketch, a probabilistic graphic model is designed to synthesize the delicate face sketch which has fine and critical details. Experimental results on public face sketch databases illustrate that our proposed framework outperforms the state-of-the-art methods in both quantitive and visual comparisons.

IJCAI Conference 2018 Conference Paper

Markov Random Neural Fields for Face Sketch Synthesis

  • Mingjin Zhang
  • Nannan Wang
  • Xinbo Gao
  • Yunsong Li

Synthesizing face sketches with both common and specific information from photos has been recently attracting considerable attentions in digital entertainment. However, the existing approaches either make the strict similarity assumption on face sketches and photos, leading to lose some identity-specific information, or learn the direct mapping relationship from face photos to sketches by the simple neural network, resulting in the lack of some common information. In this paper, we propose a novel face sketch synthesis based on the Markov random neural fields including two structures. In the first structure, we utilize the neural network to learn the non-linear photo-sketch relationship and obtain the identity-specific information of the test photo, such as glasses, hairpins and hairstyles. In the second structure, we choose the nearest neighbors of the test photo patch and the sketch pixel synthesized in the first structure from the training data which ensure the common information of Miss or Mr Average. Experimental results on the Chinese University of Hong Kong face sketch database illustrate that our proposed framework can preserve the common structure and capture the characteristic features. Compared with the state-of-the-art methods, our method achieves better results in terms of both quantitative and qualitative experimental evaluations.

TIST Journal 2015 Journal Article

TerraFly GeoCloud

  • Mingjin Zhang
  • Huibo Wang
  • Yun Lu
  • Tao Li
  • Yudong Guang
  • Chang Liu
  • Erik Edrosa
  • Hongtai Li

With the exponential growth of the usage of web map services, geo-data analysis has become more and more popular. This article develops an online spatial data analysis and visualization system, TerraFly GeoCloud, which helps end-users visualize and analyze spatial data and share the analysis results. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements. The system is available at http://terrafly.fiu.edu/GeoCloud/.