Arrow Research search

Author name cluster

Xinghao Ding

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers
2 author rows

Possible papers

25

AAAI Conference 2026 Conference Paper

Decompose and Attribute: Boosting Generalizable Open-Set Object Detection via Objectness Score

  • Yuxuan Yuan
  • Lichen Wei
  • Luyao Tang
  • Chaoqi Chen
  • Zheyuan Cai
  • Yue Huang
  • Xinghao Ding

Open-set object detection (OSOD) aims to recognize known object categories while localizing previously unseen instances. However, real-world scenarios often involve co-occurring domain shifts and novel object categories. Existing OSOD methods typically overlook domain shifts, relying on source-trained representations that entangle domain-specific style with semantic content, thereby hindering generalization to both unseen domains and novel categories. To address this challenge, we propose a unified framework, termed DecOmpose and ATtribute (DOAT), which disentangles domain-specific style from semantic structure, thereby facilitating generalizable object detection. DOAT employs wavelet-based feature decomposition to separate style information from high-frequency structural details, thus enabling an explicit separation of domain and category shifts. To account for domain shift, the low-frequency components are perturbed within a style subspace to simulate diverse domain appearances. For unknown object discovery, the high-frequency components are utilized to estimate objectness scores via an attribution mechanism that fuses wavelet energy with semantic distance to known-category prototypes. Extensive experiments on standard open-set benchmarks have demonstrated the superior generalization performance of DOAT.

AAAI Conference 2026 Conference Paper

MMMamba: A Versatile Cross-Modal in Context Fusion Framework for Pan-Sharpening and Zero-Shot Image Enhancement

  • Yingying Wang
  • Xuanhua He
  • Chen Wu
  • Jialing Huang
  • Suiyun Zhang
  • Rui Liu
  • Xinghao Ding
  • Haoxuan Che

Pan-sharpening aims to generate high-resolution multispectral (HRMS) images by integrating a high-resolution panchromatic (PAN) image with its corresponding low-resolution multispectral (MS) image. To achieve effective fusion, it is crucial to fully exploit the complementary information between the two modalities. Traditional CNN-based methods typically rely on channel-wise concatenation with fixed convolutional operators, which limits their adaptability to diverse spatial and spectral variations. While cross-attention mechanisms enable global interactions, they are computationally inefficient and may dilute fine-grained correspondences, making it difficult to capture complex semantic relationships. Recent advances in the Multimodal Diffusion Transformer (MMDiT) architecture have demonstrated impressive success in image generation and editing tasks. Unlike cross-attention, MMDiT employs in-context conditioning to facilitate more direct and efficient cross-modal information exchange. In this paper, we propose MMMamba, a cross-modal in-context fusion framework for pan-sharpening, with the flexibility to support image super-resolution in a zero-shot manner. Built upon the Mamba architecture, our design ensures linear computational complexity while maintaining strong cross-modal interaction capacity. Furthermore, we introduce a novel multimodal interleaved (MI) scanning mechanism that facilitates effective information exchange between the PAN and MS modalities. Extensive experiments demonstrate the superior performance of our method compared to existing state-of-the-art (SOTA) techniques across multiple tasks and benchmarks.

AAAI Conference 2026 Conference Paper

Self-supervised Multiplex Consensus Mamba for General Image Fusion

  • Yingying Wang
  • Rongjin Zhuang
  • Hui Zheng
  • Xuanhua He
  • Ke Cao
  • Xiaotong Tu
  • Xinghao Ding

Image fusion integrates complementary information from different modalities to generate high-quality fused images, thereby enhancing downstream tasks such as object detection and semantic segmentation. Unlike task-specific techniques that primarily focus on consolidating inter-modal information, general image fusion needs to address a wide range of tasks while improving performance without increasing complexity. To achieve this, we propose SMC-Mamba, a Self-supervised Multiplex Consensus Mamba framework for general image fusion. Specifically, the Modality-Agnostic Feature Enhancement (MAFE) module preserves fine details through adaptive gating and enhances global representations via spatial-channel and frequency rotational scanning. The Multiplex Consensus Cross-modal Mamba (MCCM) module enables dynamic collaboration among experts, reaching a consensus to efficiently integrate complementary information from multiple modalities. The cross-modal scanning within MCCM further strengthens feature interactions across modalities, facilitating seamless integration of critical information from both sources. Additionally, we introduce a Bi-level Self-supervised Contrastive Learning Loss (BSCL), which preserves high-frequency information without increasing computational overhead while simultaneously boosting performance in downstream tasks. Extensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) image fusion algorithms in tasks such as infrared-visible, medical, multi-focus, and multi-exposure fusion, as well as downstream visual tasks.

AAAI Conference 2025 Conference Paper

Accelerated Diffusion via High-Low Frequency Decomposition for Pan-Sharpening

  • Ge Meng
  • Jingjia Huang
  • Jingyan Tu
  • Yingying Wang
  • Yunlong Lin
  • Xiaotong Tu
  • Yue Huang
  • Xinghao Ding

Pan-sharpening aims to preserve the spectral information of the multi-spectral (MS) image while leveraging the high-frequency details from the guided high-resolution panchromatic (PAN) image to enhance its spatial resolution. The key challenge is how to preserve the spectral information from the MS image and the spatial details from the PAN image as much as possible. Diffusion models have achieved favorable results in image restoration and synthesis tasks but suffer from excessive computational resource and time consumption. In this paper, we design a novel and computationally efficient diffusion-based pan-sharpening network that achieves accelerated diffusion while reducing task complexity by decoupling the high and low-frequency components of the fused image. Specifically, leveraging the information-preserving characteristic of the wavelet transformation, we introduce a Wavelet-based Low-frequency Diffusion Model (WLDM). WLDM generates the low-frequency coefficient of high-resolution MS (HRMS) image from the low-resolution MS (LRMS) image. This approach significantly reduces computational resources and complexity compared to the direct restoration of the HRMS image. Furthermore, we have devised a High-frequency Information Restoration Module (HIRM) to restore the high-frequency information in the HRMS image through the interaction of high-frequency coefficients from the PAN image in three directions. Extensive experiments on three different datasets demonstrate that our method outperforms existing approaches in both quantitative metrics, qualitative metrics, and inference efficiency.

AAAI Conference 2025 Conference Paper

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement

  • Yunlong Lin
  • Tian Ye
  • Sixiang Chen
  • Zhenqi Fu
  • Yingying Wang
  • Wenhao Chai
  • Zhaohu Xing
  • Wenxue Li

Existing low-light image enhancement (LIE) methods have achieved noteworthy success in solving synthetic distortions, yet they often fall short in practical applications. The limitations arise from two inherent challenges in real-world LIE: 1) the collection of distorted/clean image pairs is often impractical and sometimes even unavailable, and 2) accurately modeling complex degradations presents a non-trivial problem. To overcome them, we propose the Attribute Guidance Diffusion framework (AGLLDiff), a training-free method for effective real-world LIE. Instead of specifically defining the degradation process, AGLLDiff shifts the paradigm and models the desired attributes, such as image exposure, structure and color of normal-light images. These attributes are readily available and impose no assumptions about the degradation process, which guides the diffusion sampling process to a reliable high-quality solution space. Extensive experiments demonstrate that our approach outperforms the current leading unsupervised LIE methods across benchmarks in terms of distortion-based and perceptual-based metrics, and it performs well even in sophisticated wild degradation.

ICML Conference 2025 Conference Paper

Demeaned Sparse: Efficient Anomaly Detection by Residual Estimate

  • Yifan Fang
  • Yifei Fang
  • Ruizhe Chen
  • Haote Xu
  • Xinghao Ding
  • Yue Huang 0001

Frequency-domain image anomaly detection methods can substantially enhance anomaly detection performance, however, they still lack an interpretable theoretical framework to guarantee the effectiveness of the detection process. We propose a novel test to detect anomalies in structural image via a Demeaned Fourier transform (DFT) under factor model framework, and we proof its effectiveness. We also briefly give the asymptotic theories of our test, the asymptotic theory explains why the test can detect anomalies at both the image and pixel levels within the theoretical lower bound. Based on our test, we derive a module called Demeaned Fourier Sparse (DFS) that effectively enhances detection performance in unsupervised anomaly detection tasks, which can construct masks in the Fourier domain and utilize a distribution-free sampling method similar to the bootstrap method. The experimental results indicate that this module can accurately and efficiently generate effective masks for reconstruction-based anomaly detection tasks, thereby enhancing the performance of anomaly detection methods and validating the effectiveness of the theoretical framework.

AAAI Conference 2025 Conference Paper

DPLUT: Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors

  • Yunlong Lin
  • Zhenqi Fu
  • Kairun Wen
  • Tian Ye
  • Sixiang Chen
  • Ge Meng
  • Yingying Wang
  • Chui Kong

Low-light image enhancement (LIE) aims at precisely and efficiently recovering an image degraded in poor illumination environments. Recent advanced LIE techniques are using deep neural networks, which require lots of low-normal light image pairs, network parameters, and computational resources. As a result, their practicality is limited. In this work, we devise a novel unsupervised LIE framework based on diffusion priors and lookup tables (DPLUT) to achieve efficient low-light image recovery. The proposed approach comprises two critical components: a light adjustment lookup table (LLUT) and a noise suppression lookup table (NLUT). LLUT is optimized with a set of unsupervised losses. It aims at predicting pixel-wise curve parameters for the dynamic range adjustment of a specific image. NLUT is designed to remove the amplified noise after the light brightens. As diffusion models are sensitive to noise, diffusion priors are introduced to achieve high-performance noise suppression. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods in terms of visual quality and efficiency.

NeurIPS Conference 2025 Conference Paper

DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

  • Kairun Wen
  • Runyu Chen
  • Hui Zheng
  • Yunlong Lin
  • Panwang Pan
  • Chenxin Li
  • Wenyan Cong
  • Jian Zhang

Understanding the dynamic physical world, characterized by its evolving 3D structure, real-world motion, and semantic content with textual descriptions, is crucial for human-agent interaction and enables embodied agents to perceive and act within real environments with human‑like capabilities. However, existing datasets are often derived from limited simulators or utilize traditional Structure-from-Motion for up-to-scale annotation and offer limited descriptive captioning, which restricts the capacity of foundation models to accurately interpret real-world dynamics from monocular videos, commonly sourced from the internet. To bridge these gaps, we introduce DynamicVerse, a physical‑scale, multimodal 4D world modeling framework for dynamic real-world video. We employ large vision, geometric, and multimodal models to interpret metric-scale static geometry, real-world dynamic motion, instance-level masks, and holistic descriptive captions. By integrating window-based Bundle Adjustment with global optimization, our method converts long real-world video sequences into a comprehensive 4D multimodal format. DynamicVerse delivers a large-scale dataset consists of 100K+ videos with 800K+ annotated masks and 10M+ frames from internet videos. Experimental evaluations on three benchmark tasks, namely video depth estimation, camera pose estimation, and camera intrinsics estimation, demonstrate that our 4D modeling achieves superior performance in capturing physical-scale measurements with greater global accuracy than existing methods.

NeurIPS Conference 2025 Conference Paper

FRN: Fractal-Based Recursive Spectral Reconstruction Network

  • Ge Meng
  • Zhongnan Cai
  • Ruizhe Chen
  • Jingyan Tu
  • Yingying Wang
  • Yue Huang
  • Xinghao Ding

Generating hyperspectral images (HSIs) from RGB images through spectral reconstruction can significantly reduce the cost of HSI acquisition. In this paper, we propose a Fractal-Based Recursive Spectral Reconstruction Network (FRN), which differs from existing paradigms that attempt to directly integrate the full-spectrum information from the R, G, and B channels in a one-shot manner. Instead, it treats spectral reconstruction as a progressive process, predicting from broad to narrow bands or employing a coarse-to-fine approach for predicting the next wavelength. Inspired by fractals in mathematics, FRN establishes a novel spectral reconstruction paradigm by recursively invoking an atomic reconstruction module. In each invocation, only the spectral information from neighboring bands is used to provide clues for the generation of the image at the next wavelength, which follows the low-rank property of spectral data. Moreover, we design a band-aware state space model that employs a pixel-differentiated scanning strategy at different stages of the generation process, further suppressing interference from low-correlation regions caused by reflectance differences. Through extensive experimentation across different datasets, FRN achieves superior reconstruction performance compared to state-of-the-art methods. Code is available at https: //github. com/mongko007/frn.

NeurIPS Conference 2025 Conference Paper

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

  • Yunlong Lin
  • Zixu Lin
  • Kunjie Lin
  • Jinbin Bai
  • Panwang Pan
  • Chenxin Li
  • Haoyu Chen
  • Zhongdao Wang

Photo retouching has become integral to contemporary visual storytelling, enabling users to capture aesthetics and express creativity. While professional tools such as Adobe Lightroom offer powerful capabilities, they demand substantial expertise and manual effort. In contrast, existing AI-based solutions provide automation but often suffer from limited adjustability and poor generalization, failing to meet diverse and personalized editing needs. To bridge this gap, we introduce JarvisArt, a multi-modal large language model (MLLM)-driven agent that understands user intent, mimics the reasoning process of professional artists, and intelligently coordinates over 200 retouching tools within Lightroom. JarvisArt undergoes a two-stage training process: an initial Chain-of-Thought supervised fine-tuning to establish basic reasoning and tool-use skills, followed by Group Relative Policy Optimization for Retouching (GRPO-R) to further enhance its decision-making and tool proficiency. We also propose the Agent-to-Lightroom Protocol to facilitate seamless integration with Lightroom. To evaluate performance, we develop MMArt-Bench, a novel benchmark constructed from real-world user edits. JarvisArt demonstrates user-friendly interaction, superior generalization, and fine-grained control over both global and local adjustments, paving a new avenue for intelligent photo retouching. Notably, it outperforms GPT-4o with a 60\% improvement in average pixel-level metrics on MMArt-Bench for content fidelity, while maintaining comparable instruction-following capabilities.

NeurIPS Conference 2025 Conference Paper

Pan-LUT: Efficient Pan-sharpening via Learnable Look-Up Tables

  • Zhongnan Cai
  • Yingying Wang
  • Hui Zheng
  • Panwang Pan
  • Zixu Lin
  • Ge Meng
  • Chenxin Li
  • Chunming He

Recently, deep learning-based pan-sharpening algorithms have achieved notable advancements over traditional methods. However, deep learning-based methods incur substantial computational overhead during inference, especially with large images. This excessive computational demand limits the applicability of these methods in real-world scenarios, particularly in the absence of dedicated computing devices such as GPUs and TPUs. To address these challenges, we propose Pan-LUT, a novel learnable look-up table (LUT) framework for pan-sharpening that strikes a balance between performance and computational efficiency for large remote sensing images. Our method makes it possible to process 15K$\times$15K remote sensing images on a 24GB GPU. To finely control the spectral transformation, we devise the PAN-guided look-up table (PGLUT) for channel-wise spectral mapping. To effectively capture fine-grained spatial details, we introduce the spatial details look-up table (SDLUT). Furthermore, to adaptively aggregate channel information for generating high-resolution multispectral images, we design an adaptive output look-up table (AOLUT). Our model contains fewer than 700K parameters and processes a 9K$\times$9K image in under 1 ms using one RTX 2080 Ti GPU, demonstrating significantly faster performance compared to other methods. Experiments reveal that Pan-LUT efficiently processes large remote sensing images in a lightweight manner, bridging the gap to real-world applications. Furthermore, our model surpasses SOTA methods in full-resolution scenes under real-world conditions, highlighting its effectiveness and efficiency. We also extend our method to general image fusion tasks.

AAAI Conference 2025 Conference Paper

Sp3ctralMamba: Physics-Driven Joint State Space Model for Hyperspectral Image Reconstruction

  • Ge Meng
  • Jingyan Tu
  • Jingjia Huang
  • Yunlong Lin
  • Yingying Wang
  • Xiaotong Tu
  • Yue Huang
  • Xinghao Ding

Hyperspectral image (HSI) reconstruction aims to restore the original 3D HSIs from the 2D hyperspectral snapshot compressive images (SCIs). The key to high-fidelity HSI reconstruction lies in designing refined spatial and spectral attention mechanisms, which are crucial for generating fine-grained representations of HSI based on the limited spatial and spectral information available in SCI. Recently, Mamba has demonstrated remarkable performance and efficiency in modeling spatial correlations. Its implicit attention mechanism generates three orders of magnitude more attention matrices than transformers, significantly raising the performance ceiling for HSI reconstruction. In this paper, we propose a novel joint SSM network named Sp3ctralMamba for HSI reconstruction. Sp3ctralMamba integrates frequency domain knowledge and physical priors to enhance reconstruction quality. Specifically, we first perform hierarchical decomposition of the 3D HSI embedding to mitigate the negative impact of distant bands on reconstruction. Next, we design a joint SSM block S3Mamba (S3MAB) to perform parallel scans of the embeddings from different bands. In addition to the conventional vanilla scan, S3MAB introduces a local scanning scheme to address the reconstruction challenges posed by the spatial sparsity of spectral information. Furthermore, a spiral scanning scheme in the frequency domain is incorporated to enhance the order correlation between different frequency signals. Finally, we introduce energy priors and structural priors to constrain the generation of spectral and spatial representations during the training process. Extensive experiments on both simulated and real datasets demonstrate that Sp3ctralMamba significantly elevates HSI reconstruction performance to a new level, surpassing SOTA methods in both quantitative and qualitative metrics.

AAAI Conference 2024 Conference Paper

Progressive High-Frequency Reconstruction for Pan-Sharpening with Implicit Neural Representation

  • Ge Meng
  • Jingjia Huang
  • Yingying Wang
  • Zhenqi Fu
  • Xinghao Ding
  • Yue Huang

Pan-sharpening aims to leverage the high-frequency signal of the panchromatic (PAN) image to enhance the resolution of its corresponding multi-spectral (MS) image. However, deep neural networks (DNNs) tend to prioritize learning the low-frequency components during the training process, which limits the restoration of high-frequency edge details in MS images. To overcome this limitation, we treat pan-sharpening as a coarse-to-fine high-frequency restoration problem and propose a novel method for achieving high-quality restoration of edge information in MS images. Specifically, to effectively obtain fine-grained multi-scale contextual features, we design a Band-limited Multi-scale High-frequency Generator (BMHG) that generates high-frequency signals from the PAN image within different bandwidths. During training, higher-frequency signals are progressively injected into the MS image, and corresponding residual blocks are introduced into the network simultaneously. This design enables gradients to flow from later to earlier blocks smoothly, encouraging intermediate blocks to concentrate on missing details. Furthermore, to address the issue of pixel position misalignment arising from multi-scale features fusion, we propose a Spatial-spectral Implicit Image Function (SIIF) that employs implicit neural representation to effectively represent and fuse spatial and spectral features in the continuous domain. Extensive experiments on different datasets demonstrate that our method outperforms existing approaches in terms of quantitative and visual measurements for high-frequency detail recovery.

AAAI Conference 2024 Conference Paper

Unsupervised Pan-Sharpening via Mutually Guided Detail Restoration

  • Huangxing Lin
  • Yuhang Dong
  • Xinghao Ding
  • Tianpeng Liu
  • Yongxiang Liu

Pan-sharpening is a task that aims to super-resolve the low-resolution multispectral (LRMS) image with the guidance of a corresponding high-resolution panchromatic (PAN) image. The key challenge in pan-sharpening is to accurately modeling the relationship between the MS and PAN images. While supervised deep learning methods are commonly employed to address this task, the unavailability of ground-truth severely limits their effectiveness. In this paper, we propose a mutually guided detail restoration method for unsupervised pan-sharpening. Specifically, we treat pan-sharpening as a blind image deblurring task, in which the blur kernel can be estimated by a CNN. Constrained by the blur kernel, the pan-sharpened image retains spectral information consistent with the LRMS image. Once the pan-sharpened image is obtained, the PAN image is blurred using a pre-defined blur operator. The pan-sharpened image, in turn, is used to guide the detail restoration of the blurred PAN image. By leveraging the mutual guidance between MS and PAN images, the pan-sharpening network can implicitly learn the spatial relationship between the two modalities. Extensive experiments show that the proposed method significantly outperforms existing unsupervised pan-sharpening methods.

AAAI Conference 2023 Conference Paper

Self-Supervised Image Denoising Using Implicit Deep Denoiser Prior

  • Huangxing Lin
  • Yihong Zhuang
  • Xinghao Ding
  • Delu Zeng
  • Yue Huang
  • Xiaotong Tu
  • John Paisley

We devise a new regularization for denoising with self-supervised learning. The regularization uses a deep image prior learned by the network, rather than a traditional predefined prior. Specifically, we treat the output of the network as a ``prior'' that we again denoise after ``re-noising.'' The network is updated to minimize the discrepancy between the twice-denoised image and its prior. We demonstrate that this regularization enables the network to learn to denoise even if it has not seen any clean images. The effectiveness of our method is based on the fact that CNNs naturally tend to capture low-level image statistics. Since our method utilizes the image prior implicitly captured by the deep denoising CNN to guide denoising, we refer to this training strategy as an Implicit Deep Denoiser Prior (IDDP). IDDP can be seen as a mixture of learning-based methods and traditional model-based denoising methods, in which regularization is adaptively formulated using the output of the network. We apply IDDP to various denoising tasks using only observed corrupted data and show that it achieves better denoising results than other self-supervised denoising methods.

AAAI Conference 2022 Conference Paper

Unsupervised Underwater Image Restoration: From a Homology Perspective

  • Zhenqi Fu
  • Huangxing Lin
  • Yan Yang
  • Shu Chai
  • Liyan Sun
  • Yue Huang
  • Xinghao Ding

Underwater images suffer from degradation due to light scattering and absorption. It remains challenging to restore such degraded images using deep neural networks since real-world paired data is scarcely available while synthetic paired data cannot approximate real-world data perfectly. In this paper, we propose an UnSupervised Underwater Image Restoration method (USUIR) by leveraging the homology property between a raw underwater image and a re-degraded image. Specifically, USUIR first estimates three latent components of the raw underwater image, i. e. , the global background light, the transmission map, and the scene radiance (the clean image). Then, a re-degraded image is generated by randomly mixing up the estimated scene radiance and the raw underwater image. We demonstrate that imposing a homology constraint between the raw underwater image and the re-degraded image is equivalent to minimizing the restoration error and hence can be used for the unsupervised restoration. Extensive experiments show that USUIR achieves promising performance in both inference time and restoration quality.

JBHI Journal 2021 Journal Article

Curriculum Feature Alignment Domain Adaptation for Epithelium-Stroma Classification in Histopathological Images

  • Qi Qi
  • Xin Lin
  • Chaoqi Chen
  • Weiping Xie
  • Yue Huang
  • Xinghao Ding
  • Xiaoqing Liu
  • Yizhou Yu

In recent years, deep learning methods have received more attention in epithelial-stroma (ES) classification tasks. Traditional deep learning methods assume that the training and test data have the same distribution, an assumption that is seldom satisfied in complex imaging procedures. Unsupervised domain adaptation (UDA) transfers knowledge from a labelled source domain to a completely unlabeled target domain, and is more suitable for ES classification tasks to avoid tedious annotation. However, existing UDA methods for this task ignore the semantic alignment across domains. In this paper, we propose a Curriculum Feature Alignment Network (CFAN) to gradually align discriminative features across domains through selecting effective samples from the target domain and minimizing intra-class differences. Specifically, we developed the Curriculum Transfer Strategy (CTS) and Adaptive Centroid Alignment (ACA) steps to train our model iteratively. We validated the method using three independent public ES datasets, and experimental results demonstrate that our method achieves better performance in ES classification compared with commonly used deep learning methods and existing deep domain adaptation methods.

IJCAI Conference 2021 Conference Paper

Noise2Grad: Extract Image Noise to Denoise

  • Huangxing Lin
  • Yihong Zhuang
  • Yue Huang
  • Xinghao Ding
  • Xiaoqing Liu
  • Yizhou Yu

In many image denoising tasks, the difficulty of collecting noisy/clean image pairs limits the application of supervised CNNs. We consider such a case in which paired data and noise statistics are not accessible, but unpaired noisy and clean images are easy to collect. To form the necessary supervision, our strategy is to extract the noise from the noisy image to synthesize new data. To ease the interference of the image background, we use a noise removal module to aid noise extraction. The noise removal module first roughly removes noise from the noisy image, which is equivalent to excluding much background information. A noise approximation module can therefore easily extract a new noise map from the removed noise to match the gradient of the noisy input. This noise map is added to a random clean image to synthesize a new data pair, which is then fed back to the noise removal module to correct the noise removal process. These two modules cooperate to extract noise finely. After convergence, the noise removal module can remove noise without damaging other background details, so we use it as our final denoising network. Experiments show that the denoising performance of the proposed method is competitive with other supervised CNNs.

AAAI Conference 2021 Conference Paper

Rain Streak Removal via Dual Graph Convolutional Network

  • Xueyang Fu
  • Qi Qi
  • Zheng-Jun Zha
  • Yurui Zhu
  • Xinghao Ding

Deep convolutional neural networks (CNNs) have become dominant in the single image de-raining area. However, most deep CNNs-based de-raining methods are designed by stacking vanilla convolutional layers, which can only be used to model local relations. Therefore, long-range contextual information is rarely considered for this specific task. To address the above problem, we propose a simple yet effective dual graph convolutional network (GCN) for single image rain removal. Specifically, we design two graphs to perform global relational modeling and reasoning. The first GC- N is used to explore global spatial relations among pixels in feature maps, while the second GCN models the global relations across the channels. Compared to standard convolutional operations, the proposed two graphs enable the network to extract representations from new dimensions. To achieve the image rain removal, we further embed these two graphs and multi-scale dilated convolution into a symmetrically skip-connected network architecture. Therefore, our dual graph convolutional network is able to well handle complex and spatially long rain streaks by exploring multiple representations, e. g. , multi-scale local feature, global spatial coherence and cross-channel correlation. Meanwhile, our model is easy to implement, end-to-end trainable and computationally efficient. Extensive experiments on synthetic and real data demonstrate that our method achieves significant improvements over the recent state-of-the-art methods.

JBHI Journal 2020 Journal Article

An Adversarial Learning Approach to Medical Image Synthesis for Lesion Detection

  • Liyan Sun
  • Jiexiang Wang
  • Yue Huang
  • Xinghao Ding
  • Hayit Greenspan
  • John Paisley

The identification of lesion within medical image data is necessary for diagnosis, treatment and prognosis. Segmentation and classification approaches are mainly based on supervised learning with well-paired image-level or voxel-level labels. However, labeling the lesion in medical images is laborious requiring highly specialized knowledge. We propose a medical image synthesis model named abnormal-to-normal translation generative adversarial network (ANT-GAN) to generate a normal-looking medical image based on its abnormal-looking counterpart without the need for paired training data. Unlike typical GANs, whose aim is to generate realistic samples with variations, our more restrictive model aims at producing a normal-looking image corresponding to one containing lesions, and thus requires a special design. Being able to provide a “normal” counterpart to a medical image can provide useful side information for medical imaging tasks like lesion segmentation or classification validated by our experiments. In the other aspect, the ANT-GAN model is also capable of producing highly realistic lesion-containing image corresponding to the healthy one, which shows the potential in data augmentation verified in our experiments.

JBHI Journal 2019 Journal Article

Label-Efficient Breast Cancer Histopathological Image Classification

  • Qi Qi
  • Yanlong Li
  • Jitian Wang
  • Han Zheng
  • Yue Huang
  • Xinghao Ding
  • Gustavo Kunde Rohde

The automatic classification of breast cancer histopathological images has great significance in computer-aided diagnosis. Recently, deep learning via neural networks has enabled pattern detection and prediction using large, labeled datasets; whereas, collecting and annotating sufficient histological data using professional pathologists is time consuming, tedious, and extremely expensive. In the proposed paper, a deep active learning framework is designed and implemented for classification of breast cancer histopathological images, with the goal of maximizing the learning accuracy from very limited labeling. This method involves manual annotation of the most valuable unlabeled samples, which are then integrated into the training set. The model is then iteratively updated with an increasing training set. Here, two selection strategies are discussed for the proposed deep active learning framework: An entropy-based strategy and a confidence-boosting strategy. The proposed method has been validated using a publicly available breast cancer histopathological image dataset, wherein each image patch is binarily classified as benign or malignant. The experimental results demonstrate that, compared with a random selection, our proposed framework can reduce annotation costs up to 66. 67%, with higher accuracy and less expensive annotation than standard query strategy.

AAAI Conference 2018 Conference Paper

Compressed Sensing MRI Using a Recursive Dilated Network

  • Liyan Sun
  • Zhiwen Fan
  • Yue Huang
  • Xinghao Ding
  • John Paisley

Compressed sensing magnetic resonance imaging (CS-MRI) is an active research topic in the field of inverse problems. Conventional CS-MRI algorithms usually exploit the sparse nature of MRI in an iterative manner. These optimizationbased CS-MRI methods are often time-consuming at test time, and are based on fixed transform bases or shallow dictionaries, which limits modeling capacity. Recently, deep models have been introduced to the CS-MRI problem. One main challenge for CS-MRI methods based on deep learning is the trade-off between model performance and network size. We propose a recursive dilated network (RDN) for CS-MRI that achieves good performance while reducing the number of network parameters. We adopt dilated convolutions in each recursive block to aggregate multi-scale information within the MRI. We also adopt a modified shortcut strategy to help features flow into deeper layers. Experimental results show that the proposed RDN model achieves state-of-the-art performance in CS-MRI while using far fewer parameters than previously required.

IJCAI Conference 2018 Conference Paper

MEnet: A Metric Expression Network for Salient Object Segmentation

  • Shulian Cai
  • Jiabin Huang
  • Delu Zeng
  • Xinghao Ding
  • John Paisley

Recent CNN-based saliency models have achieved excellent performance on public datasets, but most are sensitive to distortions from noise or compression. In this paper, we propose an end-to-end generic salient object segmentation model called Metric Expression Network (MEnet) to overcome this drawback. We construct a topological metric space where the implicit metric is determined by a deep network. In this latent space, we can group pixels within an observed image semantically into two regions, based on whether they are in a salient region or a non-salient region in the image. We carry out all feature extractions at the pixel level, which makes the output boundaries of the salient object finely-grained. Experimental results show that the proposed metric can generate robust salient maps that allow for object segmentation. By testing the method on several public benchmarks, we show that the performance of MEnet achieves excellent results. We also demonstrate that the proposed method outperforms previous CNN-based methods on distorted images.

JBHI Journal 2017 Journal Article

Epithelium-Stroma Classification via Convolutional Neural Networks and Unsupervised Domain Adaptation in Histopathological Images

  • Yue Huang
  • Han Zheng
  • Chi Liu
  • Xinghao Ding
  • Gustavo K. Rohde

Epithelium-stroma classification is a necessary preprocessing step in histopathological image analysis. Current deep learning based recognition methods for histology data require collection of large volumes of labeled data in order to train a new neural network when there are changes to the image acquisition procedure. However, it is extremely expensive for pathologists to manually label sufficient volumes of data for each pathology study in a professional manner, which results in limitations in real-world applications. A very simple but effective deep learning method, that introduces the concept of unsupervised domain adaptation to a simple convolutional neural network (CNN), has been proposed in this paper. Inspired by transfer learning, our paper assumes that the training data and testing data follow different distributions, and there is an adaptation operation to more accurately estimate the kernels in CNN in feature extraction, in order to enhance performance by transferring knowledge from labeled data in source domain to unlabeled data in target domain. The model has been evaluated using three independent public epithelium-stroma datasets by cross-dataset validations. The experimental results demonstrate that for epithelium-stroma classification, the proposed framework outperforms the state-of-the-art deep neural network model, and it also achieves better performance than other existing deep domain adaptation methods. The proposed model can be considered to be a better option for real-world applications in histopathological image analysis, since there is no longer a requirement for large-scale labeled data in each specified domain.