Author name cluster

Kui Jiang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers

2 author rows

AAAI Conference 2026 Conference Paper

ICLR: Inter-Chrominance and Luminance Interaction for Natural Color Restoration in Low-Light Image Enhancement

Xin Xu
Hao Liu
Wei Liu
Wei Wang
Jiayi Wu
Kui Jiang

Low-Light Image Enhancement (LLIE) task aims at improving contrast while restoring details and textures for images captured in low-light conditions. HVI color space has made significant progress in this task by enabling precise decoupling of chrominance and luminance. However, for the interaction of chrominance and luminance branches, substantial distributional differences between the two branches prevalent in natural images limit complementary feature extraction, and luminance errors are propagated to chrominance channels through the nonlinear parameter. Furthermore, for interaction between different chrominance branches, images with large homogeneous-color regions usually exhibit weak correlation between chrominance branches due to concentrated distributions. Traditional pixel-wise losses exploit strong inter-branch correlations for co-optimization, causing gradient conflicts in weakly correlated regions. Therefore, we propose an Inter-Chrominance and Luminance Interaction (ICLR) framework including a Dual-stream Interaction Enhancement Module (DIEM) and a Covariance Correction Loss (CCL). The DIEM improves the extraction of complementary information from two dimensions, fusion and enhancement, respectively. The CCL utilizes luminance residual statistics to penalize chrominance errors and balances gradient conflicts by constraining chrominance branches covariance. Experimental results on multiple datasets show that the proposed ICLR framework outperforms state-of-the-art methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Learning Depth from Past Selves: Self-Evolution Contrast for Robust Depth Estimation

Jing Cao
Kui Jiang
Shenyi Li
Xiaocheng Feng
Yong Huang

Self-supervised depth estimation has gained significant attention in autonomous driving and robotics. However, existing methods exhibit substantial performance degradation under adverse weather conditions such as rain and fog, where reduced visibility critically impairs depth prediction. To address this issue, we propose a novel self-evolution contrastive learning framework called SEC-Depth for self-supervised robust depth estimation tasks. Our approach leverages intermediate parameters generated during training to construct temporally evolving latency models. Using these, we design a self-evolution contrastive scheme to mitigate performance loss under challenging conditions. Concretely, we first design a dynamic update strategy of latency models for the depth estimation task to capture optimization states across training stages. To effectively leverage latency models, we introduce a self-evolution contrastive Loss (SECL) that treats outputs from historical latency models as negative samples. This mechanism adaptively adjusts learning objectives while implicitly sensing weather degradation severity, reducing the needs for manual intervention. Experiments show that our method integrates seamlessly into diverse baseline models and significantly enhances robustness in zero-shot evaluations.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MambaOVSR: Multiscale Fusion with Global Motion Modeling for Chinese Opera Video Super-Resolution

Hua Chang
Xin Xu
Wei Liu
Wei Wang
Xin Yuan
Kui Jiang

Chinese opera is celebrated for preserving classical art. However, early filming equipment limitations have degraded videos of last-century performances by renowned artists (e.g., low frame rates and resolution), hindering archival efforts. Although space-time video super-resolution (STVSR) has advanced significantly, applying it directly to opera videos remains challenging. The scarcity of datasets impedes the recovery of high-frequency details, and existing STVSR methods lack global modeling capabilities—compromising visual quality when handling opera’s characteristic large motions. To address these challenges, we pioneer a large-scale Chinese Opera Video Clip (COVC) dataset and propose the Mamba-based multiscale fusion network for space-time Opera Video Super-Resolution (MambaOVSR). Specifically, MambaOVSR involves three novel components: the Global Fusion Module (GFM) for motion modeling through a multiscale alternating scanning mechanism, and the Multiscale Synergistic Mamba Module (MSMM) for alignment across different sequence lengths. Additionally, our MambaVR block resolves feature artifacts and positional information loss during alignment. Experimental results on the COVC dataset show that MambaOVSR significantly outperforms the SOTA STVSR method by an average of 1.86 dB in terms of PSNR.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Semantics and Content Matter: Towards Multi-Prior Hierarchical Mamba for Image Deraining

Zhaocheng Yu
Kui Jiang
Junjun Jiang
Xianming Liu
Guanglu Sun
Yi Xiao

Rain significantly degrades the performance of computer vision systems, particularly in applications like autonomous driving and video surveillance. While existing deraining methods have made considerable progress, they often struggle with fidelity of semantic and spatial details. To address these limitations, we propose the Multi-Prior Hierarchical Mamba (MPHM) network for image deraining. This novel architecture synergistically integrates macro-semantic textual priors (CLIP) for task-level semantic guidance and micro-structural visual priors (DINOv2) for scene-aware structural information. To alleviate potential conflicts between heterogeneous priors, we devise a progressive Priors Fusion Injection (PFI) that strategically injects complementary cues at different decoder levels. Meanwhile, we equip the backbone network with an elaborate Hierarchical Mamba Module (HMM) to facilitate robust feature representation, featuring a Fourier-enhanced dual-path design that concurrently addresses global context modeling and local detail recovery. Comprehensive experiments demonstrate MPHM's state-of-the-art performance, achieving a 0.57 dB PSNR gain on the Rain200H dataset while delivering superior generalization on real-world rainy scenarios.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Always Clear Depth: Robust Monocular Depth Estimation Under Adverse Weather

Kui Jiang
Jing Cao
Zhaocheng Yu
Junjun Jiang
Jingchun Zhou

Monocular depth estimation is critical for applications such as autonomous driving and scene reconstruction. While existing methods perform well under normal scenarios, their performance declines in adverse weather, due to challenging domain shifts and difficulties in extracting scene information. To address this issue, we present a robust monocular depth estimation method called ACDepth from the perspective of high-quality training data generation and domain adaptation. Specifically, we introduce a one-step diffusion model for generating samples that simulate adverse weather conditions, constructing a multi-tuple degradation dataset during training. To ensure the quality of the generated degradation samples, we employ LoRA adapters to fine-turn the generation weights of diffusion model. Additionally, we integrate circular consistency loss and adversarial training to guarantee the fidelity and naturalness of the scene contents. Furthermore, we elaborate on a multi-granularity knowledge distillation strategy (MKD) that encourages the student network to absorb knowledge from both the teacher model and pretrained Depth Anything V2. This strategy guides the student model in learning degradation-agnostic scene information from various degradation inputs. In particular, we introduce an ordinal guidance distillation mechanism (OGD) that encourages the network to focus on uncertain regions through differential ranking, leading to a more precise depth estimation. Experimental results demonstrate that our ACDepth surpasses md4all-DD by 2. 50% for night scene and 2. 61% for rainy scene on the nuScenes dataset in terms of the absRel metric.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Debiased All-in-one Image Restoration with Task Uncertainty Regularization

Gang Wu
Junjun Jiang
Yijun Wang
Kui Jiang
Xianming Liu

All-in-one image restoration is a fundamental low-level vision task with significant real-world applications. The primary challenge lies in addressing diverse degradations within a single model. While current methods primarily exploit task prior information to guide the restoration models, they typically employ uniform multi-task learning, overlooking the heterogeneity in model optimization across different degradation tasks. To eliminate the bias, we propose a task-aware optimization strategy, that introduces adaptive task-specific regularization for multi-task image restoration learning. Specifically, our method dynamically weights and balances losses for different restoration tasks during training, encouraging the implementation of the most reasonable optimization route. In this way, we can achieve more robust and effective model training. Notably, our approach can serve as a plug-and-play strategy to enhance existing models without requiring modifications during inference. Extensive experiments in diverse all-in-one restoration settings demonstrate the superiority and generalization of our approach. For example, AirNet retrained with TUR achieves average improvements of 1.16 dB on three distinct tasks and 1.81 dB on five distinct all-in-one tasks. These results underscore TUR's effectiveness in advancing the SOTAs in all-in-one image restoration, paving the way for more robust and versatile image restoration.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Disentangle Nighttime Lens Flares: Self-supervised Generation-based Lens Flare Removal

Yuwen He
Wei Wang
Wanyu Wu
Kui Jiang

Lens flares arise from light reflection and refraction within sensor arrays, whose diverse types include glow, veiling glare, reflective flare and so on. Existing methods are specialized for one specific type only, and overlook the simultaneous occurrence of multiple typed lens flares, which is common in the real-world, e.g. coexistence of glow and displacement reflections from the same light source. These co-occurring lens flares cannot be effectively resolved by the simple combination of individual flare removal methods, since these coexisting flares originates from the same light source and are generated simultaneously within the same sensor array, exhibit a complex interdependence rather than simple additive relation. To model this interdependent flares’ relationship, our Nighttime Lens Flare Formation model is the first attempt to learn the intrinsic physical relationship between flares on the imaging plane. Building on this physical model, we introduce a solution to this joint flare removal task named Self-supervised Generation-based Lens Flare Removal Network (SGLFR-Net), which is self-supervised without pre-training. Specifically, the nighttime glow is detangled in PSF Rendering Network(PSFR-Net) based on PSF Rendering Prior, while the reflective flare is modelled in Texture Prior Based Reflection Flare Removal Network (TPRR-Net). Empirical evaluations demonstrate the effectiveness of the proposed method in both joint and individual glare removal tasks.

PDF Details DOI

IROS Conference 2025 Conference Paper

FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion

Pihai Sun
Junjun Jiang
Yuanqi Yao
Youyu Chen
Wenbo Zhao 0004
Kui Jiang
Xianming Liu 0005

Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability stemming from two factors: 1) limited annotated image-event-depth datasets causing insufficient cross-modal supervision, and 2) inherent frequency mismatches between static images and dynamic event streams with distinct spatiotemporal patterns, leading to ineffective feature fusion. To address this dual challenge, we propose Frequency-decoupled Unified Self-supervised Encoder (FUSE) with two synergistic components: The Parameter-efficient Self-supervised Transfer (PST) leverages image foundation models for cross-modal knowledge transfer, effectively mitigating data scarcity by enabling joint encoding without depth ground truth. Complementing this, the Frequency-Decoupled Fusion module (FreDFuse) resolves modality-specific frequency mismatches by decoupling features into high- and low-frequency bands and then performing a guided cross-attention fusion, where the modality dominant in each band steers the integration. This combined approach enables FUSE to construct a universal image-event encoder that only requires lightweight decoder adaptation for target datasets. Extensive experiments demonstrate state-of-the-art performance with 14% and 24. 9% improvements in Abs. Rel on MVSEC and DENSE datasets. The framework exhibits remarkable robustness and generalization in challenging scenarios, including extreme lighting and motion blur, significantly advancing its real-world deployment capabilities. The source code for our method is publicly available at: https://github.com/sunpihai-up/FUSE.

AAAI Conference 2025 Conference Paper

OODML: Whole Slide Image Classification Meets Online Pseudo-Supervision and Dynamic Mutual Learning

Tingting Zheng
Kui Jiang
Hongxun Yao
Yi Xiao
Zhongyuan Wang

Bag-label-based multi-instance learning (MIL) has demonstrated significant performance in whole slide image (WSI) analysis, particularly in pseudo-label-based learning schemes. However, due to inaccurate feature representation and interference, existing MIL methods often yield unreliable pseudo-labels, which spawn undesired predictions. To address these issues, we propose an Online Pseudo-Supervision and Dynamic Mutual Learning (OODML) framework that enhances pseudo-label generation and feature representation while exploring their mutual learning to improve bag-level prediction. Specifically, we design an Adaptive Memory Bank (AMB) to collect the most informative components of the current WSI. We also introduce a Self-Progressive Feature Fusion (SPFF) module that integrates label-related historical information from the AMB with current semantic variations, thereby enhancing the representation of pseudo-bag tokens. Furthermore, we propose a Decision Revision Pseudo-Label (DRPL) generation scheme to explore intrinsic connections between pseudo-bag representations and bag-label predictions, resulting in more reliable pseudo-label generation. To alleviate redundant and ambiguous representations, the class-wise prior of pseudo-label prediction is borrowed to facilitate label-related feature learning and to update the AMB, forming a mutual refinement between feature representation and pseudo-label generation. Additionally, a Dynamic Decision-Making (DDM) module is developed to harmonize explicit and implicit representations of bag information for more robust decision-making. Extensive experiments on four datasets demonstrate that our OODML surpasses the state-of-the-art by 3.3% and 6.9% on the CAMELYON16 and TCGA Lung datasets.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Reframing Gaussian Splatting Densification with Complexity-Density Consistency of Primitives

Zhemeng Dong
Junjun Jiang
Youyu Chen
Jiaxin Zhang
Kui Jiang
Xianming Liu

The essence of 3D Gaussian Splatting (3DGS) training is to smartly allocate Gaussian primitives, expressing complex regions with more primitives and vice versa. Prior researches typically mark out under-reconstructed regions in a rendering-loss-driven manner. However, such a loss-driven strategy is often dominated by low-frequency regions, which leads to insufficient modeling of high-frequency details in texture-rich regions. As a result, it yields a suboptimal spatial allocation of Gaussian primitives. This inspires us to excavate the loss-agnostic visual prior in training views to identify complex regions that need more primitives to model. Based on this insight, we propose Complexity-Density Consistent Gaussian Splatting (CDC-GS), which allocates primitives based on the consistency between visual complexity of training views and the density of primitives. Specifically, primitives involved in rendering high visual complexity areas are categorized as modeling high complexity regions, where we leverage the high frequency wavelet components of training views to measure the visual complexity. And the density of a primitive is computed with the inverse of geometric mean of its distance to the neighboring primitives. Guided by the positive correlation between primitive complexity and density, we determine primitives to be densified as well as pruned. Extensive experiments demonstrate that our CDC-GS surpasses the baseline methods in rendering quality by a large margin using the same amount of Gaussians. And we provide insightful analysis to reveal that our method serves perpendicularly to rendering loss in guiding Gaussian primitive allocation.

AAAI Conference 2025 Conference Paper

Spatial Annealing for Efficient Few-shot Neural Rendering

Yuru Xiao
Deming Zhai
Wenbo Zhao
Kui Jiang
Junjun Jiang
Xianming Liu

Neural Radiance Fields (NeRF) with hybrid representations have shown impressive capabilities for novel view synthesis, delivering high efficiency. Nonetheless, their performance significantly drops with sparse input views. Various regularization strategies have been devised to address these challenges. However, these strategies either require additional rendering costs or involve complex pipeline designs, leading to a loss of training efficiency. Although FreeNeRF has introduced an efficient frequency annealing strategy, its operation on frequency positional encoding is incompatible with the efficient hybrid representations. In this paper, we introduce an accurate and efficient few-shot neural rendering method named Spatial Annealing regularized NeRF (SANeRF), which adopts the pre-filtering design of a hybrid representation. We initially establish the analytical formulation of the frequency band limit for a hybrid architecture by deducing its filtering process. Based on this analysis, we propose a universal form of frequency annealing in the spatial domain, which can be implemented by modulating the sampling kernel to exponentially shrink from an initial one with a narrow grid tangent kernel spectrum. This methodology is crucial for stabilizing the early stages of the training phase and significantly contributes to enhancing the subsequent process of detail refinement. Our extensive experiments reveal that, by adding merely one line of code, SANeRF delivers superior rendering quality and much faster reconstruction speed compared to current few-shot neural rendering methods. Notably, SANeRF outperforms FreeNeRF on the Blender dataset, achieving 700X faster reconstruction speed.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks

Yi Xiao
Qiangqiang Yuan
Kui Jiang
Wenke Huang
Qiang Zhang
Tingting Zheng
Chia-Wen Lin
Liangpei Zhang

Spiking neural networks (SNNs) are emerging as a promising alternative to traditional artificial neural networks (ANNs), offering biological plausibility and energy efficiency. Despite these merits, SNNs are frequently hampered by limited capacity and insufficient representation power, yet remain underexplored in remote sensing image (RSI) super-resolution (SR) tasks. In this paper, we first observe that spiking signals exhibit drastic intensity variations across diverse textures, highlighting an active learning state of the neurons. This observation motivates us to apply SNNs for efficient SR of RSIs. Inspired by the success of attention mechanisms in representing salient information, we devise the spiking attention block (SAB), a concise yet effective component that optimizes membrane potentials through inferred attention weights, which, in turn, regulates spiking activity for superior feature representation. Our key contributions include: 1) we bridge the independent modulation between temporal and channel dimensions, facilitating joint feature correlation learning, and 2) we access the global self-similar patterns in large-scale remote sensing imagery to infer spatial attention weights, incorporating effective priors for realistic and faithful reconstruction. Building upon SAB, we proposed SpikeSR, which achieves state-of-the-art performance across various remote sensing benchmarks such as AID, DOTA, and DIOR, while maintaining high computational efficiency. Code of SpikeSR will be available at https: //github. com/XY-boy/SpikeSR.

AAAI Conference 2025 Conference Paper

The Parables of the Mustard Seed and the Yeast: Extremely Low-Budget, High-Performance Nighttime Semantic Segmentation

Shiqin Wang
Xin Xu
Haoyang Chen
Kui Jiang
Zheng Wang

Nighttime Semantic Segmentation (NSS) is essential to many cutting-edge vision applications. However, existing technologies overly rely on massive labeled data, whose annotation is time-consuming and laborious. In this paper, we pioneer a new task focusing on exploring the potential of training strategy and framework design with limited annotation to achieve high-performance NSS. Insufficient information at very low labeling budgets can easily lead to under-optimization or overfitting of the model. Our solution comprises two main components: i) a novel region-based active sampling strategy called Contextual-Aware Region Query (CARQ), which identifies highly informative target nighttime regions for labeling; and ii) an innovative Fragmentation Synergy Active Domain Adaptation framework (FS-ADA), which progressively broadcasts the limited annotation to the unlabeled regions, achieving high performance with a minimal annotation budget. Extensive experiments demonstrate that our method outperforms state-of-the-art UDA-NSS & ADA-SS methods across four day-to-nighttime benchmarks, and generalizes well to foggy, rainy, & snowy scenes. In particular only with 1% target nighttime data annotation, our method is on par with the mainstream fully-supervised methods on the BDD100K-Night val dataset.

PDF Details DOI

AAAI Conference 2024 Conference Paper

FMRNet: Image Deraining via Frequency Mutual Revision

Kui Jiang
Junjun Jiang
Xianming Liu
Xin Xu
Xianzheng Ma

The wavelet transform has emerged as a powerful tool in deciphering structural information within images. And now, the latest research suggests that combining the prowess of wavelet transform with neural networks can lead to unparalleled image deraining results. By harnessing the strengths of both the spatial domain and frequency space, this innovative approach is poised to revolutionize the field of image processing. The fascinating challenge of developing a comprehensive framework that takes into account the intrinsic frequency property and the correlation between rain residue and background is yet to be fully explored. In this work, we propose to investigate the potential relationships among rain-free and residue components at the frequency domain, forming a frequency mutual revision network (FMRNet) for image deraining. Specifically, we explore the mutual representation of rain residue and background components at frequency domain, so as to better separate the rain layer from clean background while preserving structural textures of the degraded images. Meanwhile, the rain distribution prediction from the low-frequency coefficient, which can be seen as the degradation prior is used to refine the separation of rain residue and background components. Inversely, the updated rain residue is used to benefit the low-frequency rain distribution prediction, forming the multi-layer mutual learning. Extensive experiments demonstrate that our proposed FMRNet delivers significant performance gains for seven datasets on image deraining task, surpassing the state-of-the-art method ELFormer by 1.14 dB in PSNR on the Rain100L dataset, while with similar computation cost. Code and retrained models are available at https://github.com/kuijiang94/FMRNet.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Learning a Spiking Neural Network for Efficient Image Deraining

Tianyu Song
Guiyue Jin
Pengpeng Li
Kui Jiang
Xiang Chen
Jiyu Jin

Recently, spiking neural networks (SNNs) have demonstrated substantial potential in computer vision tasks. In this paper, we present an Efficient Spiking Deraining Network, called ESDNet. Our work is motivated by the observation that rain pixel values will lead to a more pronounced intensity of spike signals in SNNs. However, directly applying deep SNNs to image deraining task still remains a significant challenge. This is attributed to the information loss and training difficulties that arise from discrete binary activation and complex spatiotemporal dynamics. To this end, we develop a spiking residual block to convert the input into spike signals, then adaptively optimize the membrane potential by introducing attention weights to adjust spike responses in a data-driven manner, alleviating information loss caused by discrete binary activation. By this way, our ESDNet can effectively detect and analyze the characteristics of rain streaks by learning their fluctuations. This also enables better guidance for the deraining process and facilitates high-quality image reconstruction. Instead of relying on the ANN-SNN conversion strategy, we introduce a gradient proxy strategy to directly train the model for overcoming the challenge of training. Experimental results show that our approach gains comparable performance against ANN-based methods while reducing energy consumption by 54%. The code source is available at https: //github. com/MingTian99/ESDNet.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Learning from History: Task-agnostic Model Contrastive Learning for Image Restoration

Gang Wu
Junjun Jiang
Kui Jiang
Xianming Liu

Contrastive learning has emerged as a prevailing paradigm for high-level vision tasks, which, by introducing properly negative samples, has also been exploited for low-level vision tasks to achieve a compact optimization space to account for their ill-posed nature. However, existing methods rely on manually predefined and task-oriented negatives, which often exhibit pronounced task-specific biases. To address this challenge, our paper introduces an innovative method termed 'learning from history', which dynamically generates negative samples from the target model itself. Our approach, named Model Contrastive Learning for Image Restoration (MCLIR), rejuvenates latency models as negative models, making it compatible with diverse image restoration tasks. We propose the Self-Prior guided Negative loss (SPN) to enable it. This approach significantly enhances existing models when retrained with the proposed model contrastive paradigm. The results show significant improvements in image restoration across various tasks and architectures. For example, models retrained with SPN outperform the original FFANet and DehazeFormer by 3.41 and 0.57 dB on the RESIDE indoor dataset for image dehazing. Similarly, they achieve notable improvements of 0.47 dB on SPA-Data over IDT for image deraining and 0.12 dB on Manga109 for a 4x scale super-resolution over lightweight SwinIR, respectively. Code and retrained models are available at https://github.com/Aitical/MCLIR.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Low-Light Face Super-resolution via Illumination, Structure, and Texture Associated Representation

Chenyang Wang
Junjun Jiang
Kui Jiang
Xianming Liu

Human face captured at night or in dimly lit environments has become a common practice, accompanied by complex low-light and low-resolution degradations. However, the existing face super-resolution (FSR) technologies and derived cascaded schemes are inadequate to recover credible textures. In this paper, we propose a novel approach that decomposes the restoration task into face structural fidelity maintaining and texture consistency learning. The former aims to enhance the quality of face images while improving the structural fidelity, while the latter focuses on eliminating perturbations and artifacts caused by low-light degradation and reconstruction. Based on this, we develop a novel low-light low-resolution face super-resolution framework. Our method consists of two steps: an illumination correction face super-resolution network (IC-FSRNet) for lighting the face and recovering the structural information, and a detail enhancement model (DENet) for improving facial details, thus making them more visually appealing and easier to analyze. As the relighted regions could provide complementary information to boost face super-resolution and vice versa, we introduce the mutual learning to harness the informative components from relighted regions and reconstruction, and achieve the iterative refinement. In addition, DENet equipped with diffusion probabilistic model is built to further improve face image visual quality. Experiments demonstrate that the proposed joint optimization framework achieves significant improvements in reconstruction quality and perceptual quality over existing two-stage sequential solutions. Code is available at https://github.com/wcy-cs/IC-FSRDENet.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

From Generation to Suppression: Towards Effective Irregular Glow Removal for Nighttime Visibility Enhancement

Wanyu Wu
Wei Wang
Zheng Wang
Kui Jiang
Xin Xu

Most existing Low-Light Image Enhancement (LLIE) methods are primarily designed to improve brightness in dark regions, which suffer from severe degradation in nighttime images. However, these methods have limited exploration in another major visibility damage, the glow effects in real night scenes. Glow effects are inevitable in the presence of artificial light sources and cause further diffused blurring when directly enhanced. To settle this issue, we innovatively consider the glow suppression task as learning physical glow generation via multiple scattering estimation according to the Atmospheric Point Spread Function (APSF). In response to the challenges posed by uneven glow intensity and varying source shapes, an APSF-based Nighttime Imaging Model with Near-field Light Sources (NIM-NLS) is specifically derived to design a scalable Light-aware Blind Deconvolution Network (LBDN). The glow-suppressed result is then brightened via a Retinex-based Enhancement Module (REM). Remarkably, the proposed glow suppression method is based on zero-shot learning and does not rely on any paired or unpaired training data. Empirical evaluations demonstrate the effectiveness of the proposed method in both glow suppression and low-light enhancement tasks.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning

Xian Zhong
Zipeng Li
Shuqin Chen
Kui Jiang
Chen Chen
Mang Ye

Video captioning aims to generate natural language sentences that describe the given video accurately. Existing methods obtain favorable generation by exploring richer visual representations in encode phase or improving the decoding ability. However, the long-tailed problem hinders these attempts at low-frequency tokens, which rarely occur but carry critical semantics, playing a vital role in the detailed generation. In this paper, we introduce a novel Refined Semantic enhancement method towards Frequency Diffusion (RSFD), a captioning model that constantly perceives the linguistic representation of the infrequent tokens. Concretely, a Frequency-Aware Diffusion (FAD) module is proposed to comprehend the semantics of low-frequency tokens to break through generation limitations. In this way, the caption is refined by promoting the absorption of tokens with insufficient occurrence. Based on FAD, we design a Divergent Semantic Supervisor (DSS) module to compensate for the information loss of high-frequency tokens brought by the diffusion process, where the semantics of low-frequency tokens is further emphasized to alleviate the long-tailed problem. Extensive experiments indicate that RSFD outperforms the state-of-the-art methods on two benchmark datasets, i.e., MSR-VTT and MSVD, demonstrate that the enhancement of low-frequency tokens semantics can obtain a competitive generation effect. Code is available at https://github.com/lzp870/RSFD.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Store and Fetch Immediately: Everything Is All You Need for Space-Time Video Super-resolution

Mengshun Hu
Kui Jiang
Zhixiang Nie
Jiahuan Zhou
Zheng Wang

Existing space-time video super-resolution (ST-VSR) methods fail to achieve high-quality reconstruction since they fail to fully explore the spatial-temporal correlations, long-range components in particular. Although the recurrent structure for ST-VSR adopts bidirectional propagation to aggregate information from the entire video, collecting the temporal information between the past and future via one-stage representations inevitably loses the long-range relations. To alleviate the limitation, this paper proposes an immediate storeand-fetch network to promote long-range correlation learning, where the stored information from the past and future can be refetched to help the representation of the current frame. Specifically, the proposed network consists of two modules: a backward recurrent module (BRM) and a forward recurrent module (FRM). The former first performs backward inference from future to past, while storing future super-resolution (SR) information for each frame. Following that, the latter performs forward inference from past to future to super-resolve all frames, while storing past SR information for each frame. Since FRM inherits SR information from BRM, therefore, spatial and temporal information from the entire video sequence is immediately stored and fetched, which allows drastic improvement for ST-VSR. Extensive experiments both on ST-VSR and space video super-resolution (S-VSR) as well as time video super-resolution (T-VSR) have demonstrated the effectiveness of our proposed method over other state-of-the-art methods on public datasets. Code is available https://github.com/hhhhhumengshun/SFI-STVR

PDF Details DOI

IJCAI Conference 2022 Conference Paper

DANet: Image Deraining via Dynamic Association Learning

Kui Jiang
Zhongyuan Wang
Zheng Wang
Peng Yi
Junjun Jiang
Jinsheng Xiao
Chia-Wen Lin

Rain streaks and background components in a rainy input are highly correlated, making the deraining task a composition of the rain streak removal and background restoration. However, the correlation of these two components is barely considered, leading to unsatisfied deraining results. To this end, we propose a dynamic associated network (DANet) to achieve the association learning between rain streak removal and background recovery. There are two key aspects to fulfill the association learning: 1) DANet unveils the latent association knowledge between rain streak prediction and background texture recovery, and leverages it as an extra prior via an associated learning module (ALM) to promote the texture recovery. 2) DANet introduces the parametric association constraint for enhancing the compatibility of deraining model with background reconstruction, enabling it to be automatically learned from the training data. Moreover, we observe that the sampled rainy image enjoys the similar distribution to the original one. We thus propose to learn the rain distribution at the sampling space, and exploit super-resolution to reconstruct high-frequency background details for computation and memory reduction. Our proposed DANet achieves the approximate deraining performance to the state-of-the-art MPRNet but only requires 52. 6\% and 23\% inference time and computational cost, respectively.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Degrade Is Upgrade: Learning Degradation for Low-Light Image Enhancement

Kui Jiang
Zhongyuan Wang
Zheng Wang
Chen Chen
Peng Yi
Tao Lu
Chia-Wen Lin

Low-light image enhancement aims to improve an image’s visibility while keeping its visual naturalness. Different from existing methods tending to accomplish the relighting task directly by ignoring the fidelity and naturalness recovery, we investigate the intrinsic degradation and relight the lowlight image while refining the details and color in two steps. Inspired by the color image formulation (diffuse illumination color plus environment illumination color), we first estimate the degradation from low-light inputs to simulate the distortion of environment illumination color, and then refine the content to recover the loss of diffuse illumination color. To this end, we propose a novel Degradation-to-Refinement Generation Network (DRGN). Its distinctive features can be summarized as 1) A novel two-step generation network for degradation learning and content refinement. It is not only superior to one-step methods, but also capable of synthesizing sufficient paired samples to benefit the model training; 2) A multi-resolution fusion network to represent the target information (degradation or contents) in a multi-scale cooperative manner, which is more effective to address the complex unmixing problems. Extensive experiments on both the enhancement task and joint detection task have verified the effectiveness and efficiency of our proposed method, surpassing the SOTA by 0. 70dB on average and 3. 18% in mAP, respectively. The code will be available soon.

IJCAI Conference 2022 Conference Paper

Rainy WCity: A Real Rainfall Dataset with Diverse Conditions for Semantic Driving Scene Understanding

Xian Zhong
Shidong Tu
Xianzheng Ma
Kui Jiang
Wenxin Huang
Zheng Wang

Scene understanding in adverse weather conditions (e. g. rainy and foggy days) has drawn increasing attention, arising some specific benchmarks and algorithms. However, scene segmentation under rainy weather is still challenging and under-explored due to the following limitations on the datasets and methods: 1) Manually synthetic rainy samples with empirically settings and human subjective assumptions; 2) Limited rainy conditions, including the rain patterns, intensity, and degradation factors; 3) Separated training manners for image deraining and semantic segmentation. To break these limitations, we pioneer a real, comprehensive, and well-annotated scene understanding dataset under rainy weather, named Rainy WCity. It covers various rain patterns and their bring-in negative visual effects, covering wiper, droplet, reflection, refraction, shadow, windshield-blurring, etc. In addition, to alleviate dependence on paired training samples, we design an unsupervised contrastive learning network for real image deraining and the final rainy scene semantic segmentation via multi-task joint optimization. A comprehensive comparison analysis is also provided, which shows that scene understanding in rainy weather is a largely open problem. Finally, we summarize our general observations, identify open research challenges, and point out future directions.

PDF Details DOI