Arrow Research search

Author name cluster

Junjun Jiang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

AAAI Conference 2026 Conference Paper

Semantics and Content Matter: Towards Multi-Prior Hierarchical Mamba for Image Deraining

  • Zhaocheng Yu
  • Kui Jiang
  • Junjun Jiang
  • Xianming Liu
  • Guanglu Sun
  • Yi Xiao

Rain significantly degrades the performance of computer vision systems, particularly in applications like autonomous driving and video surveillance. While existing deraining methods have made considerable progress, they often struggle with fidelity of semantic and spatial details. To address these limitations, we propose the Multi-Prior Hierarchical Mamba (MPHM) network for image deraining. This novel architecture synergistically integrates macro-semantic textual priors (CLIP) for task-level semantic guidance and micro-structural visual priors (DINOv2) for scene-aware structural information. To alleviate potential conflicts between heterogeneous priors, we devise a progressive Priors Fusion Injection (PFI) that strategically injects complementary cues at different decoder levels. Meanwhile, we equip the backbone network with an elaborate Hierarchical Mamba Module (HMM) to facilitate robust feature representation, featuring a Fourier-enhanced dual-path design that concurrently addresses global context modeling and local detail recovery. Comprehensive experiments demonstrate MPHM's state-of-the-art performance, achieving a 0.57 dB PSNR gain on the Rain200H dataset while delivering superior generalization on real-world rainy scenarios.

AAAI Conference 2026 Conference Paper

Variation-Bounded Loss for Noise-Tolerant Learning

  • Jialiang Wang
  • Xiong Zhou
  • Xianming Liu
  • Gangfeng Hu
  • Deming Zhai
  • Junjun Jiang
  • Haoliang Li

Mitigating the negative impact of noisy labels has been a perennial issue in supervised learning. Robust loss functions have emerged as a prevalent solution to this problem. In this work, we introduce the Variation Ratio as a novel property related to the robustness of loss functions, and propose a new family of robust loss functions, termed Variation-Bounded Loss (VBL), which is characterized by a bounded variation ratio. We provide theoretical analyses of the variation radio, proving that a smaller variation ratio would lead to better robustness. Furthermore, we reveal that the variation ratio provides a feasible method to relax the symmetric condition and offers a more concise path to achieve the asymmetric condition. Based on the variation ratio, we reformulate several commonly used loss functions into a variation-bounded form for pract ical applications. Positive experiments on various datasets exhibit the effectiveness and flexibility of our approach.

IJCAI Conference 2025 Conference Paper

Always Clear Depth: Robust Monocular Depth Estimation Under Adverse Weather

  • Kui Jiang
  • Jing Cao
  • Zhaocheng Yu
  • Junjun Jiang
  • Jingchun Zhou

Monocular depth estimation is critical for applications such as autonomous driving and scene reconstruction. While existing methods perform well under normal scenarios, their performance declines in adverse weather, due to challenging domain shifts and difficulties in extracting scene information. To address this issue, we present a robust monocular depth estimation method called ACDepth from the perspective of high-quality training data generation and domain adaptation. Specifically, we introduce a one-step diffusion model for generating samples that simulate adverse weather conditions, constructing a multi-tuple degradation dataset during training. To ensure the quality of the generated degradation samples, we employ LoRA adapters to fine-turn the generation weights of diffusion model. Additionally, we integrate circular consistency loss and adversarial training to guarantee the fidelity and naturalness of the scene contents. Furthermore, we elaborate on a multi-granularity knowledge distillation strategy (MKD) that encourages the student network to absorb knowledge from both the teacher model and pretrained Depth Anything V2. This strategy guides the student model in learning degradation-agnostic scene information from various degradation inputs. In particular, we introduce an ordinal guidance distillation mechanism (OGD) that encourages the network to focus on uncertain regions through differential ranking, leading to a more precise depth estimation. Experimental results demonstrate that our ACDepth surpasses md4all-DD by 2. 50% for night scene and 2. 61% for rainy scene on the nuScenes dataset in terms of the absRel metric.

AAAI Conference 2025 Conference Paper

CALLIC: Content Adaptive Learning for Lossless Image Compression

  • Daxin Li
  • Yuanchao Bai
  • Kai Wang
  • Junjun Jiang
  • Xianming Liu
  • Wen Gao

Learned lossless image compression has achieved significant advancements in recent years. However, existing methods often rely on training amortized generative models on massive datasets, resulting in sub-optimal probability distribution estimation for specific testing images during encoding process. To address this challenge, we explore the connection between the Minimum Description Length (MDL) principle and Parameter-Efficient Transfer Learning (PETL), leading to the development of a novel content-adaptive approach for learned lossless image compression, dubbed CALLIC. Specifically, we first propose a content-aware autoregressive self-attention mechanism by leveraging convolutional gating operations, termed Masked Gated ConvFormer (MGCF), and pretrain MGCF on training dataset. Cache then Crop Inference (CCI) is proposed to accelerate the coding process. During encoding, we decompose pretrained layers, including depth-wise convolutions, using low-rank matrices and then adapt the incremental weights on testing image by Rate-guided Progressive Fine-Tuning (RPFT). RPFT fine-tunes with gradually increasing patches that are sorted in descending order by estimated entropy, optimizing learning process and reducing adaptation time. Extensive experiments across diverse datasets demonstrate that CALLIC sets a new state-of-the-art (SOTA) for learned lossless image compression.

NeurIPS Conference 2025 Conference Paper

ControlFusion: A Controllable Image Fusion Network with Language-Vision Degradation Prompts

  • Linfeng Tang
  • Yeda Wang
  • Zhanchuan Cai
  • Junjun Jiang
  • Jiayi Ma

Current image fusion methods struggle with real-world composite degradations and lack the flexibility to accommodate user-specific needs. To address this, we propose ControlFusion, a controllable fusion network guided by language-vision prompts that adaptively mitigates composite degradations. On the one hand, we construct a degraded imaging model based on physical mechanisms, such as the Retinex theory and atmospheric scattering principle, to simulate composite degradations and provide a data foundation for addressing realistic degradations. On the other hand, we devise a prompt-modulated restoration and fusion network that dynamically enhances features according to degradation prompts, enabling adaptability to varying degradation levels. To support user-specific preferences in visual quality, a text encoder is incorporated to embed user-defined degradation types and levels as degradation prompts. Moreover, a spatial-frequency collaborative visual adapter is designed to autonomously perceive degradations from source images, thereby reducing complete reliance on user instructions. Extensive experiments demonstrate that ControlFusion outperforms SOTA fusion methods in fusion quality and degradation handling, particularly under real-world and compound degradations.

AAAI Conference 2025 Conference Paper

Debiased All-in-one Image Restoration with Task Uncertainty Regularization

  • Gang Wu
  • Junjun Jiang
  • Yijun Wang
  • Kui Jiang
  • Xianming Liu

All-in-one image restoration is a fundamental low-level vision task with significant real-world applications. The primary challenge lies in addressing diverse degradations within a single model. While current methods primarily exploit task prior information to guide the restoration models, they typically employ uniform multi-task learning, overlooking the heterogeneity in model optimization across different degradation tasks. To eliminate the bias, we propose a task-aware optimization strategy, that introduces adaptive task-specific regularization for multi-task image restoration learning. Specifically, our method dynamically weights and balances losses for different restoration tasks during training, encouraging the implementation of the most reasonable optimization route. In this way, we can achieve more robust and effective model training. Notably, our approach can serve as a plug-and-play strategy to enhance existing models without requiring modifications during inference. Extensive experiments in diverse all-in-one restoration settings demonstrate the superiority and generalization of our approach. For example, AirNet retrained with TUR achieves average improvements of 1.16 dB on three distinct tasks and 1.81 dB on five distinct all-in-one tasks. These results underscore TUR's effectiveness in advancing the SOTAs in all-in-one image restoration, paving the way for more robust and versatile image restoration.

IROS Conference 2025 Conference Paper

FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion

  • Pihai Sun
  • Junjun Jiang
  • Yuanqi Yao
  • Youyu Chen
  • Wenbo Zhao 0004
  • Kui Jiang
  • Xianming Liu 0005

Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability stemming from two factors: 1) limited annotated image-event-depth datasets causing insufficient cross-modal supervision, and 2) inherent frequency mismatches between static images and dynamic event streams with distinct spatiotemporal patterns, leading to ineffective feature fusion. To address this dual challenge, we propose Frequency-decoupled Unified Self-supervised Encoder (FUSE) with two synergistic components: The Parameter-efficient Self-supervised Transfer (PST) leverages image foundation models for cross-modal knowledge transfer, effectively mitigating data scarcity by enabling joint encoding without depth ground truth. Complementing this, the Frequency-Decoupled Fusion module (FreDFuse) resolves modality-specific frequency mismatches by decoupling features into high- and low-frequency bands and then performing a guided cross-attention fusion, where the modality dominant in each band steers the integration. This combined approach enables FUSE to construct a universal image-event encoder that only requires lightweight decoder adaptation for target datasets. Extensive experiments demonstrate state-of-the-art performance with 14% and 24. 9% improvements in Abs. Rel on MVSEC and DENSE datasets. The framework exhibits remarkable robustness and generalization in challenging scenarios, including extreme lighting and motion blur, significantly advancing its real-world deployment capabilities. The source code for our method is publicly available at: https://github.com/sunpihai-up/FUSE.

NeurIPS Conference 2025 Conference Paper

Reframing Gaussian Splatting Densification with Complexity-Density Consistency of Primitives

  • Zhemeng Dong
  • Junjun Jiang
  • Youyu Chen
  • Jiaxin Zhang
  • Kui Jiang
  • Xianming Liu

The essence of 3D Gaussian Splatting (3DGS) training is to smartly allocate Gaussian primitives, expressing complex regions with more primitives and vice versa. Prior researches typically mark out under-reconstructed regions in a rendering-loss-driven manner. However, such a loss-driven strategy is often dominated by low-frequency regions, which leads to insufficient modeling of high-frequency details in texture-rich regions. As a result, it yields a suboptimal spatial allocation of Gaussian primitives. This inspires us to excavate the loss-agnostic visual prior in training views to identify complex regions that need more primitives to model. Based on this insight, we propose Complexity-Density Consistent Gaussian Splatting (CDC-GS), which allocates primitives based on the consistency between visual complexity of training views and the density of primitives. Specifically, primitives involved in rendering high visual complexity areas are categorized as modeling high complexity regions, where we leverage the high frequency wavelet components of training views to measure the visual complexity. And the density of a primitive is computed with the inverse of geometric mean of its distance to the neighboring primitives. Guided by the positive correlation between primitive complexity and density, we determine primitives to be densified as well as pruned. Extensive experiments demonstrate that our CDC-GS surpasses the baseline methods in rendering quality by a large margin using the same amount of Gaussians. And we provide insightful analysis to reveal that our method serves perpendicularly to rendering loss in guiding Gaussian primitive allocation.

AAAI Conference 2025 Conference Paper

Spatial Annealing for Efficient Few-shot Neural Rendering

  • Yuru Xiao
  • Deming Zhai
  • Wenbo Zhao
  • Kui Jiang
  • Junjun Jiang
  • Xianming Liu

Neural Radiance Fields (NeRF) with hybrid representations have shown impressive capabilities for novel view synthesis, delivering high efficiency. Nonetheless, their performance significantly drops with sparse input views. Various regularization strategies have been devised to address these challenges. However, these strategies either require additional rendering costs or involve complex pipeline designs, leading to a loss of training efficiency. Although FreeNeRF has introduced an efficient frequency annealing strategy, its operation on frequency positional encoding is incompatible with the efficient hybrid representations. In this paper, we introduce an accurate and efficient few-shot neural rendering method named Spatial Annealing regularized NeRF (SANeRF), which adopts the pre-filtering design of a hybrid representation. We initially establish the analytical formulation of the frequency band limit for a hybrid architecture by deducing its filtering process. Based on this analysis, we propose a universal form of frequency annealing in the spatial domain, which can be implemented by modulating the sampling kernel to exponentially shrink from an initial one with a narrow grid tangent kernel spectrum. This methodology is crucial for stabilizing the early stages of the training phase and significantly contributes to enhancing the subsequent process of detail refinement. Our extensive experiments reveal that, by adding merely one line of code, SANeRF delivers superior rendering quality and much faster reconstruction speed compared to current few-shot neural rendering methods. Notably, SANeRF outperforms FreeNeRF on the Blender dataset, achieving 700X faster reconstruction speed.

ICRA Conference 2025 Conference Paper

Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios

  • Jialei Xu
  • Rui Li 0013
  • Kai Cheng
  • Junjun Jiang
  • Xianming Liu 0005

Monocular depth estimation from RGB images plays a pivotal role in 3D vision. However, its accuracy can deteriorate in challenging environments such as nighttime or adverse weather conditions. While long-wave infrared cameras offer stable imaging in such challenging conditions, they are inherently low-resolution, lacking rich texture and semantics as delivered by the RGB image. Current methods focus solely on a single modality due to the difficulties to identify and integrate faithful depth cues from both sources. To address these issues, this paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework. Concretely, we independently compute the coarse depth maps with separate networks by fully utilizing the individual depth cues from each modality. As the advantageous depth spreads across both modalities, we propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas. With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner. Harnessing the proposed pipeline, our method demonstrates the ability of robust depth estimation in a variety of difficult scenarios. Experimental results on the challenging $\text{MS}^{2}$ and ViViD++ datasets demonstrate the effectiveness and robustness of our method.

NeurIPS Conference 2024 Conference Paper

$\epsilon$-Softmax: Approximating One-Hot Vectors for Mitigating Label Noise

  • Jialiang Wang
  • Xiong Zhou
  • Deming Zhai
  • Junjun Jiang
  • Xiangyang Ji
  • Xianming Liu

Noisy labels pose a common challenge for training accurate deep neural networks. To mitigate label noise, prior studies have proposed various robust loss functions to achieve noise tolerance in the presence of label noise, particularly symmetric losses. However, they usually suffer from the underfitting issue due to the overly strict symmetric condition. In this work, we propose a simple yet effective approach for relaxing the symmetric condition, namely **$\epsilon$-softmax**, which simply modifies the outputs of the softmax layer to approximate one-hot vectors with a controllable error $\epsilon$. Essentially, ***$\epsilon$-softmax** not only acts as an alternative for the softmax layer, but also implicitly plays the crucial role in modifying the loss function. * We prove theoretically that **$\epsilon$-softmax** can achieve noise-tolerant learning with controllable excess risk bound for almost any loss function. Recognizing that **$\epsilon$-softmax**-enhanced losses may slightly reduce fitting ability on clean datasets, we further incorporate them with one symmetric loss, thereby achieving a better trade-off between robustness and effective learning. Extensive experiments demonstrate the superiority of our method in mitigating synthetic and real-world label noise.

NeurIPS Conference 2024 Conference Paper

AFBench: A Large-scale Benchmark for Airfoil Design

  • Jian Liu
  • Jianyu Wu
  • Hairun Xie
  • Guoqing Zhang
  • Jing Wang
  • Wei Liu
  • Wanli Ouyang
  • Junjun Jiang

Data-driven generative models have emerged as promising approaches towards achieving efficient mechanical inverse design. However, due to prohibitively high cost in time and money, there is still lack of open-source and large-scale benchmarks in this field. It is mainly the case for airfoil inverse design, which requires to generate and edit diverse geometric-qualified and aerodynamic-qualified airfoils following the multimodal instructions, \emph{i. e. ,} dragging points and physical parameters. This paper presents the open-source endeavors in airfoil inverse design, \emph{AFBench}, including a large-scale dataset with 200 thousand airfoils and high-quality aerodynamic and geometric labels, two novel and practical airfoil inverse design tasks, \emph{i. e. ,} conditional generation on multimodal physical parameters, controllable editing, and comprehensive metrics to evaluate various existing airfoil inverse design methods. Our aim is to establish \emph{AFBench} as an ecosystem for training and evaluating airfoil inverse design methods, with a specific focus on data-driven controllable inverse design models by multimodal instructions capable of bridging the gap between ideas and execution, the academic research and industrial applications. We have provided baseline models, comprehensive experimental observations, and analysis to accelerate future research. Our baseline model is trained on an RTX 3090 GPU within 16 hours. The codebase, datasets and benchmarks will be available at \url{https: //hitcslj. github. io/afbench/}.

AAAI Conference 2024 Conference Paper

FMRNet: Image Deraining via Frequency Mutual Revision

  • Kui Jiang
  • Junjun Jiang
  • Xianming Liu
  • Xin Xu
  • Xianzheng Ma

The wavelet transform has emerged as a powerful tool in deciphering structural information within images. And now, the latest research suggests that combining the prowess of wavelet transform with neural networks can lead to unparalleled image deraining results. By harnessing the strengths of both the spatial domain and frequency space, this innovative approach is poised to revolutionize the field of image processing. The fascinating challenge of developing a comprehensive framework that takes into account the intrinsic frequency property and the correlation between rain residue and background is yet to be fully explored. In this work, we propose to investigate the potential relationships among rain-free and residue components at the frequency domain, forming a frequency mutual revision network (FMRNet) for image deraining. Specifically, we explore the mutual representation of rain residue and background components at frequency domain, so as to better separate the rain layer from clean background while preserving structural textures of the degraded images. Meanwhile, the rain distribution prediction from the low-frequency coefficient, which can be seen as the degradation prior is used to refine the separation of rain residue and background components. Inversely, the updated rain residue is used to benefit the low-frequency rain distribution prediction, forming the multi-layer mutual learning. Extensive experiments demonstrate that our proposed FMRNet delivers significant performance gains for seven datasets on image deraining task, surpassing the state-of-the-art method ELFormer by 1.14 dB in PSNR on the Rain100L dataset, while with similar computation cost. Code and retrained models are available at https://github.com/kuijiang94/FMRNet.

AAAI Conference 2024 Conference Paper

Learning from History: Task-agnostic Model Contrastive Learning for Image Restoration

  • Gang Wu
  • Junjun Jiang
  • Kui Jiang
  • Xianming Liu

Contrastive learning has emerged as a prevailing paradigm for high-level vision tasks, which, by introducing properly negative samples, has also been exploited for low-level vision tasks to achieve a compact optimization space to account for their ill-posed nature. However, existing methods rely on manually predefined and task-oriented negatives, which often exhibit pronounced task-specific biases. To address this challenge, our paper introduces an innovative method termed 'learning from history', which dynamically generates negative samples from the target model itself. Our approach, named Model Contrastive Learning for Image Restoration (MCLIR), rejuvenates latency models as negative models, making it compatible with diverse image restoration tasks. We propose the Self-Prior guided Negative loss (SPN) to enable it. This approach significantly enhances existing models when retrained with the proposed model contrastive paradigm. The results show significant improvements in image restoration across various tasks and architectures. For example, models retrained with SPN outperform the original FFANet and DehazeFormer by 3.41 and 0.57 dB on the RESIDE indoor dataset for image dehazing. Similarly, they achieve notable improvements of 0.47 dB on SPA-Data over IDT for image deraining and 0.12 dB on Manga109 for a 4x scale super-resolution over lightweight SwinIR, respectively. Code and retrained models are available at https://github.com/Aitical/MCLIR.

AAAI Conference 2024 Conference Paper

Low-Light Face Super-resolution via Illumination, Structure, and Texture Associated Representation

  • Chenyang Wang
  • Junjun Jiang
  • Kui Jiang
  • Xianming Liu

Human face captured at night or in dimly lit environments has become a common practice, accompanied by complex low-light and low-resolution degradations. However, the existing face super-resolution (FSR) technologies and derived cascaded schemes are inadequate to recover credible textures. In this paper, we propose a novel approach that decomposes the restoration task into face structural fidelity maintaining and texture consistency learning. The former aims to enhance the quality of face images while improving the structural fidelity, while the latter focuses on eliminating perturbations and artifacts caused by low-light degradation and reconstruction. Based on this, we develop a novel low-light low-resolution face super-resolution framework. Our method consists of two steps: an illumination correction face super-resolution network (IC-FSRNet) for lighting the face and recovering the structural information, and a detail enhancement model (DENet) for improving facial details, thus making them more visually appealing and easier to analyze. As the relighted regions could provide complementary information to boost face super-resolution and vice versa, we introduce the mutual learning to harness the informative components from relighted regions and reconstruction, and achieve the iterative refinement. In addition, DENet equipped with diffusion probabilistic model is built to further improve face image visual quality. Experiments demonstrate that the proposed joint optimization framework achieves significant improvements in reconstruction quality and perceptual quality over existing two-stage sequential solutions. Code is available at https://github.com/wcy-cs/IC-FSRDENet.

IROS Conference 2024 Conference Paper

SDGE: Stereo Guided Depth Estimation for 360°Camera Sets

  • Jialei Xu
  • Wei Yin
  • Dong Gong
  • Junjun Jiang
  • Xianming Liu 0005

Depth estimation is a critical technology in autonomous driving, and multi-camera systems are often used to achieve a 360° perception. These 360° camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image. Alternatively, monocular methods may not produce consistent cross-view predictions. To address these issues, we propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap. We suggest building virtual pinhole cameras to resolve the distortion problem of fisheye cameras and unify the processing for the two types of 360° cameras. For handling the varying noise on camera poses caused by unstable movement, the approach employs a self-calibration method to obtain highly accurate relative poses of the adjacent cameras with minor overlap. These enable the use of robust stereo methods to obtain a high-quality depth prior in the overlap region. This prior serves not only as an additional input but also as pseudo-labels that enhance the accuracy of depth estimation methods and improve cross-view prediction consistency. The effectiveness of SGDE is evaluated on one fisheye camera dataset, Synthetic Urban, and two pinhole camera datasets, DDAD and nuScenes. Our experiments demonstrate that SGDE is effective for both supervised and self-supervised depth estimation, and highlight the potential of our method for advancing autonomous driving technology. Our project page is at https://github.com/JialeiXu/SGDE.

ICLR Conference 2024 Conference Paper

Variance-enlarged Poisson Learning for Graph-based Semi-Supervised Learning with Extremely Sparse Labeled Data

  • Xiong Zhou
  • Xianming Liu 0005
  • Hao Yu
  • Jialiang Wang 0003
  • Zeke Xie
  • Junjun Jiang
  • Xiangyang Ji

Graph-based semi-supervised learning, particularly in the context of extremely sparse labeled data, often suffers from degenerate solutions where label functions tend to be nearly constant across unlabeled data. In this paper, we introduce Variance-enlarged Poisson Learning (VPL), a simple yet powerful framework tailored to alleviate the issues arising from the presence of degenerate solutions. VPL incorporates a variance-enlarged regularization term, which induces a Poisson equation specifically for unlabeled data. This intuitive approach increases the dispersion of labels from their average mean, effectively reducing the likelihood of degenerate solutions characterized by nearly constant label functions. We subsequently introduce two streamlined algorithms, V-Laplace and V-Poisson, each intricately designed to enhance Laplace and Poisson learning, respectively. Furthermore, we broaden the scope of VPL to encompass graph neural networks, introducing Variance-enlarged Graph Poisson Networks (V-GPN) to facilitate improved label propagation. To achieve a deeper understanding of VPL's behavior, we conduct a comprehensive theoretical exploration in both discrete and variational cases. Our findings elucidate that VPL inherently amplifies the importance of connections within the same class while concurrently tempering those between different classes. We support our claims with extensive experiments, demonstrating the effectiveness of VPL and showcasing its superiority over existing methods. The code is available at https://github.com/hitcszx/VPL.

ICLR Conference 2024 Conference Paper

Zero-Mean Regularized Spectral Contrastive Learning: Implicitly Mitigating Wrong Connections in Positive-Pair Graphs

  • Xiong Zhou
  • Xianming Liu 0005
  • Feilong Zhang 0002
  • Gang Wu 0010
  • Deming Zhai
  • Junjun Jiang
  • Xiangyang Ji

Contrastive learning has emerged as a popular paradigm of self-supervised learning that learns representations by encouraging representations of positive pairs to be similar while representations of negative pairs to be far apart. The spectral contrastive loss, in synergy with the notion of positive-pair graphs, offers valuable theoretical insights into the empirical successes of contrastive learning. In this paper, we propose incorporating an additive factor into the term of spectral contrastive loss involving negative pairs. This simple modification can be equivalently viewed as introducing a regularization term that enforces the mean of representations to be zero, which thus is referred to as *zero-mean regularization*. It intuitively relaxes the orthogonality of representations between negative pairs and implicitly alleviates the adverse effect of wrong connections in the positive-pair graph, leading to better performance and robustness. To clarify this, we thoroughly investigate the role of zero-mean regularized spectral contrastive loss in both unsupervised and supervised scenarios with respect to theoretical analysis and quantitative evaluation. These results highlight the potential of zero-mean regularized spectral contrastive learning to be a promising approach in various tasks.

ICML Conference 2023 Conference Paper

No One Idles: Efficient Heterogeneous Federated Learning with Parallel Edge and Server Computation

  • Feilong Zhang 0002
  • Xianming Liu 0005
  • Shiyi Lin
  • Gang Wu 0010
  • Xiong Zhou
  • Junjun Jiang
  • Xiangyang Ji

Federated learning suffers from a latency bottleneck induced by network stragglers, which hampers the training efficiency significantly. In addition, due to the heterogeneous data distribution and security requirements, simple and fast averaging aggregation is not feasible anymore. Instead, complicated aggregation operations, such as knowledge distillation, are required. The time cost for complicated aggregation becomes a new bottleneck that limits the computational efficiency of FL. In this work, we claim that the root cause of training latency actually lies in the aggregation-then-broadcasting workflow of the server. By swapping the computational order of aggregation and broadcasting, we propose a novel and efficient parallel federated learning (PFL) framework that unlocks the edge nodes during global computation and the central server during local computation. This fully asynchronous and parallel pipeline enables handling complex aggregation and network stragglers, allowing flexible device participation as well as achieving scalability in computation. We theoretically prove that synchronous and asynchronous PFL can achieve a similar convergence rate as vanilla FL. Extensive experiments empirically show that our framework brings up to $5. 56\times$ speedup compared with traditional FL. Code is available at: https: //github. com/Hypervoyager/PFL.

JMLR Journal 2023 Journal Article

On the Dynamics Under the Unhinged Loss and Beyond

  • Xiong Zhou
  • Xianming Liu
  • Hanzhang Wang
  • Deming Zhai
  • Junjun Jiang
  • Xiangyang Ji

Recent works have studied implicit biases in deep learning, especially the behavior of last-layer features and classifier weights. However, they usually need to simplify the intermediate dynamics under gradient flow or gradient descent due to the intractability of loss functions and model architectures. In this paper, we introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze the closed-form dynamics while requiring as few simplifications or assumptions as possible. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization. Based on the layer-peeled model that views last-layer features as free optimization variables, we conduct a thorough analysis in the unconstrained, regularized, and spherical constrained cases, as well as the case where the neural tangent kernel remains invariant. To bridge the performance of the unhinged loss to that of Cross-Entropy (CE), we investigate the scenario of fixing classifier weights with a specific structure, (e.g., a simplex equiangular tight frame). Our analysis shows that these dynamics converge exponentially fast to a solution depending on the initialization of features and classifier weights. These theoretical results not only offer valuable insights, including explicit feature regularization and rescaled learning rates for enhancing practical training with the unhinged loss, but also extend their applicability to other loss functions. Finally, we empirically demonstrate these theoretical results and insights through extensive experiments. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

IJCAI Conference 2022 Conference Paper

DANet: Image Deraining via Dynamic Association Learning

  • Kui Jiang
  • Zhongyuan Wang
  • Zheng Wang
  • Peng Yi
  • Junjun Jiang
  • Jinsheng Xiao
  • Chia-Wen Lin

Rain streaks and background components in a rainy input are highly correlated, making the deraining task a composition of the rain streak removal and background restoration. However, the correlation of these two components is barely considered, leading to unsatisfied deraining results. To this end, we propose a dynamic associated network (DANet) to achieve the association learning between rain streak removal and background recovery. There are two key aspects to fulfill the association learning: 1) DANet unveils the latent association knowledge between rain streak prediction and background texture recovery, and leverages it as an extra prior via an associated learning module (ALM) to promote the texture recovery. 2) DANet introduces the parametric association constraint for enhancing the compatibility of deraining model with background reconstruction, enabling it to be automatically learned from the training data. Moreover, we observe that the sampled rainy image enjoys the similar distribution to the original one. We thus propose to learn the rain distribution at the sampling space, and exploit super-resolution to reconstruct high-frequency background details for computation and memory reduction. Our proposed DANet achieves the approximate deraining performance to the state-of-the-art MPRNet but only requires 52. 6\% and 23\% inference time and computational cost, respectively.

ICLR Conference 2022 Conference Paper

Learning Towards The Largest Margins

  • Xiong Zhou
  • Xianming Liu 0005
  • Deming Zhai
  • Junjun Jiang
  • Xin Gao
  • Xiangyang Ji

One of the main challenges for feature representation in deep learning-based classification is the design of appropriate loss functions that exhibit strong discriminative power. The classical softmax loss does not explicitly encourage discriminative learning of features. A popular direction of research is to incorporate margins in well-established losses in order to enforce extra intra-class compactness and inter-class separability, which, however, were developed through heuristic means, as opposed to rigorous mathematical principles. In this work, we attempt to address this limitation by formulating the principled optimization objective as learning towards the largest margins. Specifically, we firstly propose to employ the class margin as the measure of inter-class separability, and the sample margin as the measure of intra-class compactness. Accordingly, to encourage discriminative representation of features, the loss function should promote the largest possible margins for both classes and samples. Furthermore, we derive a generalized margin softmax loss to draw general conclusions for the existing margin-based losses. Not only does this principled framework offer new perspectives to understand and interpret existing margin-based losses, but it also provides new insights that can guide the design of new tools, including \textit{sample margin regularization} and \textit{largest margin softmax loss} for class balanced cases, and \textit{zero centroid regularization} for class imbalanced cases. Experimental results demonstrate the effectiveness of our strategy for multiple tasks including visual classification, imbalanced classification, person re-identification, and face verification.

AAAI Conference 2022 Conference Paper

Local Surface Descriptor for Geometry and Feature Preserved Mesh Denoising

  • Wenbo Zhao
  • Xianming Liu
  • Junjun Jiang
  • Debin Zhao
  • Ge Li
  • Xiangyang Ji

3D meshes are widely employed to represent geometry structure of 3D shapes. Due to limitation of scanning sensor precision and other issues, meshes are inevitably affected by noise, which hampers the subsequent applications. Convolultional neural networks (CNNs) achieve great success in image processing tasks, including 2D image denoising, and have been proven to own the capacity of modeling complex features at different scales, which is also particularly useful for mesh denoising. However, due to the nature of irregular structure, CNNs-based denosing strategies cannot be trivially applied for meshes. To circumvent this limitation, in the paper, we propose the local surface descriptor (LSD), which is able to transform the local deformable surface around a face into 2D grid representation and thus facilitates the deployment of CNNs to generate denoised face normals. To verify the superiority of LSD, we directly feed LSD into the classical Resnet without any complicated network design. The extensive experimental results show that, compared to the state-ofthe-arts, our method achieves encouraging performance with respect to both objective and subjective evaluations.

ICML Conference 2022 Conference Paper

Prototype-Anchored Learning for Learning with Imperfect Annotations

  • Xiong Zhou
  • Xianming Liu 0005
  • Deming Zhai
  • Junjun Jiang
  • Xin Gao
  • Xiangyang Ji

The success of deep neural networks greatly relies on the availability of large amounts of high-quality annotated data, which however are difficult or expensive to obtain. The resulting labels may be class imbalanced, noisy or human biased. It is challenging to learn unbiased classification models from imperfectly annotated datasets, on which we usually suffer from overfitting or underfitting. In this work, we thoroughly investigate the popular softmax loss and margin-based loss, and offer a feasible approach to tighten the generalization error bound by maximizing the minimal sample margin. We further derive the optimality condition for this purpose, which indicates how the class prototypes should be anchored. Motivated by theoretical analysis, we propose a simple yet effective method, namely prototype-anchored learning (PAL), which can be easily incorporated into various learning-based classification schemes to handle imperfect annotation. We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.

AAAI Conference 2022 Conference Paper

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-training for Spatial-Aware Visual Representations

  • Zhenyu Li
  • Zehui Chen
  • Ang Li
  • Liangji Fang
  • Qinhong Jiang
  • Xianming Liu
  • Junjun Jiang
  • Bolei Zhou

Pre-training has become a standard paradigm in many computer vision tasks. However, most of the methods are generally designed on the RGB image domain. Due to the discrepancy between the two-dimensional image plane and the three-dimensional space, such pre-trained models fail to perceive spatial information and serve as sub-optimal solutions for 3D-related tasks. To bridge this gap, we aim to learn a spatial-aware visual representation that can describe the three-dimensional space and is more suitable and effective for these tasks. To leverage point clouds, which are much more superior in providing spatial information compared to images, we propose a simple yet effective 2D Image and 3D Point cloud Unsupervised pre-training strategy, called SimIPU. Specifically, we develop a multi-modal contrastive learning framework that consists of an intra-modal spatial perception module to learn a spatial-aware representation from point clouds and an inter-modal feature interaction module to transfer the capability of perceiving spatial information from the point cloud encoder to the image encoder, respectively. Positive pairs for contrastive losses are established by the matching algorithm and the projection matrix. The whole framework is trained in an unsupervised end-toend fashion. To the best of our knowledge, this is the first study to explore contrastive learning pre-training strategies for outdoor multi-modal datasets, containing paired camera images and LIDAR point clouds.

AAAI Conference 2022 Conference Paper

Towards End-to-End Image Compression and Analysis with Transformers

  • Yuanchao Bai
  • Xu Yang
  • Xianming Liu
  • Junjun Jiang
  • Yaowei Wang
  • Xiangyang Ji
  • Wen Gao

We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer. Specifically, we first replace the patchify stem (i. e. , image splitting and embedding) of the ViT model with a lightweight image encoder modelled by a convolutional neural network. The compressed features generated by the image encoder are injected convolutional inductive bias and are fed to the Transformer for image classification bypassing image reconstruction. Meanwhile, we propose a feature aggregation module to fuse the compressed features with the selected intermediate features of the Transformer, and feed the aggregated features to a deconvolutional neural network for image reconstruction. The aggregated features can obtain the long-term information from the self-attention mechanism of the Transformer and improve the compression performance. The rate-distortion-accuracy optimization problem is finally solved by a two-step training strategy. Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.

ICML Conference 2021 Conference Paper

Asymmetric Loss Functions for Learning with Noisy Labels

  • Xiong Zhou
  • Xianming Liu 0005
  • Junjun Jiang
  • Xin Gao
  • Xiangyang Ji

Robust loss functions are essential for training deep neural networks with better generalization power in the presence of noisy labels. Symmetric loss functions are confirmed to be robust to label noise. However, the symmetric condition is overly restrictive. In this work, we propose a new class of loss functions, namely asymmetric loss functions, which are robust to learning from noisy labels for arbitrary noise type. Subsequently, we investigate general theoretical properties of asymmetric loss functions, including classification-calibration, excess risk bound, and noise-tolerance. Meanwhile, we introduce the asymmetry ratio to measure the asymmetry of a loss function, and the empirical results show that a higher ratio will provide better robustness. Moreover, we modify several common loss functions, and establish the necessary and sufficient conditions for them to be asymmetric. Experiments on benchmark datasets demonstrate that asymmetric loss functions can outperform state-of-the-art methods.

AAAI Conference 2020 Conference Paper

FusionDN: A Unified Densely Connected Network for Image Fusion

  • Han Xu
  • Jiayi Ma
  • Zhuliang Le
  • Junjun Jiang
  • Xiaojie Guo

In this paper, we present a new unsupervised and unified densely connected network for different types of image fusion tasks, termed as FusionDN. In our method, the densely connected network is trained to generate the fused image conditioned on source images. Meanwhile, a weight block is applied to obtain two data-driven weights as the retention degrees of features in different source images, which are the measurement of the quality and the amount of information in them. Losses of similarities based on these weights are applied for unsupervised learning. In addition, we obtain a single model applicable to multiple fusion tasks by applying elastic weight consolidation to avoid forgetting what has been learned from previous tasks when training multiple tasks sequentially, rather than train individual models for every fusion task or jointly train tasks roughly. Qualitative and quantitative results demonstrate the advantages of FusionDN compared with state-of-the-art methods in different fusion tasks.

IJCAI Conference 2019 Conference Paper

Learning a Generative Model for Fusing Infrared and Visible Images via Conditional Generative Adversarial Network with Dual Discriminators

  • Han Xu
  • Pengwei Liang
  • Wei Yu
  • Junjun Jiang
  • Jiayi Ma

In this paper, we propose a new end-to-end model, called dual-discriminator conditional generative adversarial network (DDcGAN), for fusing infrared and visible images of different resolutions. Unlike the pixel-level methods and existing deep learning-based methods, the fusion task is accomplished through the adversarial process between a generator and two discriminators, in addition to the specially designed content loss. The generator is trained to generate real-like fused images to fool discriminators. The two discriminators are trained to calculate the JS divergence between the probability distribution of downsampled fused images and infrared images, and the JS divergence between the probability distribution of gradients of fused images and gradients of visible images, respectively. Thus, the fused images can compensate for the features that are not constrained by the single content loss. Consequently, the prominence of thermal targets in the infrared image and the texture details in the visible image can be preserved or even enhanced in the fused image simultaneously. Moreover, by constraining and distinguishing between the downsampled fused image and the low-resolution infrared image, DDcGAN can be preferably applied to the fusion of different resolution images. Qualitative and quantitative experiments on publicly available datasets demonstrate the superiority of our method over the state-of-the-art.

IJCAI Conference 2018 Conference Paper

Deep CNN Denoiser and Multi-layer Neighbor Component Embedding for Face Hallucination

  • Junjun Jiang
  • Yi Yu
  • Jinhui Hu
  • Suhua Tang
  • Jiayi Ma

Most of the current face hallucination methods, whether they are shallow learning-based or deep learning-based, all try to learn a relationship model between Low-Resolution (LR) and High-Resolution (HR) spaces with the help of a training set. They mainly focus on modeling image prior through either model-based optimization or discriminative inference learning. However, when the input LR face is tiny, the learned prior knowledge is no longer effective and their performance will drop sharply. To solve this problem, in this paper we propose a general face hallucination method that can integrate model-based optimization and discriminative inference. In particular, to exploit the model based prior, the Deep Convolutional Neural Networks (CNN) denoiser prior is plugged into the super-resolution optimization model with the aid of image-adaptive Laplacian regularization. Additionally, we further develop a high-frequency details compensation method by dividing the face image to facial components and performing face hallucination in a multi-layer neighbor embedding manner. Experiments demonstrate that the proposed method can achieve promising super-resolution results for tiny input LR faces.

ICRA Conference 2018 Conference Paper

Visual Homing via Guided Locality Preserving Matching

  • Jiayi Ma 0001
  • Ji Zhao 0001
  • Junjun Jiang
  • Huabing Zhou
  • Yu Zhou 0016
  • Zheng Wang 0007
  • Xiaojie Guo 0001

This study proposes a simple yet surprisingly effective feature matching approach, termed as guided locality preserving matching (GLPM), for visual homing of panoramic images. The key idea of our approach is merely to preserve the neighborhood structures of potential true matches between two panoramic images. We formulate it into a mathematical model, and derive a simple closed-form solution with linearithmic time and linear space complexities. This enables our method to accomplish the mismatch removal from hundreds of putative correspondences in only a few milliseconds. To handle extremely large proportions of outliers, we further design a guided matching strategy based on the proposed method, using the matching result on a small putative set with a high inlier ratio to guide the matching on a large putative set. This strategy can also significantly boost true matches without sacrifice in accuracy. To apply our GLPM to the visual homing problem, we develop a method for dense motion flow estimation from sparse feature matches based on Tikhonov regularization. Moreover, the focus-of-contraction/focus-of-expansion is derived to determine homing directions. The effectiveness of our method is demonstrated on a panoramic database in both feature matching and visual homing.

IJCAI Conference 2017 Conference Paper

Locality Preserving Matching

  • Jiayi Ma
  • Ji Zhao
  • Hanqi Guo
  • Junjun Jiang
  • Huabing Zhou
  • Yuan Gao

Seeking reliable correspondences between two feature sets is a fundamental and important task in computer vision. This paper attempts to remove mismatches from given putative image feature correspondences. To achieve the goal, an efficient approach, termed as locality preserving matching (LPM), is designed, the principle of which is to maintain the local neighborhood structures of those potential true matches. We formulate the problem into a mathematical model, and derive a closed-form solution with linearithmic time and linear space complexities. More specifically, our method can accomplish the mismatch removal from thousands of putative correspondences in only a few milliseconds. Experiments on various real image pairs for general feature matching, as well as for visual homing and image retrieval demonstrate the generality of our method for handling different types of image deformations, and it is more than two orders of magnitude faster than state-of-the-art methods in the same range of or better accuracy.

AAAI Conference 2017 Conference Paper

Non-Rigid Point Set Registration with Robust Transformation Estimation under Manifold Regularization

  • Jiayi Ma
  • Ji Zhao
  • Junjun Jiang
  • Huabing Zhou

In this paper, we propose a robust transformation estimation method based on manifold regularization for non-rigid point set registration. The method iteratively recovers the point correspondence and estimates the spatial transformation between two point sets. The correspondence is established based on existing local feature descriptors which typically results in a number of outliers. To achieve an accurate estimate of the transformation from such putative point correspondence, we formulate the registration problem by a mixture model with a set of latent variables introduced to identify outliers, and a prior involving manifold regularization is imposed on the transformation to capture the underlying intrinsic geometry of the input data. The non-rigid transformation is specified in a reproducing kernel Hilbert space and a sparse approximation is adopted to achieve a fast implementation. Extensive experiments on both 2D and 3D data demonstrate that our method can yield superior results compared to other state-ofthe-arts, especially in case of badly degraded data.

IJCAI Conference 2016 Conference Paper

3D Action Recognition Using Multi-Temporal Depth Motion Maps and Fisher Vector

  • Chen Chen
  • Mengyuan Liu
  • Baochang Zhang
  • Jungong Han
  • Junjun Jiang
  • Hong Liu

This paper presents an effective local spatio-temporal descriptor for action recognition from depth video sequences. The unique property of our descriptor is that it takes the shape discrimination and action speed variations into account, intending to solve the problems of distinguishing different pose shapes and identifying the actions with different speeds in one goal. The entire algorithm is carried out in three stages. In the first stage, a depth sequence is divided into temporally overlapping depth segments which are used to generate three depth motion maps (DMMs), capturing the shape and motion cues. To cope with speed variations in actions, multiple frame lengths of depth segments are utilized, leading to a multi-temporal DMMs representation. In the second stage, all the DMMs are first partitioned into dense patches. Then, the local binary patterns (LBP) descriptor is exploited to characterize local rotation invariant texture information in those patches. In the third stage, the Fisher kernel is employed to encode the patch descriptors for a compact feature representation, which is fed into a kernel-based extreme learning machine classifier. Extensive experiments on the public MSRAction3D, MSRGesture3D and DHA datasets show that our proposed method outperforms state-of-the-art approaches for depth-based action recognition.

IJCAI Conference 2016 Conference Paper

Scale-Adaptive Low-Resolution Person Re-Identification via Learning a Discriminating Surface

  • Zheng Wang
  • Ruimin Hu
  • Yi Yu
  • Junjun Jiang
  • Chao Liang
  • Jinqiao Wang

Person re-identification, as an important task in video surveillance and forensics applications, has been widely studied. But most of previous approaches are based on the key assumption that images for comparison have the same resolution and a uniform scale. Some recent works investigate how to match low resolution query images against high resolution gallery images, but still assume that the low-resolution query images have the same scale. In real scenarios, person images may not only be with low-resolution but also have different scales. Through investigating the distance variation behavior by changing image scales, we observe that scale-distance functions, generated by image pairs under different scales from the same person or different persons, are distinguishable and can be classified as feasible (for a pair of images from the same person) or infeasible (for a pair of images from different persons). The scale-distance functions are further represented by parameter vectors in the scale-distance function space. On this basis, we propose to learn a discriminating surface separating these feasible and infeasible functions in the scale-distance function space, and use it for reidentifying persons. Experimental results on two simulated datasets and one public dataset demonstrate the effectiveness of the proposed framework.