EAAI Journal 2026 Journal Article
Learning a joint mutual-guidance enhancement network for degraded low-light color image and low-resolution depth map
- Bintao Chen
- Lijun Zhao
- Jinjing Zhang
- Anhong Wang
- Huihui Bai
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2026 Journal Article
AAAI Conference 2025 Conference Paper
Zero-shot learning (ZSL) endeavors to transfer knowledge from the seen categories to recognize unseen categories, which mostly relies on the semantic-visual interactions between image and attribute tokens. Recently, the prompt learning has emerged in ZSL and demonstrated significant potential as it allows the zero-shot transfer of diverse visual concepts to downstream tasks. However, current methods explore the fixed adaptation of the learnable prompt on the seen domains, which make them over-emphasize the primary visual features observed during training, limiting their generalization capabilities to the unseen domains. In this work, we propose AENet, which endows semantic information into the visual prompt to distill semantic-enhanced prompt for visual representation enrichment, enabling effective knowledge transfer for ZSL. AENet comprises two key steps: 1) exploring the concept-harmonized tokens for the visual and attribute modalities, grounded on the modal-sharing token that represents consistent visual-semantic concepts; and 2) yielding the semantic-enhanced prompt via the visual residual refinement unit with attribute consistency supervision. It is further integrated with primary visual features to attend to semantic-related information for visual enhancement, thus strengthening transferable ability. Experimental results on three benchmarks show that our AENet outperforms existing state-of-the-art ZSL methods.
AAAI Conference 2025 Conference Paper
Existing unsupervised distillation-based methods rely on the differences between encoded and decoded features to locate abnormal regions in test images. However, the decoder trained only on normal samples still reconstructs abnormal patch features well, degrading performance. This issue is particularly pronounced in unsupervised multi-class anomaly detection tasks. We attribute this behavior to ‘over-generalization’ (OG) of decoder: the significantly increasing diversity of patch patterns in multi-class training enhances the model generalization on normal patches, but also inadvertently broadens its generalization to abnormal patches. To mitigate ‘OG’, we propose a novel approach that leverages class-agnostic learnable prompts to capture common textual normality across various visual patterns, and then apply them to guide the decoded features towards a ‘normal’ textual representation, suppressing ‘over-generalization’ of the decoder on abnormal patterns. To further improve performance, we also introduce a gated mixture-of-experts module to specialize in handling diverse patch patterns and reduce mutual interference between them in multi-class training. Our method achieves competitive performance on the MVTec AD and VisA datasets, demonstrating its effectiveness.
NeurIPS Conference 2025 Conference Paper
Neural video compression (NVC) has made significant progress in recent years, while neural B-frame video compression (NBVC) remains underexplored compared to P-frame compression. NBVC can adopt bi-directional reference frames for better compression performance. However, NBVC's hierarchical coding may complicate continuous temporal prediction, especially at some hierarchical levels with a large frame span, which could cause the contribution of the two reference frames to be unbalanced. To optimize reference information utilization, we propose a novel NBVC method, termed Bi-directional Reference Harmonization Video Compression (BRHVC), with the proposed Bi-directional Motion Converge (BMC) and Bi-directional Contextual Fusion (BCF). BMC converges multiple optical flows in motion compression, leading to more accurate motion compensation on a larger scale. Then BCF explicitly models the weights of reference contexts under the guidance of motion compensation accuracy. With more efficient motions and contexts, BRHVC can effectively harmonize bi-directional references. Experimental results indicate that our BRHVC outperforms previous state-of-the-art NVC methods, even surpassing the traditional coding, VTM-RA (under random access configuration), on the HEVC datasets. The source code will be released. The source code is released at https: //github. com/kwai/NVC.
NeurIPS Conference 2025 Conference Paper
Due to the difficulty in collecting all unexpected abnormal patterns, One-Class Classification (OCC) has become the most popular approach to anomaly detection (AD). Reconstruction-based AD method relies on the discrepancy between inputs and reconstructed results to identify unobserved anomalies. However, recent methods trained only on normal samples may generalize to certain abnormal inputs, leading to well-reconstructed anomalies and degraded performance. To address this, we constrain reconstructions to remain on the normal manifold using a novel AD framework based on contraction mapping. This mapping guarantees that any input converges to a fixed point through iterations of this mapping. Based on this property, training the contraction mapping using only normal data ensures that its fixed point lies within the normal manifold. As a result, abnormal inputs are iteratively transformed toward the normal manifold, increasing the reconstruction error. In addition, the inherent invertibility of contraction mapping enables flow-based density estimation, where a prior distribution learned from the previous reconstruction is used to estimate the input likelihood for anomaly detection, further improving the performance. Using both mechanisms, we propose a bidirectional structure with forward reconstruction and backward density estimation. Extensive experiments on tabular data, natural image, and industrial image data demonstrate the effectiveness of our method. The code is available at URD.
IJCAI Conference 2020 Conference Paper
Recently, Convolutional Neural Networks (CNN) based image super-resolution (SR) have shown significant success in the literature. However, these methods are implemented as single-path stream to enrich feature maps from the input for the final prediction, which fail to fully incorporate former low-level features into later high-level features. In this paper, to tackle this problem, we propose a deep interleaved network (DIN) to learn how information at different states should be combined for image SR where shallow information guides deep representative features prediction. Our DIN follows a multi-branch pattern allowing multiple interconnected branches to interleave and fuse at different states. Besides, the asymmetric co-attention (AsyCA) is proposed and attacked to the interleaved nodes to adaptively emphasize informative features from different states and improve the discriminative ability of networks. Extensive experiments demonstrate the superiority of our proposed DIN in comparison with the state-of-the-art SR methods.