Author name cluster

Junyang Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers

2 author rows

AAAI Conference 2026 Conference Paper

LoGoSeg: Integrating Local and Global Features for Open-Vocabulary Semantic Segmentation

Junyang Chen
Xiangbo Lv
Zhiqiang Kou
Xingdong Sheng
Ning Xu
Yiguo Qiao

Open-vocabulary semantic segmentation (OVSS) extends traditional closed-set segmentation by enabling pixel-wise annotation for both seen and unseen categories using arbitrary textual descriptions. While existing methods leverage vision-language models (VLMs) like CLIP, their reliance on image-level pretraining often results in imprecise spatial alignment, leading to mismatched segmentations in ambiguous or cluttered scenes. However, most existing approaches lack strong object priors and region-level constraints, which can lead to object hallucination or missed detections, further degrading performance. To address these challenges, we propose LoGoSeg, an efficient single-stage framework that integrates three key innovations: (i) an object existence prior that dynamically weights relevant categories through global image-text similarity, effectively reducing hallucinations; (ii) a region-aware alignment module that establishes precise region-level visual-textual correspondences; and (iii) a dual-stream fusion mechanism that optimally combines local structural information with global semantic context. Unlike prior works, LoGoSeg eliminates the need for external mask proposals, additional backbones, or extra datasets, ensuring efficiency. Extensive experiments on six benchmarks (A-847, PC-459, A-150, PC-59, PAS-20, and PAS-20b) demonstrate its competitive performance and strong generalization in open-vocabulary settings.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Toward Multimodal Fake News Detection by Multi-perspective Rationale Generation and Verification

Junyang Chen
Yueqian Li
Ka Chung Ng
Huan Wang
Liang-Jie Zhang

The rapid proliferation of social media platforms has led to a surge in multimodal fake news, where deceptive content often combines text and images to mislead audiences. Traditional unimodal detection methods struggle to address the complexity of such content, necessitating holistic multimodal approaches. While the latest advancements in Multimodal Large Language Models (MLLMs) offer new opportunities for enhancing detection performance by analyzing multi-dimensional features, including source credibility, cross-modal contradictions, emotional bias, and manipulative writing patterns, these methods suffer from a key flaw: a susceptibility to hallucinations or erroneous reasoning, which can lead to flawed conclusions and ultimately biased detection results. We propose the Multimodal Fake News Detection via Multi-perspective Rationale Generation and Verification (MMRGV) model to mitigate this challenge. Our method employs a cross-verification mechanism to screen and reconcile contradictions among different rationales, thereby preserving the LLM's analytical advantages while mitigating the impact of erroneous reasoning or hallucinations on the final detection. Subsequently, these optimized rationales are fused via an adaptive weighting strategy to output a robust final prediction. Extensive experiments on three benchmark datasets (Twitter, Weibo, and GossipCop) demonstrate the superiority of our method, achieving state-of-the-art accuracy of 0.9972, 0.9663, and 0.8772, respectively, and significantly outperforming existing baselines. These results validate the effectiveness of multi-perspective rationale generation and cross-verification in enhancing multimodal fake news detection, offering a resilient solution to combat misinformation in the era of generative AI.

PDF Details DOI

AAAI Conference 2026 Conference Paper

TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models

Hui Wang
Cheng Liu
Junyang Chen
Haoze Liu
Yuhang Jia
Shiwan Zhao
Jiaming Zhou
Haoqin Sun

Text-to-Audio (TTA) generation has made rapid progress, but current evaluation methods remain narrow, focusing mainly on perceptual quality while overlooking robustness, generalization, and ethical concerns. We present TTA-Bench, a comprehensive benchmark for evaluating TTA models across functional performance, reliability, and social responsibility. It covers seven dimensions including accuracy, robustness, fairness, and toxicity, and includes 2,999 diverse prompts generated through automated and manual methods. We introduce a unified evaluation protocol that combines objective metrics with over 118,000 human annotations from both experts and general users. Ten state-of-the-art models are benchmarked under this framework, offering detailed insights into their strengths and limitations. TTA-Bench establishes a new standard for holistic evaluation of TTA systems.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Zero-shot Recommendation: Towards Class Semantic Relation Learning for Inferring Labels of Unseen Micro-videos

Junyang Chen
Huan Wang
Yirui Wu
Qiuzhen Lin
Yunfeng Diao
Junkai Ji

Micro-video label prediction plays a pivotal role on contemporary video-sharing platforms, such as Kwai and Tiktok. The emergence of video content lacking labels presents a formidable challenge for conventional user interest prediction methods. This paper addresses the challenge of micro-video label prediction, particularly for unseen videos, by proposing a zero-shot method called Class Semantic Relation Learning (CSRL). Unlike traditional user interest prediction models, CSRL leverages the pre-trained Large Language Model (LLM) to enhance prediction accuracy for unlabeled videos. The novelty of CSRL lies in its integration of three key components: a raw feature autoencoder, LLM-enhanced features, and a decomposed graph network. The decomposed graph network is specifically designed to disentangle the relationships between labeled and unlabeled videos, offering a significant improvement over previous methods. By fusing hidden topics with LLM-enhanced text, CSRL effectively handles sparse video features. Experiments on large-scale datasets from the Kwai platform show that CSRL achieves state-of-the-art results, with up to 44.64% improvement in Hit Ratio (HR), highlighting its superiority over existing zero-shot recommendation models in predicting user interests within the user-video network.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

ABNet: Mitigating Sample Imbalance in Anomaly Detection Within Dynamic Graphs

Yifan Hong
Muhammad Asif Ali
Huan Wang
Junyang Chen
Di Wang

In dynamic graphs, detecting anomalous nodes faces challenges due to sample imbalance, stemming from the scarcity of anomalous samples and feature representation bias. Existing methods often use unsupervised or semi-supervised learning to extract anomalous samples from unlabeled data, but struggle to obtain enough anomalous instances due to their low occurrence. Moreover, GNN-based approaches often prioritize normal samples, neglecting rare anomalies. To address these issues, we propose the Anomaly Balance Network (ABNet), designed to alleviate sample imbalance and enhance anomaly detection. ABNet includes three key components: a feature extractor that compares node features across time points to avoid bias, an anomaly augmenter that amplifies anomaly details and generates diverse anomalous samples, and an anomaly detector using meta-learning to adapt to graph evolution. Experimental results show that ABNet outperforms existing methods on three real-world datasets, effectively addressing sample imbalance.

PDF Details DOI

ICML Conference 2025 Conference Paper

Can DBNNs Robust to Environmental Noise for Resource-constrained Scenarios?

Wendong Zheng
Junyang Chen
Husheng Guo
Wenjian Wang

Recently, the potential of lightweight models for resource-constrained scenarios has garnered significant attention, particularly in safety-critical tasks such as bio-electrical signal classification and B-ultrasound-assisted diagnostic. These tasks are frequently affected by environmental noise due to patient movement artifacts and inherent device noise, which pose significant challenges for lightweight models (e. g. , deep binary neural networks (DBNNs)) to perform robust inference. A pertinent question arises: can a well-trained DBNN effectively resist environmental noise during inference? In this study, we find that the DBNN’s robustness vulnerability comes from the binary weights and scaling factors. Drawing upon theoretical insights, we propose L1-infinite norm constraints for binary weights and scaling factors, which yield a tighter upper bound compared to existing state-of-the-art (SOTA) methods. Finally, visualization studies show that our approach introduces minimal noise perturbations at the periphery of the feature maps. Our approach outperforms the SOTA method, as validated by several experiments conducted on the bio-electrical and image classification datasets. We hope our findings can raise awareness among researchers about the environmental noise robustness of DBNNs.

AAAI Conference 2025 Conference Paper

Deconfound Semantic Shift and Incompleteness in Incremental Few-shot Semantic Segmentation

Yirui Wu
Yuhang Xia
Hao Li
Lixin Yuan
Junyang Chen
Jun Liu
Tong Lu
Shaohua Wan

Incremental few-shot semantic segmentation (IFSS) expands segmentation capacity of the trained model to segment new-class images with few samples. However, semantic meanings may shift from background to object class or vice versa during incremental learning. Moreover, new-class samples often lack representative attribute features when the new class greatly differs from the pre-learned old class. In this paper, we propose a causal framework to discuss the cause of semantic shift and incompleteness in IFSS, and we deconfound the revealed causal effects from two aspects. First, we propose a Causal Intervention Module (CIM) to resist semantic shift. CIM progressively and adaptively updates prototypes of old class, and removes the confounder in an intervention manner. Second, a Prototype Refinement Module (PRM) is proposed to complete the missing semantics. In PRM, knowledge gained from the episode learning scheme assists in fusing features of new-class and old-class prototypes. Experiments on both PASCAL-VOC 2012 and ADE20k benchmarks demonstrate the outstanding performance of our method.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Deduction with Induction: Combining Knowledge Discovery and Reasoning for Interpretable Deep Reinforcement Learning

Haodi Zhang
Xiangyu Zeng
Junyang Chen
Yuanfeng Song
Rui Mao
Fangzhen Lin

Deep reinforcement learning (DRL) has achieved remarkable success in dynamic decision-making tasks. However, its inherent opacity and cold start problem hinder transparency and training efficiency. To address these challenges, we propose HRL-ID, a neural-symbolic framework that combines automated rule discovery with logical reasoning within a hierarchical DRL structure. HRL-ID dynamically extracts first-order logic rules from environmental interactions, iteratively refines them through success-based updates, and leverages these rules to guide action execution during training. Extensive experiments on Atari benchmarks demonstrate that HRL-ID outperforms state-of-the-art methods in training efficiency and interpretability, achieving higher reward rates and successful knowledge transfer between domains.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Diffuse& Refine: Intrinsic Knowledge Generation and Aggregation for Incremental Object Detection

Jianzhou Wang
Yirui Wu
Lixin Yuan
Wenxiao Zhang
Jun Liu
Junyang Chen
Huan Wang
Wenhai Wang

Incremental Object Detection(IOD) targets at progressively extending capability of object detectors to recognize new classes. However, representation confusion between old and new classes leads to catastrophic forgetting. To alleviate this problem, we propose DiffKA, with intrinsic knowledge generated and aggregated by forward and backward diffusion, gradually establishing rigid class boundary. With incremental streaming data, forward diffusion spreads information to generate potential inter-class associations among new- and old-class prototypes within a hierarchical tree, named as Intrinsic Correlation Tree(ICTree), to store intrinsic knowledge. Afterwards, backward diffusion refines and aggregates the generated knowledge in ICTree, explicitly establishing rigid class boundary to mitigate representation confusion. To keep semantic consistency with extreme IOD settings, we reorganize semantic relevance of old- and new-class prototypes in paradigms to adaptively and effectively update DiffKA. Experiments on MS COCO dataset show DiffKA achieves state-of-the-art performance on IOD tasks with significant advantages.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

RankMatch: A Novel Approach to Semi-Supervised Label Distribution Learning Leveraging Rank Correlation between Labels

Zhiqiang Kou
Yucheng Xie
Hailin Wang
Junyang Chen
Jingq Wang
Ming-Kun Xie
Shuo Chen
Yuheng Jia

Pseudo label based semi-supervised learning (SSL) for single-label and multi-label classification tasks has been extensively studied; however, semi-supervised label distribution learning (SSLDL) remains a largely unexplored area. Existing SSL methods fail in SSLDL because the pseudo-labels they generate only ensure overall similarity to the ground truth but do not preserve the ranking relationships between true labels, as they rely solely on KL divergence as the loss function during training. These skewed pseudo-labels lead the model to learn incorrect semantic relationships, resulting in reduced performance accuracy. To address these issues, we propose a novel SSLDL method called \textit{RankMatch}. \textit{RankMatch} fully considers the ranking relationships between different labels during the training phase with labeled data to generate higher-quality pseudo-labels. Furthermore, our key observation is that a flexible utilization of pseudo-labels can enhance SSLDL performance. Specifically, focusing solely on the ranking relationships between labels while disregarding their margins helps prevent model overfitting. Theoretically, we prove that incorporating ranking correlations enhances SSLDL performance and establish generalization error bounds for \textit{RankMatch}. Finally, extensive real-world experiments validate its effectiveness.

NeurIPS Conference 2025 Conference Paper

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

Chen Yang
Hui Wang
Shiyao Wang
Junyang Chen
Jiabei He
Jiaming Zhou
Xi Yang
Yequan Wang

While voice technologies increasingly serve aging populations, current systems exhibit significant performance gaps due to inadequate training data capturing elderly-specific vocal characteristics like presbyphonia and dialectal variations. The limited data available on super-aged individuals in existing elderly speech datasets, coupled with overly simple recording styles and annotation dimensions, exacerbates this issue. To address the critical scarcity of speech data from individuals aged 75 and above, we introduce SeniorTalk, a carefully annotated Chinese spoken dialogue dataset. This dataset contains 55. 53 hours of speech from 101 natural conversations involving 202 participants, ensuring a strategic balance across gender, region, and age. Through detailed annotation across multiple dimensions, it can support a wide range of speech tasks. We perform extensive experiments on speaker verification, speaker diarization, speech recognition, and speech editing tasks, offering crucial insights for the development of speech technologies targeting this age group. Code is available at https: //github. com/flageval-baai/SeniorTalk and data at https: //huggingface. co/datasets/evan0617/seniortalk.

IJCAI Conference 2024 Conference Paper

CONC: Complex-noise-resistant Open-set Node Classification with Adaptive Noise Detection

Qin Zhang
Jiexin Lu
Xiaowei Li
Huisi Wu
Shirui Pan
Junyang Chen

As a popular task in graph learning, node classification seeks to assign labels to nodes, taking into account both their features and connections. However, an important challenge for its application in real-world scenarios is the presence of newly-emerged out-of-distribution samples and noisy samples, which affect the quality and robustness of learned classifiers. Out-of-distribution (OOD) samples are often found in both the training and testing phases. Such samples don’t belong to any known categories. These OOD samples are considered as outliers (OOD noise) when they appear during training, and are recognized as open-set samples during the testing. Meanwhile, in-distribution (IND) noisy data, i. e. , known class samples with wrong labels, are also prevalent and inevitably degrade a model’s performance. The challenge of open-set learning with complex IND and OOD noise remains largely unexplored, particularly when dealing with non-IID graph data. To address these challenges, this paper introduces a novel complex-noise-resistant open-set node classification approach, designed for open-set graph data containing both IND and OOD noisy nodes. Specifically, a trustworthiness learner is adopted to learn the trustworthiness rates of the feature and label for each node while a decoder and an open-set classifier are trained to reconstruct the structure of a node and to predict its category simultaneously with the guidance of node trustworthiness. The experimental results demonstrate the superiority of our method.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

EGonc : Energy-based Open-Set Node Classification with substitute Unknowns

Qin Zhang
Zelin Shi
Shirui Pan
Junyang Chen
Huisi Wu
Xiaojun Chen

Open-set Classification (OSC) is a critical requirement for safely deploying machine learning models in the open world, which aims to classify samples from known classes and reject samples from out-of-distribution (OOD). Existing methods exploit the feature space of trained network and attempt at estimating the uncertainty in the predictions. However, softmax-based neural networks are found to be overly confident in their predictions even on data they have never seen before andthe immense diversity of the OOD examples also makes such methods fragile. To this end, we follow the idea of estimating the underlying density of the training data to decide whether a given input is close to the in-distribution (IND) data and adopt Energy-based models (EBMs) as density estimators. A novel energy-based generative open-set node classification method, \textit{EGonc}, is proposed to achieve open-set graph learning. Specifically, we generate substitute unknowns to mimic the distribution of real open-set samples firstly, based on the information of graph structures. Then, an additional energy logit representing the virtual OOD class is learned from the residual of the feature against the principal space, and matched with the original logits by a constant scaling. This virtual logit serves as the indicator of OOD-ness. EGonc has nice theoretical properties that guarantee an overall distinguishable margin between the detection scores for IND and OOD samples. Comprehensive experimental evaluations of EGonc also demonstrate its superiority.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language

Xiang Fang
Daizong Liu
Wanlong Fang
Pan Zhou
Zichuan Xu
Wenzheng Xu
Junyang Chen
Renfu Li

Given an untrimmed video and a sentence query, video moment retrieval using language (VMR) aims to locate a target query-relevant moment. Since the untrimmed video is overlong, almost all existing VMR methods first sparsely down-sample each untrimmed video into multiple fixed-length video clips and then conduct multi-modal interactions with the query feature and expensive clip features for reasoning, which is infeasible for long real-world videos that span hours. Since the video is downsampled into fixed-length clips, some query-related frames may be filtered out, which will blur the specific boundary of the target moment, take the adjacent irrelevant frames as new boundaries, easily leading to cross-modal misalignment and introducing both boundary-bias and reasoning-bias. To this end, in this paper, we propose an efficient approach, SpotVMR, to trim the query-relevant clip. Besides, our proposed SpotVMR can serve as plug-and-play module, which achieves efficiency for state-of-the-art VMR methods while maintaining good retrieval performance. Especially, we first design a novel clip search model that learns to identify promising video regions to search conditioned on the language query. Then, we introduce a set of low-cost semantic indexing features to capture the context of objects and interactions that suggest where to search the query-relevant moment. Also, the distillation loss is utilized to address the optimization issues arising from end-to-end joint training of the clip selector and VMR model. Extensive experiments on three challenging datasets demonstrate its effectiveness.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

ParsNets: A Parsimonious Composition of Orthogonal and Low-Rank Linear Networks for Zero-Shot Learning

Jingcai Guo
Qihua Zhou
Xiaocheng Lu
Ruibin Li
Ziming Liu
Jie Zhang
Bo Han
Junyang Chen

This paper provides a novel parsimonious yet efficient design for zero-shot learning (ZSL), dubbed ParsNets, in which we are interested in learning a composition of on-device friendly linear networks, each with orthogonality and low-rankness properties, to achieve equivalent or better performance against deep models. Concretely, we first refactor the core module of ZSL, i. e. , the visual-semantics mapping function, into several base linear networks that correspond to diverse components of the semantic space, wherein the complex nonlinearity can be collapsed into simple local linearities. Then, to facilitate the generalization of local linearities, we construct a maximal margin geometry on the learned features by enforcing low-rank constraints on intra-class samples and high-rank constraints on inter-class samples, resulting in orthogonal subspaces for different classes. To enhance the model's adaptability and counterbalance the over-/under-fittings, a set of sample-wise indicators is employed to select a sparse subset from these base linear networks to form a composite semantic predictor for each sample. Notably, maximal margin geometry can guarantee the diversity of features and, meanwhile, local linearities guarantee efficiency. Thus, our ParsNets can generalize better to unseen classes and can be deployed flexibly on resource-constrained devices.

PDF Details DOI

JBHI Journal 2024 Journal Article

Quaternion Cross-Modality Spatial Learning for Multi-Modal Medical Image Segmentation

Junyang Chen
Guoheng Huang
Xiaochen Yuan
Guo Zhong
Zewen Zheng
Chi-Man Pun
Jian Zhu
Zhixin Huang

Recently, the Deep Neural Networks (DNNs) have had a large impact on imaging process including medical image segmentation, and the real-valued convolution of DNN has been extensively utilized in multi-modal medical image segmentation to accurately segment lesions via learning data information. However, the weighted summation operation in such convolution limits the ability to maintain spatial dependence that is crucial for identifying different lesion distributions. In this paper, we propose a novel Quaternion Cross-modality Spatial Learning (Q-CSL) which explores the spatial information while considering the linkage between multi-modal images. Specifically, we introduce to quaternion to represent data and coordinates that contain spatial information. Additionally, we propose Quaternion Spatial-association Convolution to learn the spatial information. Subsequently, the proposed De-level Quaternion Cross-modality Fusion (De-QCF) module excavates inner space features and fuses cross-modality spatial dependency. Our experimental results demonstrate that our approach compared to the competitive methods perform well with only 0. 01061 M parameters and 9. 95G FLOPs.

AAAI Conference 2024 Conference Paper

ROG_PL: Robust Open-Set Graph Learning via Region-Based Prototype Learning

Qin Zhang
Xiaowei Li
Jiexin Lu
Liping Qiu
Shirui Pan
Xiaojun Chen
Junyang Chen

Open-set graph learning is a practical task that aims to classify the known class nodes and to identify unknown class samples as unknowns. Conventional node classification methods usually perform unsatisfactorily in open-set scenarios due to the complex data they encounter, such as out-of-distribution (OOD) data and in-distribution (IND) noise. OOD data are samples that do not belong to any known classes. They are outliers if they occur in training (OOD noise), and open-set samples if they occur in testing. IND noise are training samples which are assigned incorrect labels. The existence of IND noise and OOD noise is prevalent, which usually cause the ambiguity problem, including the intra-class variety problem and the inter-class confusion problem. Thus, to explore robust open-set learning methods is necessary and difficult, and it becomes even more difficult for non-IID graph data. To this end, we propose a unified framework named ROG_PL to achieve robust open-set learning on complex noisy graph data, by introducing prototype learning. In specific, ROG_PL consists of two modules, i.e., denoising via label propagation and open-set prototype learning via regions. The first module corrects noisy labels through similarity-based label propagation and removes low-confidence samples, to solve the intra-class variety problem caused by noise. The second module learns open-set prototypes for each known class via non-overlapped regions and remains both interior and border prototypes to remedy the inter-class confusion problem. The two modules are iteratively updated under the constraints of classification loss and prototype diversity loss. To the best of our knowledge, the proposed ROG_PL is the first robust open-set node classification method for graph data with complex noise. Experimental evaluations of ROG_PL on several benchmark graph datasets demonstrate that it has good performance.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Sharpness-Aware Model-Agnostic Long-Tailed Domain Generalization

Houcheng Su
Weihao Luo
Daixian Liu
Mengzhu Wang
Jing Tang
Junyang Chen
Cong Wang
Zhenghan Chen

Domain Generalization (DG) aims to improve the generalization ability of models trained on a specific group of source domains, enabling them to perform well on new, unseen target domains. Recent studies have shown that methods that converge to smooth optima can enhance the generalization performance of supervised learning tasks such as classification. In this study, we examine the impact of smoothness-enhancing formulations on domain adversarial training, which combines task loss and adversarial loss objectives. Our approach leverages the fact that converging to a smooth minimum with respect to task loss can stabilize the task loss and lead to better performance on unseen domains. Furthermore, we recognize that the distribution of objects in the real world often follows a long-tailed class distribution, resulting in a mismatch between machine learning models and our expectations of their performance on all classes of datasets with long-tailed class distributions. To address this issue, we consider the domain generalization problem from the perspective of the long-tail distribution and propose using the maximum square loss to balance different classes which can improve model generalizability. Our method's effectiveness is demonstrated through comparisons with state-of-the-art methods on various domain generalization datasets. Code: https://github.com/bamboosir920/SAMALTDG.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Sparse Enhanced Network: An Adversarial Generation Method for Robust Augmentation in Sequential Recommendation

Junyang Chen
Guoxuan Zou
Pan Zhou
Wu Yirui
Zhenghan Chen
Houcheng Su
Huan Wang
Zhiguo Gong

Sequential Recommendation plays a significant role in daily recommendation systems, such as e-commerce platforms like Amazon and Taobao. However, even with the advent of large models, these platforms often face sparse issues in the historical browsing records of individual users due to new users joining or the introduction of new products. As a result, existing sequence recommendation algorithms may not perform well. To address this, sequence-based data augmentation methods have garnered attention. Existing sequence enhancement methods typically rely on augmenting existing data, employing techniques like cropping, masking prediction, random reordering, and random replacement of the original sequence. While these methods have shown improvements, they often overlook the exploration of the deep embedding space of the sequence. To tackle these challenges, we propose a Sparse Enhanced Network (SparseEnNet), which is a robust adversarial generation method. SparseEnNet aims to fully explore the hidden space in sequence recommendation, generating more robust enhanced items. Additionally, we adopt an adversarial generation method, allowing the model to differentiate between data augmentation categories and achieve better prediction performance for the next item in the sequence. Experiments have demonstrated that our method achieves a remarkable 4-14% improvement over existing methods when evaluated on the real-world datasets. (https://github.com/junyachen/SparseEnNet)

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

PromptRestorer: A Prompting Image Restoration Method with Degradation Perception

Cong Wang
Jinshan Pan
Wei Wang
Jiangxin Dong
Mengzhu Wang
Yakun Ju
Junyang Chen

We show that raw degradation features can effectively guide deep restoration models, providing accurate degradation priors to facilitate better restoration. While networks that do not consider them for restoration forget gradually degradation during the learning process, model capacity is severely hindered. To address this, we propose a Prompting image Restorer, termed as PromptRestorer. Specifically, PromptRestorer contains two branches: a restoration branch and a prompting branch. The former is used to restore images, while the latter perceives degradation priors to prompt the restoration branch with reliable perceived content to guide the restoration process for better recovery. To better perceive the degradation which is extracted by a pre-trained model from given degradation observations, we propose a prompting degradation perception modulator, which adequately considers the characters of the self-attention mechanism and pixel-wise modulation, to better perceive the degradation priors from global and local perspectives. To control the propagation of the perceived content for the restoration branch, we propose gated degradation perception propagation, enabling the restoration branch to adaptively learn more useful features for better recovery. Extensive experimental results show that our PromptRestorer achieves state-of-the-art results on 4 image restoration tasks, including image deraining, deblurring, dehazing, and desnowing.