Author name cluster

Quan Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

1 author row

EAAI Journal 2025 Journal Article

A lightweight citrus detection and counting method based on deep learning model

Jiqing Chen
Mingchang Zhang
Bin Lu
Quan Chen
Zhiwu Jiang
Peilin Li
Jingyao Gai

Details DOI

AAAI Conference 2025 Conference Paper

D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching

Jingyu Liu
Minquan Wang
Ye Ma
Bo Wang
Aozhu Chen
Quan Chen
Peng Jiang
Xirong Li

Videos showcasing specific products are increasingly important for E-commerce. Key moments naturally exist as the first appearance of a specific product, presentation of its distinctive features, the presence of a buying link, etc. Adding proper sound effects (SFX) to such moments, or video decoration with SFX (VDSFX), is crucial for enhancing user engaging experience. Previous work adds SFX to videos by video-to-SFX matching at a holistic level, lacking the ability of adding SFX to a specific moment. Meanwhile, previous studies on video highlight detection or video moment retrieval consider only moment localization, leaving moment to SFX matching untouched. By contrast, we propose in this paper D&M, a unified method that accomplishes key moment detection and moment-to-SFX matching simultaneously. Moreover, for the new VDSFX task we build a large-scale dataset SFX-Moment from an E-commerce video creation platform. For a fair comparison, we build competitive baselines by extending a number of current video moment detection methods to the new task. Extensive experiments on SFX-Moment show the superior performance of the proposed method over the baselines.

PDF Details DOI

AAAI Conference 2025 Conference Paper

LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Jian Jia
Yipei Wang
Yan Li
Honggang Chen
Xuehan Bai
Zhaocheng Liu
Jian Liang
Quan Chen

Contemporary recommendation systems predominantly rely on ID embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance and poor generalizations. Leveraging the capability of large language models to comprehend and reason about textual content presents a promising avenue for advancing recommendation systems. To achieve this, we propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge. We address computational complexity concerns by utilizing pretrained LLMs as item encoders and freezing LLM parameters to avoid catastrophic forgetting and preserve open-world knowledge. To bridge the gap between the open-world and collaborative domains, we design a twin-tower structure supervised by the recommendation task and tailored for practical industrial application. Through experiments on the real large-scale industrial dataset and online A/B tests, we demonstrate the efficacy of our approach in industry application. We also achieve state-of-the-art performance on six Amazon Review datasets to verify the superiority of our method.

PDF Details DOI

EAAI Journal 2025 Journal Article

Research on efficient and fast extraction of vineyard navigation path based on key point detection

Quan Chen
Jiqing Chen
Zhiwu Jiang
Lixiang Huang
Jingyao Gai
Peilin Li
Mingchang Zhang

Details DOI

AAAI Conference 2024 Conference Paper

Quad Bayer Joint Demosaicing and Denoising Based on Dual Encoder Network with Joint Residual Learning

Bolun Zheng
Haoran Li
Quan Chen
Tingyu Wang
Xiaofei Zhou
Zhenghui Hu
Chenggang Yan

The recent imaging technology Quad Bayer CFA brings better imaging PSNR and higher visual quality compared to traditional Bayer CFA, but also serious challenges for demosaicing and denoising during the ISP pipeline. In this paper, we propose a novel dual encoder network, namely DRNet, to achieve joint demosaicing and denoising for Quad Bayer CFA. The dual encoders are carefully designed in that one is mainly constructed by a joint residual block to jointly estimate the residuals for demosaicing and denoising separately. In contrast, the other one is started with a pixel modulation block which is specially designed to match the characteristics of Quad Bayer pattern for better feature extraction. We demonstrate the effectiveness of each proposed component through detailed ablation investigations. The comparison results on public benchmarks illustrate that our DRNet achieves an apparent performance gain~(0.38dB to the 2nd best) from the state-of-the-art method and balances performance and efficiency well. The experiments on real-world images show that the proposed method could enhance the reconstruction quality from the native ISP algorithm.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

Kaibin Tian
Yanhua Cheng
Yi Liu
Xinglin Hou
Quan Chen
Han Li

In recent years, text-to-video retrieval methods based on CLIP have experienced rapid development. The primary direction of evolution is to exploit the much wider gamut of visual and textual cues to achieve alignment. Concretely, those methods with impressive performance often design a heavy fusion block for sentence (words)-video (frames) interaction, regardless of the prohibitive computation complexity. Nevertheless, these approaches are not optimal in terms of feature utilization and retrieval efficiency. To address this issue, we adopt multi-granularity visual feature learning, ensuring the model's comprehensiveness in capturing visual content features spanning from abstract to detailed levels during the training phase. To better leverage the multi-granularity features, we devise a two-stage retrieval architecture in the retrieval phase. This solution ingeniously balances the coarse and fine granularity of retrieval content. Moreover, it also strikes a harmonious equilibrium between retrieval effectiveness and efficiency. Specifically, in training phase, we design a parameter-free text-gated interaction block (TIB) for fine-grained video representation learning and embed an extra Pearson Constraint to optimize cross-modal representation learning. In retrieval phase, we use coarse-grained video representations for fast recall of top-k candidates, which are then reranked by fine-grained video representations. Extensive experiments on four benchmarks demonstrate the efficiency and effectiveness. Notably, our method achieves comparable performance with the current state-of-the-art methods while being nearly 50 times faster.

PDF Details DOI

YNIMG Journal 2024 Journal Article

Whole brain multiparametric mapping in two minutes using a dual-flip-angle stack-of-stars blipped multi-gradient-echo acquisition

Wenlong Feng
Zekang Ding
Quan Chen
Huajun She
Yiping P. Du

Details DOI

AAAI Conference 2023 Conference Paper

Improving Dynamic HDR Imaging with Fusion Transformer

Rufeng Chen
Bolun Zheng
Hua Zhang
Quan Chen
Chenggang Yan
Gregory Slabaugh
Shanxin Yuan

Reconstructing a High Dynamic Range (HDR) image from several Low Dynamic Range (LDR) images with different exposures is a challenging task, especially in the presence of camera and object motion. Though existing models using convolutional neural networks (CNNs) have made great progress, challenges still exist, e.g., ghosting artifacts. Transformers, originating from the field of natural language processing, have shown success in computer vision tasks, due to their ability to address a large receptive field even within a single layer. In this paper, we propose a transformer model for HDR imaging. Our pipeline includes three steps: alignment, fusion, and reconstruction. The key component is the HDR transformer module. Through experiments and ablation studies, we demonstrate that our model outperforms the state-of-the-art by large margins on several popular public datasets.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Progressive Feature Polishing Network for Salient Object Detection

Bo Wang
Quan Chen
Min Zhou
Zhiqiang Zhang
Xiaogang Jin
Kun Gai

Feature matters for salient object detection. Existing methods mainly focus on designing a sophisticated structure to incorporate multi-level features and ﬁlter out cluttered features. We present Progressive Feature Polishing Network (PFPN), a simple yet effective framework to progressively polish the multi-level features to be more accurate and representative. By employing multiple Feature Polishing Modules (FPMs) in a recurrent manner, our approach is able to detect salient objects with ﬁne details without any post-processing. A FPM parallelly updates the features of each level by directly incorporating all higher level context information. Moreover, it can keep the dimensions and hierarchical structures of the feature maps, which makes it ﬂexible to be integrated with any CNN-based models. Empirical experiments show that our results are monotonically getting better with increasing number of FPMs. Without bells and whistles, PFPN outperforms the state-of-the-art methods signiﬁcantly on ﬁve benchmark datasets under various evaluation metrics. Our code is available at: https: //github. com/chenquan-cq/PFPN.

PDF Details