Author name cluster

Cong Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

EAAI Journal 2026 Journal Article

Multi-view simulation for robust polyp segmentation via cross-gated decoding and soft-attention fusion

Linbo Wang
Cong Chen
Jinxian Qiu
Zhengyi Liu
Xianyong Fang
Shaohua Wan

Polyp segmentation is crucial for early colorectal cancer detection but remains challenging due to significant shape variations and ambiguous boundaries. Existing methods often rely on single-view analysis, overlooking the potential of multi-view representations to provide complementary segmentation cues. To address this, we propose a novel multi-view simulation-based polyp segmentation network (MVSNet) that generates diverse views of an input image through directional flipping and extracts robust features using a shared Pyramid Vision Transformer (PVT). Additional two tactics are proposed to effectively utilize the rich features from each view and the whole view set respectively: (1) A cross gating based view-aware multi-stage decoding method, which applies the element-wise cross gating to both the coarse and fine features in each stage and thus boosts multi-stage decoded features with high discrimination specific to each individual view for its initial segmentation mask; and (2) a soft-attention based cross-View prediction method, which takes soft attention among different views to adaptively weight the contributions from each view for the final prediction. Extensive experiments on five benchmark datasets (Kvasir-SEG, ClinicDB, ColonDB, ETIS, and Endoscene) validate the effectiveness of our approach, achieving the highest mean Dice scores of 0. 923, 0. 946, 0. 825, 0. 821, and 0. 902 respectively, demonstrating consistent superiority over existing state-of-the-art competitors. Code is available at https: //github. com/linbowang/MVSNet.

Details DOI

EAAI Journal 2025 Journal Article

An efficient multi-task forest fire and smoke detection model

Cong Chen
Yunfei Liu
Chenyu Zhang
Junhui Li
Xingliang Chen

The complexity and dynamism of forest ecosystems pose significant challenges to early forest fire detection. Multi-task learning improves detection accuracy by employing a shared feature extraction network that enables tasks to interact and complement each other within a unified framework. In light of this, the paper presents a forest fire detection model employing multi-task learning, referred to as EMFFS-DET. Within the backbone network, a feature extraction module, C2f-KAN (Kolmogorov-Arnold Network), is incorporated to efficiently model complex nonlinear mappings with improved representation capacity. The Hard-Swish (H-Swish) function is employed as a substitute for the Sigmoid Linear Unit (SiLU) to improve performance in contexts involving complex or multi-target detection. A median-enhanced spatial and channel attention module (MECS) is introduced within the neck network, integrating median pooling with global average pooling and max pooling to enhance feature extraction in complex scenes. Furthermore, a content-aware feature upsampling module (CARAFE) is integrated to enhance the receptive field and optimize computational efficiency. In the head section, a segmentation head is added to assist detection by leveraging pixel-level semantics to refine object localization and boundary precision. The proposed model enhances the detection of small targets by introducing the Normalized Wasserstein Distance (NWD), which utilizes probability distributions to achieve precise localization. In comparison to You Only Look Once version 8 (YOLOv8), this approach enhances the Mean Average Precision (mAP) by 5. 4 % and the Average Precision for Small objects (APS) by 7. 8 %, thereby offering AI solutions for the early detection of forest fires.

Details DOI

NeurIPS Conference 2025 Conference Paper

Latency NMS Attacks: Is It Real Life or Is It Just Fantasy?

Jean-Philippe Monteuuis
Cong Chen
Jonathan Petit

``Caught in a landslide, no escape from reality" summarizes the state of the research in AI offense: an attack might work on paper but does not necessarily in practice. In the last 5 years, we have seen the rise of latency attacks against computer vision systems. Most of them targeted 2D object detection, especially its Non-Max-Suppression (NMS) block, via adversarial images. However, we uncovered that, when tested in realistic deployment settings, the NMS latency attacks, accepted to top conferences, have very limited negative effects. In this paper, we define an evaluation framework (EVADE) to assess the practicality of attacks, and apply it to state-of-the-art NMS latency attacks. Attacks were tested on different hardware platforms, and different model formats and quantization. Results show that these attacks are not able to generate the claimed latency increase, nor transfer to other models (from the same family or not). Moreover, the latency increases remain within the latency requirements of downstream tasks in our evaluation, suggesting limited practical impact under these conditions. We also tested three defenses, which were successful in mitigating the NMS latency attacks. Therefore, in their current form, NMS latency attacks are just fantasy.

PDF Details

ICLR Conference 2025 Conference Paper

PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training

Cong Chen
Mingyu Liu
Chenchen Jing
Yizhou Zhou
Fengyun Rao
Hao Chen 0041
Bo Zhang 0046
Chunhua Shen

This paper aims to address the challenge of hallucinations in Multimodal Large Language Models (MLLMs) particularly for dense image captioning tasks. To tackle the challenge, we identify the current lack of a metric that finely measures the caption quality in concept level. We hereby introduce HalFscore, a novel metric built upon the language graph and is designed to evaluate both the accuracy and completeness of dense captions at a granular level. Additionally, we identify the root cause of hallucination as the model's over-reliance on its language prior. To address this, we propose PerturboLLaVA, which reduces the model's reliance on the language prior by incorporating adversarially perturbed text during training. This method enhances the model's focus on visual inputs, effectively reducing hallucinations and producing accurate, image-grounded descriptions without incurring additional computational overhead. PerturboLLaVA significantly improves the fidelity of generated captions, outperforming existing approaches in handling multimodal hallucinations and achieving improved performance across general multimodal benchmarks.

Details

EAAI Journal 2023 Journal Article

A K-Net-based hybrid semantic segmentation method for extracting lake water bodies

Cong Chen
Yuzhu Wang
Shuang Yang
Xiaohui Ji
Gongwen Wang

Lakes have a crucial impact on natural disaster prevention, resource recycling, maintenance of agricultural production and daily life. The traditional way of acquiring lake water body information lacks efficiency, is dangerous, and is not suitable for lake water body information acquisition and real-time monitoring. For this reason, the automated lake water body extraction method based on deep learning semantic segmentation model is gradually becoming a mainstream method. However, most of the semantic segmentation models used for lake extraction today express features through static semantics, while ignoring the extraction relationships of different convolutional kernels for these features. In order to better extract lake water bodies from remote sensing images, this paper proposes a hybrid semantic segmentation method based on K-Net, which achieves high accuracy extraction of lake water bodies by introducing dynamic semantic kernels to iteratively refine the feature information. The superiority of the K-Net-based hybrid model on a Google remote sensing image dataset of lakes is validated. The experimental results show that (1) the hybrid model is able to achieve accurate extraction of lake water bodies, with the UperNet + K-Net model using Swin-l performing the best among all six evaluation metrics, with mean intersection over union (mIoU) reaching 97. 77%; and that (2) after incorporating the K-Net module, all tested models obtain a larger mIoU than before.

Details DOI

JBHI Journal 2023 Journal Article

CUSS-Net: A Cascaded Unsupervised-Based Strategy and Supervised Network for Biomedical Image Diagnosis and Segmentation

Xiaogen Zhou
Zhiqiang Li
Yuyang Xue
Shun Chen
Meijuan Zheng
Cong Chen
Yue Yu
Xingqing Nie

Biomedical image segmentation and classification are critical components in a computer-aided diagnosis system. However, various deep convolutional neural networks are trained by a single task, ignoring the potential contribution of mutually performing multiple tasks. In this paper, we propose a cascaded unsupervised-based strategy to boost the supervised CNN framework for automated white blood cell (WBC) and skin lesion segmentation and classification, called CUSS-Net. Our proposed CUSS-Net consists of an unsupervised-based strategy (US) module, an enhanced segmentation network named E-SegNet, and a mask-guided classification network called MG-ClsNet. On the one hand, the proposed US module produces coarse masks that provide a prior localization map for the proposed E-SegNet to enhance it in locating and segmenting a target object accurately. On the other hand, the enhanced coarse masks predicted by the proposed E-SegNet are then fed into the proposed MG-ClsNet for accurate classification. Moreover, a novel cascaded dense inception module is presented to capture more high-level information. Meanwhile, we adopt a hybrid loss by combining a dice loss and a cross-entropy loss to alleviate the imbalance training problem. We evaluate our proposed CUSS-Net on three public medical image datasets. Experiments show that our proposed CUSS-Net outperforms representative state-of-the-art approaches.

Details DOI

AAAI Conference 2019 Conference Paper

Learning Diverse Bayesian Networks

Cong Chen
Changhe Yuan

Much effort has been directed at developing algorithms for learning optimal Bayesian network structures from data. When given limited or noisy data, however, the optimal Bayesian network often fails to capture the true underlying network structure. One can potentially address the problem by finding multiple most likely Bayesian networks (K-Best) in the hope that one of them recovers the true model. However, it is often the case that some of the best models come from the same peak(s) and are very similar to each other; so they tend to fail together. Moreover, many of these models are not even optimal respective to any causal ordering, thus unlikely to be useful. This paper proposes a novel method for finding a set of diverse top Bayesian networks, called modes, such that each network is guaranteed to be optimal in a local neighborhood. Such mode networks are expected to provide a much better coverage of the true model. Based on a globallocal theorem showing that a mode Bayesian network must be optimal in all local scopes, we introduce an A* search algorithm to efficiently find top M Bayesian networks which are highly probable and naturally diverse. Empirical evaluations show that our top mode models have much better diversity as well as accuracy in discovering true underlying models than those found by K-Best.

PDF Details

IJCAI Conference 2016 Conference Paper

Solving M-Modes Using Heuristic Search

Cong Chen
Changhe Yuan
Chao Chen

M-Modes for graphical models is the problem of finding top M label configurations of highest probability in their local neighborhoods. The state-of-the-art method for solving M-Modes is a dynamic programming algorithm which computes global modes by first computing local modes of each subgraph and then search through all their consistent combinations. A drawback of the algorithm is that most of its time is wasted on computing local modes that are never used in global modes. This paper introduces new algorithms that directly search the space of consistent local modes in finding the global modes, which is enabled by a novel search operator designed to search a subgraph of variables at each time. As a result, the search algorithms only need to generate and verify a small number of local modes and can hence lead to significant improvement in efficiency and scalability.

PDF Details

AAMAS Conference 2003 Conference Paper

Formal Semantics and Communication Strategies for Proactive Information Delivery Among Team-based Agents

John Yen
Xiaocong Fan
Shuang Sun
Cong Chen
et al.