Arrow Research search

Author name cluster

Cong Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
2 author rows

Possible papers

9

EAAI Journal 2026 Journal Article

Multi-view simulation for robust polyp segmentation via cross-gated decoding and soft-attention fusion

  • Linbo Wang
  • Cong Chen
  • Jinxian Qiu
  • Zhengyi Liu
  • Xianyong Fang
  • Shaohua Wan

Polyp segmentation is crucial for early colorectal cancer detection but remains challenging due to significant shape variations and ambiguous boundaries. Existing methods often rely on single-view analysis, overlooking the potential of multi-view representations to provide complementary segmentation cues. To address this, we propose a novel multi-view simulation-based polyp segmentation network (MVSNet) that generates diverse views of an input image through directional flipping and extracts robust features using a shared Pyramid Vision Transformer (PVT). Additional two tactics are proposed to effectively utilize the rich features from each view and the whole view set respectively: (1) A cross gating based view-aware multi-stage decoding method, which applies the element-wise cross gating to both the coarse and fine features in each stage and thus boosts multi-stage decoded features with high discrimination specific to each individual view for its initial segmentation mask; and (2) a soft-attention based cross-View prediction method, which takes soft attention among different views to adaptively weight the contributions from each view for the final prediction. Extensive experiments on five benchmark datasets (Kvasir-SEG, ClinicDB, ColonDB, ETIS, and Endoscene) validate the effectiveness of our approach, achieving the highest mean Dice scores of 0. 923, 0. 946, 0. 825, 0. 821, and 0. 902 respectively, demonstrating consistent superiority over existing state-of-the-art competitors. Code is available at https: //github. com/linbowang/MVSNet.

EAAI Journal 2025 Journal Article

An efficient multi-task forest fire and smoke detection model

  • Cong Chen
  • Yunfei Liu
  • Chenyu Zhang
  • Junhui Li
  • Xingliang Chen

The complexity and dynamism of forest ecosystems pose significant challenges to early forest fire detection. Multi-task learning improves detection accuracy by employing a shared feature extraction network that enables tasks to interact and complement each other within a unified framework. In light of this, the paper presents a forest fire detection model employing multi-task learning, referred to as EMFFS-DET. Within the backbone network, a feature extraction module, C2f-KAN (Kolmogorov-Arnold Network), is incorporated to efficiently model complex nonlinear mappings with improved representation capacity. The Hard-Swish (H-Swish) function is employed as a substitute for the Sigmoid Linear Unit (SiLU) to improve performance in contexts involving complex or multi-target detection. A median-enhanced spatial and channel attention module (MECS) is introduced within the neck network, integrating median pooling with global average pooling and max pooling to enhance feature extraction in complex scenes. Furthermore, a content-aware feature upsampling module (CARAFE) is integrated to enhance the receptive field and optimize computational efficiency. In the head section, a segmentation head is added to assist detection by leveraging pixel-level semantics to refine object localization and boundary precision. The proposed model enhances the detection of small targets by introducing the Normalized Wasserstein Distance (NWD), which utilizes probability distributions to achieve precise localization. In comparison to You Only Look Once version 8 (YOLOv8), this approach enhances the Mean Average Precision (mAP) by 5. 4 % and the Average Precision for Small objects (APS) by 7. 8 %, thereby offering AI solutions for the early detection of forest fires.

NeurIPS Conference 2025 Conference Paper

Latency NMS Attacks: Is It Real Life or Is It Just Fantasy?

  • Jean-Philippe Monteuuis
  • Cong Chen
  • Jonathan Petit

``Caught in a landslide, no escape from reality" summarizes the state of the research in AI offense: an attack might work on paper but does not necessarily in practice. In the last 5 years, we have seen the rise of latency attacks against computer vision systems. Most of them targeted 2D object detection, especially its Non-Max-Suppression (NMS) block, via adversarial images. However, we uncovered that, when tested in realistic deployment settings, the NMS latency attacks, accepted to top conferences, have very limited negative effects. In this paper, we define an evaluation framework (EVADE) to assess the practicality of attacks, and apply it to state-of-the-art NMS latency attacks. Attacks were tested on different hardware platforms, and different model formats and quantization. Results show that these attacks are not able to generate the claimed latency increase, nor transfer to other models (from the same family or not). Moreover, the latency increases remain within the latency requirements of downstream tasks in our evaluation, suggesting limited practical impact under these conditions. We also tested three defenses, which were successful in mitigating the NMS latency attacks. Therefore, in their current form, NMS latency attacks are just fantasy.

ICLR Conference 2025 Conference Paper

PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training

  • Cong Chen
  • Mingyu Liu
  • Chenchen Jing
  • Yizhou Zhou
  • Fengyun Rao
  • Hao Chen 0041
  • Bo Zhang 0046
  • Chunhua Shen

This paper aims to address the challenge of hallucinations in Multimodal Large Language Models (MLLMs) particularly for dense image captioning tasks. To tackle the challenge, we identify the current lack of a metric that finely measures the caption quality in concept level. We hereby introduce HalFscore, a novel metric built upon the language graph and is designed to evaluate both the accuracy and completeness of dense captions at a granular level. Additionally, we identify the root cause of hallucination as the model's over-reliance on its language prior. To address this, we propose PerturboLLaVA, which reduces the model's reliance on the language prior by incorporating adversarially perturbed text during training. This method enhances the model's focus on visual inputs, effectively reducing hallucinations and producing accurate, image-grounded descriptions without incurring additional computational overhead. PerturboLLaVA significantly improves the fidelity of generated captions, outperforming existing approaches in handling multimodal hallucinations and achieving improved performance across general multimodal benchmarks.

EAAI Journal 2023 Journal Article

A K-Net-based hybrid semantic segmentation method for extracting lake water bodies

  • Cong Chen
  • Yuzhu Wang
  • Shuang Yang
  • Xiaohui Ji
  • Gongwen Wang

Lakes have a crucial impact on natural disaster prevention, resource recycling, maintenance of agricultural production and daily life. The traditional way of acquiring lake water body information lacks efficiency, is dangerous, and is not suitable for lake water body information acquisition and real-time monitoring. For this reason, the automated lake water body extraction method based on deep learning semantic segmentation model is gradually becoming a mainstream method. However, most of the semantic segmentation models used for lake extraction today express features through static semantics, while ignoring the extraction relationships of different convolutional kernels for these features. In order to better extract lake water bodies from remote sensing images, this paper proposes a hybrid semantic segmentation method based on K-Net, which achieves high accuracy extraction of lake water bodies by introducing dynamic semantic kernels to iteratively refine the feature information. The superiority of the K-Net-based hybrid model on a Google remote sensing image dataset of lakes is validated. The experimental results show that (1) the hybrid model is able to achieve accurate extraction of lake water bodies, with the UperNet + K-Net model using Swin-l performing the best among all six evaluation metrics, with mean intersection over union (mIoU) reaching 97. 77%; and that (2) after incorporating the K-Net module, all tested models obtain a larger mIoU than before.

JBHI Journal 2023 Journal Article

CUSS-Net: A Cascaded Unsupervised-Based Strategy and Supervised Network for Biomedical Image Diagnosis and Segmentation

  • Xiaogen Zhou
  • Zhiqiang Li
  • Yuyang Xue
  • Shun Chen
  • Meijuan Zheng
  • Cong Chen
  • Yue Yu
  • Xingqing Nie

Biomedical image segmentation and classification are critical components in a computer-aided diagnosis system. However, various deep convolutional neural networks are trained by a single task, ignoring the potential contribution of mutually performing multiple tasks. In this paper, we propose a cascaded unsupervised-based strategy to boost the supervised CNN framework for automated white blood cell (WBC) and skin lesion segmentation and classification, called CUSS-Net. Our proposed CUSS-Net consists of an unsupervised-based strategy (US) module, an enhanced segmentation network named E-SegNet, and a mask-guided classification network called MG-ClsNet. On the one hand, the proposed US module produces coarse masks that provide a prior localization map for the proposed E-SegNet to enhance it in locating and segmenting a target object accurately. On the other hand, the enhanced coarse masks predicted by the proposed E-SegNet are then fed into the proposed MG-ClsNet for accurate classification. Moreover, a novel cascaded dense inception module is presented to capture more high-level information. Meanwhile, we adopt a hybrid loss by combining a dice loss and a cross-entropy loss to alleviate the imbalance training problem. We evaluate our proposed CUSS-Net on three public medical image datasets. Experiments show that our proposed CUSS-Net outperforms representative state-of-the-art approaches.

AAAI Conference 2019 Conference Paper

Learning Diverse Bayesian Networks

  • Cong Chen
  • Changhe Yuan

Much effort has been directed at developing algorithms for learning optimal Bayesian network structures from data. When given limited or noisy data, however, the optimal Bayesian network often fails to capture the true underlying network structure. One can potentially address the problem by finding multiple most likely Bayesian networks (K-Best) in the hope that one of them recovers the true model. However, it is often the case that some of the best models come from the same peak(s) and are very similar to each other; so they tend to fail together. Moreover, many of these models are not even optimal respective to any causal ordering, thus unlikely to be useful. This paper proposes a novel method for finding a set of diverse top Bayesian networks, called modes, such that each network is guaranteed to be optimal in a local neighborhood. Such mode networks are expected to provide a much better coverage of the true model. Based on a globallocal theorem showing that a mode Bayesian network must be optimal in all local scopes, we introduce an A* search algorithm to efficiently find top M Bayesian networks which are highly probable and naturally diverse. Empirical evaluations show that our top mode models have much better diversity as well as accuracy in discovering true underlying models than those found by K-Best.

IJCAI Conference 2016 Conference Paper

Solving M-Modes Using Heuristic Search

  • Cong Chen
  • Changhe Yuan
  • Chao Chen

M-Modes for graphical models is the problem of finding top M label configurations of highest probability in their local neighborhoods. The state-of-the-art method for solving M-Modes is a dynamic programming algorithm which computes global modes by first computing local modes of each subgraph and then search through all their consistent combinations. A drawback of the algorithm is that most of its time is wasted on computing local modes that are never used in global modes. This paper introduces new algorithms that directly search the space of consistent local modes in finding the global modes, which is enabled by a novel search operator designed to search a subgraph of variables at each time. As a result, the search algorithms only need to generate and verify a small number of local modes and can hence lead to significant improvement in efficiency and scalability.