Author name cluster

Shenghao Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers

2 author rows

AAAI Conference 2026 Conference Paper

Amplifying Discrepancies: Exploiting Macro and Micro Inconsistencies for Image Manipulation Localization

Shenghao Chen
Yibo Zhao
Tianyi Wang
Chunjie Ma
Weili Guan
Ming Li
Zan Gao

The rapid development of image manipulation technologies poses significant challenges to multimedia forensics, especially in accurate localization of manipulated regions. Existing methods often fail to fully explore the intrinsic discrepancies between manipulated and authentic regions, resulting in sub-optimal performance. To address this limitation, we propose the Focus Region Discrepancy Network (FRD-Net), a novel and efficient framework that significantly enhances manipulation localization by amplifying discrepancies at both macro- and micro-levels. Specifically, our proposed Iterative Clustering Module (ICM) groups features into two discriminative clusters and refines representations via backward propagation from cluster centers, improving the distinction between tampered and authentic regions at the macro level. Thereafter, our Differential Progressive Module (DPM) is constructed to capture fine-grained structural inconsistencies within local neighborhoods and integrate them into a Central Difference Convolution, increasing sensitivity to subtle manipulation details at the micro level. Finally, these complementary modules are seamlessly integrated into a compact architecture that achieves a favorable balance between accuracy and efficiency. Extensive experiments on multiple benchmarks demonstrate that FRD-Net consistently surpasses state-of-the-art methods in terms of manipulation localization performance while maintaining a lower computational cost.

PDF Details DOI

ICRA Conference 2021 Conference Paper

Two-stream 2D/3D Residual Networks for Learning Robot Manipulations from Human Demonstration Videos

Xin Xu
Kun Qian 0005
Bo Zhou 0017
Shenghao Chen
Yitong Li

Learning manipulation skills from observing human demonstration videos is a promising aspect for intelligent robotic systems. Recent advances in video to command provide an end-to-end approach to translate a video into robot plans. However, the general video captioning methods focus more on the understanding of the full frame, while they lack the consideration of the spatio-temporal features in videos. In this paper, we proposed the two-stream 2D/3D residual networks for robots to learn manipulation tasks from human demonstration videos. We integrate spatial features with 2D residual network and temporal features with 3D residual network as inputs for RNN layers. An encoder-decoder architecture is then used to encode the spatio-temporal features and sequentially generate the command words. Experimental results on an extended manipulation dataset show that our approach outperforms the state-of-the-art methods. Real-world experiments results on a Baxter robotic arm indicate that our method could produce more accurate commands from video demonstrations.

Details