Author name cluster

Pan Gao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

1 author row

AAAI Conference 2026 Conference Paper

CloudMamba: Grouped Selective State Spaces for Point Cloud Analysis

Kanglin Qu
Pan Gao
Qun Dai
Zhanzhi Ye
Rui Ye
Yuanhao Sun

Due to the long-range modeling ability and linear complexity property, Mamba has attracted considerable attention in point cloud analysis. Despite some interesting progress, related work still suffers from imperfect point cloud serialization, insufficient high-level geometric perception, and overfitting of the selective state space model (S6) at the core of Mamba. To this end, we resort to an SSM-based point cloud network termed CloudMamba to address the above challenges. Specifically, we propose sequence expanding and sequence merging, where the former serializes points along each axis separately and the latter serves to fuse the corresponding higher-order features causally inferred from different sequences, enabling unordered point sets to adapt more stably to the causal nature of Mamba without parameters. Meanwhile, we design chainedMamba that chains the forward and backward processes in the parallel bidirectional Mamba, capturing high-level geometric information during scanning. In addition, we propose a grouped selective state space model (GS6) via parameter sharing on S6, alleviating the overfitting problem caused by the computational mode in S6. Experiments on various point cloud tasks validate CloudMamba's ability to achieve state-of-the-art results with significantly less complexity.

PDF Details DOI

EAAI Journal 2026 Journal Article

Physics-guided adaptive confidence network for real-time underwater image restoration

Pan Gao
Qiang Qu
Dan Xiang
Jing Ling
Naiyao Liang

Underwater image restoration remains a persistent challenge due to the spatially heterogeneous nature of light attenuation and scattering. While physics-based methods offer interpretability, they rely on rigid assumptions that often fail in complex turbid regions. Conversely, deep learning approaches offer flexibility but lack the structural constraints necessary for consistent generalization. To resolve this conflict, we propose the Physics-Guided Attention Confidence Network (PGAC-Net), a lightweight framework that unifies physical modeling with data-driven refinement through a novel reliability-aware fusion mechanism. An efficient shared encoder extracts multiple scales features to estimate transmission maps, spatially varying background light, and a pixel-wise confidence map. This confidence map dynamically arbitrates between a physics-based inversion branch and a residual refinement branch, which is specifically designed to correct color casts and restore fine texture details lost in the physical model. Extensive experiments on benchmark datasets demonstrate state-of-the-art performance, achieving a 0. 953 structural similarity index (SSIM) and 26. 686 peak signal-to-noise ratio (PSNR) on the Underwater Image Enhancement Benchmark (UIEB) while maintaining high efficiency with 0. 34 million (M) parameters and 10 ms inference time. The method also exhibits improved color fidelity and structural consistency. The proposed framework is well suited for real-time deployment on resource-constrained autonomous underwater vehicles. Code: https: //github. com/pan-gao0904/PGAC-Net.

Details DOI

AAAI Conference 2026 Conference Paper

Simba: Towards High-Fidelity and Geometrically-Consistent Point Cloud Completion via Transformation Diffusion

Lirui Zhang
Zhengkai Zhao
Zhi Zuo
Pan Gao
Jie Qin

Point cloud completion is a fundamental task in 3D vision. A persistent challenge in this field is simultaneously preserving fine-grained details present in the input while ensuring the global structural integrity of the completed shape. While recent works leveraging local symmetry transformations via direct regression have significantly improved the preservation of geometric structure details, these methods suffer from two major limitations: (1) These regression-based methods are prone to overfitting which tend to memorize instant-specific transformations instead of learning a generalizable geometric prior. (2) Their reliance on point-wise transformation regression lead to high sensitivity to input noise, severely degrading their robustness and generalization. To address these challenges, we introduce Simba, a novel framework that reformulates point-wise transformation regression as a distribution learning problem. Our approach integrates symmetry priors with the powerful generative capabilities of diffusion models, avoiding instance-specific memorization while capturing robust geometric structures. Additionally, we introduce a hierarchical Mamba-based architecture to achieve high-fidelity upsampling. Extensive experiments across the PCN, ShapeNet, and KITTI benchmarks validate our method's state-of-the-art (SOTA) performance.

PDF Details DOI

YNIMG Journal 2025 Journal Article

RegBoost: Enhancing mouse brain image registration using geometric priors and Laplacian interpolation

Atchuth Naveen Chilaparasetti
Andy Thai
Pan Gao
Xiangmin Xu
M. Gopi

We show in this work that incorporating geometric features and geometry processing algorithms for mouse brain image registration broadens the applicability of registration algorithms and improves the registration accuracy of existing methods. We introduce the preprocessing and postprocessing steps in our proposed framework as RegBoost. We develop a method to align the axis of 3D image stacks by detecting the central planes that pass symmetrically through the image volumes. We then find geometric contours by defining external and internal structures to facilitate image correspondences. We establish Dirichlet boundary conditions at these correspondences and find the displacement map throughout the volume using Laplacian interpolation. We discuss the challenges in our standalone framework and demonstrate how our new approaches can improve the results of existing image registration methods. We expect our new approach and algorithms will have critical applications in brain mapping projects.

Details DOI

NeurIPS Conference 2024 Conference Paper

Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function

Chenyi Zhuang
Ying Hu
Pan Gao

Text-to-image diffusion models particularly Stable Diffusion, have revolutionized the field of computer vision. However, the synthesis quality often deteriorates when asked to generate images that faithfully represent complex prompts involving multiple attributes and objects. While previous studies suggest that blended text embeddings lead to improper attribute binding, few have explored this in depth. In this work, we critically examine the limitations of the CLIP text encoder in understanding attributes and investigate how this affects diffusion models. We discern a phenomenon of attribute bias in the text space and highlight a contextual issue in padding embeddings that entangle different concepts. We propose Magnet, a novel training-free approach to tackle the attribute binding problem. We introduce positive and negative binding vectors to enhance disentanglement, further with a neighbor strategy to increase accuracy. Extensive experiments show that Magnet significantly improves synthesis quality and binding accuracy with negligible computational cost, enabling the generation of unconventional and unnatural concepts.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

Kang You
Kai Liu
Li Yu
Pan Gao
Dandan Ding

Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance and extremely low-decoding-latency simultaneously. Inspired by conventional Trisoup codec, a point model-based strategy is devised to characterize local surfaces. Specifically, skin features are embedded from local windows via an attention-based encoder, and dilated windows are introduced as cross-scale priors to infer the distribution of quantized features in parallel. During decoding, features undergo fast refinement, followed by a folding-based point generator that reconstructs point coordinates with fairly fast speed. Experiments show that Pointsoup achieves state-of-the-art performance on multiple benchmarks with significantly lower decoding complexity, i. e. , up to 90~160× faster than the G-PCCv23 Trisoup decoder on a comparatively low-end platform (e. g. , one RTX 2080Ti). Furthermore, it offers variable-rate control with a single neural model (2. 9MB), which is attractive for industrial practitioners.

PDF Details DOI

EAAI Journal 2024 Journal Article

Robust minimum cost consensus models with uncertain asymmetric costs based on linear uncertain-constrained tolerance level

Zhongming Wu
Pan Gao
Yiran Wang
Xiaoxia Xu
Neng Wan
Francisco Javier Cabrerizo

Details DOI

AAAI Conference 2024 Conference Paper

Transformer-Based No-Reference Image Quality Assessment via Supervised Contrastive Learning

Jinsong Shi
Pan Gao
Jie Qin

Image Quality Assessment (IQA) has long been a research hotspot in the field of image processing, especially No-Reference Image Quality Assessment (NR-IQA). Due to the powerful feature extraction ability, existing Convolution Neural Network (CNN) and Transformers based NR-IQA methods have achieved considerable progress. However, they still exhibit limited capability when facing unknown authentic distortion datasets. To further improve NR-IQA performance, in this paper, a novel supervised contrastive learning (SCL) and Transformer-based NR-IQA model SaTQA is proposed. We first train a model on a large-scale synthetic dataset by SCL (no image subjective score is required) to extract degradation features of images with various distortion types and levels. To further extract distortion information from images, we propose a backbone network incorporating the Multi-Stream Block (MSB) by combining the CNN inductive bias and Transformer long-term dependence modeling capability. Finally, we propose the Patch Attention Block (PAB) to obtain the final distorted image quality score by fusing the degradation features learned from contrastive learning with the perceptual distortion information extracted by the backbone network. Experimental results on six standard IQA datasets show that SaTQA outperforms the state-of-the-art methods for both synthetic and authentic datasets. Code is available at https://github.com/I2-Multimedia-Lab/SaTQA.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Unified Unsupervised Salient Object Detection via Knowledge Transfer

Yao Yuan
Wutao Liu
Pan Gao
Qun Dai
Jie Qin

Recently, unsupervised salient object detection (USOD) has gained increasing attention due to its annotation-free nature. However, current methods mainly focus on specific tasks such as RGB and RGB-D, neglecting the potential for task migration. In this paper, we propose a unified USOD framework for generic USOD tasks. Firstly, we propose a Progressive Curriculum Learning-based Saliency Distilling (PCL-SD) mechanism to extract saliency cues from a pre-trained deep network. This mechanism starts with easy samples and progressively moves towards harder ones, to avoid initial interference caused by hard samples. Afterwards, the obtained saliency cues are utilized to train a saliency detector, and we employ a Self-rectify Pseudo-label Refinement (SPR) mechanism to improve the quality of pseudo-labels. Finally, an adapter-tuning method is devised to transfer the acquired saliency knowledge, leveraging shared knowledge to attain superior transferring performance on the target tasks. Extensive experiments on five representative SOD tasks confirm the effectiveness and feasibility of our proposed method. Code and supplement materials are available at https: //github. com/I2-Multimedia-Lab/A2S-v3.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Video Frame Interpolation Based on Deformable Kernel Region

Haoyue Tian
Pan Gao
Xiaojiang Peng

Video frame interpolation task has recently become more and more prevalent in the computer vision field. At present, a number of researches based on deep learning have achieved great success. Most of them are either based on optical flow information, or interpolation kernel, or a combination of these two methods. However, these methods have ignored that there are grid restrictions on the position of kernel region during synthesizing each target pixel. These limitations result in that they cannot well adapt to the irregularity of object shape and uncertainty of motion, which may lead to irrelevant reference pixels used for interpolation. In order to solve this problem, we revisit the deformable convolution for video interpolation, which can break the fixed grid restrictions on the kernel region, making the distribution of reference points more suitable for the shape of the object, and thus warp a more accurate interpolation frame. Experiments are conducted on four datasets to demonstrate the superior performance of the proposed model in comparison to the state-of-the-art alternatives.

PDF Details DOI