Author name cluster

Gan Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

2 author rows

AAAI Conference 2026 Conference Paper

Towards Efficient and Effective Interactive 3D Segmentation

Wei Cong
Yang Cong
Jiahua Dong
Gan Sun

Interactive 3D segmentation embodies an advanced human-in-the-loop paradigm, where a model iteratively refines the segmentation of interested objects within a 3D point cloud through user feedback. Existing methods have achieved notable advancements at the expense of substantial resource consumption. To address this challenge, we introduce E2I3D, an efficient and effective model for interactive 3D segmentation. Specifically, we propose a two-stage efficiency-to-effectiveness framework to decouple efficiency and effectiveness, avoiding the high training cost of joint optimization. For efficiency in the first stage, we present heterogeneous pruning, which reliably compresses the model by ranking and pruning the constructed heterogeneous groups separately based on gradient compensation. For effectiveness in the second stage, we design hierarchical click-aware attention that integrates geometric details from high-resolution features with global context from low-resolution features to enhance click-guided interaction. Extensive experiments across public datasets demonstrate that E2I3D exceeds state-of-the-art methods in both efficiency and effectiveness. For instance, on the KITTI-360 dataset, E2I3D boosts the IoU for interactive single-object segmentation from 44.4% to 49.0% with 5 user clicks, while simultaneously reducing parameters from 39.3M to 5.7M.

PDF Details DOI

AAAI Conference 2025 Conference Paper

GLAM: Global-Local Variation Awareness in Mamba-based World Model

Qian He
Wenqi Liang
Chunhui Hao
Gan Sun
Jiandong Tian

Mimicking the real interaction trajectory in the inference of the world model has been shown to improve the sample efficiency of model-based reinforcement learning (MBRL) algorithms. Many methods directly use known state sequences for reasoning. However, this approach fails to enhance the quality of reasoning by capturing the subtle variation between states. Much like how humans infer trends in event development from this variation, in this work, we introduce Global-Local variation Awareness Mamba-based world model (GLAM) that improves reasoning quality by perceiving and predicting variation between states. GLAM comprises two Mamba-based parallel reasoning modules, GMamba and LMamba, which focus on perceiving variation from global and local perspectives, respectively, during the reasoning process. GMamba focuses on identifying patterns of variation between states in the input sequence and leverages these patterns to enhance the prediction of future state variation. LMamba emphasizes reasoning about unknown information, such as rewards, termination signals, and visual representations, by perceiving variation in adjacent states. By integrating the strengths of the two modules, GLAM accounts for higher-value variation in environmental changes, providing the agent with more efficient imagination-based training. We demonstrate that our method outperforms existing methods in normalized human scores on the Atari 100k benchmark.

PDF Details DOI

IROS Conference 2024 Conference Paper

GroupTrack: Multi-Object Tracking by Using Group Motion Patterns

Xinglong Xu
Weihong Ren
Gan Sun
Haoyu Ji 0001
Yu Gao 0010
Honghai Liu 0001

The main challenge of Multi-Object Tracking (MOT) lies in maintaining a distinctive identity for each target in dense crowds or occluded scenarios. Although the existing methods have achieved significantly progress by using robust object detectors or complex association strategies, they cannot effectively solve long-term tracking due to individually motion or appearance modeling for each single target. In this paper, we propose a novel 2D MOT tracker GroupTrack, to learn reliable motion state for each target using group motion patterns. Specifically, for each tracklet, we first choose its neighboring ones to form a group of motion patterns, which can provide informative clues for the motion estimation of the current tracklet. Then, we apply the group motion patterns to perform tracklet prediction and data association. By integrating prior from neighboring motion patterns into the data association process, GroupTrack provides a new paradigm for target motion modeling in extremely crowded and occluded scenarios. Through extensive experiments on the public MOT17 and MOT20 datasets, we demonstrate the effectiveness of our approach in challenging scenarios and show state-of-the-art performance at various MOT metrics.

Details

NeurIPS Conference 2024 Conference Paper

Novel Object Synthesis via Adaptive Text-Image Harmony

Zeren Xiong
Zedong Zhang
Zikun Chen
Shuo Chen
Xiang Li
Gan Sun
Jian Yang
Jun Li

In this paper, we study an object synthesis task that combines an object text with an object image to create a new object image. However, most diffusion models struggle with this task, \textit{i. e. }, often generating an object that predominantly reflects either the text or the image due to an imbalance between their inputs. To address this issue, we propose a simple yet effective method called Adaptive Text-Image Harmony (ATIH) to generate novel and surprising objects. First, we introduce a scale factor and an injection step to balance text and image features in cross-attention and to preserve image information in self-attention during the text-image inversion diffusion process, respectively. Second, to better integrate object text and image, we design a balanced loss function with a noise parameter, ensuring both optimal editability and fidelity of the object image. Third, to adaptively adjust these parameters, we present a novel similarity score function that not only maximizes the similarities between the generated object image and the input text/image but also balances these similarities to harmonize text and image integration. Extensive experiments demonstrate the effectiveness of our approach, showcasing remarkable object creations such as colobus-glass jar. https: //xzr52. github. io/ATIH/

PDF Details DOI

IROS Conference 2023 Conference Paper

I3DOD: Towards Incremental 3D Object Detection via Prompting

Wenqi Liang
Gan Sun
Chenxi Liu
Jiahua Dong 0001
Kangru Wang

3D object detection have achieved significant performance in many fields, e. g. , robotics system, autonomous driving, and augmented reality. However, most existing methods could cause catastrophic forgetting of old classes when performing on the class-incremental scenarios. Meanwhile, the current class-incremental 3D object detection methods neglect the relationships between the object localization information and category semantic information, and assume all the knowledge of old model is reliable. To address the above challenge, we present a novel Incremental 3D Object Detection framework with the guidance of prompting, i. e. , I3DOD. Specifically, we propose a task-shared prompts mechanism to learn the matching relationships between the object localization information and category semantic information. After training on the current task, these prompts will be stored in our prompt pool, and perform the relationship of old classes in the next task. Moreover, we design a reliable distillation strategy to transfer knowledge from two aspects: a reliable dynamic distillation is developed to filter out the negative knowledge and transfer the reliable 3D knowledge to new detection model; the relation feature is proposed to capture the responses relation in feature space and protect plasticity of the model when learning novel 3D classes. To the end, we conduct comprehensive experiments on two benchmark datasets and our method outperforms the state-of-the-art object detection methods by 0. 6% ∼ 2. 7% in terms of mAP@0. 25.

Details

IROS Conference 2022 Conference Paper

Class-Incremental Gesture Recognition Learning with Out-of-Distribution Detection

Mingxue Li
Yang Cong
Yuyang Liu
Gan Sun

Gesture recognition is a popular human-computer interaction technology, which has been widely applied in many fields (e. g. , autonomous driving, medical care, VR and AR). However, 1) most existing gesture recognition methods focus on the fixed recognition scenarios with several gestures, which could lead to memory consumption and computational effort when continuously learning new gestures; 2) Meanwhile, the performance of popular class-incremental methods degrades significantly for previously learned classes (i. e. , catastrophic forgetting) due to the ambiguity and variability of gestures. To tackle these challenges, we propose a novel class-incremental gesture recognition method with out-of-distribution (OOD) detection, which can continuously adapt to new gesture classes and achieve high performance for both learned and new gestures. Specifically, we construct an episodic memory with a subset of learned training samples to preserve the previous knowledge from forgetting. Moreover, the OOD detection-based memory management is developed for exploring the most representative and informative core set from the learned datasets. When a new gesture recognition task with strange classes comes, rehearsal enhancement is adopted to increase the diversity of memory exemplars for better fitting the real characteristics of gesture recognition. After deriving an effective class-incremental gesture recognition strategy, we perform experiments on two representative datasets to validate the superiority of our method. Evaluation experiments demonstrate that our proposed method substantially outperforms the state-of-the-art methods with about 2. 17%-3. 81% improvement under different class-incremental learning scenarios.

Details

NeurIPS Conference 2021 Conference Paper

Confident Anchor-Induced Multi-Source Free Domain Adaptation

Jiahua Dong
Zhen Fang
Anjin Liu
Gan Sun
Tongliang Liu

Unsupervised domain adaptation has attracted appealing academic attentions by transferring knowledge from labeled source domain to unlabeled target domain. However, most existing methods assume the source data are drawn from a single domain, which cannot be successfully applied to explore complementarily transferable knowledge from multiple source domains with large distribution discrepancies. Moreover, they require access to source data during training, which are inefficient and unpractical due to privacy preservation and memory storage. To address these challenges, we develop a novel Confident-Anchor-induced multi-source-free Domain Adaptation (CAiDA) model, which is a pioneer exploration of knowledge adaptation from multiple source domains to the unlabeled target domain without any source data, but with only pre-trained source models. Specifically, a source-specific transferable perception module is proposed to automatically quantify the contributions of the complementary knowledge transferred from multi-source domains to the target domain. To generate pseudo labels for the target domain without access to the source data, we develop a confident-anchor-induced pseudo label generator by constructing a confident anchor group and assigning each unconfident target sample with a semantic-nearest confident anchor. Furthermore, a class-relationship-aware consistency loss is proposed to preserve consistent inter-class relationships by aligning soft confusion matrices across domains. Theoretical analysis answers why multi-source domains are better than a single source domain, and establishes a novel learning bound to show the effectiveness of exploiting multi-source domains. Experiments on several representative datasets illustrate the superiority of our proposed CAiDA model. The code is available at https: //github. com/Learning-group123/CAiDA.

PDF Details

AAAI Conference 2021 Conference Paper

Generative Partial Visual-Tactile Fused Object Clustering

Tao Zhang
Yang Cong
Gan Sun
Jiahua Dong
Yuyang Liu
Zhengming Ding

Visual-tactile fused sensing for object clustering has achieved significant progresses recently, since the involvement of tactile modality can effectively improve clustering performance. However, the missing data (i. e. , partial data) issues always happen due to occlusion and noises during the data collecting process. This issue is not well solved by most existing partial multi-view clustering methods for the heterogeneous modality challenge. Naively employing these methods would inevitably induce a negative effect and further hurt the performance. To solve the mentioned challenges, we propose a Generative Partial Visual-Tactile Fused (i. e. , GPVTF) framework for object clustering. More specifically, we first do partial visual and tactile features extraction from the partial visual and tactile data, respectively, and encode the extracted features in modality-specific feature subspaces. A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality, which can compensate missing samples and align the visual and tactile modalities naturally by adversarial learning. To the end, two pseudo-label based KLdivergence losses are employed to update the corresponding modality-specific encoders. Extensive comparative experiments on three public visual-tactile datasets prove the effectiveness of our method.

PDF Details

AAAI Conference 2021 Conference Paper

I3DOL: Incremental 3D Object Learning without Catastrophic Forgetting

Jiahua Dong
Yang Cong
Gan Sun
Bingtao Ma
Lichen Wang

3D object classification has attracted appealing attentions in academic researches and industrial applications. However, most existing methods need to access the training data of past 3D object classes when facing the common real-world scenario: new classes of 3D objects arrive in a sequence. Moreover, the performance of advanced approaches degrades dramatically for past learned classes (i. e. , catastrophic forgetting), due to the irregular and redundant geometric structures of 3D point cloud data. To address these challenges, we propose a new Incremental 3D Object Learning (i. e. , I3DOL) model, which is the first exploration to learn new classes of 3D object continually. Specifically, an adaptive-geometric centroid module is designed to construct discriminative local geometric structures, which can better characterize the irregular point cloud representation for 3D object. Afterwards, to prevent the catastrophic forgetting brought by redundant geometric information, a geometric-aware attention mechanism is developed to quantify the contributions of local geometric structures, and explore unique 3D geometric characteristics with high contributions for classes incremental learning. Meanwhile, a score fairness compensation strategy is proposed to further alleviate the catastrophic forgetting caused by unbalanced data between past and new classes of 3D object, by compensating biased prediction for new classes in the validation phase. Experiments on 3D representative datasets validate the superiority of our I3DOL framework.

PDF Details