Arrow Research search

Author name cluster

Gan Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

AAAI Conference 2026 Conference Paper

Towards Efficient and Effective Interactive 3D Segmentation

  • Wei Cong
  • Yang Cong
  • Jiahua Dong
  • Gan Sun

Interactive 3D segmentation embodies an advanced human-in-the-loop paradigm, where a model iteratively refines the segmentation of interested objects within a 3D point cloud through user feedback. Existing methods have achieved notable advancements at the expense of substantial resource consumption. To address this challenge, we introduce E2I3D, an efficient and effective model for interactive 3D segmentation. Specifically, we propose a two-stage efficiency-to-effectiveness framework to decouple efficiency and effectiveness, avoiding the high training cost of joint optimization. For efficiency in the first stage, we present heterogeneous pruning, which reliably compresses the model by ranking and pruning the constructed heterogeneous groups separately based on gradient compensation. For effectiveness in the second stage, we design hierarchical click-aware attention that integrates geometric details from high-resolution features with global context from low-resolution features to enhance click-guided interaction. Extensive experiments across public datasets demonstrate that E2I3D exceeds state-of-the-art methods in both efficiency and effectiveness. For instance, on the KITTI-360 dataset, E2I3D boosts the IoU for interactive single-object segmentation from 44.4% to 49.0% with 5 user clicks, while simultaneously reducing parameters from 39.3M to 5.7M.

AAAI Conference 2025 Conference Paper

GLAM: Global-Local Variation Awareness in Mamba-based World Model

  • Qian He
  • Wenqi Liang
  • Chunhui Hao
  • Gan Sun
  • Jiandong Tian

Mimicking the real interaction trajectory in the inference of the world model has been shown to improve the sample efficiency of model-based reinforcement learning (MBRL) algorithms. Many methods directly use known state sequences for reasoning. However, this approach fails to enhance the quality of reasoning by capturing the subtle variation between states. Much like how humans infer trends in event development from this variation, in this work, we introduce Global-Local variation Awareness Mamba-based world model (GLAM) that improves reasoning quality by perceiving and predicting variation between states. GLAM comprises two Mamba-based parallel reasoning modules, GMamba and LMamba, which focus on perceiving variation from global and local perspectives, respectively, during the reasoning process. GMamba focuses on identifying patterns of variation between states in the input sequence and leverages these patterns to enhance the prediction of future state variation. LMamba emphasizes reasoning about unknown information, such as rewards, termination signals, and visual representations, by perceiving variation in adjacent states. By integrating the strengths of the two modules, GLAM accounts for higher-value variation in environmental changes, providing the agent with more efficient imagination-based training. We demonstrate that our method outperforms existing methods in normalized human scores on the Atari 100k benchmark.

IROS Conference 2024 Conference Paper

GroupTrack: Multi-Object Tracking by Using Group Motion Patterns

  • Xinglong Xu
  • Weihong Ren
  • Gan Sun
  • Haoyu Ji 0001
  • Yu Gao 0010
  • Honghai Liu 0001

The main challenge of Multi-Object Tracking (MOT) lies in maintaining a distinctive identity for each target in dense crowds or occluded scenarios. Although the existing methods have achieved significantly progress by using robust object detectors or complex association strategies, they cannot effectively solve long-term tracking due to individually motion or appearance modeling for each single target. In this paper, we propose a novel 2D MOT tracker GroupTrack, to learn reliable motion state for each target using group motion patterns. Specifically, for each tracklet, we first choose its neighboring ones to form a group of motion patterns, which can provide informative clues for the motion estimation of the current tracklet. Then, we apply the group motion patterns to perform tracklet prediction and data association. By integrating prior from neighboring motion patterns into the data association process, GroupTrack provides a new paradigm for target motion modeling in extremely crowded and occluded scenarios. Through extensive experiments on the public MOT17 and MOT20 datasets, we demonstrate the effectiveness of our approach in challenging scenarios and show state-of-the-art performance at various MOT metrics.

NeurIPS Conference 2024 Conference Paper

Novel Object Synthesis via Adaptive Text-Image Harmony

  • Zeren Xiong
  • Zedong Zhang
  • Zikun Chen
  • Shuo Chen
  • Xiang Li
  • Gan Sun
  • Jian Yang
  • Jun Li

In this paper, we study an object synthesis task that combines an object text with an object image to create a new object image. However, most diffusion models struggle with this task, \textit{i. e. }, often generating an object that predominantly reflects either the text or the image due to an imbalance between their inputs. To address this issue, we propose a simple yet effective method called Adaptive Text-Image Harmony (ATIH) to generate novel and surprising objects. First, we introduce a scale factor and an injection step to balance text and image features in cross-attention and to preserve image information in self-attention during the text-image inversion diffusion process, respectively. Second, to better integrate object text and image, we design a balanced loss function with a noise parameter, ensuring both optimal editability and fidelity of the object image. Third, to adaptively adjust these parameters, we present a novel similarity score function that not only maximizes the similarities between the generated object image and the input text/image but also balances these similarities to harmonize text and image integration. Extensive experiments demonstrate the effectiveness of our approach, showcasing remarkable object creations such as colobus-glass jar. https: //xzr52. github. io/ATIH/

IROS Conference 2023 Conference Paper

I3DOD: Towards Incremental 3D Object Detection via Prompting

  • Wenqi Liang
  • Gan Sun
  • Chenxi Liu
  • Jiahua Dong 0001
  • Kangru Wang

3D object detection have achieved significant performance in many fields, e. g. , robotics system, autonomous driving, and augmented reality. However, most existing methods could cause catastrophic forgetting of old classes when performing on the class-incremental scenarios. Meanwhile, the current class-incremental 3D object detection methods neglect the relationships between the object localization information and category semantic information, and assume all the knowledge of old model is reliable. To address the above challenge, we present a novel Incremental 3D Object Detection framework with the guidance of prompting, i. e. , I3DOD. Specifically, we propose a task-shared prompts mechanism to learn the matching relationships between the object localization information and category semantic information. After training on the current task, these prompts will be stored in our prompt pool, and perform the relationship of old classes in the next task. Moreover, we design a reliable distillation strategy to transfer knowledge from two aspects: a reliable dynamic distillation is developed to filter out the negative knowledge and transfer the reliable 3D knowledge to new detection model; the relation feature is proposed to capture the responses relation in feature space and protect plasticity of the model when learning novel 3D classes. To the end, we conduct comprehensive experiments on two benchmark datasets and our method outperforms the state-of-the-art object detection methods by 0. 6% ∼ 2. 7% in terms of mAP@0. 25.

IROS Conference 2022 Conference Paper

Class-Incremental Gesture Recognition Learning with Out-of-Distribution Detection

  • Mingxue Li
  • Yang Cong
  • Yuyang Liu
  • Gan Sun

Gesture recognition is a popular human-computer interaction technology, which has been widely applied in many fields (e. g. , autonomous driving, medical care, VR and AR). However, 1) most existing gesture recognition methods focus on the fixed recognition scenarios with several gestures, which could lead to memory consumption and computational effort when continuously learning new gestures; 2) Meanwhile, the performance of popular class-incremental methods degrades significantly for previously learned classes (i. e. , catastrophic forgetting) due to the ambiguity and variability of gestures. To tackle these challenges, we propose a novel class-incremental gesture recognition method with out-of-distribution (OOD) detection, which can continuously adapt to new gesture classes and achieve high performance for both learned and new gestures. Specifically, we construct an episodic memory with a subset of learned training samples to preserve the previous knowledge from forgetting. Moreover, the OOD detection-based memory management is developed for exploring the most representative and informative core set from the learned datasets. When a new gesture recognition task with strange classes comes, rehearsal enhancement is adopted to increase the diversity of memory exemplars for better fitting the real characteristics of gesture recognition. After deriving an effective class-incremental gesture recognition strategy, we perform experiments on two representative datasets to validate the superiority of our method. Evaluation experiments demonstrate that our proposed method substantially outperforms the state-of-the-art methods with about 2. 17%-3. 81% improvement under different class-incremental learning scenarios.

NeurIPS Conference 2021 Conference Paper

Confident Anchor-Induced Multi-Source Free Domain Adaptation

  • Jiahua Dong
  • Zhen Fang
  • Anjin Liu
  • Gan Sun
  • Tongliang Liu

Unsupervised domain adaptation has attracted appealing academic attentions by transferring knowledge from labeled source domain to unlabeled target domain. However, most existing methods assume the source data are drawn from a single domain, which cannot be successfully applied to explore complementarily transferable knowledge from multiple source domains with large distribution discrepancies. Moreover, they require access to source data during training, which are inefficient and unpractical due to privacy preservation and memory storage. To address these challenges, we develop a novel Confident-Anchor-induced multi-source-free Domain Adaptation (CAiDA) model, which is a pioneer exploration of knowledge adaptation from multiple source domains to the unlabeled target domain without any source data, but with only pre-trained source models. Specifically, a source-specific transferable perception module is proposed to automatically quantify the contributions of the complementary knowledge transferred from multi-source domains to the target domain. To generate pseudo labels for the target domain without access to the source data, we develop a confident-anchor-induced pseudo label generator by constructing a confident anchor group and assigning each unconfident target sample with a semantic-nearest confident anchor. Furthermore, a class-relationship-aware consistency loss is proposed to preserve consistent inter-class relationships by aligning soft confusion matrices across domains. Theoretical analysis answers why multi-source domains are better than a single source domain, and establishes a novel learning bound to show the effectiveness of exploiting multi-source domains. Experiments on several representative datasets illustrate the superiority of our proposed CAiDA model. The code is available at https: //github. com/Learning-group123/CAiDA.

AAAI Conference 2021 Conference Paper

Generative Partial Visual-Tactile Fused Object Clustering

  • Tao Zhang
  • Yang Cong
  • Gan Sun
  • Jiahua Dong
  • Yuyang Liu
  • Zhengming Ding

Visual-tactile fused sensing for object clustering has achieved significant progresses recently, since the involvement of tactile modality can effectively improve clustering performance. However, the missing data (i. e. , partial data) issues always happen due to occlusion and noises during the data collecting process. This issue is not well solved by most existing partial multi-view clustering methods for the heterogeneous modality challenge. Naively employing these methods would inevitably induce a negative effect and further hurt the performance. To solve the mentioned challenges, we propose a Generative Partial Visual-Tactile Fused (i. e. , GPVTF) framework for object clustering. More specifically, we first do partial visual and tactile features extraction from the partial visual and tactile data, respectively, and encode the extracted features in modality-specific feature subspaces. A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality, which can compensate missing samples and align the visual and tactile modalities naturally by adversarial learning. To the end, two pseudo-label based KLdivergence losses are employed to update the corresponding modality-specific encoders. Extensive comparative experiments on three public visual-tactile datasets prove the effectiveness of our method.

AAAI Conference 2021 Conference Paper

I3DOL: Incremental 3D Object Learning without Catastrophic Forgetting

  • Jiahua Dong
  • Yang Cong
  • Gan Sun
  • Bingtao Ma
  • Lichen Wang

3D object classification has attracted appealing attentions in academic researches and industrial applications. However, most existing methods need to access the training data of past 3D object classes when facing the common real-world scenario: new classes of 3D objects arrive in a sequence. Moreover, the performance of advanced approaches degrades dramatically for past learned classes (i. e. , catastrophic forgetting), due to the irregular and redundant geometric structures of 3D point cloud data. To address these challenges, we propose a new Incremental 3D Object Learning (i. e. , I3DOL) model, which is the first exploration to learn new classes of 3D object continually. Specifically, an adaptive-geometric centroid module is designed to construct discriminative local geometric structures, which can better characterize the irregular point cloud representation for 3D object. Afterwards, to prevent the catastrophic forgetting brought by redundant geometric information, a geometric-aware attention mechanism is developed to quantify the contributions of local geometric structures, and explore unique 3D geometric characteristics with high contributions for classes incremental learning. Meanwhile, a score fairness compensation strategy is proposed to further alleviate the catastrophic forgetting caused by unbalanced data between past and new classes of 3D object, by compensating biased prediction for new classes in the validation phase. Experiments on 3D representative datasets validate the superiority of our I3DOL framework.

AAAI Conference 2020 Conference Paper

Cross-Modal Subspace Clustering via Deep Canonical Correlation Analysis

  • Quanxue Gao
  • Huanhuan Lian
  • Qianqian Wang
  • Gan Sun

For cross-modal subspace clustering, the key point is how to exploit the correlation information between cross-modal data. However, most hierarchical and structural correlation information among cross-modal data cannot be well exploited due to its high-dimensional non-linear property. To tackle this problem, in this paper, we propose an unsupervised framework named Cross-Modal Subspace Clustering via Deep Canonical Correlation Analysis (CMSC-DCCA), which incorporates the correlation constraint with a self-expressive layer to make full use of information among the inter-modal data and the intra-modal data. More specifically, the proposed model consists of three components: 1) deep canonical correlation analysis (Deep CCA) model; 2) self-expressive layer; 3) Deep CCA decoders. The Deep CCA model consists of convolutional encoders and correlation constraint. Convolutional encoders are used to obtain the latent representations of cross-modal data, while adding the correlation constraint for the latent representations can make full use of the information of the inter-modal data. Furthermore, self-expressive layer works on latent representations and constrain it perform self-expression properties, which makes the shared coefficient matrix could capture the hierarchical intra-modal correlations of each modality. Then Deep CCA decoders reconstruct data to ensure that the encoded features can preserve the structure of the original data. Experimental results on several real-world datasets demonstrate the proposed method outperforms the state-of-the-art methods.

AAAI Conference 2020 Conference Paper

Dual Relation Semi-Supervised Multi-Label Learning

  • Lichen Wang
  • Yunyu Liu
  • Can Qin
  • Gan Sun
  • Yun Fu

Multi-label learning (MLL) solves the problem that one single sample corresponds to multiple labels. It is a challenging task due to the long-tail label distribution and the sophisticated label relations. Semi-supervised MLL methods utilize a small-scale labeled samples and large-scale unlabeled samples to enhance the performance. However, these approaches mainly focus on exploring the data distribution in feature space while ignoring mining the label relation inside of each instance. To this end, we proposed a Dual Relation Semisupervised Multi-label Learning (DRML) approach which jointly explores the feature distribution and the label relation simultaneously. A dual-classifier domain adaptation strategy is proposed to align features while generating pseudo labels to improve learning performance. A relation network is proposed to explore the relation knowledge. As a result, DRML effectively explores the feature-label and label-label relations in both labeled and unlabeled samples. It is an end-to-end model without any extra knowledge. Extensive experiments illustrate the effectiveness and efficiency of our method1.

AAAI Conference 2020 Conference Paper

Lifelong Spectral Clustering

  • Gan Sun
  • Yang Cong
  • Qianqian Wang
  • Jun Li
  • Yun Fu

In the past decades, spectral clustering (SC) has become one of the most effective clustering algorithms. However, most previous studies focus on spectral clustering tasks with a fixed task set, which cannot incorporate with a new spectral clustering task without accessing to previously learned tasks. In this paper, we aim to explore the problem of spectral clustering in a lifelong machine learning framework, i. e. , Lifelong Spectral Clustering (L2 SC). Its goal is to efficiently learn a model for a new spectral clustering task by selectively transferring previously accumulated experience from knowledge library. Specifically, the knowledge library of L2 SC contains two components: 1) orthogonal basis library: capturing latent cluster centers among the clusters in each pair of tasks; 2) feature embedding library: embedding the feature manifold information shared among multiple related tasks. As a new spectral clustering task arrives, L2 SC firstly transfers knowledge from both basis library and feature library to obtain encoding matrix, and further redefines the library base over time to maximize performance across all the clustering tasks. Meanwhile, a general online update formulation is derived to alternatively update the basis library and feature library. Finally, the empirical experiments on several real-world benchmark datasets demonstrate that our L2 SC model can effectively improve the clustering performance when comparing with other state-of-the-art spectral clustering algorithms.

AAAI Conference 2020 Conference Paper

Robust Low-Rank Discovery of Data-Driven Partial Differential Equations

  • Jun Li
  • Gan Sun
  • Guoshuai Zhao
  • Li-wei H. Lehman

Partial differential equations (PDEs) are essential foundations to model dynamic processes in natural sciences. Discovering the underlying PDEs of complex data collected from real world is key to understanding the dynamic processes of natural laws or behaviors. However, both the collected data and their partial derivatives are often corrupted by noise, especially from sparse outlying entries, due to measurement/process noise in the real-world applications. Our work is motivated by the observation that the underlying data modeled by PDEs are in fact often low rank. We thus develop a robust low-rank discovery framework to recover both the low-rank data and the sparse outlying entries by integrating double low-rank and sparse recoveries with a (group) sparse regression method, which is implemented as a minimization problem using mixed nuclear norms with ℓ1 and ℓ0 norms. We propose a low-rank sequential (grouped) threshold ridge regression algorithm to solve the minimization problem. Results from several experiments on seven canonical models (i.e., four PDEs and three parametric PDEs) verify that our framework outperforms the state-of-art sparse and group sparse regression methods. Code is available at https://github.com/junli2019/Robust-Discovery-of-PDEs

AAAI Conference 2020 Conference Paper

Visual Tactile Fusion Object Clustering

  • Tao Zhang
  • Yang Cong
  • Gan Sun
  • Qianqian Wang
  • Zhenming Ding

Object clustering, aiming at grouping similar objects into one cluster with an unsupervised strategy, has been extensivelystudied among various data-driven applications. However, most existing state-of-the-art object clustering methods (e. g. , single-view or multi-view clustering methods) only explore visual information, while ignoring one of most important sensing modalities, i. e. , tactile information which can help capture different object properties and further boost the performance of object clustering task. To effectively benefit both visual and tactile modalities for object clustering, in this paper, we propose a deep Auto-Encoder-like Non-negative Matrix Factorization framework for visual-tactile fusion clustering. Specifically, deep matrix factorization constrained by an under-complete Auto-Encoder-like architecture is employed to jointly learn hierarchical expression of visual-tactile fusion data, and preserve the local structure of data generating distribution of visual and tactile modalities. Meanwhile, a graph regularizer is introduced to capture the intrinsic relations of data samples within each modality. Furthermore, we propose a modality-level consensus regularizer to effectively align the visual and tactile data in a common subspace in which the gap between visual and tactile data is mitigated. For the model optimization, we present an efficient alternating minimization strategy to solve our proposed model. Finally, we conduct extensive experiments on public datasets to verify the effectiveness of our framework.

ICRA Conference 2019 Conference Paper

Environment Driven Underwater Camera-IMU Calibration for Monocular Visual-Inertial SLAM

  • Changjun Gu
  • Yang Cong
  • Gan Sun

Most state-of-the-art underwater vision systems are calibrated manually in shallow water and used in open seas without changing. However, the refractivity of the water is adaptively changed depending on the salinity, temperature, depth or other underwater environmental indexes, which inevitably generate the calibration errors and induces incorrectness e. g. , for underwater Simultaneously Localization and Mapping (SLAM). To address this issue, in this paper, we propose a new underwater Camera-Inertial Measurement Unit (IMU) calibration model, which just needs to be calibrated once in the air, and then both the intrinsic parameters and extrinsic parameters between the camera and IMU could be automatically calculated depending on the environment indexes. To our best knowledge, this is the first work to consider the underwater Camera-IMU calibration via environmental indexes. We also build a verification platform to validate the effectiveness of our proposed method on real experiments, and use it for underwater monocular Visual-Inertial SLAM.

IJCAI Conference 2019 Conference Paper

LRDNN: Local-refining based Deep Neural Network for Person Re-Identification with Attribute Discerning

  • Qinqin Zhou
  • Bineng Zhong
  • Xiangyuan Lan
  • Gan Sun
  • Yulun Zhang
  • Mengran Gou

Recently, pose or attribute information has been widely used to solve person re-identification (re-ID) problem. However, the inaccurate output from pose or attribute modules will impair the final person re-ID performance. Since re-ID, pose estimation and attribute recognition are all based on the person appearance information, we propose a Local-refining based Deep Neural Network (LRDNN) to aggregate pose estimation and attribute recognition to improve the re-ID performance. To this end, we add a pose branch to extract the local spatial information and optimize the whole network on both person identity and attribute objectives. To diminish the negative affect from unstable pose estimation, a novel structure called channel parse block (CPB) is introduced to learn weights on different feature channels in pose branch. Then two branches are combined with compact bilinear pooling. Experimental results on Market1501 and DukeMTMC-reid datasets illustrate the effectiveness of the proposed method.

AAAI Conference 2018 Conference Paper

Active Lifelong Learning With “Watchdog”

  • Gan Sun
  • Yang Cong
  • Xiaowei Xu

Lifelong learning intends to learn new consecutive tasks depending on previously accumulated experiences, i. e. , knowledge library. However, the knowledge among different new coming tasks are imbalance. Therefore, in this paper, we try to mimic an effective “human cognition” strategy by actively sorting the importance of new tasks in the process of unknown-to-known and selecting to learn the important tasks with more information preferentially. To achieve this, we consider to assess the importance of the new coming task, i. e. , unknown or not, as an outlier detection issue, and design a hierarchical dictionary learning model consisting of two-level task descriptors to sparse reconstruct each task with the 0 norm constraint. The new coming tasks are sorted depending on the sparse reconstruction score in descending order, and the task with high reconstruction score will be permitted to pass, where this mechanism is called as “watchdog”. Next, the knowledge library of the lifelong learning framework encode the selected task by transferring previous knowledge, and then can also update itself with knowledge from both previously learned task and current task automatically. For model optimization, the alternating direction method is employed to solve our model and converges to a fixed point. Extensive experiments on both benchmark datasets and our own dataset demonstrate the effectiveness of our proposed model especially in task selection and dictionary learning.