Author name cluster

Xiaotang Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

ICML Conference 2025 Conference Paper

CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features

Xiaokun Feng
Dailing Zhang
Shiyu Hu
Xuchen Li 0001
Meiqi Wu
Jing Zhang 0110
Xiaotang Chen
Kaiqi Huang

Effectively modeling and utilizing spatiotemporal features from RGB and other modalities (e. g. , depth, thermal, and event data, denoted as X) is the core of RGB-X tracker design. Existing methods often employ two parallel branches to separately process the RGB and X input streams, requiring the model to simultaneously handle two dispersed feature spaces, which complicates both the model structure and computation process. More critically, intra-modality spatial modeling within each dispersed space incurs substantial computational overhead, limiting resources for inter-modality spatial modeling and temporal modeling. To address this, we propose a novel tracker, CSTrack, which focuses on modeling Compact Spatiotemporal features to achieve simple yet effective tracking. Specifically, we first introduce an innovative Spatial Compact Module that integrates the RGB-X dual input streams into a compact spatial feature, enabling thorough intra- and inter-modality spatial modeling. Additionally, we design an efficient Temporal Compact Module that compactly represents temporal features by constructing the refined target distribution heatmap. Extensive experiments validate the effectiveness of our compact spatiotemporal modeling method, with CSTrack achieving new SOTA results on mainstream RGB-X benchmarks. The code and models will be released at: https: //github. com/XiaokunFeng/CSTrack.

Details

AAMAS Conference 2025 Conference Paper

Uncertainty-Aware Opponent Modeling for Deep Reinforcement Learning

Likun Yang
Pei Xu
Shiyue Cao
Yongjian Ren
Xiaotang Chen
Kaiqi Huang

The ability to model opponent behavior is essential for autonomous decision-making in multi-agent games. Although stochastic behavior is universal in real-world situations, previous works have struggled to model opponents with high stochasticity, such as humans. The issue arises because stochasticity in opponent behavior introduces significant uncertainty into the opponent modeling process, which existing methods have not adequately addressed. We introduce a novel Uncertainty-Aware Opponent Modeling (UAOM) method that addresses two key sources of uncertainty stemming from the inherent randomness of the opponent’s actions. The first pertains to the uncertainty in constructing the opponent model, while the second concerns the uncertainty in applying the model during decision-making. For the first uncertainty, UAOM uses a hybrid behavior modeling module to learn a more powerful opponentaware representation by ensembling the deterministic and probabilistic models to address both aleatoric and epistemic uncertainties in opponent modeling. For the second uncertainty, UAOM uses an opponent-aware dynamic modeling module to learn a dynamicaware representation. We further provide a theoretical analysis showing that jointly optimizing our two modules can enhance downstream reinforcement learning performance while ensuring system convergence. We evaluate UAOM in both simulated settings This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025), Y. Vorobeychik, S. Das, A. Nowé (eds.), May 19 – 23, 2025, Detroit, Michigan, USA. © 2025 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). and human-agent interaction scenarios. Our experimental results show that the proposed method significantly enhances performance when facing opponents with varying degrees of stochastic behavior, while efficiently managing the uncertainties introduced by such opponents.

PDF

NeurIPS Conference 2024 Conference Paper

MemVLT: Vision-Language Tracking with Adaptive Memory-based Prompts

Xiaokun Feng
Xuchen Li
Shiyu Hu
Dailing Zhang
Meiqi Wu
Jing Zhang
Xiaotang Chen
Kaiqi Huang

Vision-language tracking (VLT) enhances traditional visual object tracking by integrating language descriptions, requiring the tracker to flexibly understand complex and diverse text in addition to visual information. However, most existing vision-language trackers still overly rely on initial fixed multimodal prompts, which struggle to provide effective guidance for dynamically changing targets. Fortunately, the Complementary Learning Systems (CLS) theory suggests that the human memory system can dynamically store and utilize multimodal perceptual information, thereby adapting to new scenarios. Inspired by this, (i) we propose a Memory-based Vision-Language Tracker (MemVLT). By incorporating memory modeling to adjust static prompts, our approach can provide adaptive prompts for tracking guidance. (ii) Specifically, the memory storage and memory interaction modules are designed in accordance with CLS theory. These modules facilitate the storage and flexible interaction between short-term and long-term memories, generating prompts that adapt to target variations. (iii) Finally, we conduct extensive experiments on mainstream VLT datasets (e. g. , MGIT, TNL2K, LaSOT and LaSOT$_{ext}$). Experimental results show that MemVLT achieves new state-of-the-art performance. Impressively, it achieves 69. 4% AUC on the MGIT and 63. 3% AUC on the TNL2K, improving the existing best result by 8. 4% and 4. 7%, respectively.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Learning Disentangled Attribute Representations for Robust Pedestrian Attribute Recognition

Jian Jia
Naiyu Gao
Fei He
Xiaotang Chen
Kaiqi Huang

Although various methods have been proposed for pedestrian attribute recognition, most studies follow the same feature learning mechanism, i. e. , learning a shared pedestrian image feature to classify multiple attributes. However, this mechanism leads to low-confidence predictions and non-robustness of the model in the inference stage. In this paper, we investigate why this is the case. We mathematically discover that the central cause is that the optimal shared feature cannot maintain high similarities with multiple classifiers simultaneously in the context of minimizing classification loss. In addition, this feature learning mechanism ignores the spatial and semantic distinctions between different attributes. To address these limitations, we propose a novel disentangled attribute feature learning (DAFL) framework to learn a disentangled feature for each attribute, which exploits the semantic and spatial characteristics of attributes. The framework mainly consists of learnable semantic queries, a cascaded semantic-spatial cross-attention (SSCA) module, and a group attention merging (GAM) module. Specifically, based on learnable semantic queries, the cascaded SSCA module iteratively enhances the spatial localization of attribute-related regions and aggregates region features into multiple disentangled attribute features, used for classification and updating learnable semantic queries. The GAM module splits attributes into groups based on spatial distribution and utilizes reliable group attention to supervise query attention maps. Experiments on PETA, RAPv1, PA100k, and RAPv2 show that the proposed method performs favorably against stateof-the-art methods.

PDF Details

AAAI Conference 2017 Conference Paper

A Multi-Task Deep Network for Person Re-Identification

Weihua Chen
Xiaotang Chen
Jianguo Zhang
Kaiqi Huang

Person re-identiﬁcation (ReID) focuses on identifying people across different scenes in video surveillance, which is usually formulated as a binary classiﬁcation task or a ranking task in current person ReID approaches. In this paper, we take both tasks into account and propose a multi-task deep network (MTDnet) that makes use of their own advantages and jointly optimize the two tasks simultaneously for person ReID. To the best of our knowledge, we are the ﬁrst to integrate both tasks in one network to solve the person ReID. We show that our proposed architecture signiﬁcantly boosts the performance. Furthermore, deep architecture in general requires a sufﬁcient dataset for training, which is usually not met in person ReID. To cope with this situation, we further extend the MTDnet and propose a cross-domain architecture that is capable of using an auxiliary set to assist training on small target sets. In the experiments, our approach outperforms most of existing person ReID algorithms on representative datasets including CUHK03, CUHK01, VIPeR, iLIDS and PRID2011, which clearly demonstrates the effectiveness of the proposed approach.

PDF Details