Author name cluster

Xinglong Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers

1 author row

IROS Conference 2024 Conference Paper

GroupTrack: Multi-Object Tracking by Using Group Motion Patterns

Xinglong Xu
Weihong Ren
Gan Sun
Haoyu Ji 0001
Yu Gao 0010
Honghai Liu 0001

The main challenge of Multi-Object Tracking (MOT) lies in maintaining a distinctive identity for each target in dense crowds or occluded scenarios. Although the existing methods have achieved significantly progress by using robust object detectors or complex association strategies, they cannot effectively solve long-term tracking due to individually motion or appearance modeling for each single target. In this paper, we propose a novel 2D MOT tracker GroupTrack, to learn reliable motion state for each target using group motion patterns. Specifically, for each tracklet, we first choose its neighboring ones to form a group of motion patterns, which can provide informative clues for the motion estimation of the current tracklet. Then, we apply the group motion patterns to perform tracklet prediction and data association. By integrating prior from neighboring motion patterns into the data association process, GroupTrack provides a new paradigm for target motion modeling in extremely crowded and occluded scenarios. Through extensive experiments on the public MOT17 and MOT20 datasets, we demonstrate the effectiveness of our approach in challenging scenarios and show state-of-the-art performance at various MOT metrics.

Details

IROS Conference 2024 Conference Paper

MLPER: Multi-Level Prompts for Adaptively Enhancing Vision-Language Emotion Recognition

Yu Gao 0010
Weihong Ren
Xinglong Xu
Yan Wang
Zhiyong Wang 0009
Honghai Liu 0001

In the field of robotics, vision-based Emotion Recognition (ER) has achieved significant progress, but it still faces the challenge of poor generalization ability under unconstrained conditions (e. g. , occlusions and pose variations). In this work, we propose MLPER model, which introduces Vision-Language Model for Emotion Recognition to learn discriminative representations adaptively. Specifically, different from typically leveraging a hand-crafted prompt (e. g. , "a photo of a [class] person"), we first establish Multi-Level Prompts from three aspects: facial expression, human posture and situational condition using large language models, like ChatGPT. Correspondingly, we extract the visual tokens from three levels: the face, body, and context. Further, to achieve fine-grained alignment at each level, we adopt textual tokens from the positive and the hard negative to query visual tokens, predicting whether a pair of image and text is matched. Experimental results demonstrate that our MLPER model outperforms the state-of-the-art methods on several ER benchmarks, especially under the conditions of occlusions and pose variations.

Details