Author name cluster

Su Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

1 author row

EAAI Journal 2026 Journal Article

Text-based three-dimensional geometric person retrieval

Fanzhi Jiang
Kexin Wang
Hanchi Ren
Yiming Li
Liumei Zhang
Yuanjiao Hu
Xianghua Xie
Su Yang

Person Re-identification (Re-ID) is crucial in computer vision, widely applied in forensic investigation, intelligent surveillance, and video retrieval. Recent text-based Re-ID methods leverage eyewitness descriptions to enhance retrieval flexibility but still face challenges in accurately characterizing individuals under complex conditions. To address issues like low resolution, viewpoint variations, and occlusions, this paper proposes a novel text-based person Re-ID approach that integrates textual descriptions with synthesized Three-dimensional (3D) geometric pedestrian data derived from existing Two-dimensional (2D) images. Specifically, the semantic richness of text compensates for the lack of color and texture details in 3D data, while the robustness of geometric and pose information significantly enhances retrieval performance. Despite current 3D pedestrian data being generated through reconstruction algorithms, this work serves as a pioneering exploration of text-to-3D pedestrian retrieval, offering substantial potential for real-world applications in multimodal biometrics, forensic investigations, and privacy protection. Experiments on three public datasets demonstrate that our method achieves competitive performance, confirming its practical applicability and significance.

Details DOI

TIST Journal 2023 Journal Article

Customer Volume Prediction Using Fusion of Shared-private Dynamic Weighting over Multiple Modalities

Wenshan Wang
Su Yang
Weishan Zhang

Customer volume prediction is crucial for a variety of urban applications, such as store location selection. So far, the key challenge lies in how to fuse multiple modalities from different data sources, on account of the massive amount of data accessible, for example, spatio-temporal data and satellite images. In this article, we investigate three dynamic weighting ensemble learning models to fuse spatio-temporal features and visual features for predicting customer volume in the urban commercial district of interest. Specifically, we propose the shared-private dynamic weighting model by incorporating graph neural networks, which is proposed to capture geographic dependencies (i.e., competitiveness or dependencies) between urban commercial districts in an end-to-end manner. To the best of our knowledge, it is the first work to utilize graph neural networks to model such geographic relationships. We conduct a series of experiments to demonstrate the effectiveness of the proposed models based on two real datasets. Furthermore, an elaborated visualization method is performed for knowledge discovery.

Details DOI

AAAI Conference 2023 Conference Paper

Mining and Applying Composition Knowledge of Dance Moves for Style-Concentrated Dance Generation

Xinjian Zhang
Su Yang
Yi Xu
Weishan Zhang
Longwen Gao

Choreography refers to creation of dance motions according to both music and dance knowledge, where the created dances should be style-specific and consistent. However, most of the existing methods generate dances using the given music as the only reference, lacking the stylized dancing knowledge, namely, the flag motion patterns contained in different styles. Without the stylized prior knowledge, these approaches are not promising to generate controllable style or diverse moves for each dance style, nor new dances complying with stylized knowledge. To address this issue, we propose a novel music-to-dance generation framework guided by style embedding, considering both input music and stylized dancing knowledge. These style embeddings are learnt representations of style-consistent kinematic abstraction of reference dance videos, which can act as controllable factors to impose style constraints on dance generation in a latent manner. Hence, we can make the style embedding fit into any given style while allowing the flexibility to generate new compatible dance moves by modifying the style embedding according to the learnt representations of a certain style. We are the first to achieve knowledge-driven style control in dance generation tasks. To support this study, we build a large multi-style music-to-dance dataset referred to as I-Dance. The qualitative and quantitative evaluations demonstrate the advantage of the proposed framework, as well as the ability to synthesize diverse moves under a dance style directed by style embedding.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Video Compression Artifact Reduction by Fusing Motion Compensation and Global Context in a Swin-CNN Based Parallel Architecture

Xinjian Zhang
Su Yang
Wuyang Luo
Longwen Gao
Weishan Zhang

Video Compression Artifact Reduction aims to reduce the artifacts caused by video compression algorithms and improve the quality of compressed video frames. The critical challenge in this task is to make use of the redundant high-quality information in compressed frames for compensation as much as possible. Two important possible compensations: Motion compensation and global context, are not comprehensively considered in previous works, leading to inferior results. The key idea of this paper is to fuse the motion compensation and global context together to gain more compensation information to improve the quality of compressed videos. Here, we propose a novel Spatio-Temporal Compensation Fusion (STCF) framework with the Parallel Swin-CNN Fusion (PSCF) block, which can simultaneously learn and merge the motion compensation and global context to reduce the video compression artifacts. Specifically, a temporal self-attention strategy based on shifted windows is developed to capture the global context in an efficient way, for which we use the Swin transformer layer in the PSCF block. Moreover, an additional Ada-CNN layer is applied in the PSCF block to extract the motion compensation. Experimental results demonstrate that our proposed STCF framework outperforms the state-of-the-art methods up to 0.23dB (27% improvement) on the MFQEv2 dataset.

PDF Details DOI

EAAI Journal 2022 Journal Article

Federated learning-based vertebral body segmentation

Junxiu Liu
Xiuhao Liang
Rixing Yang
Yuling Luo
Hao Lu
Liangjia Li
Shunsheng Zhang
Su Yang

To ensure the safety of spinal surgery, sufficiently labeled Magnetic Resonance Imaging (MRI) images are essential for training an accurate vertebral segmentation model, but the number of labeled MRI images owned by the independent medical institution such as the hospital is generally limited. Besides, in consideration of patients’ privacy, annotated images are difficult to share directly as the medical data to train vertebral body segment models. To address these challenges, a Federated Learning-based Vertebral Body Segment Framework (FLVBSF) is proposed in this work, which includes a novel local Dual Attention Gates (DAGs)-based attention mechanism and a global federated learning framework. The model sensitivity to vertebral body pixels and segmentation accuracy can be improved by using the DAGs. The performance of vertebral body segmentation models is boosted by the global federated learning framework via collaboratively exploiting the labeled spine image data from different institutions. The centralized training-based experimental results show that 98. 29% in pixel-level accuracy is achieved by the U-Net with DAGs, 88. 04% in dice similarity coefficient, 88. 25% in sensitivity, 99. 16% in specificity, and 79. 09% in Jaccard similarity coefficient and the mean segmentation time per case is 0. 14 s. Meanwhile, the federated learning-based experimental results show that the proposed FLVBSF can enhance the performance of the vertebral segmentation model by a statistically significant margin.

Details DOI

IJCAI Conference 2021 Conference Paper

Reinforcement Learning Based Sparse Black-box Adversarial Attack on Video Recognition Models

Zeyuan Wang
Chaofeng Sha
Su Yang

We explore the black-box adversarial attack on video recognition models. Attacks are only performed on selected key regions and key frames to reduce the high computation cost of searching adversarial perturbations on a video due to its high dimensionality. To select key frames, one way is to use heuristic algorithms to evaluate the importance of each frame and choose the essential ones. However, it is time inefficient on sorting and searching. In order to speed up the attack process, we propose a reinforcement learning based frame selection strategy. Specifically, the agent explores the difference between the original class and the target class of videos to make selection decisions. It receives rewards from threat models which indicate the quality of the decisions. Besides, we also use saliency detection to select key regions and only estimate the sign of gradient instead of the gradient itself in zeroth order optimization to further boost the attack process. We can use the trained model directly in the untargeted attack or with little fine-tune in the targeted attack, which saves computation time. A range of empirical results on real datasets demonstrate the effectiveness and efficiency of the proposed method.

PDF Details DOI

EAAI Journal 2021 Journal Article

Spiking neural network-based multi-task autonomous learning for mobile robots

Junxiu Liu
Hao Lu
Yuling Luo
Su Yang

Spiking Neural Networks (SNNs) are the new generation of artificial neural networks that closely mimic the time encoding and information processing aspects of the human brain. In this work, a multi-task autonomous learning paradigm is proposed for the mobile robot application, which employs a SNN to construct the controlling system of the mobile robot. The Reward-modulated Spiking-time-dependent Plasticity learning rule is developed for the SNN-based controller, which aims to achieve the capability of autonomous learning under multiple tasks. Reward signals are generated based on the instantaneous frequencies of pre- and post-synaptic spikes, which adapts to the sensory stimuli and environmental feedback. Meanwhile, inspired by lateral inhibition connections, a task switch mechanism is designed to enable the controller to switch the operations between multiple tasks. Two tasks of obstacle avoidance and target tracking are used for performance evaluation and results demonstrate that the mobile robot with the proposed paradigm is able to autonomously learn, switch and complete the tasks.

Details DOI

AAAI Conference 2015 Conference Paper

Crowd Motion Monitoring with Thermodynamics-Inspired Feature

Xinfeng Zhang
Su Yang
Yuan Yan Tang
Weishan Zhang

Crowd motion in surveillance videos is comparable to heat motion of basic particles. Inspired by that, we introduce Boltzmann Entropy to measure crowd motion in optical flow field so as to detect abnormal collective behaviors. As a result, the collective crowd moving pattern can be represented as a time series. We found that when most people behave anomaly, the entropy value will increase drastically. Thus, a threshold can be applied to the time series to identify abnormal crowd commotion in a simple and efficient manner without machine learning. The experimental results show promising performance compared with the state of the art methods. The system works in real time with high precision.

PDF Details