Author name cluster

Xueming Qian

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

1 author row

AAAI Conference 2026 Conference Paper

Whole-Field Action Sensing via Wearable Single-Channel EMG Sensors and Resource-Efficient Motion Network

Xuanming Jiang
Dingyu Nie
Baoyi An
Yuzhe Zheng
Yichuan Mao
Jialie Shen
Xueming Qian
Zhiwen Jin

The proliferation of collaborative training and multi-person sports has underscored the necessity for concurrent whole-field action sensing. However, Electromyography (EMG) recognition, which plays a pivotal role in Wearable Human Activity Recognition (WHAR) for analyzing muscle activity and decoding action intent, still faces challenges in achieving a balance between performance, cost, and efficiency in multi-person scenarios. Unlike current channel-expansion solutions, we propose a wireless wearable Single-Dimensional Sparse EMG (2SEMG) Sensor for efficient personal sampling. These action-unaffected sensors leverage the proposed lightweight One-Dimensional Motion Network (OMONet) to facilitate concurrent action sensing. Experiments demonstrate that OMONet achieves leading performance and efficiency in action signal recognition, and two real-world badminton matches further confirm the performance, robustness, and real-time efficiency of the whole-field action sensing network constructed via 2SEMG Sensors and OMONet.

PDF Details DOI

AAAI Conference 2025 Conference Paper

M3Net: Efficient Time-Frequency Integration Network with Mirror Attention for Audio Classification on Edge

Xuanming Jiang
Baoyi An
Guoshuai Zhao
Xueming Qian

Audio classification plays a crucial role within fields such as human-machine interaction and intelligent robotics. However, high-performance audio classification systems typically demand significant computational and storage resources, posing substantial challenges when deploying to the resource-constrained edge devices with an urgent need for such capabilities. To achieve a new level of balance between model complexity and performance, we introduce a novel multi-view method for the separated time-frequency features extraction and utilization, which exists within the proposed Mini Mirror Multi-View Network (M3Net) in the form of the Mirror Attention mechanism. M3Net enables reversible spatial transformation of spectral features is capable of efficiently leverages robust local and global features in the time and frequency domains with low requirements for parameters. Experiments based on Mel-Spectrogram without data augmentation and pre-training indicate that M3Net can achieve classification accuracy over 97% on the UrbanSound8K and SpeechCommandsV2 datasets with only 0.03 million parameters. The contribution of each functional segment in M3Net is fully verified and explained in the ablation experiments.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Decoupling Degradations with Recurrent Network for Video Restoration in Under-Display Camera

Chengxu Liu
Xuan Wang
Yuanting Fan
Shuai Li
Xueming Qian

Under-display camera (UDC) systems are the foundation of full-screen display devices in which the lens mounts under the display. The pixel array of light-emitting diodes used for display diffracts and attenuates incident light, causing various degradations as the light intensity changes. Unlike general video restoration which recovers video by treating different degradation factors equally, video restoration for UDC systems is more challenging that concerns removing diverse degradation over time while preserving temporal consistency. In this paper, we introduce a novel video restoration network, called D2RNet, specifically designed for UDC systems. It employs a set of Decoupling Attention Modules (DAM) that effectively separate the various video degradation factors. More specifically, a soft mask generation function is proposed to formulate each frame into flare and haze based on the diffraction arising from incident light of different intensities, followed by the proposed flare and haze removal components that leverage long- and short-term feature learning to handle the respective degradations. Such a design offers an targeted and effective solution to eliminating various types of degradation in UDC systems. We further extend our design into multi-scale to overcome the scale-changing of degradation that often occur in long-range videos. To demonstrate the superiority of D2RNet, we propose a large-scale UDC video benchmark by gathering HDR videos and generating realistically degraded videos using the point spread function measured by a commercial UDC system. Extensive quantitative and qualitative evaluations demonstrate the superiority of D2RNet compared to other state-of-the-art video restoration and UDC image restoration methods.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

A Diffusion Model with Contrastive Learning for ICU False Arrhythmia Alarm Reduction

Feng Wu
Guoshuai Zhao
Xueming Qian
Li-wei H. Lehman

The high rate of false arrhythmia alarms in intensive care units (ICUs) can negatively impact patient care and lead to slow staff response time due to alarm fatigue. To reduce false alarms in ICUs, previous works proposed conventional supervised learning methods which have inherent limitations in dealing with high-dimensional, sparse, unbalanced, and limited data. We propose a deep generative approach based on the conditional denoising diffusion model to detect false arrhythmia alarms in the ICUs. Conditioning on past waveform data of a patient, our approach generates waveform predictions of the patient during an actual arrhythmia event, and uses the distance between the generated and the observed samples to classify the alarm. We design a network with residual links and self-attention mechanism to capture long-term dependencies in signal sequences, and leverage the contrastive learning mechanism to maximize distances between true and false arrhythmia alarms. We demonstrate the effectiveness of our approach on the MIMIC II arrhythmia dataset for detecting false alarms in both retrospective and real-time settings.

PDF Details DOI

IS Journal 2023 Journal Article

A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching

Heng Shang
Guoshuai Zhao
Jing Shi
Xueming Qian

In image-text matching fields, one of the keys to improving performance is to extract features with more semantic information. Existing works demonstrate that semantic enrichment through knowledge expansion can improve performance. Most of them expand image features, however, the shortage of semantic information in text modality and the unilateral character of the view are often bottlenecks that limit the performance of image-text matching models. To solve the two problems, we aggregate knowledge from multiple views and propose a word imagination graph (WIG). A WIG can be used to expand textual semantic information by imagination based on input images. Then, utilizing WIG, we construct a novel multiview text imagination network (MTIN). A MTIN enables latent alignment of images and texts on tags, which can assist matching on a semantic level. Results from the Flickr30K and MS-COCO datasets demonstrate the effectiveness of our method. The source code has been released on GitHub https://github.com/smileslabsh/Multiview-Text-Imagination-Network.

Details DOI

NeurIPS Conference 2022 Conference Paper

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

Yixuan Pei
Zhiwu Qing
Jun CEN
Xiang Wang
Shiwei Zhang
Yaxiong Wang
Mingqian Tang
Nong Sang

Recent incremental learning for action recognition usually stores representative videos to mitigate catastrophic forgetting. However, only a few bulky videos can be stored due to the limited memory. To address this problem, we propose FrameMaker, a memory-efficient video class-incremental learning approach that learns to produce a condensed frame for each selected video. Specifically, FrameMaker is mainly composed of two crucial components: Frame Condensing and Instance-Specific Prompt. The former is to reduce the memory cost by preserving only one condensed frame instead of the whole video, while the latter aims to compensate the lost spatio-temporal details in the Frame Condensing stage. By this means, FrameMaker enables a remarkable reduction in memory but keep enough information that can be applied to following incremental tasks. Experimental results on multiple challenging benchmarks, i. e. , HMDB51, UCF101 and Something-Something V2, demonstrate that FrameMaker can achieve better performance to recent advanced methods while consuming only 20% memory. Additionally, under the same memory consumption conditions, FrameMaker significantly outperforms existing state-of-the-arts by a convincing margin.

PDF Details

TIST Journal 2019 Journal Article

Personalized Reason Generation for Explainable Song Recommendation

Guoshuai Zhao
Hao Fu
Ruihua Song
Tetsuya Sakai
Zhongxia Chen
Xing Xie
Xueming Qian

Personalized recommendation has received a lot of attention as a highly practical research topic. However, existing recommender systems provide the recommendations with a generic statement such as “Customers who bought this item also bought…”. Explainable recommendation, which makes a user aware of why such items are recommended, is in demand. The goal of our research is to make the users feel as if they are receiving recommendations from their friends. To this end, we formulate a new challenging problem called personalized reason generation for explainable recommendation for songs in conversation applications and propose a solution that generates a natural language explanation of the reason for recommending a song to that particular user. For example, if the user is a student, our method can generate an output such as “Campus radio plays this song at noon every day, and I think it sounds wonderful,” which the student may find easy to relate to. In the offline experiments, through manual assessments, the gain of our method is statistically significant on the relevance to songs and personalization to users comparing with baselines. Large-scale online experiments show that our method outperforms manually selected reasons by 8.2% in terms of click-through rate. Evaluation results indicate that our generated reasons are relevant to songs and personalized to users, and they attract users to click the recommendations.

Details DOI

IJCAI Conference 2019 Conference Paper

Position Focused Attention Network for Image-Text Matching

Yaxiong Wang
Hao Yang
Xueming Qian
Lin Ma
Jing Lu
Biao Li
Xin Fan

Image-text matching tasks have recently attracted a lot of attention in the computer vision field. The key point of this cross-domain problem is how to accurately measure the similarity between the visual and the textual contents, which demands a fine understanding of both modalities. In this paper, we propose a novel position focused attention network (PFAN) to investigate the relation between the visual and the textual views. In this work, we integrate the object position clue to enhance the visual-text joint-embedding learning. We first split the images into blocks, by which we infer the relative position of region in the image. Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence. Experiments on the popular datasets Flickr30K and MS-COCO show the effectiveness of the proposed method. Besides the public datasets, we also conduct experiments on our collected practical news dataset (Tencent-News) to validate the practical application value of proposed method. As far as we know, this is the first attempt to test the performance on the practical application. Our method can achieve the state-of-art performance on all of these three datasets.

PDF Details