Author name cluster

Shuheng Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

NeurIPS Conference 2025 Conference Paper

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

Shen Zhang
Siyuan Liang
Yaning Tan
Zhaowei Chen
Linze Li
Ge Wu
Yuhao Chen
Shuheng Li

Diffusion transformers (DiTs) struggle to generate images at resolutions higher than their training resolutions. The primary obstacle is that the explicit positional encodings (PE), such as RoPE, need extrapolating to unseen positions which degrades performance when the inference resolution differs from training. In this paper, We propose a Length-Extrapolatable Diffusion Transformer (LEDiT) to overcome this limitation. LEDiT needs no explicit PEs, thereby avoiding PE extrapolation. The key innovation of LEDiT lies in the use of causal attention. We demonstrate that causal attention can implicitly encode global positional information and show that such information facilitates extrapolation. We further introduce a locality enhancement module, which captures fine-grained local information to complement the global coarse-grained position information encoded by causal attention. Experimental results on both conditional and text-to-image generation tasks demonstrate that LEDiT supports up to 4× resolution scaling (e. g. , from 256$\times$256 to 512$\times$512), achieving better image quality compared to the state-of-the-art length extrapolation methods. We believe that LEDiT marks a departure from the standard RoPE-based methods and offers a promising insight into length extrapolation. Project page: https: //shenzhang2145. github. io/ledit/

PDF Details

NeurIPS Conference 2024 Conference Paper

UniMTS: Unified Pre-training for Motion Time Series

Xiyuan Zhang
Diyan Teng
Ranak R. Chowdhury
Shuheng Li
Dezhi Hong
Rajesh K. Gupta
Jingbo Shang

Motion time series collected from low-power, always-on mobile and wearable devices such as smartphones and smartwatches offer significant insights into human behavioral patterns, with wide applications in healthcare, automation, IoT, and AR/XR. However, given security and privacy concerns, building large-scale motion time series datasets remains difficult, hindering the development of pre-trained models for human activity analysis. Typically, existing models are trained and tested on the same dataset, leading to poor generalizability across variations in device location, device mounting orientation, and human activity type. In this paper, we introduce UniMTS, the first unified pre-training procedure for motion time series that generalizes across diverse device latent factors and activities. Specifically, we employ a contrastive learning framework that aligns motion time series with text descriptions enriched by large language models. This helps the model learn the semantics of time series to generalize across activities. Given the absence of large-scale motion time series data, we derive and synthesize time series from existing motion skeleton data with all-joint coverage. We use spatio-temporal graph networks to capture the relationships across joints for generalization across different device locations. We further design rotation-invariant augmentation to make the model agnostic to changes in device mounting orientations. Our model shows exceptional generalizability across 18 motion time series classification benchmark datasets, outperforming the best baselines by 340% in the zero-shot setting, 16. 3% in the few-shot setting, and 9. 2% in the full-shot setting.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Relation Inference among Sensor Time Series in Smart Buildings with Metric Learning

Shuheng Li
Dezhi Hong
Hongning Wang

Smart Building Technologies hold promise for better livability for residents and lower energy footprints. Yet, the rollout of these technologies, from demand response controls to fault detection and diagnosis, signiﬁcantly lags behind and is impeded by the current practice of manual identiﬁcation of sensing point relationships, e. g. , how equipment is connected or which sensors are co-located in the same space. This manual process is still error-prone, albeit costly and laborious. We study relation inference among sensor time series. Our key insight is that, as equipment is connected or sensors colocate in the same physical environment, they are affected by the same real-world events, e. g. , a fan turning on or a person entering the room, thus exhibiting correlated changes in their time series data. To this end, we develop a deep metric learning solution that ﬁrst converts the primitive sensor time series to the frequency domain, and then optimizes a representation of sensors that encodes their relations. Built upon the learned representation, our solution pinpoints the relationships among sensors via solving a combinatorial optimization problem. Extensive experiments on real-world buildings demonstrate the effectiveness of our solution.

PDF Details