Arrow Research search

Author name cluster

Ting Jiang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
2 author rows

Possible papers

9

AAAI Conference 2026 Conference Paper

FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models

  • Zishan Shao
  • Yixiao Wang
  • Qinsi Wang
  • Ting Jiang
  • Zhixu Du
  • Hancheng Ye
  • Danyang Zhuo
  • Yiran Chen

Singular Value Decomposition (SVD) has recently gained traction as an effective compression technique for large language models (LLMs), with many studies reporting 20-80% parameter reduction at minimal accuracy cost. However, despite reducing weight memory, existing SVD-based approaches still rely on standard dense CUDA kernels during inference, which incur substantial-and ultimately unnecessary-activation memory overhead. Our analysis reveals that this kernel-induced cost, which grows with sequence length and hidden size, in worst case prevents any real reduction in peak inference memory, limiting the practical impact of SVD compression for on-device deployment. To address this bottleneck, we propose FlashSVD, an end-to-end, rank-aware streaming inference framework for SVD-compressed LLMs. FlashSVD integrates seamlessly with any SVD-based model and directly fuses low-rank projection kernels into self-attention and feed-forward pipelines. This design avoids materializing large activation buffers by streaming small tiles of truncated factors through on-chip SRAM, performing on-the-fly multiplication and reduction, and immediately evicting results–thus preserving high GPU occupancy without introducing latency. On standard benchmarks (e.g., BERT-Base), FlashSVD reduces peak activation memory by up to 70.2% and transient memory by 75%, with zero accuracy loss against low-rank baselines, enabling truly memory-efficient deployment of low-rank LLMs.

JBHI Journal 2026 Journal Article

ULER: An Ultra-Lightweight Joint Attention Model for Emotion Recognition from Multimodal Physiological Signals

  • Haotian Liang
  • Geng-Xin Xu
  • Ting Jiang
  • Xichao Zhang
  • Dan Wu
  • Ye Li

Emotion recognition using multimodal physiological signals has gained widespread attention for providing objective insight into emotional states. However, existing models are often computationally complex and parameter heavy, hindering deployment on resource constrained wearable devices. To overcome this, we propose ULER (Ultra Lightweight Emotion Recognition), an efficient framework that integrates lightweight multi-scale convolution (LMSC) with a novel joint attention mechanism (LJA) based on a dynamic feature fusion network (DFFN) for effective multimodal fusion. Using subject dependent and total dataset training strategies, ULER is evaluated on three benchmarks (DEAP, DREAMER, WESAD) and outperforms recent state-of-the-art (SOTA) methods. It achieves accuracies of 99. 34%, 99. 46%, and 99. 23% on DEAP binary valence, binary arousal, and four class tasks, respectively, with only 0. 60 M parameters, 9. 33 M FLOPs, and 31. 10 ms inference latency. In a wearable oriented reduced channel setup (11 EEG channels), ULER also surpasses most SOTA models using standard 32 channels. Principal contributions emphasize: (1) a systematic comparison against recent SOTA methods on multiple datasets; (2) a wearable design with reduced-channel configuration; and (3) the DFFN as a core component for efficient multimodal fusion. This work demonstrates the potential for practical, real-time emotion monitoring in personal healthcare and other scenarios.

AAAI Conference 2025 Conference Paper

Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

  • Jinting Luo
  • Ru Li
  • Chengzhi Jiang
  • Xiaoming Zhang
  • Mingyan Han
  • Ting Jiang
  • Haoqiang Fan
  • Shuaicheng Liu

We propose Diff-Shadow, a global-guided diffusion model for high-quality shadow removal. Previous transformer-based approaches can utilize global information to relate shadow and non-shadow regions but are limited in their synthesis ability and recover images with obvious boundaries. In contrast, diffusion-based methods can generate better content but they are not exempt from issues related to inconsistent illumination. In this work, we combine the advantages of diffusion models and global guidance to realize shadow-free restoration. Specifically, we propose a parallel UNets architecture: 1) the local branch performs the patch-based noise estimation in the diffusion process, and 2) the global branch recovers the low-resolution shadow-free images. A Reweight Cross Attention (RCA) module is designed to integrate global contextual information of non-shadow regions into the local branch. We further design a Global-guided Sampling Strategy (GSS) that mitigates patch boundary issues and ensures consistent illumination across shaded and unshaded regions in the recovered image. Comprehensive experiments on three publicly standard datasets ISTD, ISTD+, and SRD have demonstrated the effectiveness of Diff-Shadow. Compared to state-of-the-art methods, our method achieves a significant improvement in terms of PSNR, increasing from 32.33dB to 33.69dB on the ISTD dataset.

NeurIPS Conference 2025 Conference Paper

Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models

  • Yixiao Wang
  • Zishan Shao
  • Ting Jiang
  • Aditya Devarakonda

We present a novel enhanced cyclic coordinate descent (ECCD) framework for solving generalized linear models with elastic net constraints that reduces training time in comparison to existing state-of-the-art methods. We redesign the CD method by performing a Taylor expansion around the current iterate to avoid nonlinear operations arising in the gradient computation. By introducing this approximation we are able to unroll the vector recurrences occurring in the CD method and reformulate the resulting computations into more efficient batched computations. We show empirically that the recurrence can be unrolled by a tunable integer parameter, $s$, such that $s > 1$ yields performance improvements without affecting convergence, whereas $s = 1$ yields the original CD method. A key advantage of ECCD is that it avoids the convergence delay and numerical instability exhibited by block coordinate descent. Finally, we implement our proposed method in C++ using Eigen to accelerate linear algebra computations. Comparison of our method against existing state-of-the-art solvers show consistent performance improvements of $3\times$ in average for regularization path variant on diverse benchmark datasets. Our implementation is available at [https: //github. com/Yixiao-Wang-Stats/ECCD](https: //github. com/Yixiao-Wang-Stats/ECCD).

AAAI Conference 2025 Conference Paper

Realistic Noise Synthesis with Diffusion Models

  • Qi Wu
  • Mingyan Han
  • Ting Jiang
  • Chengzhi Jiang
  • Jinting Luo
  • Man Jiang
  • Haoqiang Fan
  • Shuaicheng Liu

Deep denoising models require extensive real-world training data, which is challenging to acquire. Current noise synthesis techniques struggle to accurately model complex noise distributions. We propose a novel Realistic Noise Synthesis Diffusor (RNSD) method using diffusion models to address these challenges. By encoding camera settings into a time-aware camera-conditioned affine modulation (TCCAM), RNSD generates more realistic noise distributions under various camera conditions. Additionally, RNSD integrates a multi-scale content-aware module (MCAM), enabling the generation of structured noise with spatial correlations across multiple frequencies. We also introduce Deep Image Prior Sampling (DIPS), a learnable sampling sequence based on depth image prior, which significantly accelerates the sampling process while maintaining the high quality of synthesized noise. Extensive experiments demonstrate that our RNSD method significantly outperforms existing techniques in synthesizing realistic noise under multiple metrics and improving image denoising performance.

ICML Conference 2025 Conference Paper

SADA: Stability-guided Adaptive Diffusion Acceleration

  • Ting Jiang
  • Yixiao Wang
  • Hancheng Ye
  • Zishan Shao
  • Jingwei Sun 0002
  • Jingyang Zhang
  • Zekai Chen
  • Jianyi Zhang

Diffusion models have achieved remarkable success in generative tasks but suffer from high computational costs due to their iterative sampling process and quadratic-attention costs. Existing training-free acceleration strategies that reduce per-step computation cost, while effectively reducing sampling time, demonstrate low faithfulness compared to the original baseline. We hypothesize that this fidelity gap arises because (a) different prompts correspond to varying denoising trajectory, and (b) such methods do not consider the underlying ODE formulation and its numerical solution. In this paper, we propose Stability-guided Adaptive Diffusion Acceleration (SADA), a novel paradigm that unifies step-wise and token-wise sparsity decisions via a single stability criterion to accelerate sampling of ODE-based generative models (Diffusion and Flow-matching). For (a), SADA adaptively allocates sparsity based on the sampling trajectory. For (b), SADA introduces principled approximation schemes that leverage the precise gradient information from the numerical ODE solver. Comprehensive evaluations on SD-2, SDXL, and Flux using both EDM and DPM++ solvers reveal consistent $\ge 1. 8\times$ speedups with minimal fidelity degradation (LPIPS $\leq 0. 10$ and FID $\leq 4. 5$) compared to unmodified baselines, significantly outperforming prior methods. Moreover, SADA adapts seamlessly to other pipelines and modalities: It accelerates ControlNet without any modifications and speeds up MusicLDM by $1. 8\times$ with $\sim 0. 01$ spectrogram LPIPS. Our code is available at: https: //github. com/Ting-Justin-Jiang/sada-icml.

IS Journal 2023 Journal Article

Irregularly Sampled Multivariate Time Series Classification: A Graph Learning Approach

  • Zhen Wang
  • Ting Jiang
  • Zenghui Xu
  • Ji Zhang
  • Jianliang Gao

To date, graph-based learning methods are proven to be effective for modeling spatial and structural dependencies. However, when applied to IS-MTS, they encounter three major challenges due to the complex data characteristics of IS-MTS: 1) variable time intervals between observations; 2) asynchronous time points across dimensions; and 3) a lack of prior knowledge of connectivity structure for message propagation. To fill these gaps, we propose a multivariate temporal graph network to coherently capture structural interactions, learn temporal dependencies, and handle challenging characteristics of IS-MTS data. Specifically, we first build a multivariate interaction module to handle frequent missing values and extract the graph structure relation automatically. Second, we design a novel adjacent graph propagation mechanism to aggregate the neighbor information from multistep snapshots. Third, we construct a masked temporal-aware attention module to explicitly consider the timestamp context and interval irregularity. Based on an extensive experimental evaluation, we demonstrate the superior performance of the proposed method.

AAAI Conference 2021 Conference Paper

LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

  • Ting Jiang
  • Deqing Wang
  • Leilei Sun
  • Huayi Yang
  • Zhengyang Zhao
  • Fuzhen Zhuang

Extreme Multi-label text Classification (XMC) is a task of finding the most relevant labels from a large label set. Nowadays deep learning-based methods have shown significant success in XMC. However, the existing methods (e. g. , AttentionXML and X-Transformer etc) still suffer from 1) combining several models to train and predict for one dataset, and 2) sampling negative labels statically during the process of training label ranking model, which reduces both the efficiency and accuracy of the model. To address the above problems, we proposed LightXML, which adopts endto-end training and dynamic negative labels sampling. In LightXML, we use generative cooperative networks to recall and rank labels, in which label recalling part generates negative and positive labels, and label ranking part distinguishes positive labels from these labels. Through these networks, negative labels are sampled dynamically during label ranking part training by feeding with the same text representation. Extensive experiments show that LightXML outperforms state-of-the-art methods in five extreme multi-label datasets with much smaller model size and lower computational complexity. In particular, on the Amazon dataset with 670K labels, LightXML can reduce the model size up to 72% compared to AttentionXML. Our code is available at http: //github. com/kongds/LightXML.