Author name cluster

Kang Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

2 author rows

EAAI Journal 2026 Journal Article

Physics-guided data quality control and imputation for temperature-coupled bridge crack-width time series via a constrained temporal convolutional network

Congcong Fan
Youliang Ding
Kang Yang

Bridge structural health monitoring (SHM) generates monitoring time series with missing segments and anomalies, undermining damage diagnosis and safety warnings. Although thermo-mechanical coupling between temperature and crack width is widely recognized, it is rarely used as an explicit constraint in data quality control and imputation. This paper proposes a physics-guided constrained temporal convolutional network (PGC-TCN) for temperature-coupled bridge crack-width time series, integrating engineering knowledge into data screening and model training through anomaly screening, mask-guided reconstruction, and a negative-correlation constraint. Specifically, the framework combines anomaly-to-missing conversion, reliability weighting, and a negative-correlation constraint within a causal dilated temporal convolutional network (TCN). Using crack-width observations and synchronous temperature measurements as inputs, it first identifies abnormal observations and converts them into missing entries, then reconstructs missing or abnormal crack-width values and outputs uncertainty-aware prediction intervals. For in-service bridge records, an adaptive screening module generates masks and screening labels from daily/seasonal pattern profiling and transient-noise suppression. The TCN captures long-range dependencies and lagged temperature–crack interactions, while the negative-correlation constraint discourages physically implausible reconstructions and improves robustness under domain shifts. Experiments under random and contiguous missingness show PGC-TCN outperforms a multi-layer perceptron, a long short-term memory network, a Transformer, and a baseline temporal convolutional network. Under 50% random missingness across 20 trials, the coefficient of determination improves from 0. 939 to 0. 967, and the relative error decreases from 2. 399% to 1. 978%. Cross-project validation demonstrates transferability across bridges and environmental conditions, suggesting that physics-guided and reliability-aware deep learning can improve trustworthy time-series reconstruction for safety-critical infrastructure monitoring.

Details DOI

AAAI Conference 2026 Conference Paper

SLD-L2S: Hierarchical Subspace Latent Diffusion for High-Fidelity Lip to Speech Synthesis

Yifan Liang
Andong Li
Kang Yang
Guochen Yu
Fangkun Liu
Lingling Dai
Xiaodong Li
Chengshi Zheng

Although lip-to-speech synthesis (L2S) has achieved significant progress in recent years, current state-of-the-art methods typically rely on intermediate representations such as mel-spectrograms or discrete self-supervised learning (SSL) tokens. The potential of latent diffusion models (LDMs) in this task remains largely unexplored. In this paper, we introduce SLD-L2S, a novel L2S framework built upon a hierarchical subspace latent diffusion model. Our method aims to directly map visual lip movements to the continuous latent space of a pre-trained neural audio codec, thereby avoiding the information loss inherent in traditional intermediate representations. The core of our method is a hierarchical architecture that processes visual representations through multiple parallel subspaces, initiated by a subspace decomposition module. To efficiently enhance interactions within and between these subspaces, we design the diffusion convolution block (DiCB) as our network backbone. Furthermore, we employ a reparameterized flow matching technique to directly generate the target latent vectors. This enables a principled inclusion of speech language model (SLM) and semantic losses during training, moving beyond conventional flow matching objectives and improving synthesized speech quality. Our experiments show that SLD-L2S achieves state-of-the-art generation quality on multiple benchmark datasets, surpassing existing methods in both objective and subjective evaluations.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources

Jason Wu
Yuyang Yuan
Kang Yang
Lance Kaplan
Mani Srivastava

Multimodal deep learning systems are deployed in dynamic scenarios due to the robustness afforded by multiple sensing modalities. Nevertheless, they struggle with varying compute resource availability (due to multi-tenancy, device heterogeneity, etc. ) and fluctuating quality of inputs (from sensor feed corruption, environmental noise, etc. ). Statically provisioned multimodal systems cannot adapt when compute resources change over time, while existing dynamic networks struggle with strict compute budgets. Additionally, both systems often neglect the impact of variations in modality quality. Consequently, modalities suffering substantial corruption may needlessly consume resources better allocated towards other modalities. We propose ADMN, a layer-wise Adaptive Depth Multimodal Network capable of tackling both challenges - it adjusts the total number of active layers across all modalities to meet compute resource constraints, and continually reallocates layers across input modalities according to their modality quality. Our evaluations showcase ADMN can match the accuracy of state-of-the-art networks while reducing up to 75% of their floating-point operations.

PDF Details

EAAI Journal 2025 Journal Article

An optimisation approach guided by crack variation mechanism in the informer prediction model

Xujia Liu
Youliang Ding
Fei Xu
Yichao Xu
Kang Yang

Structural health monitoring (SHM) faces a fundamental challenge in reconciling predictive performance with physical interpretability for infrastructure diagnostics. Conventional deep learning (DL) approaches neglect essential mechanisms governing crack width variation—including thermal gradients, hysteretic responses, and phase-shifted correlations—limiting their reliability in real-world applications. To bridge this gap, we propose a mechanism-guided optimization (MGO) framework that integrates domain knowledge into the Informer architecture through physics-informed enhancements: auto-correlation modeling for capturing temperature-crack hysteresis, static gated fusion for multi-feature integration, and adaptive elastic net regularization for feature selection. Validated on cable-stayed bridge monitoring data, our framework achieves significant mean absolute error reductions (MAE) (5 %–60 %) and root mean square error reductions (RMSE) (10 %–55 %) versus baseline Informer across all cracks and prediction horizons, with diebold-mariano (DM) tests confirming statistical superiority in most cases. Crucially, it demonstrates superior precision relative to six state-of-the-art benchmarks across all evaluation scenarios. The ordinary least squares (OLS)-enhanced variant further delivers volatility reduction, while sensor failure tests establish quantifiable robustness benchmarks through MAE progression from 0. 013 mm to 0. 391 mm. This work establishes an interpretable, physics-grounded paradigm that explicitly links environmental drivers to structural degradation.

Details DOI

NeurIPS Conference 2025 Conference Paper

Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges

Pengrui Quan
Brian Wang
Kang Yang
Liying Han
Mani Srivastava

Spatiotemporal reasoning plays a key role in Cyber-Physical Systems (CPS). Despite advances in Large Language Models (LLMs) and Large Reasoning Models (LRMs), their capacity to reason about complex spatiotemporal signals remains underexplored. This paper proposes a hierarchical SpatioTemporal reAsoning benchmaRK, STARK, to systematically evaluate LLMs across three levels of reasoning complexity: state estimation (e. g. , predicting field variables, localizing and tracking events in space and time), spatiotemporal reasoning over states (e. g. , inferring spatial-temporal relationships), and world-knowledge-aware reasoning that integrates contextual and domain knowledge (e. g. , intent prediction, landmark-aware navigation). We curate 26 distinct spatiotemporal tasks with diverse sensor modalities, comprising 14, 552 challenges where models answer directly or by Python Code Interpreter. Evaluating 3 LRMs and 8 LLMs, we find LLMs achieve limited success in tasks requiring geometric reasoning (e. g. , multilateration or triangulation), particularly as complexity increases. Surprisingly, LRMs show robust performance across tasks with various levels of difficulty, often competing or surpassing traditional first-principle-based methods. Our results show that in reasoning tasks requiring world knowledge, the performance gap between LLMs and LRMs narrows, with some LLMs even surpassing LRMs. However, the LRM o3 model continues to achieve leading performance across all evaluated tasks, a result attributed primarily to the larger size of the reasoning models. STARK motivates future innovations in model architectures and reasoning paradigms for intelligent CPS by providing a structured framework to identify limitations in the spatiotemporal reasoning of LLMs and LRMs.

PDF Details

NeurIPS Conference 2025 Conference Paper

GSRF: Complex-Valued 3D Gaussian Splatting for Efficient Radio-Frequency Data Synthesis

Kang Yang
Gaofeng Dong
Sijie Ji
Wan Du
Mani Srivastava

Synthesizing radio-frequency (RF) data given the transmitter and receiver positions, e. g. , received signal strength indicator (RSSI), is critical for wireless networking and sensing applications, such as indoor localization. However, it remains challenging due to complex propagation interactions, including reflection, diffraction, and scattering. State-of-the-art neural radiance field (NeRF)-based methods achieve high-fidelity RF data synthesis but are limited by long training times and high inference latency. We introduce GSRF, a framework that extends 3D Gaussian Splatting (3DGS) from the optical domain to the RF domain, enabling efficient RF data synthesis. GSRF realizes this adaptation through three key innovations: First, it introduces complex-valued 3D Gaussians with a hybrid Fourier–Legendre basis to model directional and phase-dependent radiance. Second, it employs orthographic splatting for efficient ray–Gaussian intersection identification. Third, it incorporates a complex-valued ray tracing algorithm, executed on RF-customized CUDA kernels and grounded in wavefront propagation principles, to synthesize RF data in real time. Evaluated across various RF technologies, GSRF preserves high-fidelity RF data synthesis while achieving significant improvements in training efficiency, shorter training time, and reduced inference latency.

PDF Details

ICRA Conference 2025 Conference Paper

Is Discretization Fusion All You Need for Collaborative Perception?

Kang Yang
Tianci Bu
Lantao Li
Chunxu Li
Yongcai Wang
Deying Li 0001

Collaborative perception in multi-agent system enhances overall perceptual capabilities by facilitating the exchange of complementary information among agents. Current mainstream collaborative perception methods rely on discretized feature maps to conduct fusion, which however, lacks flexibility in extracting and transmitting the informative features and can hardly focus on the informative features during fusion. To address these problems, this paper proposes a novel Anchor-Centric paradigm for Collaborative Object detection (ACCO). It avoids grid precision issues and allows more flexible and efficient anchor-centric communication and fusion. ACCO is composed by three main components: (1) Anchor featuring block (AFB) that targets to generate anchor proposals and projects prepared anchor queries to image features. (2) Anchor confidence generator (ACG) is designed to minimize communication by selecting only the features in the confident anchors to transmit. (3) A local-global fusion module, in which local fusion is anchor alignment-based fusion (LAAF) and global fusion is conducted by spatial-aware cross-attention (SACA). LAAF and SACA run in multilayers, so agents conduct anchor-centric fusion iteratively to adjust the anchor proposals. Comprehensive experiments are conducted to evaluate ACCO on OPV2V and Dair-V2x datasets, which demonstrate ACCO's superiority in reducing the communication volume, and in improving the perception range and detection performances. Code can be found at: https://github.com/sidiangongyuan/ACCO.

Details

ICML Conference 2025 Conference Paper

ProDiff: Prototype-Guided Diffusion for Minimal Information Trajectory Imputation

Tianci Bu
Le Zhou
Wenchuan Yang
Jianhong Mou
Kang Yang
Suoyi Tan
Feng Yao
Jingyuan Wang

Trajectory data is crucial for various applications but often suffers from incompleteness due to device limitations and diverse collection scenarios. Existing imputation methods rely on sparse trajectory or travel information, such as velocity, to infer missing points. However, these approaches assume that sparse trajectories retain essential behavioral patterns, which place significant demands on data acquisition and overlook the potential of large-scale human trajectory embeddings. To address this, we propose ProDiff, a trajectory imputation framework that uses only two endpoints as minimal information. It integrates prototype learning to embed human movement patterns and a denoising diffusion probabilistic model for robust spatiotemporal reconstruction. Joint training with a tailored loss function ensures effective imputation. ProDiff outperforms state-of-the-art methods, improving accuracy by 6. 28% on FourSquare and 2. 52% on WuXi. Further analysis shows a 0. 927 correlation between generated and real trajectories, demonstrating the effectiveness of our approach.

Details

EAAI Journal 2025 Journal Article

TSD-DETR: A lightweight real-time detection transformer of traffic sign detection for long-range perception of autonomous driving

Lili Zhang
Kang Yang
Yucheng Han
Jing Li
Wei Wei
Hongxin Tan
Pei Yu
Ke Zhang

The key to accurate perception and efficient decision making of autonomous driving is the long-range detection of traffic signs. Long-range detection of traffic signs has the problems of small traffic sign size and complex background. In order to solve these problems, this paper proposes a lightweight model for traffic sign detection based on real-time detection transformer (TSD-DETR). Firstly, the feature extraction module is constructed using multiple types of convolutional modules. The model extracts multi-scale features of different levels to enhance feature extraction ability. Then, small object detection module and detection head are designed to extract and detect shallow features. It can improve the detection of small traffic signs. Finally, Efficient Multi-Scale Attention is introduced to adjust the channel weights. It aggregates the output features of three parallel branches interactively. TSD-DETR achieves a mean average precision (mAp) of 96. 8% on Tsinghua-Tencent 100K dataset. It is improved by 2. 5% compared with real-time detection transformer. In small object detection, mAp improved by 9%. TSD-DETR achieves 99. 4% mAp on the Changsha University of Science and Technology Chinese Traffic Sign Detection Benchmark dataset, with an improvement of 0. 6%. The experimental results show that TSD-DETR reduces the number of parameters by 9. 06M by optimizing the model structure. On the premise of ensuring the real-time performance of the model, the detection accuracy of the model is improved greatly. The results of ablation experiments show that the feature extraction module and small object detection module proposed in this paper are conducive to improving the detection accuracy.

Details DOI

IROS Conference 2025 Conference Paper

Unsupervised Anomaly Detection Improves Imitation Learning for Autonomous Racing

Yuang Geng
Yang Zhou
Yuyang Zhang
Zhongzheng Ren Zhang
Kang Yang
Tyler Ruble
Giancarlo Vidal
Ivan Ruchkin

Imitation Learning (IL) has shown significant promise in autonomous driving, but its performance heavily depends on the quality of training data. Noisy or corrupted sensor inputs can degrade learned policies, leading to unsafe behavior. This paper presents an unsupervised anomaly detection approach to automatically filter out abnormal images from driving datasets, thereby enhancing IL performance. Our method leverages a Convolutional Autoencoder with a novel latent reference loss, which forces abnormal images to reconstruct with higher errors than normal images. This enables effective anomaly detection without requiring manually labeled data. We validate our approach on the realistic DonkeyCar autonomous racing platform, demonstrating that filtering videos significantly improves IL policies, as measured by a 25-40% reduction in cross-track error. Compared to baseline and ablation models, our method achieves superior anomaly detection across three real-world video corruptions: collision-based occlusions, transparent obstructions, and raindrop interference. The results highlight the effectiveness of unsupervised video anomaly detection in improving the robustness and performance of IL-based autonomous control. Video: https://youtu.be/RjJ3nZR6RQ

Details

EAAI Journal 2024 Journal Article

Multi-head sequence tagging model for Grammatical Error Correction

Kamal Al-Sabahi
Kang Yang
Wangwang Liu
Guanyu Jiang
Xian Li
Ming Yang

To solve the Grammatical Error Correction (GEC) problem, a mapping between a source sequence and a target one is needed, where the two differ only on few spans. For this reason, the attention has been shifted to the non-autoregressive or sequence tagging models. In which, the GEC has been simplified from Seq2Seq to labeling the input tokens with edit commands chosen from a large edit space. Due to this large number of classes and the limitation of the available datasets, the current sequence tagging approaches still have some issues handling a broad range of grammatical errors just by being laser-focused on one single task. To this end, we simplified the GEC further by dividing it into seven related subtasks: Insertion, Deletion, Merge, Substitution, Transformation, Detection, and Correction, with Correction being our primary focus. A distinct classification head is dedicated to each of these subtasks. The novel multi-head and multi-task learning model is proposed to effectively utilize training data and harness the information from related task training signals. To mitigate the limited number of available training samples, a new denoising autoencoder is used to generate a new synthetic dataset to be used for pretraining. Additionally, a new character-level transformation is proposed to enhance the sequence-to-edit function and improve the model’s vocabulary coverage. Our single/ensemble model achieves an F0. 5 of 74. 4/77. 0, and 68. 6/69. 1 on BEA-19 (test) and CoNLL-14 (test) respectively. Moreover, evaluated on JFLEG test set, the GLEU scores are 61. 6 and 61. 7 for the single and ensemble models, respectively. It mostly outperforms recently published state-of-the-art results by a considerable margin.

Details DOI

NeurIPS Conference 2024 Conference Paper

Nimbus: Secure and Efficient Two-Party Inference for Transformers

Zhengyi Li
Kang Yang
Jin Tan
Wen-jie Lu
Haoqi Wu
Xiao Wang
Yu Yu
Derun Zhao

Transformer models have gained significant attention due to their power in machine learning tasks. Their extensive deployment has raised concerns about the potential leakage of sensitive information during inference. However, when being applied to Transformers, existing approaches based on secure two-party computation (2PC) bring about efficiency limitations in two folds: (1) resource-intensive matrix multiplications in linear layers, and (2) complex non-linear activation functions like $\mathsf{GELU}$ and $\mathsf{Softmax}$. This work presents a new two-party inference framework $\mathsf{Nimbus}$ for Transformer models. Specifically, we propose a new 2PC paradigm to securely compute matrix multiplications based on an outer-product insight, which achieves $2. 9\times \sim 12. 5\times$ performance improvements compared to the state-of-the-art (SOTA) protocol. Furthermore, through a new observation of utilizing the input distribution, we propose an approach of low-degree polynomial approximation for $\mathsf{GELU}$ and $\mathsf{Softmax}$, which improves the performance of the SOTA polynomial approximation by $2. 9\times \sim 4. 0\times$, where the average accuracy loss of our approach is 0. 08\% compared to the non-2PC inference without privacy. Compared with the SOTA two-party inference, $\mathsf{Nimbus}$ improves the end-to-end performance of $BERT_{base}$ inference by $2. 7\times \sim 4. 7\times$ across different network settings.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud Analysis

Hongyu Sun
Qiuhong Ke
Yongcai Wang
Wang Chen
Kang Yang
Deying Li
Jianfei Cai

This paper investigates the 3D domain generalization (3DDG) ability of large 3D models based on prevalent prompt learning. Recent works demonstrate the performances of 3D point cloud recognition can be boosted remarkably by parameter-efficient prompt tuning. However, we observe that the improvement on downstream tasks comes at the expense of a severe drop in 3D domain generalization. To resolve this challenge, we present a comprehensive regulation framework that allows the learnable prompts to actively interact with the well-learned general knowledge in large 3D models to maintain good generalization. Specifically, the proposed framework imposes multiple explicit constraints on the prompt learning trajectory by maximizing the mutual agreement between task-specific predictions and task-agnostic knowledge. We design the regulation framework as a plug-and-play module to embed into existing representative large 3D models. Surprisingly, our method not only realizes consistently increasing generalization ability but also enhances task-specific 3D recognition performances across various 3DDG benchmarks by a clear margin. Considering the lack of study and evaluation on 3DDG, we also create three new benchmarks, namely base-to-new, cross-dataset and few-shot generalization benchmarks, to enrich the field and inspire future research. Code and benchmarks are available at \url{https: //github. com/auniquesun/Point-PRC}.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

A Universal PINNs Method for Solving Partial Differential Equations with a Point Source

Xiang Huang
Hongsheng Liu
Beiji Shi
Zidong Wang
Kang Yang
Yang Li
Min Wang
Haotian Chu

In recent years, deep learning technology has been used to solve partial differential equations (PDEs), among which the physics-informed neural networks (PINNs)method emerges to be a promising method for solving both forward and inverse PDE problems. PDEs with a point source that is expressed as a Dirac delta function in the governing equations are mathematical models of many physical processes. However, they cannot be solved directly by conventional PINNs method due to the singularity brought by the Dirac delta function. In this paper, we propose a universal solution to tackle this problem by proposing three novel techniques. Firstly the Dirac delta function is modeled as a continuous probability density function to eliminate the singularity at the point source; secondly a lower bound constrained uncertainty weighting algorithm is proposed to balance the physics-informed loss terms of point source area and the remaining areas; and thirdly a multi-scale deep neural network with periodic activation function is used to improve the accuracy and convergence speed. We evaluate the proposed method with three representative PDEs, and the experimental results show that our method outperforms existing deep learning based methods with respect to the accuracy, the efficiency and the versatility.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Meta-Auto-Decoder for Solving Parametric Partial Differential Equations

Xiang Huang
Zhanhong Ye
Hongsheng Liu
Shi Ji
Zidong Wang
Kang Yang
Yang Li
Min Wang

Many important problems in science and engineering require solving the so-called parametric partial differential equations (PDEs), i. e. , PDEs with different physical parameters, boundary conditions, shapes of computation domains, etc. Recently, building learning-based numerical solvers for parametric PDEs has become an emerging new field. One category of methods such as the Deep Galerkin Method (DGM) and Physics-Informed Neural Networks (PINNs) aim to approximate the solution of the PDEs. They are typically unsupervised and mesh-free, but require going through the time-consuming network training process from scratch for each set of parameters of the PDE. Another category of methods such as Fourier Neural Operator (FNO) and Deep Operator Network (DeepONet) try to approximate the solution mapping directly. Being fast with only one forward inference for each PDE parameter without retraining, they often require a large corpus of paired input-output observations drawn from numerical simulations, and most of them need a predefined mesh as well. In this paper, we propose Meta-Auto-Decoder (MAD), a mesh-free and unsupervised deep learning method that enables the pre-trained model to be quickly adapted to equation instances by implicitly encoding (possibly heterogenous) PDE parameters as latent vectors. The proposed method MAD can be interpreted by manifold learning in infinite-dimensional spaces, granting it a geometric insight. Extensive numerical experiments show that the MAD method exhibits faster convergence speed without losing accuracy than other deep learning-based methods.

PDF Details

IJCAI Conference 2022 Conference Paper

MetaFinger: Fingerprinting the Deep Neural Networks with Meta-training

Kang Yang
Run Wang
Lina Wang

As deep neural networks (DNNs) play a critical role in various fields, the models themselves hence are becoming an important asset that needs to be protected. To achieve this, various neural network fingerprint methods have been proposed. However, existing fingerprint methods fingerprint the decision boundary by adversarial examples, which is not robust to model modification and adversarial defenses. To fill this gap, we propose a robust fingerprint method MetaFinger, which fingerprints the inner decision area of the model by meta-training, rather than the decision boundary. Specifically, we first generate many shadow models with DNN augmentation as meta-data. Then we optimize some images by meta-training to ensure that only models derived from the protected model can recognize them. To demonstrate the robustness of our fingerprint approach, we evaluate our method against two types of attacks including input modification and model modification. Experiments show that our method achieves 99. 34% and 97. 69% query accuracy on average, surpassing existing methods over 30%, 25% on CIFAR-10 and Tiny-ImageNet, respectively. Our code is available at https: //github. com/kangyangWHU/MetaFinger.

PDF Details DOI

AAAI Conference 2019 Conference Paper

Multi-Precision Quantized Neural Networks via Encoding Decomposition of {-1,+1}

Qigong Sun
Fanhua Shang
Kang Yang
Xiufang Li
Yan Ren
Licheng Jiao

The training of deep neural networks (DNNs) requires intensive resources both for computation and for storage performance. Thus, DNNs cannot be efficiently applied to mobile phones and embedded devices, which seriously limits their applicability in industry applications. To address this issue, we propose a novel encoding scheme of using {−1, +1} to decompose quantized neural networks (QNNs) into multibranch binary networks, which can be efficiently implemented by bitwise operations (xnor and bitcount) to achieve model compression, computational acceleration and resource saving. Based on our method, users can easily achieve different encoding precisions arbitrarily according to their requirements and hardware resources. The proposed mechanism is very suitable for the use of FPGA and ASIC in terms of data storage and computation, which provides a feasible idea for smart chips. We validate the effectiveness of our method on both large-scale image classification tasks (e. g. , ImageNet) and object detection tasks. In particular, our method with lowbit encoding can still achieve almost the same performance as its full-precision counterparts.

PDF Details