Arrow Research search

Author name cluster

Yu Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

54 papers
2 author rows

Possible papers

54

EAAI Journal 2026 Journal Article

A semantic segmentation model for early-stage fire detection from aerial remote sensing

  • Zhe Liu
  • Yu Sun
  • Xiangyuan Jiang
  • Pei Duan
  • Ming Li

For forest fire disasters threatening to the ecological environment and human life safety, current research focuses solely on detecting either flame or smoke. This often leads to missed detection or false detection. In this paper, we propose a semantic segmentation model that aims to accurately segment flame and smoke simultaneously. A Compact Atrous Spatial Pyramid Pooling module is developed with the objective of capturing multi-scale contextual information efficiently, addressing the significant scale disparities between flame and smoke. Additionally, a Bottom-up Detail-informed Feature Fusion Module is proposed, which leverages shallow features to guide cross-layer feature fusion, thereby enhancing the detection accuracy of small targets. Lastly, a Foreground Emphasis Module is proposed to mitigate the issue of foreground sparsity that commonly exists in remote sensing images of early forest fires. This module utilizes foreground classification results to guide segmentation, making the model focus more on the identification of foreground. Experimental results suggest that our method markedly surpasses other methods in early-stage fire scenarios and achieves accurate disaster area segmentation in various scenarios such as urban fires. In addition, a processing speed of 41. 83 frames per second is attainable on TITAN Xp devices, which fully demonstrates its excellent segmentation performance and efficient real-time processing capability.

EAAI Journal 2026 Journal Article

An interpretable civil case judgment prediction method based on logical reasoning and knowledge enhancement

  • Shibo Cui
  • Yu Sun
  • Ning Wang
  • Wenguang Yan

Civil case judgment prediction attempts to help judicial decision-making by artificial intelligence methods, and its accuracy directly affects judicial efficiency and social fairness perception. Current mainstream methods typically rely heavily on the case’s surface aspects while ignoring the underlying legal reasoning, and they are limited by the black-box model’s lack of interpretability, which is insufficient to cope with multi-class disputes and confusing cases. To address these issues, we propose an interpretable judgment prediction method that combines logical reasoning and knowledge enhancement, performing multi-task prediction in civil cases. First, a self-critique chain of thought reasoning module is created to extract the logical relationships implied by the case facts via the large language model, and discrete prompt alignment is used to improve the logical interpretability of judgment prediction. Second, the knowledge enhancement module is designed to inject external legal knowledge while adaptively filtering noise. Furthermore, a multi-task expert recommendation process is presented to pick the appropriate expert sub-model via a dynamic gating network, thereby enhancing the discrimination capability of multi-category disputes. Experiments with real datasets demonstrate that the method outperforms existing methods in terms of accuracy and interpretability. Experiments on a dataset of confusing cases demonstrate the method’s superior generalization ability in complex scenarios.

JBHI Journal 2026 Journal Article

Delay-Aware Cross-Modal Knowledge Distillation for Driver Vigilance Estimation: Toward Practical Edge Deployment

  • Yu Sun
  • Shiwu Li
  • Tongtong Jin
  • Yiming Bie
  • Mengzhu Guo
  • Minghao Fu
  • Xin Huang

Efficient vigilance estimation in driving scenarios requires a balance between model performance and practicality. Electroencephalography (EEG), which can directly reflect brain activity, is widely used for vigilance estimation, but its acquisition process is complicated and difficult to apply to real-world driving. In contrast, physiological signals such as electrooculogram, electrodermal activity, and photoplethysmography have more advantages for practical deployment, but the information they provide is relatively limited. To address the above issues, we propose a delay-aware cross-modal knowledge distillation method. EEG signals are only used to train the teacher model. Then, an information-theoretic criterion based on mutual information and response delay is employed to determine which physiological signals are suitable as student modality for knowledge distillation from the EEG-based teacher model. On this basis, considering the inherent temporal differences caused by different physiological signals with varying sensitivities to cognitive responses, a delay-aware soft alignment mechanism (DASA) is proposed, which handles the temporal misalignment of different physiological signals and captures the asynchronous dynamics of the EEG and other physiological signals through the introduction of learnable delay and spread parameters at the patch level, to achieve soft, temporally-aligned supervision from the teacher to the student model. Finally, an objective function incorporating cross-modal consistency, patch level alignment, and smooth regularization is designed to support the effective training of the proposed cross-modal knowledge distillation method. Extensive experiments on MMV and SEED-VIG datasets validates that the proposed method outperforms existing methods in terms of estimation accuracy and temporal alignment while maintaining the real-time performance required for edge deployment.

JBHI Journal 2026 Journal Article

UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions

  • Alisher Myrgyyassov
  • Zhen Song
  • Yu Sun
  • Bruce Xiao Wang
  • Min Ney Wong
  • Yongping Zheng

Ultrasound tongue imaging (UTI) provides a non-invasive, cost-effective modality for investigating speech articulation, speech motor control, and speech-related disorders. However, real-time tongue contour segmentation remains a significant challenge due to the inherently low signal-to-noise ratio, variability in imaging conditions, and computational demands of real-time performance. In this study, we proposed UltraUNet, a lightweight and efficient encoder-decoder architecture specifically optimized for real-time segmentation of tongue contours in ultrasound images. UltraUNet introduces several domain-informed innovations, including lightweight Squeeze-and-Excitation blocks for channel-wise feature recalibration in deeper layers, Group Normalization for enhanced stability in small-batch training, and summation-based skip connections to minimize memory and computational overhead. These architectural refinements enabled UltraUNet to achieve a high segmentation accuracy while maintaining an exceptional processing speed of 250 frames per second, making it suitable for real-time clinical workflows. UltraUNet integrates ultrasound-specific augmentation techniques, including denoising and blur simulation using point spread function. Additionally, we annotated UTI images from 8 different datasets with various imaging conditions. Comprehensive evaluations demonstrated the model's robustness and precision, with superior segmentation metrics on single-dataset testing (Dice = 0. 855, MSD = 0. 993px) compared to established architectures. Furthermore, cross-dataset testing on 7 unseen datasets with 1 train dataset revealed UltraUNet's generalization capabilities and high accuracy, achieving average Dice Scores of 0. 734 and 0. 761, respectively, in Experiments 1 and 2. The proposed framework offers a competitive solution for time-critical applications in speech research, speech motor disorder analysis, and clinical diagnostics, with real-time performance in tongue functional analysis in diverse medical and research settings.

AAAI Conference 2026 Conference Paper

Zo3T: Zero-Shot 3D-Aware Trajectory-Guided Image-to-Video Generation via Test-Time Training

  • Ruicheng Zhang
  • Jun Zhou
  • Zunnan Xu
  • Zihao Liu
  • Jiehui Huang
  • Mingyang Zhang
  • Yu Sun
  • Xiu Li

Trajectory-Guided image-to-video (I2V) generation aims to synthesize videos that adhere to user-specified motion instructions. Existing methods typically rely on computationally expensive fine-tuning on scarce annotated datasets. Although some zero-shot methods attempt to trajectory control in the latent space, they may yield unrealistic motion by neglecting 3D perspective and creating a misalignment between the manipulated latents and the network's noise predictions. To address these challenges, we introduce Zo3T, a novel zero-shot test-time-training framework for trajectory-guided generation with three core innovations: First, we incorporate a 3D-Aware Kinematic Projection, leveraging inferring scene depth to derive perspective-correct affine transformations for target regions. Second, we introduce Trajectory-Guided Test-Time LoRA, a mechanism that dynamically injects and optimizes ephemeral LoRA adapters into the denoising network alongside the latent state. Driven by a regional feature consistency loss, this co-adaptation effectively enforces motion constraints while allowing the pre-trained model to locally adapt its internal representations to the manipulated latent, thereby ensuring generative fidelity and on-manifold adherence. Finally, we develop Guidance Field Rectification, which refines the denoising evolutionary path by optimizing the conditional guidance field through a one-step lookahead strategy, ensuring efficient generative progression towards the target trajectory. Zo3T significantly enhances 3D realism and motion accuracy in trajectory-controlled I2V generation, demonstrating superior performance over existing training-based and zero-shot approaches.

JBHI Journal 2025 Journal Article

A Real-Time Contact-Free Atrial Fibrillation Detection System for Mobile Devices

  • Chih-Wei Tseng
  • Bing-Fei Wu
  • Yu Sun

As the global population ages, the death and prevalence of atrial fibrillation (AF) continue to rise, posing significant concerns due to its strong association with stroke-related disabilities. Detecting AF early before a stroke occurs has become paramount. However, existing methods face challenges in achieving quick, easy, and affordable detection in complex environments characterized by motion interference and varying light conditions. To address these challenges, we propose a system that is employable for edge computing devices like smartphones, tablets, or laptops. Meanwhile, to ensure that the dataset reflects real-world scenarios, we collect 7, 216 30-second segments from 452 subjects, categorized into Atrial Fibrillation (AF), Normal Sinus Rhythm (NSR), and Other Arrhythmias (Others), with a subject ratio of 105: 116: 231. Our lightweight non-contact facial rPPG atrial fibrillation detection system utilizes a Convolution Neural Network (CNN) with a large receptive field and a bidirectional spatial mapping augmented attention module (BiSME-ATT) coupled with a bidirectional feature pyramid network layer (BiFPN), optimized for deployment on mobile devices by reducing model parameters and floating-point operations per second (FLOPs). Our approach significantly improves AF detection accuracy, sensitivity, specificity, positive predictive value, and negative predictive value to 94. 39%, 91. 57%, 95. 44%, 88. 06%, and 96. 93%, respectively, in AF vs. Non-AF scenarios. Furthermore, the results demonstrate notable enhancements in AF detection across various motion and light intensity levels.

YNICL Journal 2025 Journal Article

Abnormal structural covariance network in major depressive disorder: Evidence from the REST-meta-MDD project

  • Changmin Chen
  • Yuhan Liu
  • Yu Sun
  • Wenhao Jiang
  • Yonggui Yuan
  • Zhao Qing

BACKGROUND: Major depressive disorder (MDD) is a common mental illness associated with brain morphological abnormalities. Although extensive studies have examined gray matter volume (GMV) changes in MDD, inconsistencies persist in reported findings. In the current study, we employed source-based morphometry (SBM) and structural covariance network (SCN) analyses to a large multi-center sample from the REST-meta-MDD database, aiming to characterize robust results of structural abnormalities in MDD. METHODS: We analyzed 798 MDD patients and 974 healthy controls (HCs) from the REST-meta-MDD consortium. Voxel-based morphometry was applied to generate GMV maps. SBM was used to adaptively parcellate brain into different components, and SCN was constructed based on SBM components. Volume scores in each component and SCNs between the components were both compared between MDD and HC groups, as well as between first-episode drug-naive (FEDN) and recurrent MDD subgroups. RESULTS: SBM identified 20 stable components. Three components encompassing the middle temporal gyrus, middle orbitofrontal gyrus and superior frontal gyrus exhibited volumetric differences between the MDD and HC groups. Volume differences were observed in the cingulate cortex and medial frontal gyrus between the FEDN and recurrent groups. SCN analysis revealed 9 aberrant pairs in MDD vs. HCs, and 7 pairs in FEDN vs. recurrent groups. All aberrant component pairs in the SCN implicated the prefrontal cortex. CONCLUSIONS: These findings demonstrated brain structural deficits in MDD, and highlighted the prefrontal cortex as a central hub of SCN alterations. Our findings advance the understanding of MDD's neural mechanisms and suggest directions for diagnostic research.

TMLR Journal 2025 Journal Article

Batch Training for Streaming Time Series: A Transferable Augmentation Framework to Combat Distribution Shifts

  • Weiyang Zhang
  • Xinyang Chen
  • Yu Sun
  • Weili Guan
  • Liqiang Nie

Multivariate time series forecasting, which predicts future dynamics by analyzing historical data, has become an essential tool in modern data analysis. With the development of deep models, batch-training based time series forecasting has made significant progress. However, in real-world applications, time series data is often collected incrementally in a streaming manner, with only a portion of the data available at each time step. As time progresses, distribution shifts in the data can occur, leading to a drastic decline in model performance. To address this challenge, online test-time adaptation and online time series forecasting have emerged as a promising solution. However, for the former, most online test-time adaptation methods are primarily designed for images and do not consider the specific characteristics of time series. As for the latter, online time series forecasting typically relies on updating the model with each newly collected sample individually, which may be problematic when the sample deviates significantly from the historical data distribution and contains noise, which may lead to a worse generalization performance. In this paper, we propose Batch Training with Transferable Online Augmentation (BTOA), which enhances model performance through three key ideas while enabling batch training. First, to fully leverage historical information, Transferable Historical Sample Selection (THSS) is proposed with theoretical guarantees to select historical samples that are most similar to the test-time distribution. Then, to mitigate the negative impact of distribution shifts through batch training and take advantage of the unique characteristics of time series, Transferable Online Augmentation (TOA) is proposed to augment the selected historical samples from the perspective of amplitude and phase in the frequency domain in a two-stream manner. Finally, a prediction module that utilizes a series decomposition module and a two-stream forecaster is employed to extract the complex patterns in time series, boosting the prediction performance. Moreover, BTOA is a general approach that is readily pluggable into any existing batch-training based deep models.Comprehensive experiments under both ideal and practice experimental settings demonstrate that the proposed method exhibits superior performance across all seven benchmark datasets. Compared to state-of-the-art approaches, our method reduces the Mean Squared Error (MSE) by up to 13.7\%.

JBHI Journal 2025 Journal Article

Explaining E/MEG Source Imaging and Beyond: An Updated Review

  • Zhao Feng
  • Ioannis Kakkos
  • George K. Matsopoulos
  • Cuntai Guan
  • Yu Sun

E/MEG source imaging (ESI) provides non-invasive measurements of brain activity with high spatial and temporal resolution. In particular, the wearability and portability of EEG make it an attractive area of research beyond the biomedical communities, especially given the broad application prospects including brain-computer interface (BCI), neuromarketing and neuroergonomics. Although existing reviews offer valuable insights, they often present ESI models in a relatively isolated manner and may not encompass the most recent advancements in the field. In this work, we aim to: 1) provide a timely in-depth review of the widely-explored and state-of-the-art ESI models, including their underlying neurophysiological assumptions and mathematical derivations; 2) list the primary applications of ESI and highlight crucial steps regarding its implementations; 3) discuss current challenges in ESI and propose future research prospects; 4) demonstrate practical usage and implementation details of various representative ESI models. As a rapidly expanding field, ESI is continuously developing and evolving to integrate new technologies. We believe the widespread applications of ESI is happening, and it will dramatically expand our understanding of brain dynamics.

JBHI Journal 2025 Journal Article

FBCPM: A Filter Bank Connectome-Based Predictive Modeling Framework for EEG Signals

  • Linze Qian
  • Sujie Wang
  • Ioannis Kakkos
  • Xiaoyu Li
  • Xinyi Xu
  • Mengru Xu
  • George K. Matsopoulos
  • Yi Sun

The human brain connectome has long been recognized as a crucial component for various cognitive functions. While connectome-based predictive modeling (CPM) has been extensively explored for predicting behavior outcomes at the individual-level, its application to electroencephalogram (EEG) remains limited due to the inherent diversity and complexity of EEG frequency information. In the present work, we aim to address this issue by developing a filter bank CPM (FBCPM) framework that leverages narrowband EEG functional connectivity (FC) for individual prediction. Four independent datasets comprising 280 healthy subjects with 392 EEG recordings during the psychomotor vigilance test (PVT), were adopted here. Using the discovery dataset (i. e. , Dataset 1) with 137 recordings, the feasibility of FBCPM was evaluated via predicting mean reaction time (RT) measures within a 15-min PVT task. The results showed that FBCPM framework achieved notable prediction accuracy and outperformed four benchmark approaches. Subsequent comprehensive internal and external validation analyses further affirmed its robustness across various hyper-parameters and generalizability to another three independent datasets (i. e. , Dataset 2 to Dataset 4) with divergent recording or preprocessing settings. Moreover, the FBCPM framework exhibited satisfactory performance when generalized to time-on-task (TOT) effect measures (i. e. , $\mathit {\Delta RT}$ and $\mathit {TOT_{slope}}$ ). Further investigation of contributing features to mean RT prediction indicated the remarkable predictive ability of negative features, manifesting as a pattern of low-frequency (below 8 Hz) predominance and complex topological distributions. Overall, these findings indicated that FBCPM provided a significant methodological advance in EEG-based individual prediction approaches, moving a step forward towards practical application in cognitive neuroscience.

AAAI Conference 2025 Conference Paper

Graph Structure Learning for Spatial-Temporal Imputation: Adapting to Node and Feature Scales

  • Xinyu Yang
  • Yu Sun
  • Xinyang Chen
  • Ying Zhang
  • Xiaojie Yuan

Spatial-temporal data collected across different geographic locations often suffer from missing values, posing challenges to data analysis. Existing methods primarily leverage fixed spatial graphs to impute missing values, which implicitly assume that the spatial relationship is roughly the same for all features across different locations. However, they may overlook the different spatial relationships of diverse features recorded by sensors in different locations. To address this, we introduce the multi-scale Graph Structure Learning framework for spatial-temporal Imputation (GSLI) that dynamically adapts to the heterogeneous spatial correlations. Our framework encompasses node-scale graph structure learning to cater to the distinct global spatial correlations of different features, and feature-scale graph structure learning to unveil common spatial correlation across features within all stations. Integrated with prominence modeling, our framework emphasizes nodes and features with greater significance in the imputation process. Furthermore, GSLI incorporates cross-feature and cross-temporal representation learning to capture spatial-temporal dependencies. Evaluated on six real incomplete spatial-temporal datasets, GSLI showcases the improvement in data imputation and downstream applications.

YNIMG Journal 2025 Journal Article

Identifying individuals with high susceptibility to mental fatigue: A functional connectivity study

  • Lingyun Gao
  • Mengru Xu
  • Linze Qian
  • Rui Zhang
  • Mingming Chen
  • Yeting Hu
  • Chuantao Li
  • Yu Sun

Substantial inter-individual differences in behavioral performance were repeatedly revealed during prolonged time-on-task (TOT), indicating complex neural mechanisms underlying mental fatigue. In this work, we provide a comprehensive investigation to identify individuals with high susceptibility to mental fatigue and to reveal its influence on brain network reorganization. Specifically, behavioral data and EEG signals were collected from 95 participants when they performed a 20-min psychomotor vigilance task (PVT). A composite index ( F i n d e x ) was introduced, based upon which the participants were categorized into the fatigue-susceptible (FS, corresponding to top third F i n d e x value) and the fatigue-resistant (FR, corresponding to bottom third F i n d e x value) groups ( N F S / N F R = 30/30). Functional connectivity was then estimated and set as input for the following analyses. As expect, significant impairment of behavioral performance was showed in the FS group, while the performance of the FR group remained relatively stable. Subsequent brain network analyses showed frequency-dependent reorganizations in both groups, whereas the FR group exhibited greater stability and higher integrity than the FS group. Further classification analyses revealed satisfactory accuracy for FS identification (95. 61%) and the prominent centro-parietal distribution of contributing nodal features. In sum, this study provides further evidence to support the notion of substantial individual differences in fatigue susceptibility and provides a practical approach to identify the individuals whose performance is particularly prone to performance decline.

TMLR Journal 2025 Journal Article

Long Short-Term Imputer: Handling Consecutive Missing Values in Time Series

  • Jiacheng You
  • Xinyang Chen
  • Yu Sun
  • Weili Guan
  • Liqiang Nie

Encountered frequently in time series data, missing values can significantly impede time-series analysis. With the progression of deep learning, advanced imputation models delve into the temporal dependencies inherent in time series data, showcasing remarkable performance. This positions them as intuitive selections for time series imputation tasks which assume ``Miss Completely at Random''. Nonetheless, long-interval consecutive missing values may obstruct the model's ability to grasp long-term temporal dependencies, consequently hampering the efficacy of imputation performance. To tackle this challenge, we propose Long Short-term Imputer (LSTI) to impute consecutive missing values with different length of intervals. Long-term Imputer is designed using the idea of bi-directional autoregression. A forward prediction model and a backward prediction model are trained with a consistency regularization, which is designed to capture long-time dependency and can adapt to long-interval consecutive missing values. Short-term Imputer is designed to capture short-time dependency and can impute the short-interval consecutive missing values effectively. A meta-weighting network is then proposed to take advantage of the strengths of two imputers. As a result, LSTI can impute consecutive missing values with different intervals effectively. Experiments demonstrate that our approach, on average, reduces the error by 57.4% compared to state-of-the-art deep models across five datasets.

NeurIPS Conference 2025 Conference Paper

LoRO: Real-Time on-Device Secure Inference for LLMs via TEE-Based Low Rank Obfuscation

  • Gaojian Xiong
  • Yu Sun
  • Jianhua Liu
  • Jian Cui
  • Jianwei Liu

While Large Language Models (LLMs) have gained remarkable success, they are consistently at risk of being stolen when deployed on untrusted edge devices. As a solution, TEE-based secure inference has been proposed to protect valuable model property. However, we identify a statistical vulnerability in existing protection methods, and furtherly compromise their security guarantees by proposed Model Stealing Attack with Prior. To eliminate this vulnerability, LoRO is presented in this paper, which leverages dense mask to completely obfuscate parameters. LoRO includes two innovations: (1) Low Rank Mask, which uses low-rank factors to generate dense masks efficiently. The computing complexity in TEE is hence reduced by an exponential amount to achieve inference speed up, while providing robust model confidentiality. (2) Factors Multiplexing, which reuses several cornerstone factors to generate masks for all layers. Compared to one-mask-per-layer, the secure memory requirement is reduced from GB-level to tens of MB, hence avoiding the hundred-fold latency introduced by secure memory paging. Experimental results indicate that LoRO achieve a $0. 94\times$ Model Stealing (MS) accuracy, while SOTA methods presents $3. 37\times$ at least. The averaged inference latency of LoRO is only $1. 49\times$, compared to the $112\times$ of TEE-shielded inference. Moreover, LoRO results no accuracy loss, and requires no re-training and structure modification. LoRO can solve the concerns regarding model thefts on edge devices in an efficient and secure manner, facilitating the wide edge application of LLMs.

ICLR Conference 2025 Conference Paper

MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

  • Yekun Chai
  • Haoran Sun
  • Huang Fang
  • Shuohuan Wang
  • Yu Sun
  • Hua Wu 0003

Reinforcement learning from human feedback (RLHF) has demonstrated effectiveness in aligning large language models (LLMs) with human preferences. However, token-level RLHF suffers from the credit assignment problem over long sequences, where delayed rewards make it challenging for the model to discern which actions contributed to preferred outcomes. This hinders learning efficiency and slows convergence.In this paper, we propose MA-RLHF, a simple yet effective RLHF framework that incorporates macro actions --- sequences of tokens or higher-level language constructs --- into the learning process. By operating at higher level of abstraction, our approach reduces the temporal distance between actions and rewards, facilitating faster and more accurate credit assignment. This results in more stable policy gradient estimates and enhances learning efficiency within each episode, all without increasing computational complexity during training or inference. We validate our approach through extensive experiments across various model sizes and tasks, including text summarization, dialogue generation, question answering, and program synthesis. Our method achieves substantial performance improvements over standard RLHF, with performance gains of up to 30\% in text summarization and code generation, 18\% in dialogue, and 8\% in question answering tasks. Notably, our approach reaches parity with vanilla RLHF $1.7 \sim 2$ times faster in terms of training time and continues to outperform it with further training. We make our code and data publicly available at \url{https://github.com/ernie-research/MA-RLHF}.

NeurIPS Conference 2025 Conference Paper

Meta Guidance: Incorporating Inductive Biases into Deep Time Series Imputers

  • Jiacheng You
  • Xinyang Chen
  • Yu Sun
  • Weili Guan
  • Liqiang Nie

Missing values, frequently encountered in time series data, can significantly impair the effectiveness of analytical methods. While deep imputation models have emerged as the predominant approach due to their superior performance, explicitly incorporating inductive biases aligned with time-series characteristics offers substantial improvement potential. Taking advantage of non-stationarity and periodicity in time series, two domain-specific inductive biases are designed: (1) Non-Stationary Guidance, which operationalizes the proximity principle to address highly non-stationary series by emphasizing temporal neighbors, and (2) Periodic Guidance, which exploits periodicity patterns through learnable weight allocation across historical periods. Building upon these complementary mechanisms, the overall module, named Meta Guidance, dynamically fuses both guidances through data-adaptive weights learned from the specific input sample. Experiments on nine benchmark datasets demonstrate that integrating Meta Guidance into existing deep imputation architectures achieves an average 27. 39\% reduction in imputation error compared to state-of-the-art baselines.

TMLR Journal 2025 Journal Article

Metamorphic Forward Adaptation Network: Dynamically Adaptive and Modular Multi-layer Learning

  • Yu Sun
  • Vijja Wichitwechkarn
  • Ronald Clark
  • Mirko Kovac
  • Basaran Bahadir Kocer

Back-propagation is a widely used algorithm for training neural networks by adjusting weights based on error gradients. However, back-propagation is biologically implausible with global derivative computation and lacks robustness in long-term dynamic learning. A previously proposed alternative to back-propagation is the Forward-Forward algorithm, which bypasses global gradient dependency and localises computations, making it a more biologically plausible approach. However, Forward-Forward has been evaluated in limited environments, does not yet match back-propagation's performance, and only supports classification, not regression. This research introduces the Metamorphic Forward Adaptation Network (MFAN), using a contrastive learning property as its core, and retaining the layer-wise architecture of the Forward-Forward algorithm. Compared to the Forward-Forward model being limited to discrete classification, MFAN can process discrete and continuous data, showing stability, adaptability, and the ability to handle evolving data. MFAN performs well in continuous data stream scenarios, demonstrating superior adaptability and robustness compared to back-propagation, particularly in tasks requiring dynamic, long-term learning.

NeurIPS Conference 2025 Conference Paper

PolyGuard: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset

  • Mintong Kang
  • Zhaorun Chen
  • Chejian Xu
  • Jiawei Zhang
  • Chengquan Guo
  • Minzhou Pan
  • Ivan Revilla
  • Yu Sun

As large language models (LLMs) become widespread across diverse applications, concerns about the security and safety of LLM interactions have intensified. Numerous guardrail models and benchmarks have been developed to ensure LLM content safety. However, existing guardrail benchmarks are often built upon ad hoc risk taxonomies that lack a principled grounding in standardized safety policies, limiting their alignment with real-world operational requirements. Moreover, they tend to overlook domain-specific risks, while the same risk category can carry different implications across different domains. To bridge these gaps, we introduce PolyGuard, the first massive multi-domain safety policy-grounded guardrail dataset. PolyGuard offers: (1) broad domain coverage across eight safety-critical domains, such as finance, law, and codeGen; (2) policy-grounded risk construction based on authentic, domain-specific safety guidelines; (3) diverse interaction formats, encompassing declarative statements, questions, instructions, and multi-turn conversations; (4) advanced benign data curation via detoxification prompting to challenge over-refusal behaviors; and (5) \textbf{attack-enhanced instances} that simulate adversarial inputs designed to bypass guardrails. Based on PolyGuard, we benchmark 19 advanced guardrail models and uncover a series of findings, such as: (1) All models achieve varied F1 scores, with many demonstrating high variance across risk categories, highlighting their limited domain coverage and insufficient handling of domain-specific safety concerns; (2) As models evolve, their coverage of safety risks broadens, but performance on common risk categories may decrease; (3) All models remain vulnerable to optimized adversarial attacks. The policy-grounded \dataset establishes the first principled and comprehensive guardrail benchmark. We believe that \dataset and the unique insights derived from our evaluations will advance the development of policy-aligned and resilient guardrail systems.

JMLR Journal 2025 Journal Article

Test-Time Training on Video Streams

  • Renhao Wang
  • Yu Sun
  • Arnuv Tandon
  • Yossi Gandelsman
  • Xinlei Chen
  • Alexei A. Efros
  • Xiaolong Wang

Prior work has established Test-Time Training (TTT) as a general framework to further improve a trained model at test time. Before making a prediction on each test instance, the model is first trained on the same instance using a self-supervised task such as reconstruction. We extend TTT to the streaming setting, where multiple test instances - video frames in our case - arrive in temporal order. Our extension is online TTT: The current model is initialized from the previous model, then trained on the current frame and a small window of frames immediately before. Online TTT significantly outperforms the fixed-model baseline for four tasks, on three real-world datasets. The improvements are more than 2.2x and 1.5x for instance and panoptic segmentation. Surprisingly, online TTT also outperforms its offline variant that accesses strictly more information, training on all frames from the entire test video regardless of temporal order. This finding challenges those in prior work using synthetic videos. We formalize a notion of locality as the advantage of online over offline TTT, and analyze its role with ablations and a theory based on bias-variance trade-off. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

YNIMG Journal 2025 Journal Article

Transcranial photobiomodulation improves functional brain networks and working memory in healthy older adults: An fNIRS study

  • Qin Yang
  • Xiujuan Qu
  • Can Sheng
  • Xing Zhao
  • Guanqun Chen
  • Xiaoni Wang
  • Yuxia Li
  • Wenying Du

BACKGROUND: Transcranial photobiomodulation (tPBM), as a novel non-invasive neurostimulation technique, has shown the compelling potential for improving cognitive function in aging population. However, the potential mechanism remains unclear. Neuroimaging studies have found that tPBM-induced physiological changes exist in both targeted and non-targeted brain areas, suggesting the necessity of understanding the modulation mechanism from the perspective of the whole brain level. OBJECTIVE: This randomized, single-blind, sham-controlled crossover study aimed to investigate the hypothesis that tPBM improved working memory in healthy older adults through the mechanism of optimizing the properties of the resting-state functional brain networks. METHODS: A total of 55 right-handed healthy older adults were randomly assigned to sham tPBM session group or active tPBM session group. After a washout interval, they were assigned to the opposite intervention session. Each session included the following: active or sham tPBM application with a 1064-nm laser to the left forehead; before and after, resting-state functional near-infrared spectroscopy (fNIRS) measurements; and the digital n-back task. Differences in accuracy and reaction time of the n-back task, and changes in functional connectivity and graph metrics of the brain networks were investigated and compared between the active and sham tPBM sessions. In addition, correlations between tPBM-induced changes in functional brain networks, and the n-back task were examined. RESULTS: The results showed that compared with the sham tPBM session, the accuracy and reaction time during 3-back task significantly improved in the active tPBM session. In addition, the global efficiency, local efficiency, nodal efficiency, and functional connectivity significantly increased in the active tPBM session, particularly in the frontoparietal areas. Importantly, the altered 3-back accuracy was positively correlated with the changes of functional connectivity and nodal efficiency mainly in left prefrontal cortex in those who had increased 3-back accuracy in the active tPBM session. CONCLUSION: This study suggests that tPBM may serve as an effective tool to improve working memory in older adults through the modulation of resting-state functional brain network properties. Investigations in large-scale samples are needed to further validate the findings of this study.

JBHI Journal 2025 Journal Article

Video Object Segmentation with Optimal Frame Auto-selection Based on Prior Knowledge for Midbrain Assessment in Transcranial Ultrasound

  • Xinyi Wang
  • Sai Kit LAM
  • Hongyu KANG
  • Yu Sun
  • Chao HOU
  • Shuai Li
  • Xin Sun
  • Fangxian LI

Transcranial sonography (TCS) provides a non-invasive means of assessing movement disorders such as Parkinson's disease (PD). However, current TCS-based evaluations rely heavily on manual operation by experienced physicians, making the process time-consuming and physician-dependent. For the first time, we aimed to develop a hybrid pipeline for real-time video object segmentation (VOS) and automatic optimal frame selection. Eighty-three standardized TCS real-time data comprising 1, 992 midbrain frames from Beijing Tiantan Hospital were collected. We adopted three state-of-the-art VOS models (STCN, RDE-VOS, and XMEM) and incorporated anatomical priors to guide optimal frame selection. Specifically, we leveraged the anatomical trend of midbrain morphology to estimate the midbrain radius at the optimal frame and selected the frame where the VOS-segmented midbrain best matched this estimate. The XMEM-based pipeline achieved high segmentation performance (Jaccard: 0. 85, Boundary Accuracy: 0. 95, Dice: 0. 92) and optimal frame selection (Distance: 4. 87; Jaccard: 0. 92), with efficiency (51. 05 FPS, 0. 56 s/patient, 661. 55 MB). Subgroup analyses confirmed robustness across image quality and PD conditions. Assessment of a junior physician's selection suggests potential to reduce the expertise gap in optimal frame selection. The proposed hybrid pipeline offers an automated tool for midbrain assessment using TCS, which may help reduce physicians' workload and minimize subjectivity, particularly supporting junior physicians in mitigating the expertise-demanding nature of TCS. This approach may serve as a foundation for more promising TCS-based assessments in the future, contributing to broader adoption of non-invasive ultrasound techniques in PD evaluation.

NeurIPS Conference 2025 Conference Paper

Whitened Score Diffusion: A Structured Prior for Imaging Inverse Problems

  • Jeffrey Alido
  • Tongyu Li
  • Yu Sun
  • Lei Tian

Conventional score-based diffusion models (DMs) may struggle with anisotropic Gaussian diffusion processes due to the required inversion of covariance matrices in the denoising score matching training objective \cite{vincent_connection_2011}. We propose Whitened Score (WS) diffusion models, a novel framework based on stochastic differential equations that learns the Whitened Score function instead of the standard score. This approach circumvents covariance inversion, extending score-based DMs by enabling stable training of DMs on arbitrary Gaussian forward noising processes. WS DMs establish equivalence with flow matching for arbitrary Gaussian noise, allow for tailored spectral inductive biases, and provide strong Bayesian priors for imaging inverse problems with structured noise. We experiment with a variety of computational imaging tasks using the CIFAR, CelebA ($64\times64$), and CelebA-HQ ($256\times256$) datasets and demonstrate that WS diffusion priors trained on anisotropic Gaussian noising processes consistently outperform conventional diffusion priors based on isotropic Gaussian noise.

EAAI Journal 2024 Journal Article

A lane-changing trajectory re-planning method considering conflicting traffic scenarios

  • Haifeng Du
  • Yu Sun
  • Yongjun Pan
  • Zhixiong Li
  • Patrick Siarry

An essential aspect of intelligent driving systems is the automatic lane-changing function. However, in real-world traffic situations, the initially planned lane-changing trajectory can become hazardous due to the intricate and unpredictable nature of human driving behavior. Based on the assumption that vehicles have risks during lane-changing, an integrated methodology is proposed to assess the hazards associated with road conditions in real-time and to quickly adjust the predetermined vehicle trajectory, if deemed necessary, to mitigate the risks of conflicting lane changes. Vehicles are encouraged to adhere to lane changing behavior by adjusting their trajectory, aiming to enhance traffic efficiency. Instead of immediately abandoning lane changing, vehicles should strategically assess the situation before making decisions. Initially, an analysis of variables influencing re-planning is conducted, determining the circumstances conducive to maintaining lane-changing behavior. Subsequently, a trajectory re-planning module is introduced, facilitated by two neural network data-fitting models, allowing real-time performance. Finally, a series of numerical experiments confirm that the devised method effectively guides autonomous driving through quick and secure lane change re-planning in high-risk traffic environments. The proposed novel approach extends the capacity to target traffic flow gaps and dynamically re-plan lane switching motivations, ensuring the vehicle can persist in lane-changing rather than reverting to the original lane.

JBHI Journal 2024 Journal Article

Contact-Free Atrial Fibrillation Screening With Attention Network

  • Yi-Chiao Wu
  • Chun-Hsien Lin
  • Li-Wen Chiu
  • Bing-Fei Wu
  • Meng-Liang Chung
  • Sung-Chun Tang
  • Yu Sun

Atrial Fibrillation (AF) screening from face videos has become popular with the trend of telemedicine and telehealth in recent years. In this study, the largest facial image database for camera-based AF detection is proposed. There are 657 participants from two clinical sites and each of them is recorded for about 10 minutes of video data, which can be further processed as over 10 000 segments around 30 seconds, where the duration setting is referred to the guideline of AF diagnosis. It is also worth noting that, 2 979 segments are segment-wise labeled, that is, every rhythm is independently labeled with AF or not. Besides, all labels are confirmed by the cardiologist manually. Various environments, talking, facial expressions, and head movements are involved in data collection, which meets the situations in practical usage. Specific to camera-based AF screening, a novel CNN-based architecture equipped with an attention mechanism is proposed. It is capable of fusing heartbeat consistency, heart rate variability derived from remote photoplethysmography, and motion features simultaneously to reliable outputs. With the proposed model, the performance of intra-database evaluation comes up to 96. 62% of sensitivity, 90. 61% of specificity, and 0. 96 of AUC. Furthermore, to check the capability of adaptation of the proposed method thoroughly, the cross-database evaluation is also conducted, and the performance also reaches about 90% on average with the AUCs being over 0. 94 in both clinical sites.

NeurIPS Conference 2024 Conference Paper

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

  • Yilong Chen
  • Linhao Zhang
  • Junyuan Shang
  • Zhenyu Zhang
  • Tingwen Liu
  • Shuohuan Wang
  • Yu Sun

Large language models (LLMs) with billions of parameters demonstrate impressive performance. However, the widely used Multi-Head Attention (MHA) in LLMs incurs substantial computational and memory costs during inference. While some efforts have optimized attention mechanisms by pruning heads or sharing parameters among heads, these methods often lead to performance degradation or necessitate substantial continued pre-training costs to restore performance. Based on the analysis of attention redundancy, we design a Decoupled-Head Attention (DHA) mechanism. DHA adaptively configures group sharing for key heads and value heads across various layers, achieving a better balance between performance and efficiency. Inspired by the observation of clustering similar heads, we propose to progressively transform the MHA checkpoint into the DHA model through linear fusion of similar head parameters step by step, retaining the parametric knowledge of the MHA checkpoint. We construct DHA models by transforming various scales of MHA checkpoints given target head budgets. Our experiments show that DHA remarkably requires a mere 0. 25\% of the original model's pre-training budgets to achieve 96. 1\% of performance while saving 75\% of KV cache. Compared to Group-Query Attention (GQA), DHA achieves a 5$\times$ training acceleration, a maximum of 13. 93\% performance improvement under 0. 01\% pre-training budget, and 5\% relative improvement under 0. 05\% pre-training budget.

NeurIPS Conference 2024 Conference Paper

Frequency-aware Generative Models for Multivariate Time Series Imputation

  • Xinyu Yang
  • Yu Sun
  • Xiaojie Yuan
  • Xinyang Chen

Missing data in multivariate time series are common issues that can affect the analysis and downstream applications. Although multivariate time series data generally consist of the trend, seasonal and residual terms, existing works mainly focus on optimizing the modeling for the first two items. However, we find that the residual term is more crucial for getting accurate fillings, since it is more related to the diverse changes of data and the biggest component of imputation errors. Therefore, in this study, we introduce frequency-domain information and design Frequency-aware Generative Models for Multivariate Time Series Imputation (FGTI). Specifically, FGTI employs a high-frequency filter to boost the residual term imputation, supplemented by a dominant-frequency filter for the trend and seasonal imputation. Cross-domain representation learning module then fuses frequency-domain insights with deep representations. Experiments over various datasets with real-world missing values show that FGTI achieves superiority in both data imputation and downstream applications.

ICML Conference 2024 Conference Paper

High-Order Contrastive Learning with Fine-grained Comparative Levels for Sparse Ordinal Tensor Completion

  • Yu Dai
  • Junchen Shen
  • Zijie Zhai
  • Danlin Liu
  • Jingyang Chen
  • Yu Sun
  • Ping Li
  • Jie Zhang

Contrastive learning is a powerful paradigm for representation learning with prominent success in computer vision and NLP, but how to extend its success to high-dimensional tensors remains a challenge. This is because tensor data often exhibit high-order mode-interactions that are hard to profile and with negative samples growing combinatorially faster than second-order contrastive learning; furthermore, many real-world tensors have ordinal entries that necessitate more delicate comparative levels. To solve the challenge, we propose High-Order Contrastive Tensor Completion (HOCTC), an innovative network to extend contrastive learning to sparse ordinal tensor data. HOCTC employs a novel attention-based strategy with query-expansion to capture high-order mode interactions even in case of very limited tokens, which transcends beyond second-order learning scenarios. Besides, it extends two-level comparisons (positive-vs-negative) to fine-grained contrast-levels using ordinal tensor entries as a natural guidance. Efficient sampling scheme is proposed to enforce such delicate comparative structures, generating comprehensive self-supervised signals for high-order representation learning. Extensive experiments show that HOCTC has promising results in sparse tensor completion in traffic/recommender applications.

EAAI Journal 2024 Journal Article

Identification of product definition patterns in mass customization by multi-information fusion weighted support vector machine

  • Ruoda Wang
  • Yu Sun
  • Jun Ni
  • Han Zheng

In mass customization, companies have built product families to enhance design efficiency and meet customer requirements. However, the complex and diverse customer requirements make the traditional process of mapping customer needs to product families challenging and heavily reliant on prior knowledge. To address this challenge, the mapping task is treated as a classification problem, with customer requirements as classification features and product families as category labels. Based on information theory, this study considers the information gain (IG) and mutual information (MI) between the classification features and the labels. The uncertainty relationship between the two is explored using grey relational analysis (GRA). A hybrid weighting matrix is constructed by combining the effects of these three aspects, which is then used to improve the calculation of the classical support vector machine (CSVM) kernel function, forming a multi-information fusion weighted support vector machine (MIFWSVM) model. This model can take new requirements as input and output product variants that may satisfy the customer. To demonstrate the effectiveness of the proposed method, a case study of a mechanical press company was reported, comparing the MIFWSVM model with classical classifiers and exploring the impact of different weighting methods on the performance of CSVM. The MIFWSVM model achieved an average accuracy of 0. 9205 with a standard deviation of 0. 0506 and a macro F1 score of 0. 9032 with a standard deviation of 0. 0589, outperforming other methods. These results indicate that the MIFWSVM model significantly improves the accuracy and stability of customer demand mapping.

NeurIPS Conference 2024 Conference Paper

Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play Priors

  • Zihui Wu
  • Yu Sun
  • Yifan Chen
  • Bingliang Zhang
  • Yisong Yue
  • Katherine L. Bouman

Diffusion models (DMs) have recently shown outstanding capabilities in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior defined within the Bayesian framework. To harness the generative power of DMs while avoiding such approximations, we propose a Markov chain Monte Carlo algorithm that performs posterior sampling for general inverse problems by reducing it to sampling the posterior of a Gaussian denoising problem. Crucially, we leverage a general DM formulation as a unified interface that allows for rigorously solving the denoising problem with a range of state-of-the-art DMs. We demonstrate the effectiveness of the proposed method on six inverse problems (three linear and three nonlinear), including a real-world black hole imaging problem. Experimental results indicate that our proposed method offers more accurate reconstructions and posterior estimation compared to existing DM-based imaging inverse methods.

JBHI Journal 2023 Journal Article

Individualized Prediction of Task Performance Decline Using Pre-Task Resting-State Functional Connectivity

  • Peng Qi
  • Xiaobing Zhang
  • Ioannis Kakkos
  • Kuijun Wu
  • Sujie Wang
  • Jingjia Yuan
  • Lingyun Gao
  • George K. Matsopoulos

As a common complaint in contemporary society, mental fatigue is a key element in the deterioration of the daily activities known as time-on-task (TOT) effect, making the prediction of fatigue-related performance decline exceedingly important. However, conventional group-level brain-behavioral correlation analysis has the limitation of generalizability to unseen individuals and fatigue prediction at individual-level is challenging due to the significant differences between individuals both in task performance efficiency and brain activities. Here, we introduced a cross-validated data-driven analysis framework to explore, for the first time, the feasibility of utilizing pre-task idiosyncratic resting-state functional connectivity (FC) on the prediction of fatigue-related task performance degradation at individual level. Specifically, two behavioral metrics, namely $\Delta$ RT (between the most vigilant and fatigued states) and $TOT_{slope}$ over the course of the 15-min sustained attention task, were estimated among three sessions from 37 healthy subjects to represent fatigue-related individual behavioral impairment. Then, a connectome-based prediction model was employed on pre-task resting-state FC features, identifying the network-related differences that contributed to the prediction of performance deterioration. As expected, prominent populational TOT-related performance declines were revealed across three sessions accompanied with substantial inter-individual differences. More importantly, we achieved significantly high accuracies for individualized prediction of both TOT-related behavioral impairment metrics using pre-task neuroimaging features. Despite the distinct patterns between both behavioral metrics, the identified top FC features contributing to the individualized predictions were mainly resided within/between frontal, temporal and parietal areas. Overall, our results of individualized prediction framework extended conventional correlation/classification analysis and may represent a promising avenue for the development of applicable techniques that allow precaution of the TOT-related performance declines in real-world scenarios.

EAAI Journal 2023 Journal Article

Joint task offloading and resource allocation for multi-user and multi-server MEC networks: A deep reinforcement learning approach with multi-branch architecture

  • Yu Sun
  • Qijie He

Mobile Edge Computing (MEC) is a promising computing paradigm in the context of 5G networks, as it enables the migration of workloads from User Equipments (UEs) to nearby MEC servers, thereby providing additional computing resources to UEs. In this paper, we propose a joint optimization approach to offloading decisions and resource allocation in a multi-user and multi-server MEC system, which operates in a time-varying environment. Our objective is to minimize the average task latency and discard rate under the constraints of latency and limited computing resources of the server. While traditional optimization methods have been used to solve computational offloading problems in static environments, these methods are not suitable for time-varying systems. Deep reinforcement learning can be an effective method for solving optimization problems in time-varying environments, as it can be used to adjust strategies in real-time in response to changes in the environment. However, the increased number of UEs in the system leads to a combinatorial increase in the number of possible actions, making it difficult for the algorithm to learn. To address this issue, we propose a multi-branch network based Deep Q Network (DQN) algorithm called Branch Deep Q Network (BDQN), which modifies the action generation network into a multi-branch network structure, each branch generates one-dimensional action. This modification makes the number of network outputs increase linearly. Numerical results show that the BDQN algorithm outperforms other baseline algorithms in terms of average task latency and discard rate.

JBHI Journal 2023 Journal Article

Motion-Robust Atrial Fibrillation Detection Based on Remote-Photoplethysmography

  • Bing-Fei Wu
  • Bing-Jhang Wu
  • Shao-En Cheng
  • Yu Sun
  • Meng-Liang Chung

Atrial fibrillation (AF) has been proven highly correlated to stroke; more than 43 million people suffer from AF worldwide. However, most of these patients are unaware of their disease. There is no convenient tool by which to conduct a comprehensive screening to identify asymptomatic AF patients. Hence, we provide a non-contact AF detection approach based on remote photoplethysmography (rPPG). We address motion disturbance, the most challenging issue in rPPG technology, with the NR-Net, ATT-Net, and SQ-Mask modules. NR-Net is designed to eliminate motion noise with a CNN model, and ATT-Net and SQ-Mask utilize channel-wise and temporal attention to reduce the influence of poor signal segments. Moreover, we present an AF dataset collected from hospital wards which contains 452 subjects (mean age, 69. 3 $\pm$ 13. 0 years; women, 46%) and 7, 306 30-second segments to verify the proposed algorithm. To our best knowledge, this dataset has the most participants and covers the full age range of possible AF patients. The proposed method yields accuracy, sensitivity, and specificity of 95. 69%, 96. 76%, and 94. 33%, respectively, when discriminating AF from normal sinus rhythm. More than previous studies, other arrhythmias are also taken into consideration, leading to a further investigation of AF vs. Non-AF and AF vs. Other scenarios. For the three scenarios, the proposed approach outperforms the benchmark algorithms. Additionally, the accuracy of the slight motion data improves to 95. 82%, 92. 39%, and 89. 18% for the three scenarios, respectively, while that of full motion data increases by over 3%.

AAAI Conference 2023 Conference Paper

Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose Estimation

  • Han Li
  • Bowen Shi
  • Wenrui Dai
  • Hongwei Zheng
  • Botao Wang
  • Yu Sun
  • Min Guo
  • Chenglin Li

There has been a recent surge of interest in introducing transformers to 3D human pose estimation (HPE) due to their powerful capabilities in modeling long-term dependencies. However, existing transformer-based methods treat body joints as equally important inputs and ignore the prior knowledge of human skeleton topology in the self-attention mechanism. To tackle this issue, in this paper, we propose a Pose-Oriented Transformer (POT) with uncertainty guided refinement for 3D HPE. Specifically, we first develop novel pose-oriented self-attention mechanism and distance-related position embedding for POT to explicitly exploit the human skeleton topology. The pose-oriented self-attention mechanism explicitly models the topological interactions between body joints, whereas the distance-related position embedding encodes the distance of joints to the root joint to distinguish groups of joints with different difficulties in regression. Furthermore, we present an Uncertainty-Guided Refinement Network (UGRN) to refine pose predictions from POT, especially for the difficult joints, by considering the estimated uncertainty of each joint with uncertainty-guided sampling strategy and self-attention mechanism. Extensive experiments demonstrate that our method significantly outperforms the state-of-the-art methods with reduced model parameters on 3D HPE benchmarks such as Human3.6M and MPI-INF-3DHP.

YNIMG Journal 2023 Journal Article

μ-STAR: A novel framework for spatio-temporal M/EEG source imaging optimized by microstates

  • Zhao Feng
  • Sujie Wang
  • Linze Qian
  • Mengru Xu
  • Kuijun Wu
  • Ioannis Kakkos
  • Cuntai Guan
  • Yu Sun

Source imaging of Electroencephalography (EEG) and Magnetoencephalography (MEG) provides a noninvasive way of monitoring brain activities with high spatial and temporal resolution. In order to address this highly ill-posed problem, conventional source imaging models adopted spatio-temporal constraints that assume spatial stability of the source activities, neglecting the transient characteristics of M/EEG. In this work, a novel source imaging method μ-STAR that includes a microstate analysis and a spatio-temporal Bayesian model was introduced to address this problem. Specifically, the microstate analysis was applied to achieve automatic determination of time window length with quasi-stable source activity pattern for optimal reconstruction of source dynamics. Then a user-specific spatial prior and data-driven temporal basis functions were utilized to characterize the spatio-temporal information of sources within each state. The solution of the source reconstruction was obtained through a computationally efficient algorithm based upon variational Bayesian and convex analysis. The performance of the μ-STAR was first assessed through numerical simulations, where we found that the determination and inclusion of optimal temporal length in the spatio-temporal prior significantly improved the performance of source reconstruction. More importantly, the μ-STAR model achieved robust performance under various settings (i.e., source numbers/areas, SNR levels, and source depth) with fast convergence speed compared with five widely-used benchmark models (including wMNE, STV, SBL, BESTIES, & SI-STBF). Additional validations on real data were then performed on two publicly-available datasets (including block-design face-processing ERP and continuous resting-state EEG). The reconstructed source activities exhibited spatial and temporal neurophysiologically plausible results consistent with previously-revealed neural substrates, thereby further proving the feasibility of the μ-STAR model for source imaging in various applications.

JBHI Journal 2022 Journal Article

Inferring the Individual Psychopathologic Deficits With Structural Connectivity in a Longitudinal Cohort of Schizophrenia

  • Yi Sun
  • Zhe Zhang
  • Ioannis Kakkos
  • George K. Matsopoulos
  • Jingjia Yuan
  • John Suckling
  • Luoyi Xu
  • Shuxia Cao

The prediction of schizophrenia-related psychopathologic deficits is exceedingly important in the fields of psychiatry and clinical practice. However, objective association of the brain structure alterations to the illness clinical symptoms is challenging. Although, schizophrenia has been characterized as a brain dysconnectivity syndrome, evidence accounting for neuroanatomical network alterations remain scarce. Moreover, the absence of generalized connectome biomarkers for the assessment of illness progression further perplexes the prediction of long-term symptom severity. In this paper, a combination of individualized prediction models with quantitative graph theoretical analysis was adopted, providing a comprehensive appreciation of the extent to which the brain network properties are affected over time in schizophrenia. Specifically, Connectome-based Prediction Models were employed on Structural Connectivity (SC) features, efficiently capturing individual network-related differences, while identifying the anatomical connectivity disturbances contributing to the prediction of psychopathological deficits. Our results demonstrated distinctions among widespread cortical circuits responsible for different domains of symptoms, indicating the complex neural mechanisms underlying schizophrenia. Furthermore, the generated models were able to significantly predict changes of symptoms using SC features at follow-up, while the preserved SC features suggested an association with improved positive and overall symptoms. Moreover, cross-sectional significant deficits were observed in network efficiency and a progressive aberration of global integration in patients compared to healthy controls, representing a group-consensus pathological map, while supporting the dysconnectivity hypothesis.

IJCAI Conference 2022 Conference Paper

Simple and Effective Relation-based Embedding Propagation for Knowledge Representation Learning

  • Huijuan Wang
  • Siming Dai
  • Weiyue Su
  • Hui Zhong
  • Zeyang Fang
  • Zhengjie Huang
  • Shikun Feng
  • Zeyu Chen

Relational graph neural networks have garnered particular attention to encode graph context in knowledge graphs (KGs). Although they achieved competitive performance on small KGs, how to efficiently and effectively utilize graph context for large KGs remains an open problem. To this end, we propose the Relation-based Embedding Propagation (REP) method. It is a post-processing technique to adapt pre-trained KG embeddings with graph context. As relations in KGs are directional, we model the incoming head context and the outgoing tail context separately. Accordingly, we design relational context functions with no external parameters. Besides, we use averaging to aggregate context information, making REP more computation-efficient. We theoretically prove that such designs can avoid information distortion during propagation. Extensive experiments also demonstrate that REP has significant scalability while improving or maintaining prediction quality. Particularly, it averagely brings about 10% relative improvement to triplet-based embedding methods on OGBL-WikiKG2 and takes 5%-83% time to achieve comparable results as the state-of-the-art GC-OTE.

NeurIPS Conference 2022 Conference Paper

Test-Time Training with Masked Autoencoders

  • Yossi Gandelsman
  • Yu Sun
  • Xinlei Chen
  • Alexei Efros

Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision. In this paper, we use masked autoencoders for this one-sample learning problem. Empirically, our simple method improves generalization on many visual benchmarks for distribution shifts. Theoretically, we characterize this improvement in terms of the bias-variance trade-off.

JBHI Journal 2021 Journal Article

EEG Fingerprints of Task-Independent Mental Workload Discrimination

  • Ioannis Kakkos
  • Georgios N. Dimitrakopoulos
  • Yi Sun
  • Jingjia Yuan
  • George K. Matsopoulos
  • Anastasios Bezerianos
  • Yu Sun

In the nascent field of neuroergonomics, mental workload assessment is one of the most important issues and has an apparent significance in real-world applications. Although prior research has achieved efficient single-task classification, scatted studies on cross-task mental workload assessment usually result in unsatisfactory performance. Here, we introduce a data-driven analysis framework to overcome the challenges regarding task-independent workload assessment using a fusion of EEG spectral characteristics and unveil the common neural mechanisms underlying mental workload. Specifically, multi-frequency power spectrum and functional connectivity (FC) were estimated for two workload levels in two working-memory tasks performed by 40 healthy participants, subsequently being fed into a machine learning approach to obtain the importance of each feature vector and evaluate classification performance in a cross-task fashion. Our framework achieved a classification accuracy of 0. 94 for task-independent mental workload discrimination. Further investigation of the designated features in terms of their spectral and localization properties revealed task-independent common patterns in the neural mechanisms governing workload. In particular, increased workload was associated with elevated frontal delta and theta power but reduced parietal alpha power, whereas FC exhibited complex frequency- and region-dependent alterations. By implication, the employment of the EEG feature fusion emphasized their utility in serving as promising indicators for different workload conditions applications.

AAAI Conference 2021 Conference Paper

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs

  • Fei Yu
  • Jiji Tang
  • Weichong Yin
  • Yu Sun
  • Hao Tian
  • Hua Wu
  • Haifeng Wang

We propose a knowledge-enhanced approach, ERNIE-ViL, which incorporates structured knowledge obtained from scene graphs to learn joint representations of vision-language. ERNIE-ViL tries to build the detailed semantic connections (objects, attributes of objects and relationships between objects) across vision and language, which are essential to vision-language cross-modal tasks. Utilizing scene graphs of visual scenes, ERNIE-ViL constructs Scene Graph Prediction tasks, i. e. , Object Prediction, Attribute Prediction and Relationship Prediction tasks in the pre-training phase. Specifically, these prediction tasks are implemented by predicting nodes of different types in the scene graph parsed from the sentence. Thus, ERNIE-ViL can learn the joint representations characterizing the alignments of the detailed semantics across vision and language. After pre-training on large scale image-text aligned datasets, we validate the effectiveness of ERNIE-ViL on 5 cross-modal downstream tasks. ERNIE-ViL achieves state-of-the-art performances on all these tasks and ranks the first place on the VCR leaderboard with an absolute improvement of 3. 7%.

IJCAI Conference 2021 Conference Paper

Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification

  • Yunsheng Shi
  • Zhengjie Huang
  • Shikun Feng
  • Hui Zhong
  • Wenjing Wang
  • Yu Sun

Graph neural network (GNN) and label propagation algorithm (LPA) are both message passing algorithms, which have achieved superior performance in semi-supervised classification. GNN performs feature propagation by a neural network to make predictions, while LPA uses label propagation across graph adjacency matrix to get results. However, there is still no effective way to directly combine these two kinds of algorithms. To address this issue, we propose a novel Unified Message Passaging Model (UniMP) that can incorporate feature and label propagation at both training and inference time. First, UniMP adopts a Graph Transformer network, taking feature embedding and label embedding as input information for propagation. Second, to train the network without overfitting in self-loop input label information, UniMP introduces a masked label prediction strategy, in which some percentage of input label information are masked at random, and then predicted. UniMP conceptually unifies feature propagation and label propagation and is empirically powerful. It obtains new state-of-the-art semi-supervised classification results in Open Graph Benchmark (OGB).

YNICL Journal 2020 Journal Article

Altered dynamic effective connectivity of the default mode network in newly diagnosed drug-naïve juvenile myoclonic epilepsy

  • Zhe Zhang
  • Guangyao Liu
  • Weihao Zheng
  • Jie Shi
  • Hong Liu
  • Yu Sun

Juvenile myoclonic epilepsy (JME) has been repeatedly revealed to be associated with brain dysconnectivity in the default mode network (DMN). However, the implicit assumption of stationary and nondirectional functional connectivity (FC) in most previous resting-state fMRI studies raises an open question of JME-related aberrations in dynamic causal properties of FC. Here, we introduces an empirical method incorporating sliding-window approach and a multivariate Granger causality analysis to investigate, for the first time, the reorganization of dynamic effective connectivity (DEC) in DMN for patients with JME. DEC was obtained from resting-state fMRI of 34 patients with newly diagnosed and drug-naïve JME and 34 matched controls. Through clustering analysis, we found two distinct states that characterize the DEC patterns (i.e., a less frequent, strongly connected state (State 1) and a more frequent, weakly connected state (State 2)). Patients showed altered ECs within DMN subnetworks in the State 2, whereas abnormal ECs between DMN subnetworks were found in the State 1. Furthermore, we observed that the causal influence flows of the medial prefrontal cortex and angular gyrus were altered in a manner of state specificity, and associated with disease severity of patients. Overall, our findings extend the dysconnectivity hypothesis in JME from static to dynamic causal FC and demonstrate that aberrant DEC may underlie abnormal brain function in JME at early phase of illness.

AAAI Conference 2020 Conference Paper

ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding

  • Yu Sun
  • Shuohuan Wang
  • Yukun Li
  • Shikun Feng
  • Hao Tian
  • Hua Wu
  • Haifeng Wang

Recently pre-trained models have achieved state-of-the-art results in various language understanding tasks. Current pretraining procedures usually focus on training the model with several simple tasks to grasp the co-occurrence of words or sentences. However, besides co-occurring information, there exists other valuable lexical, syntactic and semantic information in training corpora, such as named entities, semantic closeness and discourse relations. In order to extract the lexical, syntactic and semantic information from training corpora, we propose a continual pre-training framework named ERNIE 2. 0 which incrementally builds pre-training tasks and then learn pre-trained models on these constructed tasks via continual multi-task learning. Based on this framework, we construct several tasks and train the ERNIE 2. 0 model to capture lexical, syntactic and semantic aspects of information in the training data. Experimental results demonstrate that ERNIE 2. 0 model outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several similar tasks in Chinese. The source codes and pre-trained models have been released at https: //github. com/PaddlePaddle/ERNIE.

IJCAI Conference 2020 Conference Paper

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

  • Dongling Xiao
  • Han Zhang
  • Yukun Li
  • Yu Sun
  • Hao Tian
  • Hua Wu
  • Haifeng Wang

Current pre-training works in natural language generation pay little attention to the problem of exposure bias on downstream tasks. To address this issue, we propose an enhanced multi-flow sequence to sequence pre-training and fine-tuning framework named ERNIE-GEN, which bridges the discrepancy between training and inference with an infilling generation mechanism and a noise-aware generation method. To make generation closer to human writing patterns, this framework introduces a span-by-span generation flow that trains the model to predict semantically-complete spans consecutively rather than predicting word by word. Unlike existing pre-training methods, ERNIE-GEN incorporates multi-granularity target sampling to construct pre-training data, which enhances the correlation between encoder and decoder. Experimental results demonstrate that ERNIE-GEN achieves state-of-the-art results with a much smaller amount of pre-training data and parameters on a range of language generation tasks, including abstractive summarization (Gigaword and CNN/DailyMail), question generation (SQuAD), dialogue generation (Persona-Chat) and generative question answering (CoQA). The source codes and pre-trained models have been released at https: //github. com/PaddlePaddle/ERNIE/ernie-gen.

AAAI Conference 2020 Conference Paper

MALA: Cross-Domain Dialogue Generation with Action Learning

  • Xinting Huang
  • Jianzhong Qi
  • Yu Sun
  • Rui Zhang

Response generation for task-oriented dialogues involves two basic components: dialogue planning and surface realization. These two components, however, have a discrepancy in their objectives, i. e. , task completion and language quality. To deal with such discrepancy, conditioned response generation has been introduced where the generation process is factorized into action decision and language generation via explicit action representations. To obtain action representations, recent studies learn latent actions in an unsupervised manner based on the utterance lexical similarity. Such an action learning approach is prone to diversities of language surfaces, which may impinge task completion and language quality. To address this issue, we propose multi-stage adaptive latent action learning (MALA) that learns semantic latent actions by distinguishing the effects of utterances on dialogue progress. We model the utterance effect using the transition of dialogue states caused by the utterance and develop a semantic similarity measurement that estimates whether utterances have similar effects. For learning semantic actions on domains without dialogue states, MALA extends the semantic similarity measurement across domains progressively, i. e. , from aligning shared actions to learning domain-specific actions. Experiments using multi-domain datasets, SMD and MultiWOZ, show that our proposed model achieves consistent improvements over the baselines models in terms of both task completion and language quality.

AIIM Journal 2020 Journal Article

State recognition of decompressive laminectomy with multiple information in robot-assisted surgery

  • Yu Sun
  • Li Wang
  • Zhongliang Jiang
  • Bing Li
  • Ying Hu
  • Wei Tian

The decompressive laminectomy is a common operation for treatment of lumbar spinal stenosis. The tools for grinding and drilling are used for fenestration and internal fixation, respectively. The state recognition is one of the main technologies in robot-assisted surgery, especially in tele-surgery, because surgeons have limited perception during remote-controlled robot-assisted surgery. The novelty of this paper is that a state recognition system is proposed for the robot-assisted tele-surgery. By combining the learning methods and traditional methods, the robot from the slave-end can think about the current operation state like a surgeon, and provide more information and decision suggestions to the master-end surgeon, which aids surgeons work safer in tele-surgery. For the fenestration, we propose an image-based state recognition method that consists a U-Net derived network, grayscale redistribution and dynamic receptive field assisting in controlling the grinding process to prevent the grinding-bit from crossing the inner edge of the lamina to damage the spinal nerves. For the internal fixation, we propose an audio and force-based state recognition method that consists signal features extraction methods, LSTM-based prediction and information fusion assisting in monitoring the drilling process to prevent the drilling-bit from crossing the outer edge of the vertebral pedicle to damage the spinal nerves. Several experiments are conducted to show the reliability of the proposed system in robot-assisted surgery.

NeurIPS Conference 2019 Conference Paper

Block Coordinate Regularization by Denoising

  • Yu Sun
  • Jiaming Liu
  • Ulugbek Kamilov

We consider the problem of estimating a vector from its noisy measurements using a prior specified only through a denoising function. Recent work on plug-and-play priors (PnP) and regularization-by-denoising (RED) has shown the state-of-the-art performance of estimators under such priors in a range of imaging tasks. In this work, we develop a new block coordinate RED algorithm that decomposes a large-scale estimation problem into a sequence of updates over a small subset of the unknown variables. We theoretically analyze the convergence of the algorithm and discuss its relationship to the traditional proximal optimization. Our analysis complements and extends recent theoretical results for RED-based estimation methods. We numerically validate our method using several denoiser priors, including those based on convolutional neural network (CNN) denoisers.

YNICL Journal 2019 Journal Article

Characterization of white matter changes along fibers by automated fiber quantification in the early stages of Alzheimer's disease

  • Xin Zhang
  • Yu Sun
  • Weiping Li
  • Bing Liu
  • Wenbo Wu
  • Hui Zhao
  • Renyuan Liu
  • Yue Zhang

Brain white matter fiber bundles in patients with mild cognitive impairment (MCI) and Alzheimer's disease (AD) have abnormalities not usually seen in unaffected subjects. Ideal algorithm of the localization-specific properties in white matter integrity might reveal the changes of tissue properties varying along each tract, while previous studies only detected the mean DTI parameters of each fiber. The aim of this study was to investigate whether these abnormalities of nerve fiber tracts are localized to specific regions of the tracts or spread throughout and to analyze which of the examined fiber tracts are involved in the early stages of Alzheimer's disease. In this study, we utilized VBA, TBSS as well as AFQ together to comprehensively investigate the white matter fiber impairment on 25 CE patients, 29 MCI patients and 34 normal control (NC) subjects. Two tract profiles, fractional anisotropy (FA) and mean diffusivity (MD), were extracted to evaluate the white matter integrity at 100 locations along each of 20 fiber tracts and then we validated the results with 27 CE patients, 21 MCI patients and 22 NC from the ADNI cohort. Also, we compare the AFQ with VBA and TBSS in our cohort. In comparison with NC, AD patients showed widespread FA reduction in 25% (5 /20) and MD increase in 65%(13/20) of the examined fiber tracts. The MCI patients showed a regional FA reduction in 5% (1/20) of the examined fiber tracts (right cingulum cingulate) and MD increase in 5%(1/20) of the examined fiber tracts (left arcuate fasciculus). Among these changed tracts, only the right cingulum cingulate showed widespread disruption of myelin or/and fiber axons in MCI and aggravated deterioration in AD, findings supported by FA/MD changes both by the mean and FA changes by point wise methods and TBSS. And the AFQ findings from ADNI cohort showed some similarity with our cohort, especially in the pointwise comparison of MD profiles between AD vs NC. Furthermore, the pattern of white matter abnormalities was different across neuronal fiber tracts; for example, the MCI and AD patients showed similar FA reduction in the middle part of the right cingulum cingulate, and the anterior part were not damaged. However, the left arcuate fasciculus showed MD elevation located at the temporal part of the fibers in the MCI patients and expanding to the temporal and middle part of the fibers in AD patients. So, the AFQ may be an alternative complementary method of VBA and TBSS, and may provide new insights into white matter degeneration in MCI and its association with AD.

IJCAI Conference 2019 Conference Paper

RLTM: An Efficient Neural IR Framework for Long Documents

  • Chen Zheng
  • Yu Sun
  • Shengxian Wan
  • Dianhai Yu

Deep neural networks have achieved significant improvements in information retrieval (IR). However, most existing models are computational costly and can not efficiently scale to long documents. This paper proposes a novel End-to-End neural ranking framework called Reinforced Long Text Matching (RLTM) which matches a query with long documents efficiently and effectively. The core idea behind the framework can be analogous to the human judgment process which firstly locates the relevance parts quickly from the whole document and then matches these parts with the query carefully to obtain the final label. Firstly, we select relevant sentences from the long documents by a coarse and efficient matching model. Secondly, we generate a relevance score by a more sophisticated matching model based on the sentence selected. The whole model is trained jointly with reinforcement learning in a pairwise manner by maximizing the expected score gaps between positive and negative examples. Experimental results demonstrate that RLTM has greatly improved the efficiency and effectiveness of the states-of-the-art models.

NeurIPS Conference 2018 Conference Paper

KDGAN: Knowledge Distillation with Generative Adversarial Networks

  • Xiaojie Wang
  • Rui Zhang
  • Yu Sun
  • Jianzhong Qi

Knowledge distillation (KD) aims to train a lightweight classifier suitable to provide accurate inference with constrained resources in multi-label learning. Instead of directly consuming feature-label pairs, the classifier is trained by a teacher, i. e. , a high-capacity model whose training may be resource-hungry. The accuracy of the classifier trained this way is usually suboptimal because it is difficult to learn the true data distribution from the teacher. An alternative method is to adversarially train the classifier against a discriminator in a two-player game akin to generative adversarial networks (GAN), which can ensure the classifier to learn the true data distribution at the equilibrium of this game. However, it may take excessively long time for such a two-player game to reach equilibrium due to high-variance gradient updates. To address these limitations, we propose a three-player game named KDGAN consisting of a classifier, a teacher, and a discriminator. The classifier and the teacher learn from each other via distillation losses and are adversarially trained against the discriminator via adversarial losses. By simultaneously optimizing the distillation and adversarial losses, the classifier will learn the true data distribution at the equilibrium. We approximate the discrete distribution learned by the classifier (or the teacher) with a concrete distribution. From the concrete distribution, we generate continuous samples to obtain low-variance gradient updates, which speed up the training. Extensive experiments using real datasets confirm the superiority of KDGAN in both accuracy and training speed.

IJCAI Conference 2017 Conference Paper

App Download Forecasting: An Evolutionary Hierarchical Competition Approach

  • Yingzi Wang
  • Nicholas Jing Yuan
  • Yu Sun
  • Chuan Qin
  • Xing Xie

Product sales forecasting enables comprehensive understanding of products' future development, making it of particular interest for companies to improve their business, for investors to measure the values of firms, and for users to capture the trends of a market. Recent studies show that the complex competition interactions among products directly influence products' future development. However, most existing approaches fail to model the evolutionary competition among products and lack the capability to organically reflect multi-level competition analysis in sales forecasting. To address these problems, we propose the Evolutionary Hierarchical Competition Model (EHCM), which effectively considers the time-evolving multi-level competition among products. The EHCM model systematically integrates hierarchical competition analysis with multi-scale time series forecasting. Extensive experiments using a real-world app download dataset show that EHCM outperforms state-of-the-art methods in various forecasting granularities.

YNIMG Journal 2017 Journal Article

The effects of a mid-task break on the brain connectome in healthy participants: A resting-state functional MRI study

  • Yu Sun
  • Julian Lim
  • Zhongxiang Dai
  • KianFoong Wong
  • Fumihiko Taya
  • Yu Chen
  • Junhua Li
  • Nitish Thakor

Although rest breaks are commonly administered as a countermeasure to reduce mental fatigue and boost cognitive performance, the effects of taking a break on behavior are not consistent. Moreover, our understanding of the underlying neural mechanisms of rest breaks and how they modulate mental fatigue is still rudimentary. In this study, we investigated the effects of receiving a rest break on the topological properties of brain connectivity networks via a two-session experimental paradigm, in which one session comprised four successive blocks of a mentally demanding visual selective attention task (No-rest session), whereas the other contained a rest break between the second and third task blocks (Rest session). Functional brain networks were constructed using resting-state functional MRI data recorded from 20 healthy adults before and after the performance of the task blocks. Behaviorally, subjects displayed robust time-on-task (TOT) declines, as reflected by increasingly slower reaction time as the test progressed and lower post-task self-reported ratings of engagement. However, we did not find a significant effect on task performance due to administering a mid-task break. Compared to pre-task measurements, post-task functional brain networks demonstrated an overall decrease of optimal small-world properties together with lower global efficiency. Specifically, we found TOT-related reduced nodal efficiency in brain regions that mainly resided in the subcortical areas. More interestingly, a significant block-by-session interaction was revealed in local efficiency, attributing to a significant post-task decline in No-rest session and a preserved local efficiency when a mid-task break opportunity was introduced in the Rest session. Taken together, these findings augment our understanding of how the resting brain reorganizes following the accumulation of prolonged task, suggest dissociable processes between the neural mechanisms of fatigue and recovery, and provide some of the first quantitative insights into the cognitive neuroscience of work and rest.

NeurIPS Conference 2016 Conference Paper

Supervised Word Mover's Distance

  • Gao Huang
  • Chuan Guo
  • Matt Kusner
  • Yu Sun
  • Fei Sha
  • Kilian Weinberger

Accurately measuring the similarity between text documents lies at the core of many real world applications of machine learning. These include web-search ranking, document recommendation, multi-lingual document matching, and article categorization. Recently, a new document metric, the word mover's distance (WMD), has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high quality word embeddings to document metrics by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised WMD (S-WMD) metric. Our algorithm learns document distances that measure the underlying semantic differences between documents by leveraging semantic differences between individual words discovered during supervised training. This is achieved with an linear transformation of the underlying word embedding space and tailored word-specific weights, learned to minimize the stochastic leave-one-out nearest neighbor classification error on a per-document level. We evaluate our metric on eight real-world text classification tasks on which S-WMD consistently outperforms almost all of our 26 competitive baselines.

KER Journal 2011 Journal Article

The development of a graphical user interface, functional elements and classifiers for the non-invasive characterization of childhood brain tumours using magnetic resonance spectroscopy

  • Alexander Gibb
  • John Easton
  • Nigel Davies
  • Yu Sun
  • Lesley MacPherson
  • Kal Natarajan
  • Theodoros Arvanitis
  • Andrew Peet

Abstract Magnetic resonance spectroscopy (MRS) is a non-invasive method, which can provide diagnostic information on children with brain tumours. The technique has not been widely used in clinical practice, partly because of the difficulty of developing robust classifiers from small patient numbers and the challenge of providing decision support systems (DSSs) acceptable to clinicians. This paper describes a participatory design approach in the development of an interactive clinical user interface, as part of a distributed DSS for the diagnosis and prognosis of brain tumours. In particular, we consider the clinical need and context of developing interactive elements for an interface that facilitates the classification of childhood brain tumours, for diagnostic purposes, as part of the HealthAgents European Union project. Previous MRS-based DSS tools have required little input from the clinician user and a raw spectrum is essentially processed to provide a diagnosis sometimes with an estimate of error. In childhood brain tumour diagnosis where there are small numbers of cases and a large number of potential diagnoses, this approach becomes intractable. The involvement of clinicians directly in the designing of the DSS for brain tumour diagnosis from MRS led to an alternative approach with the creation of a flexible DSS that, allows the clinician to input prior information to create the most relevant differential diagnosis for the DSS. This approach mirrors that which is currently taken by clinicians and removes many sources of potential error. The validity of this strategy was confirmed for a small cohort of children with cerebellar tumours by combining two diagnostic types, pilocytic astrocytomas (11 cases) and ependymomas (four cases) into a class of glial tumours which then had similar numbers to the other diagnostic type, medulloblastomas (18 cases). Principal component analysis followed by linear discriminant analysis on magnetic resonance spectral data gave a classification accuracy of 91% for a three-class classifier and 94% for a two-class classifier using a leave-one-out analysis. This DSS provides a flexible method for the clinician to use MRS for brain tumour diagnosis in children.

IJCAI Conference 2005 Conference Paper

The Ontology Revision

  • Yu Sun
  • Yuefei

An ontology consists of a set of concepts, a set of constraints imposing on instances of concepts, and the subsumption relation. It is assumed that an ontology is a tree under the subsumption relation between concepts. To preserve structural properties of ontologies, the ontology revision is not only contracting ontologies by discarding statements inconsistent with a revising statement, but also extracting statements consistent with the revising statement and adding some other statements. In the ontology revision, the consistency of a revising statement with the theory of the logical closure of the ontology under the closed world assumption is discussed. The basic postulates of the ontology revision are proposed and a concrete ontology revision is given based on the consistence or inconsistence of an ontology and a revising statement.