Arrow Research search

Author name cluster

Xin Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers
2 author rows

Possible papers

25

JBHI Journal 2026 Journal Article

GCL-MSE: Graph Contrastive Learning with Mutual Similarity Enhancement for Drug Repositioning

  • Shasha Tao
  • Jin Liu
  • Min Xiang
  • Xin Ma
  • Tongtong Huo
  • Xiaolin Ning

Amidst the shift to data-driven drug repositioning, existing models struggle to capture complex semantic and topological relationships in biomedical knowledge graphs for drug-disease association (DDA) mining. We propose a Graph Contrastive Learning with Mutual Similarity Enhancement (GCL-MSE). The core innovation lies in defining a concept of mutual similarity. This concept comprises two aspects: drug therapeutic domain similarity, which captures the functional associations between drugs based on their therapeutic spectra, and disease pharmacological response similarity, which reflects the pathological associations between diseases based on drug response patterns. Based on this concept, a Mutual Similarity Enhancement mechanism (MSE) is constructed to fuse four similarities, to build a semantic relationship topology that captures the complex semantic dependencies in DDA. Further, an Adaptive Orthogonal Noise Contrastive Estimation Loss (AdaOrthoNCE) is proposed to disentangle biological relationships in the latent space while optimizing discriminative representations. GCL MSE integrates the semantic topology via MSE, employs a three-channel graph convolutional model to generate topology-semantic co-representations, and utilizes AdaOrthoNCE to learn optimized embeddings, ultimately enabling cross-scale DDA prediction. Experimental results demonstrate that GCL-MSE significantly outperforms state of-the-art models in both AUROC and AUPRC metrics, with improvements of over 4. 8% in AUROC and 25. 5% in AUPRC, thereby validating the effectiveness of collaborative modeling that integrates features from pharmacological, therapeutic, and topological perspectives. Additionally, GCL MSE predicts the therapeutic roles of drugs such as lamotrigine for Alzheimer's disease and hydroxyurea for breast cancer. Molecular docking experiments and related studies further confirm its validity.

EAAI Journal 2026 Journal Article

Research on a residual learning based neural-kernel framework with applications in short-term load forecasting

  • Wangyi Xu
  • Yushu Xiang
  • Xin Ma
  • Wangpeng Li

Short-term load forecasting is essential for power system operation, yet it remains challenging due to the non-stationary nature of load data and the difficulty of capturing complex nonlinear relationships. To address this issue, a residual learning–based neural kernel framework is proposed for short–term load forecasting. The framework integrates a Fourier kernel-based neural kernel module into a deep residual network as a residual function. The Fourier kernel enables automatic identification and separation of periodic components and long-term trends in load data, while the non-parametric property of the kernel model helps reduce model complexity. Meanwhile, the shortcut connections in the residual network effectively alleviate the vanishing gradient problem, ensuring stable and efficient model training. To further improve model performance, the Artificial Bee Colony (ABC) algorithm is employed for hyperparameter optimization, allowing efficient approximation of the global optimum. In addition, a novel Theil UII-S loss function is introduced to enhance the model’s sensitivity to abnormal load fluctuations through adaptive gradient regulation. Experimental results on four real-world power datasets demonstrate that the proposed model outperforms 23 benchmark methods in terms of prediction accuracy. Ablation studies further verify the individual contributions of the Fourier kernel, the loss function, and the ABC algorithm, providing useful insights for future research.

EAAI Journal 2025 Journal Article

A residual learning-based grey system model and its applications in Electricity Transformer’s Seasonal oil temperature forecasting

  • Yiwu Hao
  • Xin Ma
  • Lili Song
  • Yushu Xiang

Accurately predicting cross-regional electricity demand is crucial for efficient distribution management, but it remains challenging due to its complexity. Transformer oil temperature is a key indicator of operational status, and analyzing its seasonal variation is vital for addressing distribution issues. Grey models based on neural networks are effective for predicting nonlinear and small-scale datasets but are prone to overfitting. While residual networks help mitigate overfitting, their application to small-scale time series forecasting is still limited. To improve prediction accuracy for nonlinear and small-scale data, this study introduces residual learning into grey models, proposing a hybrid model. This model combines the feature-capturing ability of residual learning networks with the robustness of grey models, helping to reduce overfitting. The model is trained using the Adam algorithm, with parameters optimized by the Gridsearch algorithm. Performance is demonstrated using four seasonal datasets of transformer oil temperature. A comparison with 13 grey system models and 9 machine learning models shows that the proposed method outperforms the others. By calculating the percentage improvements of various metrics, the model demonstrates consistent performance gains. Sensitivity analysis reveals that the model’s performance is sensitive to the number of neurons and network depth, with higher values significantly improving accuracy and robustness. The results confirm the model’s effectiveness. This study fills the gap between neural grey models and residual networks, successfully applying the model to forecast the seasonal temperature trends of power transformers and providing a theoretical basis for addressing power distribution challenges.

ICRA Conference 2025 Conference Paper

A Variable Stiffness and Transformable Entanglement Soft Robotic Gripper

  • Huayu Zhang
  • Tianle Pan
  • Jianshu Zhou
  • Boyuan Liang
  • Jing Shu
  • Puchen Zhu
  • Jiajun An
  • Yun-Hui Liu

For objects with complex topological and geometrical features, stochastic topological grasping can be executed without the necessity for feedback or precise planning. However, this grasping method has two significant limitations. First, the technique's effectiveness is reduced when interacting with topologically and geometrically simple objects like spheres, cubes, and cylinders, due to the inherent variability in grasping patterns. Additionally, the method's low stiffness restricts its ability to securely handling heavier objects. To address these challenges, this paper proposes an entanglement soft robotic gripper with variable stiffness and two transformed grasping modes (entanglement and clamping modes). The gripper contains three filaments, which can enhance the stiffness through the mechanism of layer jamming. Furthermore, the entanglement mode and the clamping mode, can be transformed by adjusting the working length of the filaments. The grasping performance comparison with and without variable stiffness was carried out, and the results indicated that the implementation of variable stiffness led to a 149 % increase in payload weight. Through experimental validation, we successfully employed the gripper in variable stiffness and transformed modes to grasp items with various shapes and weights. Demonstration of grasping heavier objects and transforming between two grasping modes were also conducted to showcase the adaptability and versatility of the gripper.

NeurIPS Conference 2025 Conference Paper

Dual Prototype-Enhanced Contrastive Framework for Class-Imbalanced Graph Domain Adaptation

  • Xin Ma
  • Yifan Wang
  • Siyu Yi
  • Wei Ju
  • Junyu Luo
  • Yusheng Zhao
  • Xiao Luo
  • Jiancheng Lv

Graph transfer learning, especially in unsupervised domain adaptation, aims to transfer knowledge from a label-abundant source graph to an unlabeled target graph. However, most existing approaches overlook the common issue of label imbalance in the source domain, typically assuming a balanced label distribution that rarely holds in practice. Moreover, they face challenges arising from biased knowledge in the source graph and substantial domain distribution shifts. To remedy the above challenges, we propose a dual-branch prototype-enhanced contrastive framework for class-imbalanced graph domain adaptation in this paper. Specifically, we introduce a dual-branch graph encoder to capture both local and global information, generating class-specific prototypes from a distilled anchor set. Then, a prototype-enhanced contrastive learning framework is introduced. On the one hand, we encourage class alignment between the two branches based on constructed prototypes to alleviate the bias introduced by class imbalance. On the other hand, we infer the pseudo-labels for the target domain and align sample pairs across domains that share similar semantics to reduce domain discrepancies. Experimental results show that our ImGDA outperforms the state-of-the-art methods across multiple datasets and settings. The code is available at: https: //github. com/maxin88scu/ImGDA.

YNICL Journal 2025 Journal Article

Effects of parietal iTBS on resting-state effective connectivity within the frontoparietal network in patients with schizophrenia: An fMRI study

  • Li Li
  • Lina Wang
  • Han Wu
  • Bing Li
  • Weigang Pan
  • Wenqing Jin
  • Wen Wang
  • Yanping Ren

BACKGROUND: Although intermittent theta burst stimulation (iTBS) has shown effectiveness in addressing working memory (WM) deficits in individuals with schizophrenia (SZ), the current body of evidence is limited and the specific mechanisms involved remain unclear. Therefore, this pilot fMRI study aimed to examine the efficacy of parietal iTBS in ameliorating WM impairments and explore its influence on the resting-state effective connectivity within the frontoparietal network in patients with SZ. METHOD: A total of 48 patients diagnosed with SZ were randomly assigned to an active or sham iTBS group and underwent 20 sessions of active or sham iTBS over 4 weeks. Subsequently, all patients underwent cognitive tests, clinical symptom assessments, and resting-state functional MRI (rs-fMRI) scans. The effective connectivity between the frontal and parietal brain regions during the rs-fMRI scans was analyzed using a spectral dynamic causal modeling approach. Additionally, this trial was registered at the Chinese Clinical Trial Registry in November 2022 (registry number: ChiCTR2200057286). RESULTS: iTBS treatment improved the positive symptoms, negative symptoms, general psychopathology, and WM deficits. Following the iTBS intervention, the active group demonstrated a significant increase in connectivity strengths from the right MFG to the right SPL (p = 0.031) and from the left SPL to the left MFG (p = 0.010) compared to the pre-treatment levels. Additionally, compared to the sham group, the active group displayed a significantly higher connectivity strength from the right MFG to the right SPL (p = 0.042) after iTBS treatment. CONCLUSION: All these findings suggest that iTBS targeting the parietal region may influence the resting-state effective connectivity within the frontoparietal network, thereby offering promising therapeutic implications for alleviating the cognitive deficits in SZ.

EAAI Journal 2025 Journal Article

Enhancing autism spectrum disorder early detection with parent-child dyads block-play protocol and attention-enhanced hybrid deep learning framework

  • Xiang Li
  • Lizhou Fan
  • Hanbo Wu
  • Kunping Chen
  • Xiaoxiao Yu
  • Chao Che
  • Zhifeng Cai
  • Xiuhong Niu

Autism Spectrum Disorder (ASD) is a rapidly growing neurodevelopmental disorder. Early intervention is crucial for the development of young children with ASD, yet traditional clinical screening methods often lack objectivity. We introduce a novel Parent-Child Dyads Block-Play (PCB) protocol that captures distinct behavioral patterns in ASD toddlers during naturalistic interactions with their parents. This protocol systematically captures and quantifies parent–child interactions during the block-play task, providing a structured and naturalistic environment to observe ASD-relevant behaviors. Drawing on kinesiological and neuroscientific insights, our approach analyzes movement dynamics to reliably differentiate ASD from typically developing (TD) toddlers. In a dataset of 129 toddlers (40 ASD, 89 TD), we analyze the videos using a hybrid deep learning framework that integrates a two-stream graph convolution network (2sGCN) with an attention-enhanced extended long short-term memory (AxLSTM), enabling the capture of both spatial and temporal aspects of movement. Our 2sGCN-AxLSTM framework efficiently analyzes human dynamic behavioral patterns and is able to distinguish between ASD and typical developmental disorders with an unprecedented 89. 6% accuracy. This high level of accuracy holds promise for practical clinical use, as it could facilitate timely interventions and potentially improve developmental outcomes. By focusing on real-life parent–child interactions, the proposed PCB protocol provides a valuable tool that can complement traditional assessments, facilitating timely interventions, and potentially improving developmental outcomes.

JBHI Journal 2025 Journal Article

Exploring the Potential of SSVER-BCI Based on Contactless Measurement Using Optically Pumped Magnetometers

  • Fulong Wang
  • Fuzhi Cao
  • Jiawei Gao
  • Nan An
  • Jianzhi Yang
  • Yaxiang Wang
  • Dexin Yu
  • Xin Ma

Brain-computer interfaces (BCIs) based on electroencephalogram (EEG) have been widely applied in health monitoring and neurorehabilitation. However, EEG signals are often attenuated and distorted by tissues like the scalp and skull, limiting EEG-based BCI performance. In contrast, magnetoencephalography (MEG) with contactless measurement offers higher spatial resolution and immunity to volume conduction effects. Traditional MEG systems, based on superconducting quantum interference devices (SQUIDs), are hindered by their size and cost, while optically pumped magnetometers (OPMs) have made OPM-MEG-based BCIs more practical and accessible. Nevertheless, the performance potential of OPM-MEG in BCI applications remains underexplored. To address this, we developed an OPM-MEG BCI system based on steady-state visual evoked response (SSVER) and conducted a systematic evaluation of its performance, highlighting the practical advantages of OPM-MEG in this context. Furthermore, we proposed a fusion framework for OPM-MEG and EEG to further enhance system performance. Offline experiments conducted with 13 participants showed that the developed EEG-BCI achieved an average accuracy of 94. 30% and an information transfer rate (ITR) of 122. 76 bits/min, the developed OPM-MEG BCI achieved an average accuracy of 98. 68% and an ITR of 138. 20 bits/min, while the hybrid BCI achieved an average accuracy of 99. 72% and an ITR of 159. 4 bits/min. The findings highlight the advantages of OPM-MEG for BCI applications and validate the proposed fusion framework as a viable means to enhance decoding performance, thereby extending the potential use cases of OPM-MEG-based systems.

TMLR Journal 2025 Journal Article

Latte: Latent Diffusion Transformer for Video Generation

  • Xin Ma
  • Yaohui Wang
  • Xinyuan Chen
  • Gengyun Jia
  • Ziwei Liu
  • Yuan-Fang Li
  • Cunjian Chen
  • Yu Qiao

We propose Latte, a novel Latent Diffusion Transformer for video generation. Latte first extracts spatio-temporal tokens from input videos and then adopts a series of Transformer blocks to model video distribution in the latent space. In order to model a substantial number of tokens extracted from videos, four efficient variants are introduced from the perspective of decomposing the spatial and temporal dimensions of input videos. To improve the quality of generated videos, we determine the best practices of Latte through rigorous experimental analysis, including video clip patch embedding, model variants, timestep-class information injection, temporal positional embedding, and learning strategies. Our comprehensive evaluation demonstrates that Latte achieves state-of-the-art performance across four standard video generation datasets, \textit{i.e.}, FaceForensics, SkyTimelapse, UCF101, and Taichi-HD. In addition, we extend Latte to the text-to-video generation (T2V) task, where Latte achieves results that are competitive with recent T2V models. We strongly believe that Latte provides valuable insights for future research on incorporating Transformers into diffusion models for video generation.

JBHI Journal 2025 Journal Article

Multi-task Learning for Gait Phase and Gait Cycle Percentage Prediction with Wearable Sensors in Frail Older Adults

  • Jiachen Wang
  • Zeyang Guan
  • Tian Liang
  • Ziyun Ding
  • Xin Ma
  • Yibin Li
  • Rui Song
  • Huanghe Zhang

Deep learning has been widely used in wearable sensors to improve accuracy in gait analysis. However, these deep learning models typically focus on single tasks, either in gait parameter estimation or gait phase detection. This study presents a novel multi-task learning framework for regression (i. e. , gait cycle percentage prediction) and classification (i. e. , gait phase prediction) tasks in pathological gait analysis using wearable sensors. Our framework employs a Multi-gate Mixture-of-Experts (MMoE) architecture to achieve soft parameter-sharing, integrating expert networks, cross-expert attention mechanisms, and dynamic routing to balance shared and task-specific representations. To reduce computational burden in wearable applications, we compare lightweight model configurations that optimize expert count and feature dimensionality. Model performance has been validated on a public dataset consisting of 158 frail older adults, demonstrating that our framework significantly outperforms single-task learning and hard parameter-sharing baselines, achieving an accuracy of 97. 56% and a Mean Absolute Error (MAE) of 0. 0397. Notably, the most compact lightweight configuration reduces the parameter count by nearly 98% (from 2. 118 million to 0. 0469 million), achieving an accuracy of 96. 47% and a MAE of 0. 0549. Attention mechanisms significantly enhance performance across all configurations, with improvements ranging from 17. 9% to 30. 4%. These findings validate the potential of lightweight multi-task approaches for real-time gait assessment, offering promising applications for clinical evaluation and rehabilitation monitoring in geriatric populations.

AAAI Conference 2025 Conference Paper

Novel View Synthesis Under Large-Deviation Viewpoint for Autonomous Driving

  • Xin Ma
  • Jiguang Zhang
  • Peng Lu
  • Shibiao Xu
  • Chengwei Pan

Novel view synthesis is a critical task in autonomous driving. Although 3D Gaussian Splatting (3D-GS) has shown success in generating novel views, it faces challenges in maintaining high-quality rendering when viewpoints deviate significantly from the training set. This difficulty primarily stems from complex lighting conditions and geometric inconsistencies in texture-less regions. To address these issues, we propose an attention-based illumination model that leverages light fields from neighboring views, enhancing the realism of synthesized images. Additionally, we propose a geometry optimization method using planar homography to improve geometric consistency in texture-less regions. Our experiments demonstrate substantial improvements in synthesis quality for large-deviation viewpoints, validating the effectiveness of our approach.

IJCAI Conference 2025 Conference Paper

PALA: Class-imbalanced Graph Domain Adaptation via Prototype-anchored Learning and Alignment

  • Xin Ma
  • Yifan Wang
  • Siyu Yi
  • Wei Ju
  • Bei Wu
  • Ziyue Qiao
  • Chenwei Tang
  • Jiancheng Lv

Graph domain adaptation is a key subfield of graph transfer learning that aims to bridge domain gaps by transferring knowledge from a label-rich source graph to an unlabeled target graph. However, most existing methods assume balanced labels in the source graph, which often fails in practice and leads to biased knowledge transfer. To address this, in this paper, we propose a prototype-anchored learning and alignment framework for class-imbalanced graph domain adaptation. Specifically, we incorporate pointwise node mutual information into the graph encoder to capture high-order topological proximity and learn generalized node representations. Leveraging this, we then introduce categorical prototypes with adversarial proto-instances for prototype-anchored learning and recalibration to represent the source graph under an imbalanced class distribution. Finally, we introduce a weighted prototype contrastive adaptation strategy that aligns target pseudo-labels with source prototypes to handle class imbalance during adaptation. Extensive experiments show that our PALA outperforms the state-of-the-art methods. Our code is available at https: //github. com/maxin88scu/PALA.

JBHI Journal 2025 Journal Article

SDPR: Prescription Recommendation With Syndrome Differentiation in Traditional Chinese Medicine

  • Wenjing Yue
  • Wendi Ji
  • Xinyu Wang
  • Xin Ma
  • Pengfei Wang
  • Xiaoling Wang

Prescription recommendation is critical for clinical decision support in Traditional Chinese Medicine (TCM), aiming to recommend a herb set based on a patient's symptoms. The core principle of TCM clinical practice, treatment based on syndrome differentiation (SD), follows a four-step progressive process: symptoms to syndromes, therapeutic methods, and herbs. However, existing models oversimplify this process by overlooking therapeutic methods, directly mapping symptoms to herbs or syndromes to herbs, resulting in information loss and reducing the effectiveness of recommended prescriptions. Furthermore, the implicit, sparse, and many-to-many relationships between syndromes and therapeutic methods, coupled with the nonlinear interactions between therapeutic methods and herbs, further hinder the modeling of the complete SD process. To address these challenges, we propose a novel four-partite graph paradigm that explicitly models the four key components of SD and their interactions, preserving critical information at each step and aligning more closely with clinicians' decision-making logic. Building on this, we develop SDPR, an SD-based prescription recommendation model comprising four modules aligned with all SD steps. Then, we integrated them into a multi-task learning framework to fully capture the progressive prescription process. To handle the implicit and complex relationships among syndromes, therapeutic methods, and herbs, we introduce a syndrome-induced pre-training strategy and a therapeutic method-aware contrastive learning framework. Extensive experiments on public and real-world datasets validate SDPR's effectiveness in herb recommendation and prescription retrieval, confirming the strength of the four-partite graph paradigm. Our broader goal is to advance the intelligent development of TCM in healthcare.

YNIMG Journal 2025 Journal Article

Source-level performance of triaxial and uniaxial-radial OPM-MEG

  • Wen Li
  • Nan An
  • Zhenfeng Gao
  • Junjian Tang
  • Jianzhi Yang
  • Xin Ma
  • Min Xiang
  • Fuzhi Cao

Magnetoencephalography (MEG) based on optically pumped magnetometers (OPM) provides a new means for detecting human brain activities. Compared to conventional SQUID-MEG, OPM-MEG offers significant advantages, including the wearability, flexibility in sensor arrays, ability to operate at room temperature, and closer proximity to the scalp for stronger signals. Initially, the OPM with uniaxial sensitivity has been widely used for measuring the radial magnetic field components relative to the scalp. Recently, triaxial OPM has become a mainstream development trend, enabling OPM-MEG to measure the full neuromagnetic field vector. Studies have shown that triaxial OPM-MEG provides sensitivity to the tangential components of the magnetic field, offers greater brain coverage, and exhibits enhanced motion-artifact resistance compared to uniaxial OPM-MEG. However, source-level performance disparities between triaxial and uniaxial OPM-MEG remain underexplored. Here, we comprehensively evaluated the source-level performance of triaxial and uniaxial OPM-MEG using five source reconstruction methods. The analysis is performed under various simulation conditions, including various extents, numbers, correlation coefficients, and depth of sources, SNRs, co-registration errors. The results show that triaxial OPM-MEG significantly improves source imaging accuracy in most cases over uniaxial systems. Importantly, this advantage exhibits strong method dependence. In addition, we validated the conclusions obtained from the simulation in publicly available 64 triaxial OPM-MEG (192 channels of data) somatosensory data. A guideline for the selection of imaging methods in various real-world scenarios is provided based on our findings.

EAAI Journal 2025 Journal Article

Time-delayed fractional grey Bernoulli model with independent fractional orders for fossil energy consumption forecasting

  • Xin Ma
  • Qingping He
  • Wanpeng Li
  • Wenqing Wu

Fossil fuels serve as the primary energy source in the global energy landscape. Through an in-depth understanding and forecasting of fossil fuel consumption, it becomes possible to address energy and environmental challenges more effectively. This study contributes to artificial intelligence by proposing a novel time-delayed fractional grey Bernoulli model with independent fractional orders, which leverages the structural properties of the Bernoulli equation, the cumulative nature of fractional orders, and the driving influence of time-delay term. The model parameters are optimized using the particle swarm optimization algorithm, enhancing its adaptability and accuracy. The engineering application of this model focuses on forecasting fossil fuel consumption, a critical challenge in energy and environmental engineering. Using real datasets from 2000 to 2022, the proposed model is applied to predict the consumption of natural gas, coal and oil in the Middle East and North America, alongside ten benchmark models. The results demonstrate the superior performance of the proposed model, achieving the mean absolute percentage errors of 2. 632991%, 5. 793513%, 5. 432220% and 2. 816756% across four case studies, significantly outperforming other models. These findings highlight the potential of the proposed model as a robust and reliable decision-support tool in energy engineering.

EAAI Journal 2025 Journal Article

Unsupervised domain adaptation framework with global-local adversarial learning and masked image consistency for fish counting in deep-sea aquaculture

  • Hanchi Liu
  • Xin Ma

Accurate fish counting is crucial for effective management in deep-sea aquaculture. However, the diverse deep-sea aquaculture environments pose significant challenges to the generalizability of fish counting models. Current methods rely heavily on extensive labeled datasets and struggle to adapt to unseen scenarios. To address these limitations, this study proposes an unsupervised domain adaptation framework that leverages global-local adversarial learning and masked image consistency for cross-domain fish counting. A global-local discriminator is designed to extract the domain-invariant features by aligning the density map prediction across domains at the image and patch levels. Additionally, a masked image consistency module is designed to enhance the utilization of spatial context in the target image by enforcing prediction consistency between masked and complete target images. To validate our approach, we established four fish counting datasets using cameras with varying lighting conditions and viewpoints from two actual deep-sea cages. The framework was evaluated on three domain adaptation tasks: (1) natural to artificial lighting, (2) fixed to free viewpoints, and (3) “Shenlan 1” to “Genghai 1” cages. Experimental results demonstrate that the proposed method significantly improves generalization to diverse aquaculture scenarios without requiring additional data annotations. It outperformed state-of-the-art methods, reducing the mean absolute percentage error by 17. 34 % for cross-view counting and 6. 77 % for cross-cage counting.

ECAI Conference 2025 Conference Paper

WDFusion: Multi-Modal Image Fusion via Wavelet-Decoupled Inter-Modal Interaction Learning

  • Yuansheng Song
  • Xin Ma
  • Xuqi Cai
  • Xincao Xu

The purpose of multi-modal image fusion is to integrate the advantages of different modal images to obtain fused images that satisfy human or machine perception tasks. However, most methods rely on hard fusion rules, which tend to perform poorly in the face of new data, generalize poorly, and fail to learn the needed more general and robust feature representations. Therefore, we designed a two-stage multi-modal image fusion method based on wavelet transform decoupled cross-modal translation and contrastive regularization. Specifically, in the first stage, different modalities are decoupled into high-frequency and low-frequency parts by wavelet transform, respectively, and the enhanced features are obtained by the corresponding enhancement blocks, and then their low-frequency parts are exchanged to obtain the corresponding pseudo-modal images. Then the positive and negative samples are constructed, and the cross-modal translation constraints are carried out using the idea of contrastive learning to realize the inter-modal interactive learning. In the second stage, the wavelet decoupled high and low frequency features of different modal images are modulated and fused respectively. Through extensive experimental validation, our method achieves state-of-the-art fusion performance with strong generalization on multiple datasets, and achieves state-of-the-art performance on downstream tasks including object detection and semantic segmentation. The code is available at https: //github. com/Song-YS/WDFusion.

NeurIPS Conference 2024 Conference Paper

Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs

  • Xin Ma
  • Yang Liu
  • Jingjing Liu
  • Xiaoxu Ma

Large language models (LLMs), although having revolutionized many fields, still suffer from the challenging extrapolation problem, where the inference ability of LLMs sharply declines beyond their max training lengths. In this work, we conduct a theoretical analysis to better understand why No Position Encoding (NoPE) fails outside its effective range, as well as examining the power of Position Encoding (PE) in this context. Our findings reveal that with meticulous weave position, PE can indeed be extended beyond effective range. Our theorems establish that LLMs equipped with weave PE can achieve improved extrapolation performance without additional cost. Furthermore, we introduce a novel weave PE method, Mesa-Extrapolation, which utilizes a chunk-based triangular attention matrix and applies Stair PE to manage the final chunk. This method not only retains competitive performance but also offers substantial benefits such as significantly reduced memory demand and faster inference speed. Extensive experiments validate the effectiveness of Mesa-Extrapolation, demonstrating its potential as a scalable solution to enhancing LLMs’ applicative reach.

YNIMG Journal 2023 Journal Article

Leading and following: Noise differently affects semantic and acoustic processing during naturalistic speech comprehension

  • Xinmiao Zhang
  • Jiawei Li
  • Zhuoran Li
  • Bo Hong
  • Tongxiang Diao
  • Xin Ma
  • Guido Nolte
  • Andreas K. Engel

Despite the distortion of speech signals caused by unavoidable noise in daily life, our ability to comprehend speech in noisy environments is relatively stable. However, the neural mechanisms underlying reliable speech-in-noise comprehension remain to be elucidated. The present study investigated the neural tracking of acoustic and semantic speech information during noisy naturalistic speech comprehension. Participants listened to narrative audio recordings mixed with spectrally matched stationary noise at three signal-to-ratio (SNR) levels (no noise, 3 dB, -3 dB), and 60-channel electroencephalography (EEG) signals were recorded. A temporal response function (TRF) method was employed to derive event-related-like responses to the continuous speech stream at both the acoustic and the semantic levels. Whereas the amplitude envelope of the naturalistic speech was taken as the acoustic feature, word entropy and word surprisal were extracted via the natural language processing method as two semantic features. Theta-band frontocentral TRF responses to the acoustic feature were observed at around 400 ms following speech fluctuation onset over all three SNR levels, and the response latencies were more delayed with increasing noise. Delta-band frontal TRF responses to the semantic feature of word entropy were observed at around 200 to 600 ms leading to speech fluctuation onset over all three SNR levels. The response latencies became more leading with increasing noise and decreasing speech comprehension and intelligibility. While the following responses to speech acoustics were consistent with previous studies, our study revealed the robustness of leading responses to speech semantics, which suggests a possible predictive mechanism at the semantic level for maintaining reliable speech comprehension in noisy environments.

EAAI Journal 2023 Journal Article

Urban natural gas consumption forecasting by novel wavelet-kernelized grey system model

  • Xin Ma
  • Hongfang Lu
  • Minda Ma
  • Lifeng Wu
  • Yubin Cai

Natural gas is playing a key role in the Carbon Neutral path, which is clean and abundant. However it is difficult to collect sufficient data of urban natural gas consumption in China, and such data sets often present high nonlinearity and complex features, making it difficult to make accurate forecasts for the mid-small cities based on small samples. In this work, a novel wavelet kernel-based grey system model is proposed by using the wavelet kernel-based machine learning and the grey system modelling, taking advantage of the features of nonlinearity and periodicity of the wavelet kernel. A complete computational algorithm is presented by utilizing a hold-out cross validation-based grid-search scheme for selecting the optimal hyperparameters. Three case studies are carried out based on the real-world data sets of urban natural gas consumption in Kunming China, in which the proposed model outperforms other 15 time series forecasting models (including kernel-based models, grey system models and deep learning models), illustrating its priority in such forecasting tasks and high potential in similar applications.

EAAI Journal 2022 Journal Article

A novel fractional time-delayed grey Bernoulli forecasting model and its application for the energy production and consumption prediction

  • Yong Wang
  • Xinbo He
  • Lei Zhang
  • Xin Ma
  • Wenqing Wu
  • Rui Nie
  • Pei Chi
  • Yuyang Zhang

Energy affects the stable and sustainable development of social economy. Energy prediction plays an important role in the process of China’s energy market transformation. Scientific and reasonable energy predicting method can help government to make decisions effectively, and then adjust energy structure and industrial layout. The energy field is full of fractional order phenomenon and nonlinear disturbance. Aiming at the energy data sets with the characteristics of scarcity, complexity and nonlinear, a mathematical model including time delay term and Bernoulli equation can be used to fit this trend. A new fractional time-delayed grey Bernoulli model is proposed, and the new model has a wider application in the nonlinear field. The model is discretized by integral, and the least square estimation of the linear parameters and the approximate time response equation are obtained. The Grey Wolf Optimizer (GWO) is used to search the optimal parameters of the model. In addition, the energy prediction model is established from the perspective of renewable energy and fossil energy, and the effectiveness of the model is verified by three actual cases of renewable energy, crude oil and fossil fuel. Compared with the other seven grey models, the results show that the new model has higher prediction performance. Finally, the energy development trend in the next few years is predicted by using the proposed model, and relevant conclusions are drawn according to the prediction results.

EAAI Journal 2022 Journal Article

A novel self-adaptive fractional multivariable grey model and its application in forecasting energy production and conversion of China

  • Yong Wang
  • Li Wang
  • Lingling Ye
  • Xin Ma
  • Wenqing Wu
  • Zhongsen Yang
  • Xinbo He
  • Lei Zhang

Energy production and conversion have a significant impact on the economic development of all countries in the world. China’s energy production and conversion are large. Therefore, accurate mid-to-long term China’s energy production and conversion forecasting is becoming more and more important for integrating energy systems and energy strategic planning. For this purpose, a novel fractional grey sequence is proposed based on Grunwald–Letnikov fractional calculus. Furthermore, a novel self-adaptive fractional multivariable grey model is proposed based on the novel sequence. In this article, we compare several classical optimization algorithms and finally choose Particle Swarm Optimization (PSO) to compute the parameters. In addition, Monte-Carlo simulation and probability density analysis (PDA) are presented in this article to verify the model’s performance. Monte-Carlo simulation reduces the randomness of the results of the model runs to a certain extent. Probability density analysis visualizes this randomness through kernel density estimation (KDE). This paper compares the new model with the existing seven grey models and predicts the total energy consumption per capita, energy conversion efficiency and total renewable energy in China, respectively. The experimental results show that the new model is superior to the other seven models in terms of stability and prediction accuracy.

IJCAI Conference 2022 Conference Paper

Locally Normalized Soft Contrastive Clustering for Compact Clusters

  • Xin Ma
  • Won Hwa Kim

Recent deep clustering algorithms take advantage of self-supervised learning and self-training techniques to map the original data into a latent space, where the data embedding and clustering assignment can be jointly optimized. However, as many recent datasets are enormous and noisy, getting a clear boundary between different clusters is challenging with existing methods that mainly focus on contracting similar samples together and overlooking samples near boundary of clusters in the latent space. In this regard, we propose an end-to-end deep clustering algorithm, i. e. , Locally Normalized Soft Contrastive Clustering (LNSCC). It takes advantage of similarities among each sample's local neighborhood and globally disconnected samples to leverage positiveness and negativeness of sample pairs in a contrastive way to separate different clusters. Experimental results on various datasets illustrate that our proposed approach achieves outstanding clustering performance over most of the state-of-the-art clustering methods for both image and non-image data even without convolution.

IROS Conference 2019 Conference Paper

Fast and Incremental Loop Closure Detection Using Proximity Graphs

  • Shan An
  • Guangfu Che
  • Fangru Zhou
  • Xianglong Liu 0001
  • Xin Ma
  • Yu Chen

Visual loop closure detection, which can be considered as an image retrieval task, is an important problem in SLAM (Simultaneous Localization and Mapping) systems. The frequently used bag-of-words (BoW) models can achieve high precision and moderate recall. However, the requirement for lower time costs and fewer memory costs for mobile robot applications is not well satisfied. In this paper, we propose a novel loop closure detection framework titled FILD’ (Fast and Incremental Loop closure Detection), which focuses on an on-line and incremental graph vocabulary construction for fast loop closure detection. The global and local features of frames are extracted using the Convolutional Neural Networks (CNN) and SURF on the GPU, which guarantee extremely fast extraction speeds. The graph vocabulary construction is based on one type of proximity graph, named Hierarchical Navigable Small World (HNSW) graphs, which is modified to adapt to this specific application. In addition, this process is coupled with a novel strategy for real-time geometrical verification, which only keeps binary hash codes and significantly saves on memory usage. Extensive experiments on several publicly available datasets show that the proposed approach can achieve fairly good recall at 100% precision compared to other state-of-the-art methods. The source code can be downloaded at https://github.com/AnshanTJU/FILD for further studies.

JBHI Journal 2014 Journal Article

Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine

  • Xin Ma
  • Haibo Wang
  • Bingxia Xue
  • Mingang Zhou
  • Bing Ji
  • Yibin Li

Falls are one of the major causes leading to injury of elderly people. Using wearable devices for fall detection has a high cost and may cause inconvenience to the daily lives of the elderly. In this paper, we present an automated fall detection approach that requires only a low-cost depth camera. Our approach combines two computer vision techniques-shape-based fall characterization and a learning-based classifier to distinguish falls from other daily actions. Given a fall video clip, we extract curvature scale space (CSS) features of human silhouettes at each frame and represent the action by a bag of CSS words (BoCSS). Then, we utilize the extreme learning machine (ELM) classifier to identify the BoCSS representation of a fall from those of other actions. In order to eliminate the sensitivity of ELM to its hyperparameters, we present a variable-length particle swarm optimization algorithm to optimize the number of hidden neurons, corresponding input weights, and biases of ELM. Using a low-cost Kinect depth camera, we build an action dataset that consists of six types of actions (falling, bending, sitting, squatting, walking, and lying) from ten subjects. Experimenting with the dataset shows that our approach can achieve up to 91. 15% sensitivity, 77. 14% specificity, and 86. 83% accuracy. On a public dataset, our approach performs comparably to state-of-the-art fall detection methods that need multiple cameras.