Arrow Research search

Author name cluster

Xin Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers
1 author row

Possible papers

25

AAAI Conference 2026 Conference Paper

Edge Self-Adversarial Augmentation Enhances Graph Contrastive Learning Against Neighborhood Inconsistency

  • Chunchun Chen
  • Xing Wei
  • Jiayi Yang
  • Chenrun Wang
  • Yiwei Fu
  • Yuxing Zhang
  • Xin Sun
  • Rui Fan

Recent studies have shown that unsupervised graph contrastive learning (GCL) is vulnerable to adversarial attacks. Automatic adversarial augmentation techniques are proposed to improve both the effectiveness and robustness of GCL. Existing methods typically regard unsupervised contrastive loss as the adversarial goal, essentially aiming to maximize inter-view instance-wise discrepancies between adversarial and original views. However, such attacks overlook intra-view neighborhood inconsistency, which hinders the robustness of GCL models against local neighborhood noises, resulting in performance degradation on low-homophily graphs. To tackle this issue, we propose a novel adversarial contrastive paradigm, named Edge self-aDversarial Augmentation for Graph Contrastive Learning (EDA-GCL). We theoretically establish that the adversarial objective of the intra-view neighborhood is equivalent to maximizing the discrepancy between bidirectional edge features. Hence, we build our adversarial framework based on edge self-adversarial learning. It generates pairwise adversarial augmentations from the original view by learning distinct neighborhood connectivity structures. The learned pairwise adversarial views are utilized for GCL model training in the minimization stage. Notably, this edge-level adversarial approach reduces the computational complexity to the level of the edge number. Experiments on various graph tasks and complex noise scenarios demonstrate the superiority and robustness of our EDA-GCL.

AAAI Conference 2026 Conference Paper

State-Derivative-Aware Neural Controlled Differential Equations for Multivariate Time Series Anomaly Detection and Diagnosis

  • Xin Sun
  • Heng Zhou
  • Yuhao Wu
  • Chao Li

Multivariate time series anomaly detection is a crucial factor in real-world applications but a challenging task due to the complex temporal dependencies and system dynamics. Reconstruction-based methods have made great improvements in recent years. However, we observe an issue these methods are suffering, that they primarily measure deviations in the time points themselves when performing anomaly detection but ignore changes in the dynamic properties of the system. In these cases, they are unable to produce sufficient reconstruction errors to detect anomalies, so some potential abnormal time points caused by the dynamic evolution of the system are missing. To address this problem, we propose a novel method, SDA2D, which models system dynamics by the derivative of the NCDE-derived state vector with respect to time, enabling the learning of reconstruction deviation and system evolution jointly. Our experimental results show that SDA2D achieves noticeable improvements in four benchmark datasets, and the visualization also provides further instructions for anomaly diagnosis, which helps locate the sources of these anomalies.

EAAI Journal 2025 Journal Article

A comb concatenation diffusion model for hyperspectral image super-resolution

  • Yinghao Xu
  • Hao Wang
  • Xin Sun
  • Qianlong Xie
  • Wenwen Zhang
  • Peng Ren
  • Fei Zhou
  • Susanto Rahardja

One primary challenge of hyperspectral image super-resolution is addressing the high dimensionality and complexity of both the spectral and spatial domains. The diffusion model incorporates conditional knowledge and utilizes a step-by-step denoising method, thereby skillfully coping with the high dimensionality and complexity of hyperspectral images. However, a major difficulty encountered by diffusion-based hyperspectral image super-resolution techniques is the accurate use of conditional knowledge to guide and limit the denoising process. Additionally, the lack of sufficient conditional knowledge results in inadequate information for generating detailed high-resolution images. To address these issues, we develop a comb concatenation strategy specifically designed for diffusion models. This strategy strengthens the accuracy of conditional knowledge in guiding and constraining the noisy image through the comb concatenation of conditional knowledge and noisy images. Additionally, we propose a comb concatenation diffusion model for hyperspectral image super-resolution, consisting of three components. The first component is the comb concatenation block, which accurately utilizes conditional knowledge to guide the denoising process. The second component is the conditional encoding block, responsible for generating rich conditional knowledge. Finally, we develop a noise prediction block tailored for the hyperspectral image super-resolution diffusion framework. This block effectively manages accurate and comprehensive conditional knowledge representations. The three complementary components work together to maintain both spatial resolution and spectral fidelity. Extensive experimental results on the Houston, Chikusei, and Qingdao university of science and technology-1 datasets demonstrate that our framework outperforms state-of-the-art methods in both quantitative evaluations and visual quality across various scenarios. We release our source code at https: //gitee. com/YinghaoXU/ISEDM for public evaluations.

EAAI Journal 2025 Journal Article

Effective fuzzing testcase generation based on variational auto-encoder generative adversarial network

  • Zhongyuan Qin
  • Jiarong Fan
  • Xujian Liu
  • Zeru Li
  • Xin Sun

Fuzzing is an effective way to detect vulnerability, yet the quality of initial testcases greatly affects it. Recently generative adversarial network(GAN) has been applied to the automatic generation of testcases, which are expected between testcases that fully conform to the syntax format and the random testcases that are probably rejected. These testcases can cover new edges of programs to find bugs. However, there are still some problems in the current approaches based on GAN, such as unstable training process, single generation method, and generating testcases that are difficult to find new edges. In this paper, a novel fuzzing testcase generation technique based on variational auto-encoder generative adversarial network(VAE-GAN) is proposed, which introduces the representation learning process of the variational auto-encoder(VAE) for traditional GANs. We innovate two generation methods that the generator can learn and generate testcases according to the feature information of the training set, which can improve the stability of training and generates more diversified testcases. We use generated testcases as the initial testcases for fuzzing tools such as AFL++ to execute common Linux programs. The experiments show that the testcases generated by VAE-GAN outperform other GANs in 7/9 of target programs in terms of the unique tuples triggered; compared with the training set generated by AFL++ mutation, testcases generated by VAE-GAN can increase 57. 11% edge discovery, 57. 14% unique crash discovery and 85. 26% unique hang discovery for target programs at most; the initial testcases generated by VAE-GAN also optimizes the performance of Fairfuzz and Angora, reflecting the scalability of the VAE-GAN model.

TIST Journal 2025 Journal Article

Enhancing Aspect Sentiment Classification with Dual-Channel Graph Convolutional Network

  • Xin Sun
  • Yongqing Mi
  • Hongao Li

Aspect sentiment classification (ASC) constitutes a crucial research area within sentiment analysis tasks, aiming to predict sentiment polarity toward different aspects in given contexts. Identifying the relations between aspects and sentiments can be a challenging task, as aspects and sentiments are not always predefined. Most existing studies have demonstrated the effectiveness of using dependency parsing tree and graph convolutional network (GCN), achieving good experimental results. However, existing methods have mainly focused on either semantic or syntactic information individually and may introduce errors when the input sentence lacks clear syntactic information. To address these issues, we propose a novel approach based on Dual-Channel Graph Convolutional Network (DC-GCN), which integrates feature fusion within a dual-channel architecture. Our model can effectively capture the semantic information and enhance the feature representation of syntactic structures by introducing the multi-head self-attention graph convolution, guided by the TopK strategy, and the directional densely connected graph convolutional network. We further employ a bi-affine strategy and multi-layer perceptron to integrate semantic and syntactic information. Experimental results on publicly available datasets demonstrate the superior performance of our model over state-of-the-art methods. Specifically, our model improves upon baseline models on the Twitter, Lap14, Rest14, Rest15, and Rest16 datasets, with increases in Accuracy/Macro-F1 scores of 0.06/0.58, 0.58/0.47, 0.25/1.19, 0.23/1.05, and 0.36/1.32, respectively.

AAAI Conference 2025 Conference Paper

Multi-view Consistent 3D Panoptic Scene Understanding

  • Xianzhu Liu
  • Xin Sun
  • Haozhe Xie
  • Zonglin Li
  • Ru Li
  • Shengping Zhang

3D panoptic scene understanding seeks to create novel view images with 3D-consistent panoptic segmentation, which is crucial for many vision and robotics applications. Mainstream methods (e.g., Panoptic Lifting) directly use machine-generated 2D panoptic segmentation masks as training labels. However, these generated masks often exhibit multi-view inconsistencies, leading to ambiguities during the optimization process. To address this, we present Multi-view Consistent 3D Panoptic Scene Understanding (MVC-PSU), featuring two key components: 1) Probabilistic Semantic Aligner, which associates semantic information of corresponding pixels across multiple views by probabilistic alignment to ensure that predicted panoptic segmentation masks are consistent across different views. 2) Geometric Consistency Enforcer, which uses multi-view projection and monocular depth consistency to ensure that the geometry of the reconstructed scene is accurate and consistent across different views. Experimental results demonstrate that the proposed MVC-PSU surpasses state-of-the-art methods on the ScanNet, Replica, and HyperSim datasets.

NeurIPS Conference 2025 Conference Paper

Multivariate Time Series Anomaly Detection with Idempotent Reconstruction

  • Xin Sun
  • Heng Zhou
  • Chao Li

Reconstruction-based methods are competitive choices for multivariate time series anomaly detection (MTS AD). However, one challenge these methods may suffer is over generalization, where abnormal inputs are also well reconstructed. In addition, balancing robustness and sensitivity is also important for final performance, as robustness ensures accurate detection in potentially noisy data, while sensitivity enables early detection of subtle anomalies. To address these problems, inspired by idempotent generative network, we take the view from the manifold and propose a novel module named I dempotent G eneration for A nomaly D etection (IGAD) which can be flexibly combined with a reconstruction-based method without introducing additional trainable parameters. We modify the manifold to make sure that normal time points can be mapped onto it while tightening it to drop out abnormal time points simultaneously. Regarding the latest findings of AD metrics, we evaluated IGAD on various methods with four real-world datasets, and they achieve visible improvements in VUS-PR than their predecessors, demonstrating the effective potential of IGAD for further improvements in MTS AD tasks. Our instructions on integrating IGAD into customized models and example codes are available at https: //github. com/ProEcho1/Idempotent-Generation-for-Anomaly-Detection-IGAD.

AAAI Conference 2025 Conference Paper

OTPNet: ODE-inspired Tuning-free Proximal Network for Remote Sensing Image Fusion

  • Wei Yu
  • Zonglin Li
  • Qinglin Liu
  • Xin Sun

Remote sensing image fusion aims to reconstruct a high spatial and spectral resolution image by integrating the spatial and spectral information from multiple remote sensing sensor data. Despite the remarkable progress of deep learning-based fusion methods, most existing methods rely on manual network architecture design and hyperparameter tuning, lacking sufficient interpretability and adaptability. To address this limitation, we propose a novel neural Ordinary Differential Equation (ODE)-inspired tuning-free proximal splitting algorithm, which splits remote sensing image fusion as two optimization problems regularized by deep priors to model the fusion of spatial and spectral. Firstly, based on the physical properties of spatial and spectral information, the two problems are optimized by two proximal splitting operators to iteratively integrate spatial-spectral complementary information, eliminating or suppressing redundant information to reduce fusion errors. Secondly, considering the efficiency of neural ODE in reducing optimization error, we utilize a high-order numerical scheme to customize the proximal operator theoretically without additional handcrafted design and parameter tuning. Finally, by incorporating the numerical scheme as a solver into the proximal optimization algorithm, we derive an ODE-inspired Tuning-free Proximal Network, dubbed OTPNet, which achieves efficient and robust fusion reconstruction. Extensive experiments on nine datasets across three different remote sensing image fusion tasks show that our OTPNet outperforms existing state-of-the-art approaches, which validates the effectiveness of our method.

AAAI Conference 2025 Conference Paper

Path-Adaptive Matting for Efficient Inference Under Various Computational Cost Constraints

  • Qinglin Liu
  • Zonglin Li
  • Xiaoqian Lv
  • Xin Sun
  • Ru Li
  • Shengping Zhang

In this paper, we explore a novel image matting task aimed at achieving efficient inference under various computational cost constraints, specifically FLOP limitations, using a single matting network. Existing matting methods which have not explored scalable architectures or path-learning strategies, fail to tackle this challenge. To overcome these limitations, we introduce Path-Adaptive Matting (PAM), a framework that dynamically adjusts network paths based on image contexts and computational cost constraints. We formulate the training of the computational cost-constrained matting network as a bilevel optimization problem, jointly optimizing the matting network and the path estimator. Building on this formalization, we design a path-adaptive matting architecture by incorporating path selection layers and learnable connect layers to estimate optimal paths and perform efficient inference within a unified network. Furthermore, we propose a performance-aware path-learning strategy to generate path labels online by evaluating a few paths sampled from the prior distribution of optimal paths and network estimations, enabling robust and efficient online path learning. Experiments on five image matting datasets demonstrate that the proposed PAM framework achieves competitive performance across a range of computational cost constraints.

AAAI Conference 2025 Conference Paper

Procedure Knowledge Decoupled Distillation Strategy for Procedure Planning in Instructional Videos

  • Xiaotian Pan
  • Zhaobo Qi
  • Xin Sun
  • Yuanrong Xu
  • Weigang Zhang

Procedure planning in instructional videos, producing a structured and plannable action sequence facilitating the transition from the start to the goal states, has achieved significant progress. The dominant single-branch non-autoregressive planning paradigm guides action sequence generation through action labels, overlooking the limitation of the absence of intermediate visual information. Hence, we introduce the procedure knowledge decoupled distillation strategy to address the above issue. This innovative strategy deliberately lets the teacher model see the real visual information among the start and goal states to enhance its action semantic understanding and relationship modeling ability, producing the potential probability distribution containing the real action class and other action classes that may occur. Accordingly, we introduce a decoupled intermediate information knowledge distillation loss, which comprises single action knowledge distillation and sequence distribution knowledge distillation for the student model. The former improves the student model's precise inference ability for individual actions by transferring knowledge of a single action target category using binary classification loss. Conversely, the latter uses MSE loss to constrain the student model to learn the action sequence probability distribution from the teacher model, thereby enhancing the student model's global planning capability. Extensive experiments on three datasets demonstrate that our strategy can improve the performance of multiple weakly supervised models, achieving promising procedure knowledge modeling ability and plug-and-play flexibility.

AAAI Conference 2025 Conference Paper

ProsodyTalker: 3D Visual Speech Animation via Prosody Decomposition

  • Zonglin Li
  • Xiaoqian Lv
  • Qinglin Liu
  • Quanling Meng
  • Xin Sun
  • Shengping Zhang

Most existing 3D visual speech animation methods synthesize lip movements synchronized with speech, which however neglect head poses and therefore degrade the animation realism. The animation of head poses presents two primary challenges: (1) the intricate mapping between speech and head poses remains poorly understood and (2) the absence of 4D face datasets featuring realistic head poses. Inspired by prosody decomposition in speech processing, we discern that head movements correlate with the fundamental frequency (F0) of speech prosody, while lip movements align with the language content. These observations motivate us to propose a novel framework, dubbed ProsodyTalker, that concurrently synthesizes lip and head movements, grounded in the principles of prosody decomposition. The core idea is first to adopt information perturbation to explicitly decompose the speech prosody into pose-related F0 and lip-related language content. Then, an autoregressive content-oriented fusion decoder is employed to enhance lip synchronization in the synthesized facial sequences. To synthesize head poses, we design a transformer-based variational autoencoder to learn a latent distribution of facial sequences and propose an F0-conditioned latent diffusion model to establish a probabilistic mapping from F0 to pose-related latent codes. Furthermore, we contribute a large-scale 4D face dataset containing bunches of variations in identities, head poses and facial motions. Extensive experiments show that our method achieves more realistic animation than state-of-the-art methods.

JBHI Journal 2025 Journal Article

Video Object Segmentation with Optimal Frame Auto-selection Based on Prior Knowledge for Midbrain Assessment in Transcranial Ultrasound

  • Xinyi Wang
  • Sai Kit LAM
  • Hongyu KANG
  • Yu Sun
  • Chao HOU
  • Shuai Li
  • Xin Sun
  • Fangxian LI

Transcranial sonography (TCS) provides a non-invasive means of assessing movement disorders such as Parkinson's disease (PD). However, current TCS-based evaluations rely heavily on manual operation by experienced physicians, making the process time-consuming and physician-dependent. For the first time, we aimed to develop a hybrid pipeline for real-time video object segmentation (VOS) and automatic optimal frame selection. Eighty-three standardized TCS real-time data comprising 1, 992 midbrain frames from Beijing Tiantan Hospital were collected. We adopted three state-of-the-art VOS models (STCN, RDE-VOS, and XMEM) and incorporated anatomical priors to guide optimal frame selection. Specifically, we leveraged the anatomical trend of midbrain morphology to estimate the midbrain radius at the optimal frame and selected the frame where the VOS-segmented midbrain best matched this estimate. The XMEM-based pipeline achieved high segmentation performance (Jaccard: 0. 85, Boundary Accuracy: 0. 95, Dice: 0. 92) and optimal frame selection (Distance: 4. 87; Jaccard: 0. 92), with efficiency (51. 05 FPS, 0. 56 s/patient, 661. 55 MB). Subgroup analyses confirmed robustness across image quality and PD conditions. Assessment of a junior physician's selection suggests potential to reduce the expertise gap in optimal frame selection. The proposed hybrid pipeline offers an automated tool for midbrain assessment using TCS, which may help reduce physicians' workload and minimize subjectivity, particularly supporting junior physicians in mitigating the expertise-demanding nature of TCS. This approach may serve as a foundation for more promising TCS-based assessments in the future, contributing to broader adoption of non-invasive ultrasound techniques in PD evaluation.

NeurIPS Conference 2024 Conference Paper

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

  • Desai Xie
  • Sai Bi
  • Zhixin Shu
  • Kai Zhang
  • Zexiang Xu
  • Yi Zhou
  • Sören Pirk
  • Arie Kaufman

We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e. g. , height fields, boolean differences, and wireframes). Unlike previous 3D datasets (e. g. , Objaverse) which are often captured or crafted by humans to approximate real 3D data, Zeroverse completely ignores realistic global semantics but is rich in complex geometric and texture details that are locally similar to or even more intricate than real objects. We demonstrate that our LRM-Zero, trained with our fully synthesized Zeroverse, can achieve high visual quality in the reconstruction of real-world objects, competitive with models trained on Objaverse. We also analyze several critical design choices of Zeroverse that contribute to LRM-Zero's capability and training stability. Our work demonstrates that 3D reconstruction, one of the core tasks in 3D vision, can potentially be addressed without the semantics of real-world objects. The Zeroverse's procedural synthesis code and interactive visualization are available at: https: //desaixie. github. io/lrm-zero/.

NeurIPS Conference 2024 Conference Paper

Pin-Tuning: Parameter-Efficient In-Context Tuning for Few-Shot Molecular Property Prediction

  • Qiang Liu
  • Shaozhen Liu
  • Xin Sun
  • Shu Wu
  • Liang Wang

Molecular property prediction (MPP) is integral to drug discovery and material science, but often faces the challenge of data scarcity in real-world scenarios. Addressing this, few-shot molecular property prediction (FSMPP) has been developed. Unlike other few-shot tasks, FSMPP typically employs a pre-trained molecular encoder and a context-aware classifier, benefiting from molecular pre-training and molecular context information. Despite these advancements, existing methods struggle with the ineffective fine-tuning of pre-trained encoders. We attribute this issue to the imbalance between the abundance of tunable parameters and the scarcity of labeled molecules, and the lack of contextual perceptiveness in the encoders. To overcome this hurdle, we propose a parameter-efficient in-context tuning method, named Pin-Tuning. Specifically, we propose a lightweight adapter for pre-trained message passing layers (MP-Adapter) and Bayesian weight consolidation for pre-trained atom/bond embedding layers (Emb-BWC), to achieve parameter-efficient tuning while preventing over-fitting and catastrophic forgetting. Additionally, we enhance the MP-Adapters with contextual perceptiveness. This innovation allows for in-context tuning of the pre-trained encoder, thereby improving its adaptability for specific FSMPP tasks. When evaluated on public datasets, our method demonstrates superior tuning with fewer trainable parameters, improving few-shot predictive performance.

NeurIPS Conference 2023 Conference Paper

GSLB: The Graph Structure Learning Benchmark

  • Zhixun Li
  • Xin Sun
  • Yifan Luo
  • Yanqiao Zhu
  • Dingshuo Chen
  • Yingtao Luo
  • Xiangxin Zhou
  • Qiang Liu

Graph Structure Learning (GSL) has recently garnered considerable attention due to its ability to optimize both the parameters of Graph Neural Networks (GNNs) and the computation graph structure simultaneously. Despite the proliferation of GSL methods developed in recent years, there is no standard experimental setting or fair comparison for performance evaluation, which creates a great obstacle to understanding the progress in this field. To fill this gap, we systematically analyze the performance of GSL in different scenarios and develop a comprehensive Graph Structure Learning Benchmark (GSLB) curated from 20 diverse graph datasets and 16 distinct GSL algorithms. Specifically, GSLB systematically investigates the characteristics of GSL in terms of three dimensions: effectiveness, robustness, and complexity. We comprehensively evaluate state-of-the-art GSL algorithms in node- and graph-level tasks, and analyze their performance in robust learning and model complexity. Further, to facilitate reproducible research, we have developed an easy-to-use library for training, evaluating, and visualizing different GSL methods. Empirical results of our extensive experiments demonstrate the ability of GSL and reveal its potential benefits on various downstream tasks, offering insights and opportunities for future research. The code of GSLB is available at: https: //github. com/GSL-Benchmark/GSLB.

IJCAI Conference 2022 Conference Paper

A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model

  • Xin Sun
  • Tao Ge
  • Shuming Ma
  • Jingjing Li
  • Furu Wei
  • Houfeng Wang

Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns. In this paper, we propose a generic and language-independent strategy for multilingual GEC, which can train a GEC system effectively for a new non-English language with only two easy-to-access resources: 1) a pre-trained cross-lingual language model (PXLM) and 2) parallel translation data between English and the language. Our approach creates diverse parallel GEC data without any language-specific operations by taking the non-autoregressive translation generated by PXLM and the gold translation as error-corrected sentence pairs. Then, we reuse PXLM to initialize the GEC model and pre-train it with the synthetic data generated by itself, which yields further improvement. We evaluate our approach on three public benchmarks of GEC in different languages. It achieves the state-of-the-art results on the NLPCC 2018 Task 2 dataset (Chinese) and obtains competitive performance on Falko-Merlin (German) and RULEC-GEC (Russian). Further analysis demonstrates that our data construction method is complementary to rule-based approaches.

EAAI Journal 2022 Journal Article

Asian stock markets closing index forecast based on secondary decomposition, multi-factor analysis and attention-based LSTM model

  • Jujie Wang
  • Quan Cui
  • Xin Sun
  • Maolin He

The analysis and prediction of stock markets in Asian is an important issue which can help to promote the integration and globalization of financial cooperation. However, owning to the non-stationary and complexity of the stock market fluctuation, it is challenging to predict the stock price accurately. Especially after the decomposition of the original series, how to solve the problem of pseudo information and filter the exogenous variables is often certain challenging. This paper presents a hybrid model based on secondary decomposition (SD), multi-factor analysis (MFA) and attention-based long short-term memory (ALSTM) to predict the stock market price trends of four major Asian countries. The original stock price series is preprocessed by two decomposition algorithms so as to capture further non-linear feature and better filter the noise. Multi-factor analysis is introduced as a supplement to the original data information. In the prediction stage, attention layer is added in long short-term memory model to increase the weights of effective information. Finally, four datasets about Asian stock markets and nine compared models were used to verify the performance of the proposed model. The empirical analysis results show that compared to the general long short-term memory, our proposed model can obtain higher 30% accuracy at least. The mean average percentage errors of the system were also the lowest among all models mentioned in this paper (0. 612%, 0. 903%, 0. 606% and 0. 402% respectively), which proves the effectiveness of the hybrid model.

JBHI Journal 2022 Journal Article

Dynamic Sepsis Prediction for Intensive Care Unit Patients Using XGBoost-Based Model With Novel Time-Dependent Features

  • Shuhui Liu
  • Bo Fu
  • Wen Wang
  • Mei Liu
  • Xin Sun

Sepsis is a systemic inflammatory response caused by pathogens such as bacteria. Because its pathogenesis is not clear, the clinical manifestations of patients vary greatly, and the alarming incidence and mortality pose a great threat to patients and medical systems, especially in the ICU (Intensive Care Unit). The traditional judgment criteria have the problem of low specificity. Artificial intelligence models could greatly improve the accuracy of sepsis prediction and judgment. Based on the XGBoost machine learning framework taking demographic, vital signs, laboratory tests and medical intervention data as input, this paper proposes a novel model for dynamically predicting sepsis and assessing risk. To realize the model, two methods for feature construction are introduced. For the observed time-series data of vital signs and laboratory tests, the time-dependent method performs to construct the time-dependent characteristics after the statistical screening. For the clinical intervention data, the statistical counting method is applied to construct count-dependent characteristics. Moreover, a new objective function is proposed for the XGBoost framework, and the first-order and second-order gradients of the objective function are also given for model training. Compared with the state-of-the-art methods at present, the proposed model has the best performance, with AUROC improved by 5. 4% on the MIMIC-III dataset and 2. 1% on PhysioNet Challenge 2019 dataset. The data processing and training methods of this model can be conveniently applied in different electronic health record systems and has a wide application prospect.

IJCAI Conference 2022 Conference Paper

Relational Triple Extraction: One Step is Enough

  • Yu-Ming Shang
  • Heyan Huang
  • Xin Sun
  • Wei Wei
  • Xian-Ling Mao

Extracting relational triples from unstructured text is an essential task in natural language processing and knowledge graph construction. Existing approaches usually contain two fundamental steps: (1) finding the boundary positions of head and tail entities; (2) concatenating specific tokens to form triples. However, nearly all previous methods suffer from the problem of error accumulation, i. e. , the boundary recognition error of each entity in step (1) will be accumulated into the final combined triples. To solve the problem, in this paper, we introduce a fresh perspective to revisit the triple extraction task and propose a simple but effective model, named DirectRel. Specifically, the proposed model first generates candidate entities through enumerating token sequences in a sentence, and then transforms the triple extraction task into a linking problem on a ``head -> tail" bipartite graph. By doing so, all triples can be directly extracted in only one step. Extensive experimental results on two widely used datasets demonstrate that the proposed model performs better than the state-of-the-art baselines.

TCS Journal 2021 Journal Article

Deterministic approximation algorithm for submodular maximization subject to a matroid constraint

  • Xin Sun
  • Dachuan Xu
  • Longkun Guo
  • Min Li

In this paper, we study the generalized submodular maximization problem with a non-negative monotone submodular set function as the objective function and subject to a matroid constraint. The problem is generalized through the curvature parameter α ∈ [ 0, 1 ] which measures how far a set function deviates from linearity to submodularity. We propose a deterministic approximation algorithm which uses the approximation algorithm proposed by Buchbinder et al. [2] as a building block and inherits the approximation guarantee for α = 1. For general value of the curvature parameter α ∈ [ 0, 1 ], we present an approximation algorithm with a factor of 1 + h α ( y ) + Δ ⋅ [ 3 + α − ( 2 + α ) y − ( 1 + α ) h α ( y ) ] 2 + α + ( 1 + α ) ( 1 − y ), where y ∈ [ 0, 1 ] is a predefined parameter for tuning the ratio. In particular, when α = 1 we obtain a ratio 0. 5008 when setting y = 0. 9, coinciding with the renowned state-of-art approximate ratio; when α = 0 that the object is a linear function, the approximation factor equals one and our algorithm is indeed an exact algorithm that always produces optimum solutions.

EAAI Journal 2020 Journal Article

A scale-adaptive positive selection algorithm based on B-cell immune mechanisms for anomaly detection

  • Hongli Zhang
  • Zhongyuan Ren
  • Shaojie Xin
  • Shulin Liu
  • Chao Lan
  • Xin Sun

In current anomaly detection immune algorithms, the methods for setting the detection radius of detectors fail to take into account the concentration characteristic of self samples, which weaken their application effect. In response to this deficiency, we proposed a new type of detector named the scale-adaptive B-cells (SAB-cells) detector, and a novel algorithm named scale-adaptive positive selection algorithm (SA-PSA). This algorithm is mainly based on the B-cell immune mechanisms of clonal variation and network suppression. In SA-PSA, the detection radius of SAB-cells can be adaptively adjusted by clonal variation, and the number of redundant SAB-cells can be effectively compressed by fusion variation, so as to eventually obtain efficient detectors. Based on the Iris data set, firstly, we analyzed the effects of three main control parameters on SA-PSA; secondly, we compared SA-PSA with other mainstream anomaly detection immune algorithms by three performance indicators; thirdly, we performed the analysis of receiver operating characteristic (ROC) curve and verified the effectiveness of SA-PSA. At last, we also applied SA-PSA to bearing anomaly detection and further verified its effectiveness in more complicated engineering applications.

AAAI Conference 2020 Conference Paper

Are Noisy Sentences Useless for Distant Supervised Relation Extraction?

  • Yuming Shang
  • He-Yan Huang
  • Xian-Ling Mao
  • Xin Sun
  • Wei Wei

The noisy labeling problem has been one of the major obstacles for distant supervised relation extraction. Existing approaches usually consider that the noisy sentences are useless and will harm the model’s performance. Therefore, they mainly alleviate this problem by reducing the influence of noisy sentences, such as applying bag-level selective attention or removing noisy sentences from sentence-bags. However, the underlying cause of the noisy labeling problem is not the lack of useful information, but the missing relation labels. Intuitively, if we can allocate credible labels for noisy sentences, they will be transformed into useful training data and benefit the model’s performance. Thus, in this paper, we propose a novel method for distant supervised relation extraction, which employs unsupervised deep clustering to generate reliable labels for noisy sentences. Specifically, our model contains three modules: a sentence encoder, a noise detector and a label generator. The sentence encoder is used to obtain feature representations. The noise detector detects noisy sentences from sentence-bags, and the label generator produces high-confidence relation labels for noisy sentences. Extensive experimental results demonstrate that our model outperforms the state-of-the-art baselines on a popular benchmark dataset, and can indeed alleviate the noisy labeling problem.

AAAI Conference 2020 Conference Paper

Learning Deep Relations to Promote Saliency Detection

  • Changrui Chen
  • Xin Sun
  • Yang Hua
  • Junyu Dong
  • Hongwei Xv

Though saliency detectors has made stunning progress recently. The performances of the state-of-the-art saliency detectors are not acceptable in some confusing areas, e. g. , object boundary. We argue that the feature spatial independence should be one of the root cause. This paper explores the ubiquitous relations on the deep features to promote the existing saliency detectors efficiently. We establish the relation by maximizing the mutual information of the deep features of the same category via deep neural networks to break this independence. We introduce a threshold-constrained training pair construction strategy to ensure that we can accurately estimate the relations between different image parts in a selfsupervised way. The relation can be utilized to further excavate the salient areas and inhibit confusing backgrounds. The experiments demonstrate that our method can significantly boost the performance of the state-of-the-art saliency detectors on various benchmark datasets. Besides, our model is label-free and extremely efficient. The inference speed is 140 FPS on a single GTX1080 GPU.

AAAI Conference 2019 Conference Paper

Network Structure and Transfer Behaviors Embedding via Deep Prediction Model

  • Xin Sun
  • Zenghui Song
  • Junyu Dong
  • Yongbo Yu
  • Claudia Plant
  • Christian Böhm

Network-structured data is becoming increasingly popular in many applications. However, these data present great challenges to feature engineering due to its high non-linearity and sparsity. The issue on how to transfer the link-connected nodes of the huge network into feature representations is critical. As basic properties of the real-world networks, the local and global structure can be reflected by dynamical transfer behaviors from node to node. In this work, we propose a deep embedding framework to preserve the transfer possibilities among the network nodes. We first suggest a degree-weight biased random walk model to capture the transfer behaviors of the network. Then a deep embedding framework is introduced to preserve the transfer possibilities among the nodes. A network structure embedding layer is added into the conventional Long Short-Term Memory Network to utilize its sequence prediction ability. To keep the local network neighborhood, we further perform a Laplacian supervised space optimization on the embedding feature representations. Experimental studies are conducted on various real-world datasets including social networks and citation networks. The results show that the learned representations can be effectively used as features in a variety of tasks, such as clustering, visualization and classification, and achieve promising performance compared with state-of-the-art models.

TIST Journal 2012 Journal Article

Robust Visual Tracking Using an Effective Appearance Model Based on Sparse Coding

  • Shengping Zhang
  • Hongxun Yao
  • Xin Sun
  • Shaohui Liu

Intelligent video surveillance is currently one of the most active research topics in computer vision, especially when facing the explosion of video data captured by a large number of surveillance cameras. As a key step of an intelligent surveillance system, robust visual tracking is very challenging for computer vision. However, it is a basic functionality of the human visual system (HVS). Psychophysical findings have shown that the receptive fields of simple cells in the visual cortex can be characterized as being spatially localized, oriented, and bandpass, and it forms a sparse, distributed representation of natural images. In this article, motivated by these findings, we propose an effective appearance model based on sparse coding and apply it in visual tracking. Specifically, we consider the responses of general basis functions extracted by independent component analysis on a large set of natural image patches as features and model the appearance of the tracked target as the probability distribution of these features. In order to make the tracker more robust to partial occlusion, camouflage environments, pose changes, and illumination changes, we further select features that are related to the target based on an entropy-gain criterion and ignore those that are not. The target is finally represented by the probability distribution of those related features. The target search is performed by minimizing the Matusita distance between the distributions of the target model and a candidate using Newton-style iterations. The experimental results validate that the proposed method is more robust and effective than three state-of-the-art methods.