Arrow Research search

Author name cluster

Xin Hu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

EAAI Journal 2026 Journal Article

Attribute reduction for concept cognition over knowledge graphs

  • Xin Hu
  • Denan Huang
  • Jiangli Duan
  • Zhongying Zhao
  • Sulan Zhang

Concept cognition over knowledge graphs offers prior knowledge, enhancing machine understanding and thinking. However, some redundant attributes may seriously affect the speed of obtaining the above prior knowledge. Attribute reduction is the process of simplifying a dataset by identifying and removing redundant attributes while maintaining the classification or decision-making capabilities, and attribute reduction for concept cognition over knowledge graphs presents two unique characteristics. Therefore, it becomes imperative to put forward an innovative measurement method along with an attribute reduction approach that adapts to the above unique characteristics. First, partition closeness with high discrimination is proposed to avoid over-refinement and reduce crossing, and different from existing measurement methods, it can distinguish attribute sets whose partitions only differ in coarser and finer. A measurement method and a reduction method are proposed and can achieve attribute reduction in the context of concept cognition over knowledge graphs. A deterministic algorithm and a heuristic algorithm are introduced for generating attribute reductions, and an increase in the number of executions can ensure that the heuristic algorithm has both accuracy and speed advantages. The experiments show that attribute reduction can preserve the original characteristics of the data and enhance the efficiency of data analysis.

AAAI Conference 2026 Conference Paper

TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition

  • Wen Yin
  • Siyu Zhan
  • Cencen Liu
  • Xin Hu
  • Guiduo Duan
  • Xiurui Xie
  • Yuan-Fang Li
  • Tao He

Multimodal Emotion Recognition (MER) aims to accurately identify human emotional states by integrating heterogeneous modalities such as visual, auditory, and textual data. Existing approaches predominantly rely on unified emotion labels to supervise model training, often overlooking a critical challenge: inter-modal emotion conflicts, wherein different modalities within the same sample may express divergent emotional tendencies. In this work, we address this overlooked issue by proposing a novel framework, Typicality-based Consistent-aware Multimodal Emotion Recognition (TiCAL), inspired by the stage-wise nature of human emotion perception. TiCAL dynamically assesses the consistency of each training sample by leveraging pseudo unimodal emotion labels alongside a typicality estimation. To further enhance emotion representation, we embed features in a hyperbolic space, enabling the capture of fine-grained distinctions among emotional categories. By incorporating consistency estimates into the learning process, our method improves model performance, particularly on samples exhibiting high modality inconsistency. Extensive experiments on benchmark datasets, e.g, MOSEI and MER2023, validate the effectiveness of TiCAL in mitigating inter-modal emotional conflicts and enhancing overall recognition accuracy, e.g., with about 2.6% improvements over the state-of-the-art DMD.

EAAI Journal 2025 Journal Article

Forecasting train travel times of China–Europe Railway Express through a hybrid deep learning model optimized with a bandit-based approach

  • Yongxiang Zhang
  • Liting Gu
  • Jingwei Guo
  • Xu Yan
  • Xin Hu
  • Zhen-Song Chen

With the globalization of economic trade, the China–Europe Railway Express (CRE) has emerged as a crucial means of international freight transportation. However, since the travel process of CRE trains is subject to various factors (e. g. , customs clearance efficiency, weather changes, etc.), existing models struggle to handle the complex nonlinear characteristics of the travel time data, failing to achieve accurate train travel time predictions. This significantly affects the scheduling and utilization of capacity resources along the CRE routes. To address this issue, this study proposes a novel hybrid deep learning model, i. e. , Discrete Wavelet Transform (DWT)-Convolutional Neural Networks (CNN)-Bidirectional Gated Recurrent Unit (BiGRU) (DWT-CNN-BiGRU). Specifically, the DWT technique is first used to preprocess historical train travel time data to reduce noise interference and improve data quality. Then, the CNN module focuses on extracting local spatial features from the data, whereas the BiGRU module emphasizes its long-term temporal dependencies. Furthermore, a bandit-based approach is applied to hyperparameter optimization to further exploit model potentials. By testing on a real-life CRE dataset, the DWT-CNN-BiGRU model demonstrates superior prediction accuracy with root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) values respectively equal to 10. 7347 h, 7. 5482 h, and 2. 2034%, and it outperforms the other ten popular baseline models. In conclusion, the proposed DWT-CNN-BiGRU model features a lightweight structure and strong robustness, offering reliable technical support to alleviate capacity resource shortages and improve the service quality of CRE.

ECAI Conference 2025 Conference Paper

Learning to Suppress Backgrounds and Bidirectionally Fuse Modalities for RGB-D Gesture Recognition

  • Xin Hu
  • Yunan Li 0001
  • Yulang Xu
  • Yilin Zhang 0007
  • Zixiang Lu
  • Qiguang Miao

RGB-D video-based gesture recognition is a fundamental task in computer vision, yet it remains challenging due to the small size of hand regions and the presence of redundant background noise. Existing methods often fail to effectively suppress irrelevant background features and inadequately exploit the complementary nature of RGB and depth modalities, leading to suboptimal semantic alignment in feature fusion. To address these issues, we propose a novel end-to-end RGB-D gesture recognition framework that incorporates Spatiotemporal Background Suppression (STBS) and a Bidirectional Modality Fusion Adapter (BMFA). STBS leverages Vision Transformers to construct region-wise tokens and adaptively merges them based on responses scores, suppressing irrelevant background content while preserving fine-grained gesture-related spatiotemporal features. Meanwhile, BMFA enables deep, bidirectional interaction between RGB and depth features across encoder layers, enhancing cross-modal semantic consistency. Extensive experiments on three public RGB-D gesture datasets validate the effectiveness of our method. Experiments conducted on three public RGB-D gesture datasets validate the effectiveness of our approach and demonstrate significant improvements in recognition performance. Our code is available at https: //github. com/caicai211/SB-BFM

YNIMG Journal 2025 Journal Article

Right inferior frontal cortex and preSMA in response inhibition: An investigation based on PTC model

  • Lili Wu
  • Mengjie Jiang
  • Min Zhao
  • Xin Hu
  • Jing Wang
  • Kaihua Zhang
  • Ke Jia
  • Fuxin Ren

Response inhibition is an essential component of cognitive function. A large body of literature has used neuroimaging data to uncover the neural architecture that regulates inhibitory control in general and movement cancelation. The presupplementary motor area (preSMA) and the right inferior frontal cortex (rIFC) are the key nodes in the inhibitory control network. However, how these two regions contribute to response inhibition remains controversial. Based on the Pause-then-Cancel Model (PTC), this study employed functional magnetic resonance imaging (fMRI) to investigate the functional specificity of two regions in the stopping process. The Go/No-Go task (GNGT) and the Stop Signal Task (SST) were administered to the same group of participants. We used the GNGT to dissociate the pause process and both the GNGT and the SST to investigate the inhibition mechanism. Imaging data revealed that response inhibition produced by both tasks activated the preSMA and rIFC. Furthermore, an across-participants analysis showed that increased activation in the rIFC was associated with a delay in the go response in the GNGT. In contrast, increased activation in the preSMA was associated with good inhibition efficiency via the striatum in both GNGT and SST. These behavioral and imaging findings support the PTC model of the role of rIFC and preSMA, that the former is involved in a pause process to delay motor responses, whereas the preSMA is involved in the stopping of motor responses.

YNIMG Journal 2024 Journal Article

Brain extended and closed forms glutathione levels decrease with age and extended glutathione is associated with visuospatial memory

  • Xin Hu
  • Keyu Pan
  • Min Zhao
  • Jiali Lv
  • Jing Wang
  • Xiaofeng Zhang
  • Yuxi Liu
  • Yulu Song

During aging, the brain is subject to greater oxidative stress (OS), which is thought to play a critical role in cognitive impairment. Glutathione (GSH), as a major antioxidant in the brain, can be used to combat OS. However, how brain GSH levels vary with age and their associations with cognitive function is unclear. In this study, we combined point-resolved spectroscopy and edited spectroscopy sequences to investigate extended and closed forms GSH levels in the anterior cingulate cortex (ACC), posterior cingulate cortex (PCC), and occipital cortex (OC) of 276 healthy participants (extended form, 166 females, age range 20-70 years) and 15 healthy participants (closed form, 7 females, age range 26-56 years), and examined their relationships with age and cognitive function. The results revealed decreased extended form GSH levels with age in the PCC among 276 participants. Notably, the timecourse of extended form GSH level changes in the PCC and ACC differed between males and females. Additionally, positive correlations were observed between extended form GSH levels in the PCC and OC and visuospatial memory. Additionally, a decreased trend of closed form GSH levels with age was also observed in the PCC among 15 participants. Taken together, these findings enhance our understanding of the brain both closed and extended form GSH time course during normal aging and associations with sex and memory, which is an essential first step for understanding the neurochemical underpinnings of healthy aging.

NeurIPS Conference 2024 Conference Paper

Learning 3D Equivariant Implicit Function with Patch-Level Pose-Invariant Representation

  • Xin Hu
  • Xiaole Tang
  • Ruixuan Yu
  • Jian Sun

Implicit neural representation gains popularity in modeling the continuous 3D surface for 3D representation and reconstruction. In this work, we are motivated by the fact that the local 3D patches repeatedly appear on 3D shapes/surfaces if the factor of poses is removed. Based on this observation, we propose the 3D patch-level equivariant implicit function (PEIF) based on the 3D patch-level pose-invariant representation, allowing us to reconstruct 3D surfaces by estimating equivariant displacement vector fields for query points. Specifically, our model is based on the pose-normalized query/patch pairs and enhanced by the proposed intrinsic patch geometry representation, modeling the intrinsic 3D patch geometry feature by learnable multi-head memory banks. Extensive experiments show that our model achieves state-of-the-art performance on multiple surface reconstruction datasets, and also exhibits better generalization to crossdataset shapes and robustness to arbitrary rotations. Our code will be available at https: //github. com/mathXin112/PEIF. git.

ICML Conference 2024 Conference Paper

Residual-Conditioned Optimal Transport: Towards Structure-Preserving Unpaired and Paired Image Restoration

  • Xiaole Tang
  • Xin Hu
  • Xiang Gu 0005
  • Jian Sun 0009

Deep learning-based image restoration methods generally struggle with faithfully preserving the structures of the original image. In this work, we propose a novel Residual-Conditioned Optimal Transport (RCOT) approach, which models image restoration as an optimal transport (OT) problem for both unpaired and paired settings, introducing the transport residual as a unique degradation-specific cue for both the transport cost and the transport map. Specifically, we first formalize a Fourier residual-guided OT objective by incorporating the degradation-specific information of the residual into the transport cost. We further design the transport map as a two-pass RCOT map that comprises a base model and a refinement process, in which the transport residual is computed by the base model in the first pass and then encoded as a degradation-specific embedding to condition the second-pass restoration. By duality, the RCOT problem is transformed into a minimax optimization problem, which can be solved by adversarially training neural networks. Extensive experiments on multiple restoration tasks show that RCOT achieves competitive performance in terms of both distortion measures and perceptual quality, restoring images with more faithful structures as compared with state-of-the-art methods.

IJCAI Conference 2023 Conference Paper

Diagram Visual Grounding: Learning to See with Gestalt-Perceptual Attention

  • Xin Hu
  • Lingling Zhang
  • Jun Liu
  • Xinyu Zhang
  • Wenjun Wu
  • Qianying Wang

Diagram visual grounding aims to capture the correlation between language expression and local objects in the diagram, and plays an important role in the applications like textbook question answering and cross-modal retrieval. Most diagrams consist of several colors and simple geometries. This results in sparse low-level visual features, which further aggravates the gap between low-level visual and high-level semantic features of diagrams. The phenomenon brings challenges to the diagram visual grounding. To solve the above issues, we propose a gestalt-perceptual attention model to align the diagram objects and language expressions. For low-level visual features, inspired by the gestalt that simulates human visual system, we build a gestalt-perception graph network to make up the features learned by the traditional backbone network. For high-level semantic features, we design a multi-modal context attention mechanism to facilitate the interaction between diagrams and language expressions, so as to enhance the semantics of diagrams. Finally, guided by diagram features and linguistic embedding, the target query is gradually decoded to generate the coordinates of the referred object. By conducting comprehensive experiments on diagrams and natural images, we demonstrate that the proposed model achieves superior performance over the competitors. Our code will be released at https: //github. com/AIProCode/GPA.

AAAI Conference 2023 Conference Paper

GPTR: Gestalt-Perception Transformer for Diagram Object Detection

  • Xin Hu
  • Lingling Zhang
  • Jun Liu
  • Jinfu Fan
  • Yang You
  • Yaqiang Wu

Diagram object detection is the key basis of practical applications such as textbook question answering. Because the diagram mainly consists of simple lines and color blocks, its visual features are sparser than those of natural images. In addition, diagrams usually express diverse knowledge, in which there are many low-frequency object categories in diagrams. These lead to the fact that traditional data-driven detection model is not suitable for diagrams. In this work, we propose a gestalt-perception transformer model for diagram object detection, which is based on an encoder-decoder architecture. Gestalt perception contains a series of laws to explain human perception, that the human visual system tends to perceive patches in an image that are similar, close or connected without abrupt directional changes as a perceptual whole object. Inspired by these thoughts, we build a gestalt-perception graph in transformer encoder, which is composed of diagram patches as nodes and the relationships between patches as edges. This graph aims to group these patches into objects via laws of similarity, proximity, and smoothness implied in these edges, so that the meaningful objects can be effectively detected. The experimental results demonstrate that the proposed GPTR achieves the best results in the diagram object detection task. Our model also obtains comparable results over the competitors in natural image object detection.

JBHI Journal 2023 Journal Article

Personality in Daily Life: Multi-Situational Physiological Signals Reflect Big-Five Personality Traits

  • Xinyu Shui
  • Yiling Chen
  • Xin Hu
  • Fei Wang
  • Dan Zhang

The popularity of wearable physiological recording devices has opened up new possibilities for the assessment of personality traits in everyday life. Compared with traditional questionnaires or laboratory assessments, wearable device-based measurements can collect rich data about individual physiological activities in real-life situations without interfering with normal life, enabling a more comprehensive description of individual differences. The present study aimed to explore the assessment of individuals’ Big-Five personality traits by physiological signals in daily life situations. A commercial bracelet was used to track the heart rate (HR) data from eighty college students (all male) enrolled in a special training program with a strictly-controlled daily schedule for ten consecutive working days. Their HR activities were divided into five daily situations (morning exercise, morning classes, afternoon classes, free time in the evening, and self-study situations) according to their daily schedule. Regression analyses with HR-based features in these five situations averaged across the ten days revealed significant cross-validated quantitative prediction correlations of 0. 32 and 0. 26 for the dimensions of Openness and Extraversion, with the prediction correlation trending significance for Conscientiousness and Neuroticism. Moreover, the multi-situation HR-based results were in general superior to those based on single-situation HR-based features, as well as those based on the multi-situation self-reported emotion ratings. Togetherour findings demonstrate the link between personality and daily HR measures using state-of-the-art commercial devices and could shed light on the development of Big-Five personality assessment based on daily multi-situation physiological measures.

YNIMG Journal 2023 Journal Article

The domain-separation language network dynamics in resting state support its flexible functional segregation and integration during language and speech processing

  • Binke Yuan
  • Hui Xie
  • Zhihao Wang
  • Yangwen Xu
  • Hanqing Zhang
  • Jiaxuan Liu
  • Lifeng Chen
  • Chaoqun Li

Modern linguistic theories and network science propose that language and speech processing are organized into hierarchical, segregated large-scale subnetworks, with a core of dorsal (phonological) stream and ventral (semantic) stream. The two streams are asymmetrically recruited in receptive and expressive language or speech tasks, which showed flexible functional segregation and integration. We hypothesized that the functional segregation of the two streams was supported by the underlying network segregation. A dynamic conditional correlation approach was employed to construct framewise time-varying language networks and k-means clustering was employed to investigate the temporal-reoccurring patterns. We found that the framewise language network dynamics in resting state were robustly clustered into four states, which dynamically reconfigured following a domain-separation manner. Spatially, the hub distributions of the first three states highly resembled the neurobiology of speech perception and lexical-phonological processing, speech production, and semantic processing, respectively. The fourth state was characterized by the weakest functional connectivity and was regarded as a baseline state. Temporally, the first three states appeared exclusively in limited time bins (∼15%), and most of the time (> 55%), state 4 was dominant. Machine learning-based dFC-linguistics prediction analyses showed that dFCs of the four states significantly predicted individual linguistic performance. These findings suggest a domain-separation manner of language network dynamics in resting state, which forms a dynamic "meta-network" framework to support flexible functional segregation and integration during language and speech processing.

YNIMG Journal 2022 Journal Article

Similar brains blend emotion in similar ways: Neural representations of individual difference in emotion profiles

  • Xin Hu
  • Fei Wang
  • Dan Zhang

Our daily emotional experience is a complex construct that usually involves multiple emotions blended in a context-dependent manner. However, the co-occurring and context-dependent nature of human emotions was understated in previous studies when addressing the individual difference in emotional experiences. The present study proposed a situated and blended 'profile' perspective to characterize individualized emotional experiences. Eighty participants watched a series of emotional videos with their EEG recorded, and the individual differences in their emotion profiles were measured as the vector distances between their multidimensional emotion ratings for these video stimuli. This measure was found to be a reliable descriptor of individualized emotional experiences and could efficiently predict classical emotional complexity indices. More importantly, inter-subject representational analyses revealed that similar emotion profiles were associated with similar delta-band activities over the prefrontal and temporo-parietal regions and similar theta-band activities over the frontal regions. Furthermore, left- and right-lateralized temporo-parietal representations were observed for positive and negative emotion profiles, respectively. Our findings demonstrate the potential of taking a 'profile' perspective for understanding individual differences in human emotions.