Author name cluster

Xu Shen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

1 author row

AAAI Conference 2026 Conference Paper

Bridging the Language Gap: Uncovering and Aligning Shared Circuits for Multi-Hop Reasoning in Multilingual LLMs

Chenghao Sun
Zhen Huang
Yonggang Zhang
Xinmei Tian
Xu Shen
Jieping Ye

Large language models (LLMs) present a paradox: they can correctly answer a multi-hop factual query in a high-resource language like English, yet fail on the identical query in another language. This raises a fundamental question about the nature of multilingual knowledge: are facts missing, or merely inaccessible? The underlying mechanisms for this knowledge gap have remained largely unexplored. In this work, we resolve this question by introducing a mechanistic interpretability framework that traces the causal pathways of multi-hop knowledge reasoning. Our analysis reveals a core, non-obvious finding: cross-lingual inconsistencies do not stem from a knowledge deficit. Instead, factual knowledge is robustly stored in a set of **shared, language-agnostic semantic neurons**. The failure originates from **misaligned attention pathways**, where a common set of critical attention heads fails to correctly route information along the reasoning chain to the appropriate knowledge neurons in lower-resource languages. This mechanistic diagnosis motivates a targeted alignment strategy: a surgical fine-tuning of only these critical heads. Experiments demonstrate that our method achieves significant improvements in multilingual multi-hop factuality—with positive cross-lingual transfer—while uniquely preserving general model capabilities, offering a scalable and mechanistically-grounded approach to building more reliable multilingual models.

PDF Details DOI

AAAI Conference 2026 Conference Paper

HyperD: Hybrid Periodicity Decoupling Framework for Traffic Forecasting

Minlan Shao
Zijian Zhang
Yili Wang
Yiwei Dai
Xu Shen
Xin Wang

Accurate traffic forecasting plays a vital role in intelligent transportation systems, enabling applications such as congestion control, route planning, and urban mobility optimization. However, traffic forecasting remains challenging due to two key factors: (1) complex spatial dependencies arising from dynamic interactions between road segments and traffic sensors across the network, and (2) the coexistence of multi-scale periodic patterns (e.g., daily and weekly periodic patterns driven by human routines) with irregular fluctuations caused by unpredictable events (e.g., accidents, weather, or construction). To tackle these challenges, we propose HyperD (Hybrid Periodic Decoupling), a novel framework that decouples traffic data into periodic and residual components. The periodic component is handled by the Hybrid Periodic Representation Module, which extracts fine-grained daily and weekly patterns using learnable periodic embeddings and spatial-temporal attention. The residual component, which captures non-periodic, high-frequency fluctuations, is modeled by the Frequency-Aware Residual Representation Module, leveraging complex-valued MLP in frequency domain. To enforce semantic separation between the two components, we further introduce a Dual-View Alignment Loss, which aligns low-frequency information with the periodic branch and high-frequency information with the residual branch. Extensive experiments on four real-world traffic datasets demonstrate that HyperD achieves state-of-the-art prediction accuracy, while offering superior robustness under disturbances and improved computational efficiency compared to existing methods.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Latte: Transfering LLMs' Latent-level Knowledge for Few-shot Tabular Learning

Ruxue Shi
Hengrui Gu
Hangting Ye
Yiwei Dai
Xu Shen
Xin Wang

Few-shot tabular learning, in which machine learning models are trained with a limited amount of labeled data, provides a cost-effective approach to addressing real-world challenges. The advent of Large Language Models (LLMs) has sparked interest in leveraging their pre-trained knowledge for few-shot tabular learning. Despite promising results, existing approaches either rely on test-time knowledge extraction, which introduces undesirable latency, or text-level knowledge, which leads to unreliable feature engineering. To overcome these limitations, we propose Latte, a training-time knowledge extraction framework that transfers the latent prior knowledge within LLMs to optimize a more generalized downstream model. Latte enables general knowledge-guided downstream tabular learning, facilitating the weighted fusion of information across different feature values while reducing the risk of overfitting to limited labeled data. Furthermore, Latte is compatible with existing unsupervised pre-training paradigms and effectively utilizes available unlabeled samples to overcome the performance limitations imposed by an extremely small labeled dataset. Extensive experiments on various few-shot tabular learning benchmarks demonstrate the superior performance of Latte, establishing it as a state-of-the-art approach in this domain. Our code is available at https: //github. com/ruxueshi/Latte. git.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space

Xin He
Yili Wang
Wenqi Fan
Xu Shen
Xin Juan
Rui Miao
Xin Wang

Graph Neural Networks (GNNs) have shown great success in various graph-based learning tasks. However, it often faces the issue of over-smoothing as the model depth increases, which causes all node representations to converge to a single value and become indistinguishable. This issue stems from the inherent limitations of GNNs, which struggle to distinguish the importance of information from different neighborhoods. In this paper, we introduce MbaGCN, a novel graph convolutional architecture that draws inspiration from the Mamba paradigm—originally designed for sequence modeling. MbaGCN presents a new backbone for GNNs, consisting of three key components: the Message Aggregation Layer, the Selective State Space Transition Layer, and the Node State Prediction Layer. These components work in tandem to adaptively aggregate neighborhood information, providing greater flexibility and scalability for deep GNN models. While MbaGCN may not consistently outperform all existing methods on each dataset, it provides a foundational framework that demonstrates the effective integration of the Mamba paradigm into graph representation learning. Through extensive experiments on benchmark datasets, we demonstrate that MbaGCN paves the way for future advancements in graph neural network research. Our code is in https: //github. com/hexin5515/MbaGCN.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control

Yuxin Xiao
Chaoqun Wan
Yonggang Zhang
Wenxiao Wang
Binbin Lin
Xiaofei He
Xu Shen
Jieping Ye

As the development and application of Large Language Models (LLMs) continue to advance rapidly, enhancing their trustworthiness and aligning them with human preferences has become a critical area of research. Traditional methods rely heavily on extensive data for Reinforcement Learning from Human Feedback (RLHF), but representation engineering offers a new, training-free approach. This technique leverages semantic features to control the representation of LLM's intermediate hidden states, enabling the model to meet specific requirements such as increased honesty or heightened safety awareness. However, a significant challenge arises when attempting to fulfill multiple requirements simultaneously. It proves difficult to encode various semantic contents, like honesty and safety, into a singular semantic feature, restricting its practicality. In this work, we address this challenge through Sparse Activation Control. By delving into the intrinsic mechanisms of LLMs, we manage to identify and pinpoint modules that are closely related to specific tasks within the model, i. e. attention heads. These heads display sparse characteristics that allow for near-independent control over different tasks. Our experiments, conducted on the open-source Llama series models, have yielded encouraging results. The models were able to align with human preferences on issues of safety, factualness, and bias concurrently.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

CS-Isolate: Extracting Hard Confident Examples by Content and Style Isolation

Yexiong Lin
Yu Yao
Xiaolong Shi
Mingming Gong
Xu Shen
Dong Xu
Tongliang Liu

Label noise widely exists in large-scale image datasets. To mitigate the side effects of label noise, state-of-the-art methods focus on selecting confident examples by leveraging semi-supervised learning. Existing research shows that the ability to extract hard confident examples, which are close to the decision boundary, significantly influences the generalization ability of the learned classifier. In this paper, we find that a key reason for some hard examples being close to the decision boundary is due to the entanglement of style factors with content factors. The hard examples become more discriminative when we focus solely on content factors, such as semantic information, while ignoring style factors. Nonetheless, given only noisy data, content factors are not directly observed and have to be inferred. To tackle the problem of inferring content factors for classification when learning with noisy labels, our objective is to ensure that the content factors of all examples in the same underlying clean class remain unchanged as their style information changes. To achieve this, we utilize different data augmentation techniques to alter the styles while regularizing content factors based on some confident examples. By training existing methods with our inferred content factors, CS-Isolate proves their effectiveness in learning hard examples on benchmark datasets. The implementation is available at https: //github. com/tmllab/2023 NeurIPS CS-isolate.

PDF Details

AAAI Conference 2018 Conference Paper

Sequence-to-Sequence Learning via Shared Latent Representation

Xu Shen
Xinmei Tian
Jun Xing
Yong Rui
Dacheng Tao

Sequence-to-sequence learning is a popular research area in deep learning, such as video captioning and speech recognition. Existing methods model this learning as a mapping process by ﬁrst encoding the input sequence to a ﬁxed-sized vector, followed by decoding the target sequence from the vector. Although simple and intuitive, such mapping model is task-speciﬁc, unable to be directly used for different tasks. In this paper, we propose a star-like framework for general and ﬂexible sequence-to-sequence learning, where different types of media contents (the peripheral nodes) could be encoded to and decoded from a shared latent representation (SLR) (the central node). This is inspired by the fact that human brain could learn and express an abstract concept in different ways. The media-invariant property of SLR could be seen as a high-level regularization on the intermediate vector, enforcing it to not only capture the latent representation intra each individual media like the auto-encoders, but also their transitions like the mapping models. Moreover, the SLR model is content-speciﬁc, which means it only needs to be trained once for a dataset, while used for different tasks. We show how to train a SLR model via dropout and use it for different sequence-to-sequence tasks. Our SLR model is validated on the Youtube2Text and MSR-VTT datasets, achieving superior performance on video-to-sentence task, and the ﬁrst sentence-to-video results.

PDF Details

IJCAI Conference 2017 Conference Paper

Classification and Representation Joint Learning via Deep Networks

Ya Li
Xinmei Tian
Xu Shen
Dacheng Tao

Deep learning has been proven to be effective for classification problems. However, the majority of previous works trained classifiers by considering only class label information and ignoring the local information from the spatial distribution of training samples. In this paper, we propose a deep learning framework that considers both class label information and local spatial distribution information between training samples. A two-channel network with shared weights is used to measure the local distribution. The classification performance can be improved with more detailed information provided by the local distribution, particularly when the training samples are insufficient. Additionally, the class label information can help to learn better feature representations compared with other feature learning methods that use only local distribution information between samples. The local distribution constraint between sample pairs can also be viewed as a regularization of the network, which can efficiently prevent the overfitting problem. Extensive experiments are conducted on several benchmark image classification datasets, and the results demonstrate the effectiveness of our proposed method.

PDF Details

AAAI Conference 2017 Conference Paper

Patch Reordering: A NovelWay to Achieve Rotation and Translation Invariance in Convolutional Neural Networks

Xu Shen
Xinmei Tian
Shaoyan Sun
Dacheng Tao

Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance on many visual recognition tasks. However, the combination of convolution and pooling operations only shows invariance to small local location changes in meaningful objects in input. Sometimes, such networks are trained using data augmentation to encode this invariance into the parameters, which restricts the capacity of the model to learn the content of these objects. A more ef- ﬁcient use of the parameter budget is to encode rotation or translation invariance into the model architecture, which relieves the model from the need to learn them. To enable the model to focus on learning the content of objects other than their locations, we propose to conduct patch ranking of the feature maps before feeding them into the next layer. When patch ranking is combined with convolution and pooling operations, we obtain consistent representations despite the location of meaningful objects in input. We show that the patch ranking module improves the performance of the CNN on many benchmark tasks, including MNIST digit recognition, large-scale image recognition, and image retrieval.

PDF Details