Author name cluster

Xiaowen Dong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

1 author row

TMLR Journal 2026 Journal Article

A Survey on Over-smoothing and Over-squashing: Unified Propagation Perspectives on Graph Neural Networks and Transformers

Alvaro Arroyo
Federico Barbero
Hugh Blayney
Michael M. Bronstein
Xiaowen Dong
Pietro Lio
Razvan Pascanu
Pierre Vandergheynst

Decoder-Transformers have achieved remarkable success and have laid the groundwork for the development of Large Language Models (LLMs). At the core of these models is the self-attention matrix, which allows different tokens to interact with each other. This process is remarkably similar to the message-passing mechanism used in Graph Neural Networks (GNNs), and as such decoder-Transformers suffer many of the optimization difficulties studied extensively in the GNN literature. In this paper, we present a unified graph perspective that bridges the theoretical understanding of decoder-Transformers and GNNs. We systematically examine how well-known phenomena in GNNs, such as over-smoothing and over-squashing, directly manifest as analogous issues like rank collapse and representational collapse in deep Transformer architectures. By interpreting Transformers' self-attention as a learned adjacency operator, we reveal shared underlying principles governing signal propagation and demonstrate how insights from one field can illuminate challenges and solutions in the other. We analyze the role of architectural components like residual connections, normalization, and causal masking in these issues. We aim to provide a framework for understanding how information flows through deep learning models that perform sequence mixing through an adjacency operator, and to highlight areas for cross-pollination of research, as well as to provide a comprehensive reference for researchers interested in the underpinnings of these architectures.

PDF Details

NeurIPS Conference 2025 Conference Paper

On the Stability of Graph Convolutional Neural Networks: A Probabilistic Perspective

Ning Zhang
Henry Kenlay
Li Zhang
Mihai Cucuringu
Xiaowen Dong

Graph convolutional neural networks (GCNNs) have emerged as powerful tools for analyzing graph-structured data, achieving remarkable success across diverse applications. However, the theoretical understanding of the stability of these models, i. e. , their sensitivity to small changes in the graph structure, remains in rather limited settings, hampering the development and deployment of robust and trustworthy models in practice. To fill this gap, we study how small perturbations in the graph topology affect GCNN outputs and propose a novel formulation for analyzing model stability. Unlike prior studies that focus only on worst-case perturbations, our distribution-aware formulation characterizes output perturbations across a broad range of input data. This way, our framework enables, for the first time, a probabilistic perspective on the interplay between the statistical properties of the node data and perturbations in the graph topology. We conduct extensive experiments to validate our theoretical findings and demonstrate their benefits over existing baselines, in terms of both representation stability and adversarial attacks on downstream tasks. Our results demonstrate the practical significance of the proposed formulation and highlight the importance of incorporating data distribution into stability analysis.

PDF Details

NeurIPS Conference 2025 Conference Paper

On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning

Alvaro Arroyo
Alessio Gravina
Benjamin Gutteridge
Federico Barbero
Claudio Gallicchio
Xiaowen Dong
Michael Bronstein
Pierre Vandergheynst

Graph Neural Networks (GNNs) are models that leverage the graph structure to transmit information between nodes, typically through the message-passing operation. While widely successful, this approach is well-known to suffer from representational collapse as the number of layers increases and insensitivity to the information contained at distant and poorly connected nodes. In this paper, we present a unified view of on the appearance of these issues through the lens of vanishing gradients, using ideas from linear control theory for our analysis. We propose an interpretation of GNNs as recurrent models and empirically demonstrate that a simple state-space formulation of an GNN effectively alleviates these issues at no extra trainable parameter cost. Further, we show theoretically and empirically that (i) Traditional GNNs are by design prone to extreme gradient vanishing even after few layers; (ii) Feature collapse is directly related to the mechanism causing vanishing gradients; (iii) Long-range modeling is most easily achieved by a combination of graph rewiring and vanishing gradient mitigation. We believe our work will help bridge the gap between the recurrent and graph neural network literature and will unlock the design of new deep and performant GNNs.

PDF Details

NeurIPS Conference 2025 Conference Paper

Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks

Ali Hariri
Alvaro Arroyo
Alessio Gravina
Moshe Eliasof
Carola-Bibiane Schönlieb
Davide Bacciu
Xiaowen Dong
Kamyar Azizzadenesheli

ChebNet, one of the earliest spectral GNNs, has largely been overshadowed by Message Passing Neural Networks (MPNNs), which gained popularity for their simplicity and effectiveness in capturing local graph structure. Despite their success, MPNNs are limited in their ability to capture long-range dependencies between nodes. This has led researchers to adapt MPNNs through rewiring or make use of Graph Transformers, which compromise the computational efficiency that characterized early spatial message passing architectures, and typically disregard the graph structure. Almost a decade after its original introduction, we revisit ChebNet to shed light on its ability to model distant node interactions. We find that out-of-box, ChebNet already shows competitive advantages relative to classical MPNNs and GTs on long-range benchmarks, while maintaining good scalability properties for high-order polynomials. However, we uncover that this polynomial expansion leads ChebNet to an unstable regime during training. To address this limitation, we cast ChebNet as a stable and non-dissipative dynamical system, which we coin Stable-ChebNet. Our Stable-ChebNet model allows for stable information propagation, and has controllable dynamics which do not require the use of eigendecompositions, positional encodings, or graph rewiring. Across several benchmarks, Stable-ChebNet achieves near state-of-the-art performance.

PDF Details

NeurIPS Conference 2024 Conference Paper

Bayesian Optimization of Functions over Node Subsets in Graphs

Huidong Liang
Xingchen Wan
Xiaowen Dong

We address the problem of optimizing over functions defined on node subsets in a graph. The optimization of such functions is often a non-trivial task given their combinatorial, black-box and expensive-to-evaluate nature. Although various algorithms have been introduced in the literature, most are either task-specific or computationally inefficient and only utilize information about the graph structure without considering the characteristics of the function. To address these limitations, we utilize Bayesian Optimization (BO), a sample-efficient black-box solver, and propose a novel framework for combinatorial optimization on graphs. More specifically, we map each $k$-node subset in the original graph to a node in a new combinatorial graph and adopt a local modeling approach to efficiently traverse the latter graph by progressively sampling its subgraphs using a recursive algorithm. Extensive experiments under both synthetic and real-world setups demonstrate the effectiveness of the proposed BO framework on various types of graphs and optimization tasks, where its behavior is analyzed in detail with ablation studies.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Rough Transformers: Lightweight and Continuous Time Series Modelling through Signature Patching

Fernando Moreno-Pino
Álvaro Arroyo
Harrison Waldon
Xiaowen Dong
Álvaro Cartea

Time-series data in real-world settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In these settings, traditional sequence-based recurrent models struggle. To overcome this, researchers often replace recurrent models with Neural ODE-based architectures to account for irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of even moderate length. To address this challenge, we introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences and incurs significantly lower computational costs. In particular, we propose multi-view signature attention, which uses path signatures to augment vanilla attention and to capture both local and global (multi-scale) dependencies in the input data, while remaining robust to changes in the sequence length and sampling frequency and yielding improved spatial processing. We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the representational benefits of Neural ODE-based models, all at a fraction of the computational time and memory resources.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Bayesian Optimisation of Functions on Graphs

Xingchen Wan
Pierre Osselin
Henry Kenlay
Binxin Ru
Michael A Osborne
Xiaowen Dong

The increasing availability of graph-structured data motivates the task of optimising over functions defined on the node set of graphs. Traditional graph search algorithms can be applied in this case, but they may be sample-inefficient and do not make use of information about the function values; on the other hand, Bayesian optimisation is a class of promising black-box solvers with superior sample efficiency, but it has scarcely been applied to such novel setups. To fill this gap, we propose a novel Bayesian optimisation framework that optimises over functions defined on generic, large-scale and potentially unknown graphs. Through the learning of suitable kernels on graphs, our framework has the advantage of adapting to the behaviour of the target function. The local modelling approach further guarantees the efficiency of our method. Extensive experiments on both synthetic and real-world graphs demonstrate the effectiveness of the proposed optimisation framework.

PDF Details

NeurIPS Conference 2023 Conference Paper

Neural Latent Geometry Search: Product Manifold Inference via Gromov-Hausdorff-Informed Bayesian Optimization

Haitz Sáez de Ocáriz Borde
Alvaro Arroyo
Ismael Morales
Ingmar Posner
Xiaowen Dong

Recent research indicates that the performance of machine learning models can be improved by aligning the geometry of the latent space with the underlying data structure. Rather than relying solely on Euclidean space, researchers have proposed using hyperbolic and spherical spaces with constant curvature, or combinations thereof, to better model the latent space and enhance model performance. However, little attention has been given to the problem of automatically identifying the optimal latent geometry for the downstream task. We mathematically define this novel formulation and coin it as neural latent geometry search (NLGS). More specifically, we introduce an initial attempt to search for a latent geometry composed of a product of constant curvature model spaces with a small number of query evaluations, under some simplifying assumptions. To accomplish this, we propose a novel notion of distance between candidate latent geometries based on the Gromov-Hausdorff distance from metric geometry. In order to compute the Gromov-Hausdorff distance, we introduce a mapping function that enables the comparison of different manifolds by embedding them in a common high-dimensional ambient space. We then design a graph search space based on the notion of smoothness between latent geometries and employ the calculated distances as an additional inductive bias. Finally, we use Bayesian optimization to search for the optimal latent geometry in a query-efficient manner. This is a general method which can be applied to search for the optimal latent geometry for a variety of models and downstream tasks. We perform experiments on synthetic and real-world datasets to identify the optimal latent geometry for multiple machine learning problems.

PDF Details

NeurIPS Conference 2021 Conference Paper

Adversarial Attacks on Graph Classifiers via Bayesian Optimisation

Xingchen Wan
Henry Kenlay
Robin Ru
Arno Blaas
Michael A Osborne
Xiaowen Dong

Graph neural networks, a popular class of models effective in a wide range of graph-based learning tasks, have been shown to be vulnerable to adversarial attacks. While the majority of the literature focuses on such vulnerability in node-level classification tasks, little effort has been dedicated to analysing adversarial attacks on graph-level classification, an important problem with numerous real-life applications such as biochemistry and social network analysis. The few existing methods often require unrealistic setups, such as access to internal information of the victim models, or an impractically-large number of queries. We present a novel Bayesian optimisation-based attack method for graph classification models. Our method is black-box, query-efficient and parsimonious with respect to the perturbation applied. We empirically validate the effectiveness and flexibility of the proposed method on a wide range of graph classification tasks involving varying graph properties, constraints and modes of attack. Finally, we analyse common interpretable patterns behind the adversarial samples produced, which may shed further light on the adversarial robustness of graph classification models.

PDF Details

NeurIPS Conference 2021 Conference Paper

Beltrami Flow and Neural Diffusion on Graphs

Benjamin Chamberlain
James Rowbottom
Davide Eynard
Francesco Di Giovanni
Xiaowen Dong
Michael Bronstein

We propose a novel class of graph neural networks based on the discretized Beltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning, topology evolution. The resulting model generalizes many popular graph neural networks and achieves state-of-the-art results on several benchmarks.

PDF Details

NeurIPS Conference 2021 Conference Paper

Learning to Learn Graph Topologies

Xingyue Pu
Tianyue Cao
Xiaoyun Zhang
Xiaowen Dong
Siheng Chen

Learning a graph topology to reveal the underlying relationship between data entities plays an important role in various machine learning and data analysis tasks. Under the assumption that structured data vary smoothly over a graph, the problem can be formulated as a regularised convex optimisation over a positive semidefinite cone and solved by iterative algorithms. Classic methods require an explicit convex function to reflect generic topological priors, e. g. the $\ell_1$ penalty for enforcing sparsity, which limits the flexibility and expressiveness in learning rich topological structures. We propose to learn a mapping from node data to the graph structure based on the idea of learning to optimise (L2O). Specifically, our model first unrolls an iterative primal-dual splitting algorithm into a neural network. The key structural proximal projection is replaced with a variational autoencoder that refines the estimated graph with enhanced topological properties. The model is trained in an end-to-end fashion with pairs of node data and graph samples. Experiments on both synthetic and real-world data demonstrate that our model is more efficient than classic iterative algorithms in learning a graph with specific topological properties.

PDF Details

TIST Journal 2017 Journal Article

Social Bridges in Urban Purchase Behavior

Xiaowen Dong
Yoshihiko Suhara
Burçin Bozkaya
Vivek K. Singh
Bruno Lepri
Alex ‘Sandy’ Pentland

The understanding and modeling of human purchase behavior in city environment can have important implications in the study of urban economy and in the design and organization of cities. In this article, we study human purchase behavior at the community level and argue that people who live in different communities but work at close-by locations could act as “social bridges” between the respective communities and that they are correlated with similarity in community purchase behavior. We provide empirical evidence by studying millions of credit card transaction records for tens of thousands of individuals in a city environment during a period of three months. More specifically, we show that the number of social bridges between communities is a much stronger indicator of similarity in their purchase behavior than traditionally considered factors such as income and sociodemographic variables. Our findings also suggest that such an effect varies across different merchant categories, that the presence of female customers in social bridges is a stronger indicator compared to that of their male counterparts, and that there seems to be a geographical constraint for this effect, all of which may have implications in the studies of urban economy and data-driven urban planning.

Details DOI