Arrow Research search

Author name cluster

Xi Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

AAAI Conference 2026 Conference Paper

HC2-GNN: Hierarchical Graph Representation Learning for Efficient Text Classification

  • jiejie fan
  • Xiaojuan Ban
  • Zhiyan Zhang
  • Xi Sun

Graph Neural Networks (GNNs) offer superior modeling capabilities for text classification by capturing complex spatial features within semantic representations. However, existing graph-based approaches often suffer from computational inefficiency and limited ability to model both fine-grained local structures and the sequential nature of text. To address these challenges, we propose HC2-GNN, a Hierarchical Clustering and Coarsening Graph Neural Network, which introduces a novel lightweight graph clustering algorithm called Compromise Conductance Graph Clustering (C2GC). C2GC enables efficient graph clustering while simultaneously preserving both the textual order and the topological coherence of subgraphs. Furthermore, it incorporates a virtue cluster mechanism that expands each subgraph with semantically relevant neighbors, explicitly enabling cross-cluster information propagation without compromising local structural integrity. HC2-GNN aggregates local and global features by combining subgraph-level and full-graph representations, enhancing semantic discriminability for classification. Extensive experiments on benchmark datasets demonstrate that HC2-GNN consistently outperforms existing state-of-the-art text classification methods.

AAAI Conference 2026 Conference Paper

ST-TPP: Learning Semi-Transductive Temporal Point Processes with Gromov-Wasserstein Barycentric Regularization

  • Qingmei Wang
  • Tianyu Huang
  • Yujie Long
  • Yuxin Wu
  • Fanmeng Wang
  • Xi Sun
  • Junchi Yan
  • Hongteng Xu

The generative mechanisms behind real-world event sequences are often heterogeneous, leading to data that possesses inherent clustering structures. However, most existing temporal point processes (TPPs) treat different event sequences independently, without leveraging the clustering structures when predicting events. In this study, we design and learn a novel semi-transductive temporal point process (ST-TPP), which explicitly improves prediction performance by co-training sequence clusters. In particular, given a set of event sequences, our method learns a neural TPP together with cluster centers of the sequences. Besides maximizing the likelihood of the event sequences, we leverage a data-based kernel matrix and prior knowledge to regularize the sequence embeddings, leading to a Gromov-Wasserstein barycentric (GWB) regularizer. Based on the optimal transport plans associated with the GWB regularizer, we derive the cluster centers by the push-forward of the sequence embeddings. When a new sequence comes, the learned model first assigns a cluster center to the sequence and then jointly encodes the sequence and the cluster center to predict future events, leading to a semi-transductive prediction scheme. Experiments demonstrate that ST-TPP achieves competitive sequence clustering results and strong prediction performance.

IROS Conference 2020 Conference Paper

When We First Met: Visual-Inertial Person Localization for Co-Robot Rendezvous

  • Xi Sun
  • Xinshuo Weng
  • Kris Kitani

We aim to enable robots to visually localize a target person through the aid of an additional sensing modality - the target person's 3D inertial measurements. The need for such technology may arise when a robot is to meet a person in a crowd for the first time or when an autonomous vehicle must rendezvous with a rider amongst a crowd without knowing the appearance of the person in advance. A person's inertial information can be measured with a wearable device such as a smart-phone and can be shared selectively with an autonomous system during the rendezvous. We propose a method to learn a visual-inertial feature space in which the motion of a person in video can be easily matched to the motion measured by a wearable inertial measurement unit (IMU). The transformation of the two modalities into the joint feature space is learned through the use of a triplet loss which forces inertial motion features and video motion features generated by the same person to lie close in the joint feature space. To validate our approach, we compose a dataset of over 3, 000 video segments of moving people along with wearable IMU data. We show that our method is able to localize a target person with 80. 7% accuracy averaged over testing data with various number of candidates using only 5 seconds of IMU data and video.