Author name cluster

Ken Gu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

NeurIPS Conference 2025 Conference Paper

RADAR: Benchmarking Language Models on Imperfect Tabular Data

Ken Gu
Zhihan Zhang
Kate Lin
Yuwei Zhang
Akshay Paruchuri
Hong Yu
Mehran Kazemi
Kumar Ayush

Language models (LMs) are increasingly being deployed to perform autonomous data analyses. However, their data awareness—the ability to recognize, reason over, and appropriately handle data artifacts such as missing values, outliers, and logical inconsistencies—remains underexplored. These artifacts are especially common in real-world tabular data and, if mishandled, can significantly compromise the validity of analytical conclusions. To address this gap, we present RADAR, a benchmark for systematically evaluating data-aware reasoning on tabular data. We develop a framework to simulate data artifacts via programmatic perturbations to enable targeted evaluation of model behavior. RADAR comprises 2, 980 table-query pairs, grounded in real-world data spanning 9 domains and 5 data artifact types. In addition to evaluating artifact handling, RADAR systematically varies table size to study how reasoning performance holds when increasing table size. Our evaluation reveals that, despite decent performance on tables without data artifacts, frontier models degrade significantly when data artifacts are introduced, exposing critical gaps in their capacity for robust, data-aware analysis. Designed to be flexible and extensible, RADAR supports diverse perturbation types and controllable table sizes, offering a valuable resource for advancing tabular reasoning.

PDF Details

AAAI Conference 2020 Conference Paper

Learning-Based Efficient Graph Similarity Computation via Multi-Scale Convolutional Set Matching

Yunsheng Bai
Hao Ding
Ken Gu
Yizhou Sun
Wei Wang

Graph similarity computation is one of the core operations in many graph-based applications, such as graph similarity search, graph database analysis, graph clustering, etc. Since computing the exact distance/similarity between two graphs is typically NP-hard, a series of approximate methods have been proposed with a trade-off between accuracy and speed. Recently, several data-driven approaches based on neural networks have been proposed, most of which model the graphgraph similarity as the inner product of their graph-level representations, with different techniques proposed for generating one embedding per graph. However, using one ﬁxeddimensional embedding per graph may fail to fully capture graphs in varying sizes and link structures—a limitation that is especially problematic for the task of graph similarity computation, where the goal is to ﬁnd the ﬁne-grained difference between two graphs. In this paper, we address the problem of graph similarity computation from another perspective, by directly matching two sets of node embeddings without the need to use ﬁxed-dimensional vectors to represent whole graphs for their similarity computation. The model, GRAPH- SIM, achieves the state-of-the-art performance on four realworld graph datasets under six out of eight settings (here we count a speciﬁc dataset and metric combination as one setting), compared to existing popular methods for approximate Graph Edit Distance (GED) and Maximum Common Subgraph (MCS) computation.

PDF Details

IJCAI Conference 2019 Conference Paper

Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity

Yunsheng Bai
Hao Ding
Yang Qiao
Agustin Marinovic
Ken Gu
Ting Chen
Yizhou Sun
Wei Wang

We introduce a novel approach to graph-level representation learning, which is to embed an entire graph into a vector space where the embeddings of two graphs preserve their graph-graph proximity. Our approach, UGraphEmb, is a general framework that provides a novel means to performing graph-level embedding in a completely unsupervised and inductive manner. The learned neural network can be considered as a function that receives any graph as input, either seen or unseen in the training set, and transforms it into an embedding. A novel graph-level embedding generation mechanism called Multi-Scale Node Attention (MSNA), is proposed. Experiments on five real graph datasets show that UGraphEmb achieves competitive accuracy in the tasks of graph classification, similarity ranking, and graph visualization.

PDF Details