Contextual Tokenization for Graph Inverted Indices

Pritish Chakraborty; Indradyumna Roy; Soumen Chakrabarti; Abir De

Back to NeurIPS

NeurIPS 2025

Contextual Tokenization for Graph Inverted Indices

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Retrieving graphs from a large corpus, that contain a subgraph isomorphic to a given query graph, is a core operation in many real-world applications. While recent multi-vector graph representations and scores based on set alignment and containment can provide accurate subgraph isomorphism tests, their use in retrieval remains limited by their need to score corpus graphs exhaustively. We introduce CoRGII (COntextual Representation of Graphs for Inverted Indexing), a graph indexing framework in which, starting with a contextual dense graph representation, a differentiable discretization module computes sparse binary codes over a learned latent vocabulary. This text document-like representation allows us to leverage classic, highly optimized inverted indexes, while supporting soft (vector) set containment scores. Improving on this paradigm further, we replace the classical impact score of a `word' on a graph (such as defined by TFIDF or BM25) with a data-driven, trainable impact score. Crucially, CoRGII is trained end-to-end using only binary relevance labels, without fine-grained supervision of query-to-document set alignments. Extensive experiments show that CoRGII provides better trade-offs between efficiency and accuracy, compared to several baselines.

Contextual Tokenization for Graph Inverted Indices

Abstract

Authors

Keywords

Context