Spanner Evaluation over SLP-Compressed Documents

Markus L. Schmid

Back to Highlights

Highlights 2021

Spanner Evaluation over SLP-Compressed Documents

Conference Abstract SESSION 4B: Database theory Logic in Computer Science · Theoretical Computer Science

Details

Abstract

We consider the problem of evaluating regular spanners over compressed documents, i. e. , we wish to solve evaluation tasks directly on the compressed data, without decompression. As compressed forms of the documents we use straight-line programs (SLPs) – a lossless compression scheme for textual data widely used in different areas of theoretical computer science and particularly well-suited for algorithmics on compressed data. In data complexity, our results are as follows. For a regular spanner M and an SLP S that represents a document D, we can solve the tasks of model checking and of checking non-emptiness in time O(size(S)). Computing the set [[M]](D) of all span-tuples extracted from D can be done in time O(size(S) |[[M]](D)|), and enumeration of [[M]](D) can be done with linear preprocessing O(size(S)) and a delay of O(depth(S)), where depth(S) is the depth of S’s derivation tree. Note that size(S) can be exponentially smaller than the document’s size |D|; and, due to known balancing results for SLPs, we can always assume that depth(S) = O(log(|D|)) independent of D’s compressibility. Hence, our enumeration algorithm has a delay logarithmic in the size of the non-compressed data and a preprocessing time that is at best (i. e. , in the case of highly compressible documents) also logarithmic, but at worst still linear. Therefore, in a big-data perspective, our enumeration algorithm for SLP-compressed documents may nevertheless beat the known linear preprocessing and constant delay algorithms for non-compressed documents.

Authors

Markus L. Schmid

Keywords

No keywords are indexed for this paper.

Context

Venue: Highlights of Logic, Games and Automata
Archive span: 2013-2025
Indexed papers: 1236
Paper id: 1076070505950524798