Highlights 2021
Spanner Evaluation over SLP-Compressed Documents
Abstract
We consider the problem of evaluating regular spanners over compressed documents, i. e. , we wish to solve evaluation tasks directly on the compressed data, without decompression. As compressed forms of the documents we use straight-line programs (SLPs) – a lossless compression scheme for textual data widely used in different areas of theoretical computer science and particularly well-suited for algorithmics on compressed data. In data complexity, our results are as follows. For a regular spanner M and an SLP S that represents a document D, we can solve the tasks of model checking and of checking non-emptiness in time O(size(S)). Computing the set [[M]](D) of all span-tuples extracted from D can be done in time O(size(S) |[[M]](D)|), and enumeration of [[M]](D) can be done with linear preprocessing O(size(S)) and a delay of O(depth(S)), where depth(S) is the depth of S’s derivation tree. Note that size(S) can be exponentially smaller than the document’s size |D|; and, due to known balancing results for SLPs, we can always assume that depth(S) = O(log(|D|)) independent of D’s compressibility. Hence, our enumeration algorithm has a delay logarithmic in the size of the non-compressed data and a preprocessing time that is at best (i. e. , in the case of highly compressible documents) also logarithmic, but at worst still linear. Therefore, in a big-data perspective, our enumeration algorithm for SLP-compressed documents may nevertheless beat the known linear preprocessing and constant delay algorithms for non-compressed documents.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- Highlights of Logic, Games and Automata
- Archive span
- 2013-2025
- Indexed papers
- 1236
- Paper id
- 1076070505950524798