Author name cluster

Jakub Radoszewski

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

TCS Journal 2026 Journal Article

Internal quasiperiod queries

Maxime Crochemore
Costas S. Iliopoulos
Jakub Radoszewski
Wojciech Rytter
Juliusz Straszyński
Tomasz Waleń
Wiktor Zuba

Internal pattern matching requires one to answer queries about factors of a given string. Many results are known on answering internal period queries, asking for the periods of a given factor. In this paper we investigate internal queries asking for covers (also known as quasiperiods) of a given factor. Let n denote the length of the string and m denote the length of the factor in question. We propose a data structure that answers such queries in O ( log m ) time for the shortest cover and in O ( log m log log m ) time for a representation of all the covers, after O ( n log n ) time and space preprocessing. This is a full version of a conference paper at SPIRE 2020 with query complexities improved by a log log n-factor and additional applications.

TCS Journal 2026 Journal Article

Quasi-linear-time algorithm for a longest common circular factor

Mai Alzamel
Maxime Crochemore
Costas S. Iliopoulos
Tomasz Kociumaka
Jakub Radoszewski
Wojciech Rytter
Juliusz Straszyński
Tomasz Waleń

We consider the Longest Common Circular Factor (LCCF) problem in which, given strings S and T of length at most n, we are to compute the longest factor of S whose cyclic shift is a factor of T. This new similarity measure is an extension of the classic Longest Common Factor. We show an algorithm solving the LCCF problem in O ( n log 3 n log ( r + 2 ) ) time, where r ≤ n is the length of the output, using O ( n ) space. A naive algorithm works in Ω(n 2) time, and no O ( n polylog n ) -time solution was known prior to our work. Our result constitutes yet another application of string synchronizing sets of Kempa and Kociumaka (STOC 2019). Compared to the preliminary version published at CPM 2019, we significantly simplified the algorithm and improved the space complexity from O ( n log 2 n ) to O ( n ). We achieved that with new algorithmic insights into a certain geometric intersection problem, solved by suitably decomposing intervals.

MFCS Conference 2025 Conference Paper

Counting Distinct Square Substrings in Sublinear Time

Panagiotis Charalampopoulos
Manal Mohamed 0001
Jakub Radoszewski
Wojciech Rytter
Tomasz Walen
Wiktor Zuba

We show that the number of distinct squares in a packed string of length n over an alphabet of size σ can be computed in 𝒪(n/log_{σ}n) time in the word-RAM model of computation. This paper is the first to introduce a sublinear time algorithm for the packed version of squares counting. The packed representation of a string of length n over an alphabet of size σ is given as a sequence of 𝒪(n/ log_{σ} n) machine words in the word-RAM model (a machine word consists of ω ≥ log₂ n bits). Previously it was known how to count distinct squares in 𝒪(n) time [Gusfield and Stoye, JCSS 2004], even for a string over an integer alphabet, see [Crochemore et al. , TCS 2014; Bannai et al. , CPM 2017; Charalampopoulos et al. , SPIRE 2020]. We use techniques of squares extraction from runs described by Crochemore et al. [TCS 2014]. However, the packed model requires novel approaches. In particular, we need an 𝒪(n/log_{σ}n) sized representation of all long-period runs (runs with periods that are Ω(log_{σ}n)) which guarantees sublinear time counting of potentially linearly-many implied squares. The long-period runs with a string period that is periodic itself (called layer runs) are an obstacle, since their number can be Ω(n). Fortunately, the number of all other long-period runs is 𝒪(n/log_{σ}n) and we can construct an implicit representation of all long-period runs in 𝒪(n/log_{σ}n) time by adopting the insights of Amir et al. [ESA 2019], combined with sublinear time tools provided by the PILLAR model of computations in case of packed strings. We count squares in layer runs in sublinear time by exploiting combinatorial properties of types of pyramidally-shaped groups of layer runs. As a by-product, we discover several new structural properties of runs. Another difficulty is to compute, in sublinear time, locations of Lyndon roots of runs in packed strings, which is needed for grouping of runs that can generate equal squares. To overcome this difficulty, we introduce sparse-Lyndon roots which are based on the notion of string synchronizers proposed by Kempa and Kociumaka [STOC 2019].

TCS Journal 2025 Journal Article

Subsequence covers of words

Panagiotis Charalampopoulos
Solon P. Pissis
Jakub Radoszewski
Wojciech Rytter
Tomasz Waleń
Wiktor Zuba

We introduce subsequence covers (s-covers, in short), a new type of covers of a word. A word C is an s-cover of a word S if the occurrences of C in S as subsequences cover all the positions in S. The s-covers seem to be computationally much harder than standard covers of words (cf. Apostolico et al. (1991) [1]), but, on the other hand, much easier than the related shuffle powers (Warmuth and Haussler (1984) [6]). We give a linear-time algorithm for testing if a candidate word C is an s-cover of a word S over a polynomially-bounded integer alphabet. We also give an algorithm for finding a shortest s-cover of a word S, which in the case of a constant-sized alphabet, also runs in linear time. The words without proper s-cover are called s-primitive. We complement our algorithmic results with explicit lower and an upper bound on the length of a longest s-primitive word. Both bounds are exponential in the size of the alphabet. The upper bound presented here improves the bound given in the conference version of this paper [SPIRE 2022].

TCS Journal 2021 Journal Article

Experimental evaluation of algorithms for computing quasiperiods

Patryk Czajka
Jakub Radoszewski

Quasiperiodicity is a generalization of periodicity that was introduced in the early 1990s. Since then, dozens of algorithms for computing various types of quasiperiodicity were proposed. Our work is a step towards answering the general question: “Which algorithm for computing quasiperiods to choose? ”. The central notions of quasiperiodicity are covers and seeds. We implement algorithms for computing covers and seeds in the original and in new simplified versions and compare their efficiency on various types of data. We also discuss other known types of quasiperiodicity, distinguish partial covers as currently the most promising for large real-world data, and check their effectiveness using real-world data.

TCS Journal 2021 Journal Article

Optimal skeleton and reduced Huffman trees

Shmuel T. Klein
Jakub Radoszewski
Tamar C. Serebro
Dana Shapira

A skeleton Huffman tree is a Huffman tree from which all full subtrees of depth h ≥ 1 have been pruned. Skeleton Huffman trees are used to save storage and enhance processing time in several applications such as decoding, compressed pattern matching and wavelet trees for random access. A reduced skeleton tree prunes the skeleton Huffman tree further to an even smaller tree. The resulting more compact trees can be used to further enhance the time and space complexities of the corresponding algorithms. However, it is shown that the straightforward ways of basing the constructions of a skeleton tree as well as that of a reduced skeleton tree on a canonical Huffman tree do not necessarily yield the least number of nodes. New algorithms for achieving such trees are given.

TCS Journal 2021 Journal Article

Shortest covers of all cyclic shifts of a string

Maxime Crochemore
Costas S. Iliopoulos
Jakub Radoszewski
Wojciech Rytter
Juliusz Straszyński
Tomasz Waleń
Wiktor Zuba

A factor C of a string S is called a cover of S, if each position of S is contained in an occurrence of C. Breslauer (1992) [3] proposed a well-known O ( n ) -time algorithm that computes the shortest cover of every prefix of a string of length n. We show an O ( n log ⁡ n ) -time and O ( n ) -space algorithm that computes the shortest cover of every cyclic shift of a string of length n and an O ( n ) -time algorithm that computes the shortest among these covers. We also provide a combinatorial characterization of shortest covers of cyclic shifts of Fibonacci strings that leads to efficient algorithms for computing these covers. We further consider the bound on the number of different lengths of shortest covers of cyclic shifts of the same string of length n. We show that this number is Θ ( log ⁡ n ) for Fibonacci strings.

TCS Journal 2020 Journal Article

Faster algorithms for 1-mappability of a sequence

Mai Alzamel
Panagiotis Charalampopoulos
Costas S. Iliopoulos
Solon P. Pissis
Jakub Radoszewski
Wing-Kin Sung

In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. There exists an algorithm to solve this problem for k = 1 requiring time O ( m n log ⁡ n / log ⁡ log ⁡ n ) using space O ( n ). Here we present two new algorithms that require worst-case time O ( m n ) and O ( n log ⁡ n log ⁡ log ⁡ n ), respectively, and space O ( n ), thus greatly improving the previous result. Moreover, we present another algorithm that requires average-case time and space O ( n ) for integer alphabets of size σ if m = Ω ( log σ ⁡ n ). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O ( k n ) and space O ( n ) if m = Ω ( k log σ ⁡ n ), assuming that the letters are independent and uniformly distributed random variables. Finally, we provide an experimental evaluation of our average-case algorithm demonstrating its competitiveness to the state-of-the-art implementation.

TCS Journal 2020 Journal Article

Universal reconstruction of a string

Paweł Gawrychowski
Tomasz Kociumaka
Jakub Radoszewski
Wojciech Rytter
Tomasz Waleń

Many properties of a string can be viewed as sets of dependencies between substrings of the string expressed in terms of substring equality. We design a linear-time algorithm which finds a solution to an arbitrary system of such constraints: a generic string satisfying a system of substring equations. This provides a general tool for reconstructing a string from different kinds of repetitions or symmetries present in the string, in particular, from runs or from maximal palindromes. The recursive structure of our algorithm in some aspects resembles the suffix array construction by Kärkkäinen et al. (2006) [23]. This is a full version of a paper presented at WADS 2015 [18].

TCS Journal 2019 Journal Article

Special issue in honor of the 70th birthday of Prof. Wojciech Rytter

Maxime Crochemore
Jakub Radoszewski

TCS Journal 2018 Journal Article

Efficient algorithms for shortest partial seeds in words

Tomasz Kociumaka
Solon P. Pissis
Jakub Radoszewski
Wojciech Rytter
Tomasz Waleń

A factor u of a word w is a cover of w if every position in w lies within some occurrence of u in w. A factor u is a seed of w if it is a cover of a superstring of w. Covers and seeds extend the classical notions of periodicity. We introduce a new notion of α-partial seed, that is, a factor covering as a seed at least α positions in a given word. We use the Cover Suffix Tree, recently introduced in the context of α-partial covers (Kociumaka et al. , 2015, [13]); an O ( n log ⁡ n ) -time algorithm constructing such a tree is known. However, it appears that partial seeds are more complicated than partial covers—our algorithms require algebraic manipulations of special functions related to edges of the modified Cover Suffix Tree and the border array. We present a procedure for computing shortest α-partial seeds that works in O ( n ) time if the Cover Suffix Tree is already given. This is a full version, which includes all the proofs, of a paper that appeared at CPM 2014 [1].

TCS Journal 2018 Journal Article

On the string consensus problem and the Manhattan sequence consensus problem

Tomasz Kociumaka
Jakub W. Pachocki
Jakub Radoszewski
Wojciech Rytter
Tomasz Waleń

We study the Manhattan Sequence Consensus problem (MSC problem) in which we are given k integer sequences, each of length ℓ, and we are to find an integer sequence x of length ℓ (called a consensus sequence) such that the maximum Manhattan distance of x from each of the input sequences is minimized. A related problem, with Hamming distance instead of Manhattan distance, is called Hamming String Consensus (HSC), also known under the names of string center problem or closest string problem. For binary sequences Manhattan distance coincides with Hamming distance, hence in this case HSC is a special case of MSC. We design a practically efficient O ( ℓ ) -time algorithm solving MSC for k ≤ 5 sequences. It improves upon the quadratic algorithm by Amir et al. (2012) [1] for HSC for k = 5 binary strings. Similarly as in the algorithm of Amir et al. , we use a column-based framework. We replace the implied general integer linear programming by its easy special cases due to combinatorial properties of MSC for k ≤ 5. Practicality of our algorithms has been verified experimentally. We also show that for a general parameter k any instance can be reduced in linear time to a kernel of size k! , so the problem is fixed-parameter tractable. Nevertheless, for k ≥ 4 this is still too large for any naive solution to be feasible in practice. This is a full version of an article published at SPIRE 2014 [15].

TCS Journal 2017 Journal Article

Covering problems for partial words and for indeterminate strings

Maxime Crochemore
Costas S. Iliopoulos
Tomasz Kociumaka
Jakub Radoszewski
Wojciech Rytter
Tomasz Waleń

Indeterminate strings are a subclass of non-standard words having non-deterministic nature. In a classic string every position contains exactly one symbol—we say it is a solid symbol—while in an indeterminate string a position may contain a set of symbols (possible at this position); such sets are called non-solid symbols. The most important subclass of indeterminate strings are partial words, where each non-solid symbol is the whole alphabet; in this case non-solid symbols are also called don't care symbols. We consider the problem of finding a shortest cover of an indeterminate string, i. e. , finding a shortest solid string whose occurrences cover the whole indeterminate string. We show that this classical problem becomes NP-complete for indeterminate strings and even for partial words. The proof of this fact is one of the main results of this paper. Our other main results focus on design of algorithms efficient with respect to certain parameters of the input (so called FPT algorithms) for the shortest cover problem. For the indeterminate string covering problem we obtain an O ( n k 2 + 2 k k 3 ) -time algorithm, where k is the number of non-solid symbols, while for the partial word covering problem we obtain a running time of O ( n k 2 + 2 O ( k log ⁡ k ) ). Additionally, we prove that, unless the Exponential Time Hypothesis is false, no 2 o ( k ) n O ( 1 ) -time solution exists for either problem, which shows that our algorithm for partial words is close to optimal. We also present an algorithm for both problems parameterized both by k and the alphabet size with a simple implementation. A preliminary version of this article was presented at the 25th International Symposium on Algorithms and Computation (ISAAC 2014), LNCS, vol. 8889, pp. 220–232, Springer (2014) [12].

TCS Journal 2016 Journal Article

Maximum number of distinct and nonequivalent nonstandard squares in a word

Tomasz Kociumaka
Jakub Radoszewski
Wojciech Rytter
Tomasz Waleń

The combinatorics of nonstandard squares in a word depends on how the equivalence of halves of the square is defined. We consider Abelian squares, parameterized squares, and order-preserving squares. The word uv is an Abelian (parameterized, order-preserving) square if u and v are equivalent in the Abelian (parameterized, order-preserving) sense. The maximum number of ordinary squares in a word is known to be asymptotically linear, but the exact bound is still investigated. We present several results on the maximum number of distinct squares for nonstandard factor equivalence relations. Let SQ Abel ( n, σ ) and SQ Abel ′ ( n, σ ) denote the maximum number of Abelian squares in a word of length n over an alphabet of size σ which are distinct as words and which are nonequivalent in the Abelian sense, respectively. For σ ≥ 2 we prove that SQ Abel ( n, σ ) = Θ ( n 2 ), SQ Abel ′ ( n, σ ) = Ω ( n 3 / 2 ) and SQ Abel ′ ( n, σ ) = O ( n 11 / 6 ). We also give linear bounds for parameterized and order-preserving squares for alphabets of constant size: SQ param ( n, O ( 1 ) ) = Θ ( n ), SQ op ( n, O ( 1 ) ) = Θ ( n ). The upper bounds have quadratic dependence on the alphabet size for order-preserving squares and exponential dependence for parameterized squares. As a side result we construct infinite words over the smallest alphabet which avoid nontrivial order-preserving squares and nontrivial parameterized cubes (nontrivial parameterized squares cannot be avoided in an infinite word). A preliminary version of this paper was published at DLT 2014 [24]. In this full version we improve or extend the bounds on all three kinds of squares.

TCS Journal 2016 Journal Article

Order-preserving indexing

Maxime Crochemore
Costas S. Iliopoulos
Tomasz Kociumaka
Marcin Kubica
Alessio Langiu
Solon P. Pissis
Jakub Radoszewski
Wojciech Rytter

Kubica et al. [33] and Kim et al. [29] introduced order-preserving pattern matching: for a given text the goal is to find its factors having the same ‘shape’ as a given pattern. Known results include a linear-time algorithm for this problem (in case of polynomially-bounded alphabet) and a generalization to multiple patterns. We propose an index that enables order-preserving pattern matching queries in time proportional to pattern length. The index can be constructed in O ( n log ⁡ log ⁡ n ) expected time or in O ( n log 2 ⁡ log ⁡ n / log ⁡ log ⁡ log ⁡ n ) worst-case time. It is an incomplete order-preserving suffix tree which may miss a single edge label at each branching node. For most applications such incomplete suffix trees provide the same functional power as the complete ones. We show a number of their applications, including computation of longest common factors, longest previously occurring factors and squares in a string in the order-preserving setting. We also give an O ( n log ⁡ n ) -time algorithm constructing complete order-preserving suffix trees.

SODA Conference 2015 Conference Paper

Internal Pattern Matching Queries in a Text and Applications

Tomasz Kociumaka
Jakub Radoszewski
Wojciech Rytter
Tomasz Walen

We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword x in another subword y given text, assuming that, which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding δ-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed δ we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding. The model of internal queries in texts is connected to the well-studied problem of text indexing. Both models have their origins in the introduction of suffix trees. However, there is an important difference: in our model the size of the representation of a query is constant and therefore enables faster query time. Our results can be viewed as efficient solutions to “internal” equivalents of several basic problems of regular pattern matching and make an improvement in a majority of already published results related to internal queries.

TCS Journal 2014 Journal Article

Efficient counting of square substrings in a tree

Tomasz Kociumaka
Jakub Pachocki
Jakub Radoszewski
Wojciech Rytter
Tomasz Waleń

We give an algorithm which in O ( n log 2 n ) time counts all distinct squares in a labeled tree. There are two main obstacles to overcome. The first one is that the number of distinct squares in a tree is Ω ( n 4 / 3 ) (see Crochemore et al. , 2012 [7]), which differs substantially from the case of classical strings for which there are only linearly many distinct squares. We overcome this obstacle by using a compact representation of all squares (based on maximal cyclic shifts) which requires only O ( n log n ) space. The second obstacle is lack of adequate algorithmic tools for labeled trees, consequently we design several novel tools, this is the most complex part of the paper. In particular we extend to trees Imre Simon's compact representations of the failure table in pattern matching machines.

SODA Conference 2012 Conference Paper

A linear time algorithm for seeds computation

Tomasz Kociumaka
Marcin Kubica 0001
Jakub Radoszewski
Wojciech Rytter
Tomasz Walen

A seed in a word is a relaxed version of a period. We show a linear time algorithm computing a compact representation of all the seeds of a word, in particular, the shortest seed. Thus, we solve an open problem stated in the survey by Smyth (2000) and improve upon a previous over 15-year old O ( n log n ) algorithm by Iliopoulos, Moore and Park (1996). Our approach is based on combinatorial relations between seeds and a variant of the LZ-factorization (used here for the first time in context of seeds).