Author name cluster

Diptarama Hendrian

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

1 author row

TCS Journal 2024 Journal Article

Linear time online algorithms for constructing linear-size suffix trie

Diptarama Hendrian
Takuya Takagi
Shunsuke Inenaga
Keisuke Goto
Mitsuru Funakoshi

The suffix trees are fundamental data structures for various kinds of string processing. The suffix tree of a text string T of length n has O ( n ) nodes and edges, and the string label of each edge is encoded by a pair of positions in T. Thus, even after the tree is built, the input string T needs to be kept stored and random access to T is still needed. The linear-size suffix tries (LSTs), proposed by Crochemore et al. [Linear-size suffix tries, TCS 638: 171-178, 2016], are a “stand-alone” alternative to the suffix trees. Namely, the LST of an input text string T of length n occupies O ( n ) total space, and supports pattern matching and other tasks with the same efficiency as the suffix tree without the need to store the input text string T. Crochemore et al. proposed an offline algorithm which transforms the suffix tree of T into the LST of T in O ( n log ⁡ σ ) time and O ( n ) space, where σ is the alphabet size. In this paper, we present two types of online algorithms which “directly” construct the LST, from right to left, and from left to right, without constructing the suffix tree as an intermediate structure. Both algorithms construct the LST incrementally when a new symbol is read, and do not access the previously read symbols. Both of the right-to-left construction algorithm and the left-to-right construction algorithm work in O ( n log ⁡ σ ) time and O ( n ) space. The main feature of our algorithms is that the input text string does not need to be stored.

Details DOI

TCS Journal 2022 Journal Article

Parameterized DAWGs: Efficient constructions and bidirectional pattern searches

Katsuhito Nakashima
Noriki Fujisato
Diptarama Hendrian
Yuto Nakashima
Ryo Yoshinaka
Shunsuke Inenaga
Hideo Bannai
Ayumi Shinohara

Two strings x and y over Σ ∪ Π of equal length are said to parameterized match (p-match) if there is a renaming bijection f: Σ ∪ Π → Σ ∪ Π that is identity on Σ and transforms x to y (or vice versa). The p-matching problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose parameterized suffix automata (p-suffix automata) and parameterized directed acyclic word graphs (PDAWGs) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have Θ ( n 2 ) nodes and edges but PDAWGs have only O ( n ) nodes and edges, where n is the length of an input string. We also give an O ( n | Π | log ⁡ ( | Π | + | Σ | ) ) -time O ( n ) -space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the parameterized suffix tree for the reversed string can also be built in the same time and space, in a right-to-left online manner. This duality also leads us to two further efficient algorithms for p-matching: Given the parameterized suffix tree for the reversal T ‾ of the input string T, one can build the PDAWG of T in O ( n ) time in an offline manner; One can perform bidirectional p-matching in O ( m log ⁡ ( | Π | + | Σ | ) + occ ) time using O ( n ) space, where m denotes the pattern length and occ is the number of pattern occurrences in the text T.

Details DOI

TCS Journal 2020 Journal Article

Efficient computation of longest single-arm-gapped palindromes in a string

Shintaro Narisada
Diptarama Hendrian
Kazuyuki Narisawa
Shunsuke Inenaga
Ayumi Shinohara

In this paper, we introduce new types of approximate palindromes called single-arm-gapped palindromes (shortly SAGPs). A SAGP contains a gap in either its left or right arm, which is in the form of either w g u c u R w R or w u c u R g w R, where w and u are non-empty strings, w R and u R are respectively the reversed strings of w and u, g is a string called a gap, and c is either a single character or the empty string. Here we call wu and u R w R the arm of the SAGP, and | u v | the length of the arm. We classify SAGPs into two groups: those which have u c u R as a maximal palindrome (type-1), and the others (type-2). We propose several algorithms to compute type-1 SAGPs with longest arms occurring in a given string, based on suffix arrays. Then, we propose a linear-time algorithm to compute all type-1 SAGPs with longest arms, based on suffix trees. Also, we show how to compute type-2 SAGPs with longest arms in linear time. We also perform some preliminary experiments to show practical performances of the proposed methods.

Details DOI

TCS Journal 2020 Journal Article

Linear-time online algorithm for inferring the shortest path graph from a walk label

Shintaro Narisada
Diptarama Hendrian
Ryo Yoshinaka
Ayumi Shinohara

We consider the problem of inferring an edge-labeled graph from the sequence of edge labels seen in a walk on that graph. It has been known that this problem is solvable in O ( n log ⁡ n ) time when the targets are path or cycle graphs. This paper presents an online algorithm for the problem of this restricted case that runs in O ( n ) time, based on Manacher's algorithm for computing all the maximal palindromes in a string.

Details DOI

TCS Journal 2019 Journal Article

Efficient dynamic dictionary matching with DAWGs and AC-automata

Diptarama Hendrian
Shunsuke Inenaga
Ryo Yoshinaka
Ayumi Shinohara

The dictionary matching is a task to find all occurrences of pattern strings in a set D (called a dictionary) on a text string T. The Aho–Corasick-automaton (AC-automaton) which is built on D is a fundamental data structure which enables us to solve the dictionary matching problem in O ( d log ⁡ σ ) preprocessing time and O ( n log ⁡ σ + occ ) matching time, where d is the total length of the patterns in the dictionary D, n is the length of the text, σ is the alphabet size, and occ is the total number of occurrences of all the patterns in the text. The dynamic dictionary matching is a variant where patterns may dynamically be inserted into and deleted from the dictionary D. This problem is called semi-dynamic dictionary matching if only insertions are allowed. In this paper, we propose two efficient algorithms that can solve both problems with some modifications. For a pattern of length m, our first algorithm supports insertions in O ( m log ⁡ σ + log ⁡ d / log ⁡ log ⁡ d ) time and pattern matching in O ( n log ⁡ σ + occ ) for the semi-dynamic setting. This algorithm also supports both insertions and deletions in O ( σ m + log ⁡ d / log ⁡ log ⁡ d ) time and pattern matching in O ( n ( log ⁡ d / log ⁡ log ⁡ d + log ⁡ σ ) + occ ( log ⁡ d / log ⁡ log ⁡ d ) ) time for the dynamic dictionary matching problem by some modifications. This algorithm is based on the directed acyclic word graph (DAWG) of Blumer et al. (JACM 1987). Our second algorithm, which is based on the AC-automaton, supports insertions in O ( m log ⁡ σ + u f + u o ) time for the semi-dynamic setting and supports both insertions and deletions in O ( σ m + u f + u o ) time for the dynamic setting, where u f and u o respectively denote the numbers of states in which the failure function and the output function need to be updated. This algorithm performs pattern matching in O ( n log ⁡ σ + occ ) time for both settings. Our algorithm achieves optimal update time for AC-automaton based methods over constant-size alphabets, since any algorithm which explicitly maintains the AC-automaton requires Ω ( m + u f + u o ) update time.

Details DOI

GandALF Workshop 2019 Workshop Paper

Query Learning Algorithm for Residual Symbolic Finite Automata

Kaizaburo Chubachi
Diptarama Hendrian
Ryo Yoshinaka
Ayumi Shinohara

We propose a query learning algorithm for residual symbolic finite automata (RSFAs). Symbolic finite automata (SFAs) are finite automata whose transitions are labeled by predicates over a Boolean algebra, in which a big collection of characters leading the same transition may be represented by a single predicate. Residual finite automata (RFAs) are a special type of non-deterministic finite automata which can be exponentially smaller than the minimum deterministic finite automata and have a favorable property for learning algorithms. RSFAs have both properties of SFAs and RFAs and can have more succinct representation of transitions and fewer states than RFAs and deterministic SFAs accepting the same language. The implementation of our algorithm efficiently learns RSFAs over a huge alphabet and outperforms an existing learning algorithm for deterministic SFAs. The result also shows that the benefit of non-determinism in efficiency is even larger in learning SFAs than non-symbolic automata.

PDF Details DOI