Arrow Research search

Author name cluster

Moshe Lewenstein

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers
2 author rows

Possible papers

23

TCS Journal 2016 Journal Article

Document retrieval with one wildcard

  • Moshe Lewenstein
  • J. Ian Munro
  • Yakov Nekrich
  • Sharma V. Thankachan

In this paper we extend several well-known document listing problems to the case when documents contain a substring that approximately matches the query pattern. We study the scenario when the query string can contain a wildcard symbol that matches any alphabet symbol; all documents that match a query pattern with one wildcard must be enumerated. We describe a linear space data structure that reports all documents containing a substring P in O ( | P | + σ log ⁡ log ⁡ log ⁡ n + docc ) time, where σ is the alphabet size and docc is the number of listed documents. We also describe a succinct solution for this problem, as well as a solution for an extension of this problem. Furthermore our approach enables us to obtain an O ( n σ ) -space data structure that enumerates all documents containing both a pattern P 1 and a pattern P 2 in the special case when P 1 and P 2 differ in one symbol.

TCS Journal 2016 Journal Article

Two dimensional range minimum queries and Fibonacci lattices

  • Gerth Stølting Brodal
  • Pooya Davoodi
  • Moshe Lewenstein
  • Rajeev Raman
  • Srinivasa Rao Satti

Given a matrix of size N, two dimensional range minimum queries (2D-RMQs) ask for the position of the minimum element in a rectangular range within the matrix. We study trade-offs between the query time and the additional space used by indexing data structures that support 2D-RMQs. Using a novel technique—the discrepancy properties of Fibonacci lattices—we give an indexing data structure for 2D-RMQs that uses O ( N / c ) bits additional space with O ( c log ⁡ c ( log ⁡ log ⁡ c ) 2 ) query time, for any parameter c, 4 ≤ c ≤ N. Also, when the entries of the input matrix are from { 0, 1 }, we show that the query time can be improved to O ( c log ⁡ c ) with the same space usage.

TCS Journal 2014 Journal Article

Generalized substring compression

  • Orgad Keller
  • Tsvi Kopelowitz
  • Shir Landau Feibish
  • Moshe Lewenstein

In substring compression one is given a text to preprocess so that, upon request, a compressed substring is returned. Generalized substring compression is the same with the following twist. The queries contain an additional context substring (or a collection of context substrings) and the answers are the substring in compressed format, where the context substring is used to make the compression more efficient. We focus our attention on generalized substring compression and present the first non-trivial correct algorithm for this problem. Inherent to our algorithm is a new method for finding the bounded longest common prefix of substrings, which may be of independent interest. In addition, we propose an efficient algorithm for substring compression which makes use of range successor queries. We present several tradeoffs for both problems. For compressing the substring S [ i. . j ] (possibly with the substring S [ α. . β ] as a context), the best query times we achieve are O ( C ) and O ( C log ( j − i C ) ) for substring compression query and generalized substring compression query, respectively, where C is the number of phrases encoded. A preliminary version of this paper has been presented in [21].

TCS Journal 2014 Journal Article

Less space: Indexing for queries with wildcards

  • Moshe Lewenstein
  • J. Ian Munro
  • Venkatesh Raman
  • Sharma V. Thankachan

Text indexing is a fundamental problem in computer science, where the task is to index a given text (string) T [ 1. . n ], such that whenever a pattern P [ 1. . p ] comes as a query, we can efficiently report all those locations where P occurs as a substring of T. In this paper, we consider the case when P contains wildcard characters (which can match with any other character). The first non-trivial solution for the problem was given by Cole et al. [11], where the index space is O ( n log k ⁡ n ) words or O ( n log k + 1 ⁡ n ) bits and the query time is O ( p + 2 h log ⁡ log ⁡ n + occ ), where k is the maximum number of wildcard characters allowed in P, h ≤ k is the number of wildcard characters in P and occ represents the number of occurrences of P in T. Even though many indexes offering different space-time trade-offs were later proposed, a clear improvement on this result is still not known. In this paper, we first propose an O ( n log k + ϵ ⁡ n ) bits index achieving the same query time as the of Cole et al. 's index, where 0 < ϵ < 1 is an arbitrary small constant. Then we propose another index of size O ( n log k ⁡ n log ⁡ σ ) bits, but with a slightly higher query time of O ( p + 2 h log ⁡ n + occ ), where σ denotes the alphabet set size. We also study a related problem, where the task is to index a collection of documents (of n characters in total) so as to find the number of distinct documents containing a query pattern P. For the case where P contains at most a single wildcard character, we propose an O ( n log ⁡ n ) -word index with optimal O ( p ) query time.

TCS Journal 2014 Journal Article

Quick greedy computation for minimum common string partition

  • Isaac Goldstein
  • Moshe Lewenstein

In the minimum common string partition problem one is given two strings S and T with the same character statistics and one seeks for the smallest partition of S into substrings so that T can also be partitioned into the same substring multiset. The problem is fundamental in several variants of edit distance with block operations, e. g. signed reversal distance with duplicates and edit distance with moves. The minimum common string partition problem is known to be NP-complete and the best approximation algorithm known has an approximation factor of O ( log n log ⁎ n ). Since the minimum common string partition problem is of utmost practical importance one seeks a heuristic that will (1) usually have a low approximation factor and (2) will run fast. A simple greedy algorithm is known, which iteratively choose non-overlapping longest common substrings of the input strings. This algorithm has been well-studied from an approximation point of view and it has been shown to have a bad worst case approximation factor. However, all the bad approximation factors presented so far stem from complicated recursive construction. In practice the greedy algorithm seems to have small approximation factors. However, the best current implementation of greedy runs in quadratic time. We propose a novel method to implement greedy in linear time.

TCS Journal 2012 Journal Article

On demand string sorting over unbounded alphabets

  • Carmel Kent
  • Moshe Lewenstein
  • Dafna Sheinwald

On-demand string sorting is the problem of preprocessing a set of strings to allow subsequent queries for finding the k lexicographically smallest strings (and afterward the next k etc.) This on-demand variant strongly resembles the search engine queries which give you the best k -ranked pages recurringly. We present a data structure that supports this in O ( n ) preprocessing time, where n is the number of strings, and answer queries in O ( log n ) time. There is also a cost of O ( N ) time amortized over all operations, where N is the total length of the strings. Our data structure is a heap of strings, which supports heapify and delete-mins. As it turns out, implementing a full heap with all operations is not that simple. For the sake of completeness, we propose a heap with full operations based on balanced indexing trees that supports the heap operations in optimal times.

SODA Conference 2011 Conference Paper

Fast, precise and dynamic distance queries

  • Yair Bartal
  • Lee-Ad Gottlieb
  • Tsvi Kopelowitz
  • Moshe Lewenstein
  • Liam Roditty

We present an approximate distance oracle for a point set S with n points and doubling dimension Λ. For every ε > 0, the oracle supports (1 + ε)-approximate distance queries in (universal) constant time, occupies space [ε − O (Λ) + 2 O (Λ log Λ) ] n, and can be constructed in [2 O (Λ) log 3 n + ε − O (Λ) + 2 O (Λ log Λ) ] n expected time. This improves upon the best previously known constructions, presented by Har-Peled and Mendel [13]. Furthermore, the oracle can be made fully dynamic with expected O (1) query time and only 2 O (Λ) log n + ε − O (Λ) + 2 O (Λ log Λ) update time. This is the first fully dynamic (1 + ε)-distance oracle.

TCS Journal 2009 Journal Article

On the longest common parameterized subsequence

  • Orgad Keller
  • Tsvi Kopelowitz
  • Moshe Lewenstein

The well-known problem of the longest common subsequence (LCS), of two strings of lengths n and m respectively, is O ( n m ) -time solvable and is a classical distance measure for strings. Another well-studied string comparison measure is that of parameterized matching, where two equal-length strings are a parameterized match if there exists a bijection on the alphabets such that one string matches the other under the bijection. All works associated with parameterized pattern matching present polynomial time algorithms. There have been several attempts to accommodate parameterized matching along with other distance measures, as these turn out to be natural problems, e. g. , Hamming distance, and a bounded version of edit-distance. Several algorithms have been proposed for these problems. In this paper we consider the longest common parameterized subsequence problem which combines the LCS measure with parameterized matching. We prove that the problem is NP -hard, and then show a couple of approximation algorithms for the problem.

FOCS Conference 2003 Conference Paper

Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs

  • Haim Kaplan
  • Moshe Lewenstein
  • Nira Shafrir
  • Maxim Sviridenko

A directed multigraph is said to be d-regular if the indegree and outdegree of every vertex is exactly d. By Hall's theorem one can represent such a multigraph as a combination of at most n/sup 2/ cycle covers each taken with an appropriate multiplicity. We prove that if the d-regular multigraph does not contain more than /spl lfloor/d/2/spl rfloor/ copies of any 2-cycle then we can find a similar decomposition into 0(n/sup 2/) pairs of cycle covers where each 2-cycle occurs in at most one component of each pair. Our proof is constructive and gives a polynomial algorithm to find such decomposition. Since our applications only need one such a pair of cycle covers whose weight is at least the average weight of all pairs, we also give a simpler algorithm to extract a single such pair. This combinatorial theorem then comes handy in rounding a fractional solution of an LP relaxation of the maximum and minimum TSP problems. For maximum TSP, we obtain a tour whose weight is at least 2/3 of the weight of the longest tour, improving a previous 5/8 approximation. For minimum TSP we obtain a tour whose weight is at most 0. 842log/sub 2/ n times the optimal, improving a previous 0. 999log/sub 2/ n approximation. Utilizing a reduction from maximum TSP to the shortest superstring problem we obtain a 2. 5-approximation algorithm for the latter problem which is again much simpler than the previous one. Other applications of the rounding procedure are approximation algorithms for maximum 3-cycle cover (factor 2/3, previously 3/5) and maximum asymmetric TSP with triangle inequality (factor 10/13, previously 3/4 ).

FOCS Conference 1997 Conference Paper

Pattern Matching with Swaps

  • Amihood Amir
  • Yonatan Aumann
  • Gad M. Landau
  • Moshe Lewenstein
  • Noa Lewenstein

Let a text string T of n symbols and a pattern string P of m symbols from alphabet /spl Sigma/ be given. A swapped version T' of T is a length n string derived from T by a series of local swaps, (i. e. t/sup '//sub l//spl larr/t/sub l+1/ and t'/sub l+1//spl larr/t/sub l/) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of finding all locations i for which there exists a swapped version T' of T where there is an exact matching of P in location i of T'. It has been an open problem whether swapped matching can be done in less than O(mn) time. In this paper we show the first algorithm that solves the pattern matching with swaps problem in time O(mn). We present an algorithm whose time complexity is O(nm/sup 1/3/ log m log/sup 2/ /spl sigma/) for a general alphabet /spl Sigma/, where /spl sigma/=min(m, |/spl Sigma/|).