Arrow Research search

Author name cluster

Raphaël Clifford

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

MFCS Conference 2019 Conference Paper

RLE Edit Distance in Near Optimal Time

  • Raphaël Clifford
  • Pawel Gawrychowski
  • Tomasz Kociumaka
  • Daniel P. Martin 0001
  • Przemyslaw Uznanski

We show that the edit distance between two run-length encoded strings of compressed lengths m and n respectively, can be computed in O(mn log(mn)) time. This improves the previous record by a factor of O(n/log(mn)). The running time of our algorithm is within subpolynomial factors of being optimal, subject to the standard SETH-hardness assumption. This effectively closes a line of algorithmic research first started in 1993.

SODA Conference 2019 Conference Paper

The streaming k-mismatch problem

  • Raphaël Clifford
  • Tomasz Kociumaka
  • Ely Porat

We consider the streaming complexity of a fundamental task in approximate pattern matching: the k -mismatch problem. In this problem, we must compute Hamming distances between a pattern of length n and all length- n substrings of a text for which the Hamming distance does not exceed a given threshold k. In our problem formulation, we report not only the Hamming distance but also, on demand, the full mismatch information, that is the list of mismatched pairs of symbols and their indices. The twin challenges of streaming pattern matching derive from the need both to achieve small working space and also to guarantee that every arriving input symbol is processed quickly. We present a streaming algorithm for the k -mismatch problem which uses bits of space and spends time on each symbol of the input stream. In our formulation, the pattern is also in the stream, arriving directly before the text. The running time almost matches the classic offline solution [5] and the space usage is within a logarithmic factor of optimal. Our new algorithm therefore effectively resolves and also extends a problem first introduced in FOCS’09 [38]. En route to this solution, we also give a deterministic -bit encoding of all the alignments with Hamming distance at most k of a length- n pattern within a text of length O ( n ). This secondary result provides an optimal solution to a natural encoding problem which may be of independent interest.

SODA Conference 2016 Conference Paper

The k -mismatch problem revisited

  • Raphaël Clifford
  • Allyx Fontaine
  • Ely Porat
  • Benjamin Sach
  • Tatiana Starikovskaya

We revisit the complexity of one of the most basic problems in pattern matching. In the k -mismatch problem we must compute the Hamming distance between a pattern of length m and every m -length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output “No”. We study this problem in both the standard offline setting and also as a streaming problem. In the streaming k -mismatch problem the text arrives one symbol at a time and we must give an output before processing any future symbols. Our main results are as follows: Our first result is a deterministic O ( nk 2 log k / m + n polylog m ) time offline algorithm for k -mismatch on a text of length n. This is a factor of k improvement over the fastest previous result of this form from SODA 2000 [9, 10]. We then give a randomised and online algorithm which runs in the same time complexity but requires only O ( k 2 polylog m ) space in total. Next we give a randomised (1 + ∊)-approximation algorithm for the streaming k -mismatch problem which uses O ( k 2 polylog m /∊ 2 ) space and runs in O (polylog m /∊ 2 ) worst-case time per arriving symbol. Finally we combine our new results to derive a randomised O ( k 2 polylog m ) space algorithm for the streaming k -mismatch problem which runs in worst-case time per arriving symbol. This improves the best previous space complexity for streaming k -mismatch from FOCS 2009 [26] by a factor of k. We also improve the time complexity of this previous result by an even greater factor to match the fastest known offline algorithm (up to logarithmic factors).

FOCS Conference 2015 Conference Paper

New Unconditional Hardness Results for Dynamic and Online Problems

  • Raphaël Clifford
  • Allan Grønlund Jørgensen
  • Kasper Green Larsen

There has been a resurgence of interest in lower bounds whose truth rests on the conjectured hardness of well known computational problems. These conditional lower bounds have become important and popular due to the painfully slow progress on proving strong unconditional lower bounds. Nevertheless, the long term goal is to replace these conditional bounds with unconditional ones. In this paper we make progress in this direction by studying the cell probe complexity of two conjectured to be hard problems of particular importance: matrix-vector multiplication and a version of dynamic set disjointness known as Patrascu's Multiphase Problem. We give improved unconditional lower bounds for these problems as well as introducing new proof techniques of independent interest. These include a technique capable of proving strong threshold lower bounds of the following form: If we insist on having a very fast query time, then the update time has to be slow enough to compute a lookup table with the answer to every possible query. This is the first time a lower bound of this type has been proven.

FOCS Conference 2013 Conference Paper

Element Distinctness, Frequency Moments, and Sliding Windows

  • Paul Beame
  • Raphaël Clifford
  • Widad Machmouchi

We derive new time-space tradeoff lower bounds and algorithms for exactly computing statistics of input data, including frequency moments, element distinctness, and order statistics, that are simple to calculate for sorted data. We develop a randomized algorithm for the element distinctness problem whose time T and space S satisfy T ∈ Õ (n 3/2 /S 1/2 ), smaller than previous lower bounds for comparison-based algorithms, showing that element distinctness is strictly easier than sorting for randomized branching programs. This algorithm is based on a new time and space efficient algorithm for finding all collisions of a function f from a finite set to itself that are reachable by iterating f from a given set of starting points. We further show that our element distinctness algorithm can be extended at only a polylogarithmic factor cost to solve the element distinctness problem over sliding windows, where the task is to take an input of length 2n-1 and produce an output for each window of length n, giving n outputs in total. In contrast, we show a time-space tradeoff lower bound of T ∈ Ω(n 2 /S) for randomized branching programs to compute the number of distinct elements over sliding windows. The same lower bound holds for computing the low-order bit of F 0 and computing any frequency moment F k, ≠ 1. This shows that those frequency moments and the decision problem F 0 mod 2 are strictly harder than element distinctness. We complement this lower bound with a T ∈ Õ(n 2 /S) comparison-based deterministic RAM algorithm for exactly computing F k over sliding windows, nearly matching both our lower bound for the sliding-window version and the comparison-based lower bounds for the single-window version. We further exhibit a quantum algorithm for F 0 over sliding windows with T ∈ O(n 3/2 /S 1/2 ). Finally, we consider the computations of order statistics over sliding windows.

TCS Journal 2013 Journal Article

Space lower bounds for online pattern matching

  • Raphaël Clifford
  • Markus Jalsenius
  • Ely Porat
  • Benjamin Sach

We present space lower bounds for online pattern matching under a number of different distance measures. Given a pattern of length m and a text that arrives one character at a time, the online pattern matching problem is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. We require that the correct answer is given at each position with constant probability. We give Ω ( m ) bit space lower bounds for L 1, L 2, L ∞, Hamming, edit and swap distances as well as for any algorithm that computes the cross-correlation/convolution. We then show a dichotomy between distance functions that have wildcard-like properties and those that do not. In the former case which includes, as an example, pattern matching with character classes, we give Ω ( m ) bit space lower bounds. For other distance functions, we show that there exist space bounds of Ω ( log m ) and O ( log 2 m ) bits. Finally we discuss space lower bounds for non-binary inputs and show how in some cases they can be improved.

SODA Conference 2013 Conference Paper

Tight Cell-Probe Bounds for Online Hamming Distance Computation

  • Raphaël Clifford
  • Markus Jalsenius
  • Benjamin Sach

We show tight bounds for online Hamming distance computation in the cell-probe model with word size w. The task is to output the Hamming distance between a fixed string of length n and the last n symbols of a stream. We give a lower bound of time on average per output, where δ is the number of bits needed to represent an input symbol. We argue that this bound is tight within the model. The lower bound holds under randomisation and amortisation.

SODA Conference 2009 Conference Paper

From coding theory to efficient pattern matching

  • Raphaël Clifford
  • Klim Efremenko
  • Ely Porat
  • Amir Rothschild

We consider the classic problem of pattern matching with few mismatches in the presence of promiscuously matching wildcard symbols. Given a text t of length n and a pattern p of length m with optional wildcard symbols and a bound k, our algorithm finds all the alignments for which the pattern matches the text with Hamming distance at most k and also returns the location and identity of each mismatch. The algorithm we present is deterministic and runs in Õ( kn ) time, matching the best known randomised time complexity to within logarithmic factors. The solutions we develop borrow from the tool set of algebraic coding theory and provide a new framework in which to tackle approximate pattern matching problems.