Arrow Research search
Back to TCS

TCS 2014

Quick greedy computation for minimum common string partition

Journal Article journal-article Computer Science ยท Theoretical Computer Science

Abstract

In the minimum common string partition problem one is given two strings S and T with the same character statistics and one seeks for the smallest partition of S into substrings so that T can also be partitioned into the same substring multiset. The problem is fundamental in several variants of edit distance with block operations, e. g. signed reversal distance with duplicates and edit distance with moves. The minimum common string partition problem is known to be NP-complete and the best approximation algorithm known has an approximation factor of O ( log n log โŽ n ). Since the minimum common string partition problem is of utmost practical importance one seeks a heuristic that will (1) usually have a low approximation factor and (2) will run fast. A simple greedy algorithm is known, which iteratively choose non-overlapping longest common substrings of the input strings. This algorithm has been well-studied from an approximation point of view and it has been shown to have a bad worst case approximation factor. However, all the bad approximation factors presented so far stem from complicated recursive construction. In practice the greedy algorithm seems to have small approximation factors. However, the best current implementation of greedy runs in quadratic time. We propose a novel method to implement greedy in linear time.

Authors

Keywords

  • Strings
  • Approximation algorithm
  • Pattern matching

Context

Venue
Theoretical Computer Science
Archive span
1975-2026
Indexed papers
16261
Paper id
773945259280616325