Arrow Research search
Back to FOCS

FOCS 2004

Approximating Edit Distance Efficiently

Conference Paper Session 13 Algorithms and Complexity ยท Theoretical Computer Science

Abstract

Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as low-distortion embeddings into normed spaces, approximate nearest-neighbor schemes, and sketching algorithms, known results for the edit distance are rather weak. We develop algorithms that solve gap versions of the edit distance problem: given two strings of length n with the promise that their edit distance is either at most k or greater than /spl lscr/, decide which of the two holds. We present two sketching algorithms for gap versions of edit distance. Our first algorithm solves the k vs. (kn)/sup 2/3/ gap problem, using a constant size sketch. A more involved algorithm solves the stronger k vs. /spl lscr/ gap problem, where /spl lscr/ can be as small as O(k/sup 2/) - still with a constant sketch - but works only for strings that are mildly "nonrepetitive". Finally, we develop an n/sup 3/7/-approximation quasilinear time algorithm for edit distance, improving the previous best factor of n/sup 3/4/ (Cole and Hariharan, 2002); if the input strings are assumed to be nonrepetitive, then the approximation factor can be strengthened to n/sup 1/3/.

Authors

Keywords

  • Approximation algorithms
  • Computational biology
  • Text processing
  • Algorithm design and analysis
  • Genomics
  • Bioinformatics
  • Web search
  • Books
  • Dynamic programming
  • Heuristic algorithms
  • Edit Distance
  • Linear Time
  • String Length
  • Normed Space
  • Input String
  • Linear-time Algorithm
  • Lower Bound
  • Estimation Algorithm
  • Shortest Path
  • Time Distance
  • Problem In Time
  • Hamming Distance
  • Reconstruction Procedure
  • Number Of Insertions
  • Dynamic Programming Algorithm
  • Optimal Alignment
  • Nearest Neighbor Algorithm
  • Positional Encoding
  • Edit Operations
  • Hamming Weight
  • Nearby Positions

Context

Venue
IEEE Symposium on Foundations of Computer Science
Archive span
1975-2025
Indexed papers
3809
Paper id
345070506929384855