Edit Distance#
- The edit distance between two strings is the minimum total cost of the
operations required to convert
str1tostr2? - The allowed operations and their corresponding costs are as follows:
- Inserting a letter costs \(\alpha\).
- Deleting a letter costs \(\beta\).
- Replacing a letter with a different letter costs \(\gamma\).
- Question: Given two strings
str1andstr2, what is the edit distance betweenstr1andstr2?
Dynamic Programming Solution#
A fully executable example can be found on our GitHub.
Let \(OPT[i, j]\) denote the edit distance from the first \(i\) characters of
str1 to the first \(j\) characters of str2.
BASE CASE: \(i\) is \(0\). str1 is empty, so we should just pay the cost of
adding the remaining letters in str2. So we should record that \(OPT[0, j]
= \alpha \cdot j\).
BASE CASE: \(j\) is \(0\). str2 is empty, so we should just pay the cost to
removing the remaining letters in str1. So we should record that \(OPT[i, 0]
= \beta \cdot i\).
CASE 1: str1[i] == str2[j]. The last letter of str1 is the same as the
last letter of str2. So we should leave the last letters alone and convert
str1[i-1] to str2[j-1] as cheaply as possible. So, \(OPT[i, j] = OPT[i-1,
j-1]\).
CASE 2: str1[i] != str2[j]. The last letter of str1 is not the same as
the last letter of str2. At this point, we have to either add the last letter
of str2 to str1, delete the last letter of str1, or replace the last
letter of str1 with the last letter of str2. Each option is an edit, so we
must record \(OPT[i, j] = \min(\alpha + OPT[i, j - 1], \beta + OPT[i - 1, j],
\gamma + OPT[i - 1, j - 1])\).
Visualization with dpvis#
We can visualize this with dpvis as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | |