# Essays/Levenshtein Distance

## Introduction

The Levenshtein distance between two strings is the minimum number of edits (insertions, deletions, or single-character substitutions) needed to transform one string to the other. We derive a non-looping calculation of this distance.

## A Looping Solution

First, a translation of the pseudo-code in the Wikipedia article into a looping function in J:

```LevenshteinMatrix=: 4 : 0
d=. (1+(#x),#y)\$0
for_i. i.1+#x do. d=. i (<i,0)}d end.  NB. deletion
for_j. i.1+#y do. d=. j (<0,j)}d end.  NB. insertion
for_j. i.#y do.
for_i. i.#x do.
if. (i{x) = j{y do.
d=. d (<1+i,j)}~ (<i,j){d
else.
m=.      1 + (<i,1+j){d             NB. deletion
m=. m <. 1 + (<(1+i),j){d           NB. insertion
m=. m <. 1 + (<i,j){d               NB. substitution
d=. m (<1+i,j)}d
end. end. end.
)
```

The function returns the entire matrix rather than just the Levenshtein distance; the actual distance is the number in the lower right corner. It reproduces the two test cases in the Wikipedia article:

```   'sitting' LevenshteinMatrix 'kitten'
0 1 2 3 4 5 6
1 1 2 3 4 5 6
2 2 1 2 3 4 5
3 3 2 1 2 3 4
4 4 3 2 1 2 3
5 5 4 3 2 2 3
6 6 5 4 3 3 2
7 7 6 5 4 4 3
'Sunday' LevenshteinMatrix 'Saturday'
0 1 2 3 4 5 6 7 8
1 0 1 2 3 4 5 6 7
2 1 1 2 2 3 4 5 6
3 2 2 2 3 3 4 5 6
4 3 3 3 3 4 3 4 5
5 4 3 4 4 4 4 3 4
6 5 4 4 5 5 5 4 3
```

## A Non-Looping Solution

A non-looping version obtains through a series of algebraic manipulations, replacing the inner loop with an insert (f/) and the outer loop with an application of the power operator (^:). The result of this function is the transpose of the matrix produced by LevenshteinMatrix.

```lmy=: 4 : 'y,(0{x){(1+(1{x)<.{:y),2{x'

lmx=: 4 : 0
b=. (_1+#y){x
d=. {:y
m=. (}.<.}:) d
y , > lmy&.>/ (|.<"1 b,.m,.}:d),<#y
)

LM=: =/~ lmx^:(#@[) ,:@i.@>:@#@[
```

Testing against the looping version:

```   'sitting' (|:@LM -: LevenshteinMatrix) 'kitten'
1
'Sunday'  (|:@LM -: LevenshteinMatrix) 'Saturday'
1
```

## Version Producing Only The Scalar Distance

By Henry Rich.

```levdist=: 4 : 0 " 1
'a b'=. (/: #&>)x;y
z=. >: iz =. i.#b
for_j. a do.
z=. <./\&.(-&iz) (>: <. (j ~: b) + |.!.j_index) z
end.
{:z
)
```

## Collected Definitions

```LevenshteinMatrix=: 4 : 0
d=. (1+(#x),#y)\$0
for_i. i.1+#x do. d=. i (<i,0)}d end.  NB. deletion
for_j. i.1+#y do. d=. j (<0,j)}d end.  NB. insertion
for_j. i.#y do.
for_i. i.#x do.
if. (i{x) = j{y do.
d=. d (<1+i,j)}~ (<i,j){d
else.
m=.      1 + (<i,1+j){d             NB. deletion
m=. m <. 1 + (<(1+i),j){d           NB. insertion
m=. m <. 1 + (<i,j){d               NB. substitution
d=. m (<1+i,j)}d
end. end. end.
)

lmy=: 4 : 'y,(0{x){(1+(1{x)<.{:y),2{x'

lmx=: 4 : 0
b=. (_1+#y){x
d=. {:y
m=. (}.<.}:) d
y , > lmy&.>/ (|.<"1 b,.m,.}:d),<#y
)

LM=: =/~ lmx^:(#@[) ,:@i.@>:@#@[

levdist=: 4 : 0 " 1
'a b'=. (/: #&>)x;y
z=. >: iz =. i.#b
for_j. a do.
z=. <./\&.(-&iz) (>: <. (j ~: b) + |.!.j_index) z
end.
{:z
)
```

Contributed by Roger Hui.