====== Levenshtein Distance ======
Levenshtein Distance is a method to check the similarity of wo string. If the string is very similar, we will get a small value for the distance. If the two strings are completely off, we get the value of max string lenth as the distance.
Here is my implementation in java:
/**
* Calculate the Levenshtein Distance for the given two string.
*
* @param in1
* @param in2
* @return The Levenshtein Distance
*/
public static int levenshteinDistance(String in1, String in2) {
in1 = " " + in1;
in2 = " " + in2;
int iWidth = in1.length();
int iHeight = in2.length();
int[][] matrix = new int[iWidth][iHeight];
for (int i = 0; i < iWidth; i++) {
matrix[i][0] = i;
}
for (int i = 0; i < iHeight; i++) {
matrix[0][i] = i;
}
for (int i = 1; i < in1.length(); i++) {
for (int j = 1; j < in2.length(); j++) {
int substitutionCost = 0;
if (in1.charAt(i) != in2.charAt(j)) {
substitutionCost = 1;
}
int deletion = matrix[i - 1][j] + 1;
int insertion = matrix[i][j - 1] + 1;
int substitution = matrix[i - 1][j - 1] + substitutionCost;
matrix[i][j] = Math.min(Math.min(deletion, insertion), substitution);
}
}
return matrix[iWidth - 1][iHeight - 1];
}
To make use of it, I create a function call ogosokaStringSimilarity for it:
/**
* Get the Ogosoka string similarity for the given 2 strings.
*
* @param in1
* @param in2
* @return 1 to 0. If the two strings are exactly the same, if will return 1. The more similar the two strings are, the value closer ot 1 will be return.
*/
public static double ogosokaStringSimilarity(String in1, String in2) {
double maxLength = Math.max(in1.length(), in2.length());
return (maxLength - levenshteinDistance(in1, in2)) / maxLength;
}