Levenshtein Distance
Levenshtein Distance is a method to check the similarity of wo string. If the string is very similar, we will get a small value for the distance. If the two strings are completely off, we get the value of max string lenth as the distance.
Here is my implementation in java:
/** * Calculate the Levenshtein Distance for the given two string. * * @param in1 * @param in2 * @return The Levenshtein Distance */ public static int levenshteinDistance(String in1, String in2) { in1 = " " + in1; in2 = " " + in2; int iWidth = in1.length(); int iHeight = in2.length(); int[][] matrix = new int[iWidth][iHeight]; for (int i = 0; i < iWidth; i++) { matrix[i][0] = i; } for (int i = 0; i < iHeight; i++) { matrix[0][i] = i; } for (int i = 1; i < in1.length(); i++) { for (int j = 1; j < in2.length(); j++) { int substitutionCost = 0; if (in1.charAt(i) != in2.charAt(j)) { substitutionCost = 1; } int deletion = matrix[i - 1][j] + 1; int insertion = matrix[i][j - 1] + 1; int substitution = matrix[i - 1][j - 1] + substitutionCost; matrix[i][j] = Math.min(Math.min(deletion, insertion), substitution); } } return matrix[iWidth - 1][iHeight - 1]; }
To make use of it, I create a function call ogosokaStringSimilarity for it:
/** * Get the Ogosoka string similarity for the given 2 strings. * * @param in1 * @param in2 * @return 1 to 0. If the two strings are exactly the same, if will return 1. The more similar the two strings are, the value closer ot 1 will be return. */ public static double ogosokaStringSimilarity(String in1, String in2) { double maxLength = Math.max(in1.length(), in2.length()); return (maxLength - levenshteinDistance(in1, in2)) / maxLength; }