java:levenshtein_distance

Levenshtein Distance

Levenshtein Distance is a method to check the similarity of wo string. If the string is very similar, we will get a small value for the distance. If the two strings are completely off, we get the value of max string lenth as the distance.

Here is my implementation in java:

/**
 * Calculate the Levenshtein Distance for the given two string.
 *
 * @param in1
 * @param in2
 * @return The Levenshtein Distance
 */
public static int levenshteinDistance(String in1, String in2) {
    in1 = " " + in1;
    in2 = " " + in2;
    int iWidth = in1.length();
    int iHeight = in2.length();
    int[][] matrix = new int[iWidth][iHeight];

    for (int i = 0; i < iWidth; i++) {
        matrix[i][0] = i;
    }
    for (int i = 0; i < iHeight; i++) {
        matrix[0][i] = i;
    }

    for (int i = 1; i < in1.length(); i++) {
        for (int j = 1; j < in2.length(); j++) {
            int substitutionCost = 0;
            if (in1.charAt(i) != in2.charAt(j)) {
                substitutionCost = 1;
            }
            int deletion = matrix[i - 1][j] + 1;
            int insertion = matrix[i][j - 1] + 1;
            int substitution = matrix[i - 1][j - 1] + substitutionCost;
            matrix[i][j] = Math.min(Math.min(deletion, insertion), substitution);
        }
    }
    return matrix[iWidth - 1][iHeight - 1];
}

To make use of it, I create a function call ogosokaStringSimilarity for it:

/**
 * Get the Ogosoka string similarity for the given 2 strings.
 *
 * @param in1
 * @param in2
 * @return 1 to 0. If the two strings are exactly the same, if will return 1. The more similar the two strings are, the value closer ot 1 will be return.
 */
public static double ogosokaStringSimilarity(String in1, String in2) {
    double maxLength = Math.max(in1.length(), in2.length());
    return (maxLength - levenshteinDistance(in1, in2)) / maxLength;
}
  • java/levenshtein_distance.txt
  • Last modified: 2021/07/22 16:30
  • by chongtin