Edit Distance

Difficulty Level Hard
Frequently asked in Amazon ByteDance Facebook Google Microsoft Palantir Technologies Square
Dynamic Programming StringViews 1308

In the edit distance problem we have to find the minimum number of operations required to convert a string X of length n to another string Y of length m.

Operations allowed:

  1. Insertion
  2. Deletion
  3. Substitution

Edit Distance

Example

Input:

String1 = “abcd”

String2 = “abe”

Output:

Minimum operations required is 2 ( one deletion and one substitution )

We can solve this question by solving it’s sub-problem i.e., find the minimum number of operations required to convert a string X[0,…,i] to Y[0,…j].

Steps

  1. If the last character of X and Y are the same, then find minimum operations required to convert string X[0,…,n-1] to Y[0,…,m-1].
  2. If the last character of X and Y are not the same, then we can do one of the three operations:
    • Insertion at last index of X.
    • Deletion of the last index of X.
    • Substitution of the last index of X.

Using the above steps, let’s try to draw the recursion tree for this example:

string X= “ab” and Y = “xy” Therefore n=2 and m=2 ed(i,j) denotes minimum operations required to obtained a string Y[1,…,j] from X[1,…,i]. ( 1-based index).

As none of the characters in X is same as Y, hence at every iteration, we will call recursion three times by applying one of the three allowed operations.

Edit Distance

As we can see there are many overlapping sub-problems in the above recursion due to which the worst case complexity of this procedure is O(3^n).

This can be easily reduced using Dynamic Programming which will resolve the issue of re-computation of overlapping sub-problems.

Dynamic Programming Approach

Base conditions

  1. To convert a string of length n to the given string of length 0, we require n deletion operations.
  2. To convert a string of length 0 to the given string of length n, we require n insertion operations.

At every position we may come across any of these two situations

  1. If the last character of both strings is the same. In this case, we don’t need to do any operation, we can simply say that the answer to this problem will be the same as for the sub-problem editDistance(X[0,…,i-1] , Y[0,….,j-1]).
  2. If the last character is not the same, we have to choose one of these operations:
    • Insertion: In this case, we will insert Y[j] character at ith position and add 1 to the answer of editDistance(X[0,….i], Y[0,….,j-1]). By doing this we ensure that now the last character of both strings is the same.
    • Deletion: In this case, we will delete the ith character of X and add 1 to the answer of editDistance(X[0,….,i-1] , Y[0,…,j]). By doing this we are leaving the last character of X string i.e., deletion is performed.
    • Substitution: In this case we will substitute X[i] by Y[j] and add 1 to the answer of editDistance(X[0,…,i-1] , Y[0,…,j-1]). This becomes the same case as having equal last character in both strings.

As we require the minimum number of operations hence, we will take a minimum of the three possible operations.

C++ Program

#include<bits/stdc++.h>
using namespace std;
int min(int a,int b,int c){
  return min(a,min(b,c));
}
int editDistance(string X,string Y){
  int n=X.length();
  int m=Y.length();
  int ed[n+1][m+1];

  // to convert a string into null string we need to perform 
  // deletion operation n number of times where n is length of the string
  for(int i=0;i<=n;i++){
    ed[i][0]=i;
  }

  // to convert a null string into the given string we need to perform 
  // insertion operation n number of times where n is length of the string
  for(int i=0;i<=m;i++){
    ed[0][i]=i;
  }

  for(int i=1;i<=n;i++){
    for(int j=1;j<=m;j++){
      if(X[i-1]==Y[j-1]){
        // no operation required
        ed[i][j] = ed[i-1][j-1];
      }
      else{
        // one of the three operation required
        ed[i][j] = 1+ min( ed[i-1][j],   //deletion
                                   ed[i][j-1],   //insertion
                                   ed[i-1][j-1] //substitution
                       );
      }
    }
  }
  return ed[n][m];
}

int main()
{
  string X,Y;
    cin>>X>>Y;
  int result = editDistance(X,Y);
    cout<<"Minimum operations required to convert string "<<X<<" into string "<<Y<<" is: ";
  cout<<result<<"\n";
  return 0;
}
programming
problem
Minimum operations required to convert string programming into string problem is: 7

JAVA Program

import java.util.Scanner;
class Main { 
    static int min(int x, int y, int z) 
    { 
        return Math.min(x,Math.min(y,z));
    }
    static int editDistance(String X, String Y) 
    { 
        int n=X.length();
    	int m=Y.length();
    	int ed[][] = new int[n + 1][m + 1]; 
    
    	// to convert a string into null string we need to perform 
    	// deletion operation n number of times where n is length of the string
    	for(int i=0;i<=n;i++){
    		ed[i][0]=i;
    	}
    
    	// to convert a null string into the given string we need to perform 
    	// insertion operation n number of times where n is length of the string
    	for(int i=0;i<=m;i++){
    		ed[0][i]=i;
    	}
    
    	for(int i=1;i<=n;i++){
    		for(int j=1;j<=m;j++){
    			if(X.charAt(i - 1)==Y.charAt(j - 1)){
    				// no operation required
    				ed[i][j] = ed[i-1][j-1];
    			}
    			else{
    				// one of the three operation required
    				ed[i][j] = 1+ min( ed[i-1][j],     //deletion
                                       ed[i][j-1],    //insertion
                                       ed[i-1][j-1]  //substitution
    					             );
    			}
    		}
    	}
    	return ed[n][m];
    } 
  
    public static void main(String args[]) 
    { 
        String X ,Y;
        Scanner sc = new Scanner(System.in);
        X = sc.nextLine();
        Y = sc.nextLine();
        int result = editDistance(X,Y);
        System.out.println("Minimum operations required to convert string "+X+" into string "+Y+" is: "+result); 
    } 
}
codingAndChill
codeAndGrow
Minimum operations required to convert string codingAndChill into string codeAndGrow is: 8

Complexity Analysis

Time complexity: O(n*m)

where n = length of 1st string

m = length of 2nd string

as we are filling a 2D matrix( ed ) using two nested loops.

References

Translate »