Distinct Subsequences  

Difficulty Level Hard
Frequently asked in Amazon Bloomberg Mathworks
Dynamic Programming String

Given two strings S and P1, we have to count all the number of distinct subsequences of S which equals P1.

Note: A subsequence of a given string is a string that we archive by deleting some characters or possible zero characters also from the original string. We can’t change the order of the elements present in the original string.

For example:

“EGY” is a subsequence of “AEDGYS” while “DAY” is not.

Note: Every substring is a subsequence but every subsequence is not a substring.

Brute force approach  

We will check all the subsequences of string S with length the same as string P1.

Algorithm:

  1. Take an empty string “ans” which will store your string for the current state.
  2. Run a recursion in which arguments pointers pointing to the index of string S and string P on which we will operate.
  3. Put base cases.
    • If the pointer of string T becomes equal to the length of string P1, then return 1.
    • If the pointer of string S becomes equal to the length of string S, then return 0.

Optimal Sub-structure:

To consider all subsequence of string S, there can be two cases for every index.

  • Case 1: The item is included in the optimal subset.
  • Case 2: The item is not included in the optimal set.
See also
The Knapsack Problem

If the values of pointers of both strings are not equal then only case 2 is considered.

Implementation

C++ Program For Distinct Subsequences

#include <bits/stdc++.h>
using namespace std;
int ds(string &s, string &t, string &ans, int pointer_S, int pointer_T)
{
    if (pointer_T == t.length())
    {
        return 1;
    }
    if (pointer_S >= s.length())
    {
        return 0;
    }
    int count = 0;
    count += ds(s, t, ans, pointer_S + 1, pointer_T);
    if (s[pointer_S] == t[pointer_T])
    {
        ans += s[pointer_S];
        count += ds(s, t, ans, pointer_S + 1, pointer_T + 1);
        ans.pop_back();
    }
    return count;
}
int main()
{
    string S, P;
    cin >> S >> P;
    string ans;
    cout << ds(S, P, ans, 0, 0) << endl;
    return 0;
}

 

If we observe the above function carefully, we can see that it computes the same sub-problems again and again. See the following recursion tree. The time complexity of this naive recursive solution is exponential.

Distinct SubsequencesPin

 

Please click Like if you loved this article?

Since sub problems are evaluated again, this problem has Overlapping Sub-problems property.

So this problem has both properties for a dynamic programming problem:

  1. Overlapping sub problems
  2. Optimal substructure

Like in other dynamic programming problems, in this also we will store the answers of every state so that we will not compute the same sub problem again and again.

Dynamic programming approach  

Algorithm

  1. Initialize an array dp[][] of size (length of string S+1)*(length of string P1+1) with zeros, where dp[i][j] represents the number of distinct subsequences of string S[0….i-1] which are equal to P1[0….j-1].
  2. Base case:

for I in range 0 to n

dp[i][0]=1, because to 1 subsequence (empty subsequence) which is equal to empty string.

  1. Now there are two cases
See also
Find whether a subarray is in form of a mountain or not

CASE 1: if S[i-1]==P1[j-1]

In this case we can choose either ith element or i-1th element for jth element, so

Please click Like if you loved this article?

dp[i][j]=dp[i-1][j-1]+dp[i-1][j].

CASE 2: if S[i-1]!=P1[j-1]

In this case we can only choose  i-1th element for jth element, so

dp[i][j]= dp[i-1][j].

  1. Return dp[n][m].

Implementation

C++ Program For Distinct Subsequences

#include <bits/stdc++.h>
using namespace std;
int distinctSubsequence(string s, string t)
{
    long long n, m;
    n = s.length();
    m = t.length();
    vector<vector<long long>> dp(n + 1, vector<long long>(m + 1, 0));
    for (long long i = 0; i <= n; i++)
    {
        dp[i][0] = 1;
    }
    for (long long i = 1; i <= n; i++)
    {
        for (long long j = 1; j <= m; j++)
        {
            if (s[i - 1] == t[j - 1])
            {
                dp[i][j] = dp[i - 1][j] + dp[i - 1][j - 1];
            }
            else
            {
                dp[i][j] = dp[i - 1][j];
            }
        }
    }
    return dp[n][m];
}
int main()
{
    string s="abbbcdd", p="abc";
    cout << distinctSubsequence(s, p) << endl;
    return 0;
}

Output

3

JAVA Program For Distinct Subsequences

public class Main
{
    public static int distinctSubsequence(String s, String t)
    {
        int n = s.length(), m = t.length();
        int[][] dp = new int[n + 1][m + 1];
        for (int i = 0; i <= n; i++) {
            dp[i][0] = 1;
        }
        
        for (int i = 0; i < n; i++) {
            for (int j = 0; j < m; j++) {
                if (s.charAt(i) == t.charAt(j)) dp[i + 1][j + 1] = dp[i][j] + dp[i][j + 1];
                else dp[i + 1][j + 1] = dp[i][j + 1];
            }
        }
        return dp[n][m];
    }
  public static void main(String[] args) {
    String s="abccccdee", p ="abe";
    System.out.println(distinctSubsequence(s,p));
  }
}

Output

2

Time Complexity  

T(n) = O(n*m) as we are filling the 2D matrix( dp ) of size n*m using two nested loops, hence the time complexity will be O(n*m)

where n is the length of 1st string and m is the length of the 2nd string.

References

Please click Like if you loved this article?

Please click Like if you loved this article?

Ads Blocker Image Powered by Code Help Pro
Ads Blocker Detected!!!

This website does not work properly with AdBlock. We have detected that you are using extensions to block ads. Please disable Adblocker to view the content.

Refresh