Table of Contents

## Problem Statement

The problem “Minimum number of distinct elements after removing m items” states that you have an array and an integer m. Each element of the array indicates an item id’s. The problem statement asks to remove m elements in such a way that there should be a minimum number of distinct id’s left.

## Example

arr[] = {1, 3, 4, 2, 4, 1, 3 } m=2

3

arr[] = {1, 4, 4, 1} m=2

2

## Algorithm to find Minimum number of distinct elements after removing m items

1. Declare a map. 2. Set temp to 0. 3. Count and store the frequencies of each element of the array into the map and check how many unique keys are present in it and set it as a cap. 4. Traverse the map and check if the key element is less than or equal to m. 1. If true then decrease the value of m up to that key and also increase the temp by 1. 2. Else return cap-temp. 5. Return cap-temp

**Explanation**

We are given an integer array. We have a problem statement, that removes m number of elements in such a way that after the removal. So, we should have the minimum number of distinct element ids. Suppose that we are having 4 numbers which are distinct elements and if we remove two of them. We still have 2 distinct elements, but the case here is different. If we have 4 different numbers, the occurrence of them can be more than one for each element. Then we can remove the same element twice, or maybe two different elements, but we have to minimize the distinct elements count after the removal of m elements.

We are going to use hashing. We will declare a Hashmap. Setting the values of cap and temp to 0, we will start traversing the array and count and store the frequencies of each array element into the map. This is because we need to know the frequency of each element. But, while traversing if we get the value for the first time, we will increase the value of cap by 1, basically, it works for us as a capacity or a size.

We will visit every key and values of a map again, this time we just need to find if any key of a map is smaller than or equal to m, then decrease the value of m by key times, we need to update m as **m – key** found and increase the value of temp. If this condition doesn’t satisfy, we just need to return the difference between the cap and the temp. At last, we have to return the **cap-temp**, because that is the required output if every condition satisfies in if part, then we have to put an extra statement of return.

## Code

### C++ code to find Minimum number of distinct elements after removing m items

#include<iostream> #include<unordered_map> #include<vector> #include<algorithm> using namespace std; int getMinimumDistinctIds(int arr[], int n, int m) { unordered_map<int, int> MAP; vector<pair<int, int> > vec; int temp = 0; for (int i = 0; i < n; i++) MAP[arr[i]]++; for (auto it = MAP.begin(); it != MAP.end(); it++) vec.push_back(make_pair(it->second, it->first)); sort(vec.begin(), vec.end()); int cap = vec.size(); for (int i = 0; i < cap; i++) { if (vec[i].first <= m) { m = m - vec[i].first; temp++; } else return cap - temp; } return cap - temp; } int main() { int arr[] = { 1, 3, 4, 2, 4, 1, 3 }; int n = sizeof(arr) / sizeof(arr[0]); int m = 2; cout << getMinimumDistinctIds(arr, n, m); return 0; }

3

### Java code to find Minimum number of distinct elements after removing m items

import java.util.Map.Entry; import java.util.Map; import java.util.HashMap; class MinimumDistinctId { public static int getMinimumDistinctIds(int arr[], int n, int m) { Map<Integer, Integer> MAP = new HashMap<Integer, Integer>(); int temp = 0; int cap = 0; for (int i = 0; i < n; i++) { if (MAP.containsKey(arr[i]) == false) { MAP.put(arr[i], 1); cap++; } else MAP.put(arr[i], MAP.get(arr[i]) + 1); } for (Entry<Integer, Integer> entry:MAP.entrySet()) { if (entry.getKey() <= m) { m = m - entry.getKey(); temp++; } else return cap - temp; } return cap - temp; } public static void main(String[] args) { int arr[] = {1, 3, 4, 2, 4, 1, 3}; int m = 2; System.out.println(getMinimumDistinctIds(arr, arr.length, m)); } }

3

## Complexity Analysis

### Time Complexity

**O(N log N), **because we have used sorting which marks the upper bound for time complexity.

### Space Complexity

**O(N), **because we used each element as key. We can have almost N elements.