1. What is the use of DISTINCT in SELECT Statement?
DISTINCT statement is used for eliminating the duplicate rows from the result. When we fire a SELECT statement, it will select all the matching rows from the table. These matching rows may have duplicate entries. When we use DISTINCT it will eliminate all the duplicate rows and retain single entry for such entries.
Consider the DEPARTMENTS table. It has LOCATION_ID which indicates the location of each department.
If we write a simple query to find the locations where departments exist, then it will list all the location ids irrespective of its row and duplicate values.
SELECT LOCATION_ID FROM DEPARTMENTS;
We can notice the duplicate entries of location ids in above result set. But we need not have duplicate values in the result set to know various locations of the departments. We have to eliminate the duplicate values here. Therefore, we use DISTINCT in above query to get different locations.
SELECT DISTINCT LOCATION_ID FROM DEPARTMENTS;
Now we do not have any duplicate values in the result. We can even apply DISTINCT on combination of columns. It need not be on single column like above.
Consider below query to list various MANAGER_ID and DEPARTMENT_ID from the EMPLOYEES table like below.
SELECT MANAGER_ID, DEPARTMENT_ID FROM EMPLOYEES;
We can see duplicate result values in above query. If we want to eliminate the duplicate entries, we can use DISTINCT clause. It will remove the duplicate values on the combination of columns. We can see duplicate values for MANAGER_ID but combination of MANAGER_ID and DEPARTMENT_ID does not have any duplicate values.
SELECT DISTINCT MANAGER_ID, DEPARTMENT_ID FROM EMPLOYEES;
2. Write a Query to find Maximum and Minimum values from the table without using any MAX () or MIN () functions.
Since we cannot use MAX or MIN aggregate functions, we need to think of logic for getting the maximum and minimum values. The logic for getting these values can be anything but it should return right value. Let us first discuss the simplest way of getting minimum and maximum values by sorting the data.
Consider the DEPARTMENTS table from which we need to find the LOCATION_ID with minimum value and maximum value.
SELECT LOCATION_ID FROM DEPARTMENTS;
It has 27 records with different LOCATION_IDs and is displayed in the order they appear in the table. How can we get minimum value of the LOCATION_ID here without using MIN () function? If we sort the LOCATION_ID column in ascending order, then we will get the smallest number at the first row and the biggest number at the last row. Hence if we sort above query in ascending order, then we will get the result starting from smallest number.
SELECT LOCATION_ID FROM DEPARTMENTS ORDER BY LOCATION_ID;
But we do not want all the LOCATION_IDs. We need only the smallest ID alone. Hence we need to restrict the number of rows in the SELECT Statement above to one row by using ROWNUM. But ROWNUM condition will be given in the WHERE clause. If we restrict the number of rows in WHERE clause, then first row is selected first and then it will be sorted. This will not give us the required smallest number.
SELECT LOCATION_ID FROM DEPARTMENTS WHERE ROWNUM
We need to get 1400 as smallest number. Therefore, we need to sort the records first and then we need to select the smallest number by limiting the rows like below. This will give us correct result.
SELECT LOCATION_ID FROM (SELECT LOCATION_ID FROM DEPARTMENTS ORDER BY LOCATION_ID) WHERE ROWNUM <2;
This is how we got smallest number. What should we do to get largest number? We need to sort the column in descending order. Then largest number will come at the top and we can select it by restricting the number of rows to one.
SELECT LOCATION_ID FROM (SELECT LOCATION_ID FROM DEPARTMENTS ORDER BY LOCATION_ID DESC) WHERE ROWNUM <2;
Suppose we need to find minimum and maximum values without using any min, max function or order by clause. Then we have to think of complex logic to get the minimum and maximum values. In this case we need to think little bit more about the logic and query. We need to consider the same table twice and compare each value with rest of them to get max and min values. Let us start step by step to get the values.
As a first step, how to compare the values from same table? We need to have DEPARTMENTS table twice in our query with self-join. But self-join is not equijoin here (values are not compared with equal to, ‘=’). We have to use greater than or lesser than operator to get smaller and larger values. i.e.;
SELECT smaller.LOCATION_ID smaller_loc_id, larger.LOCATION_ID large_loc_id FROM DEPARTMENTS smaller, DEPARTMENTS larger WHERE smaller.LOCATION_ID < larger.LOCATION_ID;
Here each LOCATION_ID is compared against rest of the LOCATION_IDs by using ‘<’ comparison. Hence in the result set we will get all the smaller LOCATION_IDs and corresponding larger location ID combinations like below.
This is not the final. We need to extract the smaller and larger values from this list. But how? What could be the logic to get the values from this list? In above result set, the first row first column will have smallest number. We need to extract it by limiting the number of rows to one. In addition, we do not want combination of smaller and larger values. We need only smaller number. Hence we can eliminate the second column from above list.
SELECT smaller.LOCATION_ID smaller_loc_id FROM DEPARTMENTS smaller, DEPARTMENTS larger WHERE smaller.LOCATION_ID < larger.LOCATION_ID AND ROWNUM <2;
Now how we will get largest number? One method is to extract the last row second column. But we cannot use ORDER BY to sort and get the value. Another method is to use the same logic as above in little different way. In the first column above we will have all the smaller value compared against larger value. The largest value will not have any value to be compared and it will not be listed in the first column. Hence if we extract that number by writing a query on DEPARTMENTS to find which number is not listed in above query, we will get the largest number.
SELECT LOCATION_ID FROM DEPARTMENTS WHERE LOCATION_ID NOT IN (SELECT smaller.LOCATION_ID smaller_loc_id FROM DEPARTMENTS smaller, DEPARTMENTS larger WHERE smaller.LOCATION_ID < larger.LOCATION_ID);
These are the two ways of getting smaller and larger numbers from the table. There will be other methods too depending on your thinking about logic.
3. What is ternary (also known as three-valued) logic in SQL?
In SQL, any column in the table can take some valid value or UNKNOWN value. When the value of the column is UNKNOWN, we represent it by using NULL or some blank column which is not actually blank.
When we have to compare the column values we usually think that either it has some value or it does not have any value. But NULL does not indicate not having any value. It represents that the value in the column is not still known. Hence in above table, when we write query like below it returns nothing.
That means, we cannot compare NULL like above with ‘=’ operator. Since NULL (a UNKNOWN Value) is not equal to another NULL (UNKNOWN value), comparing NULL using ‘equal to’ or ‘not equal to’ or ‘greater than’ or ‘less than’ etc. are meaningless. Hence it does not return any value.
Suppose we want to list all the STATE_PROVINCE which are not NULL.
It does not return any rows that have STATE_PROVICE value. That means, it cannot compare this UNKNOWN value. Therefore, SQL provides special syntax to compare NULL and Non-NULL value using ‘IS NULL’ or IS NOT NULL’. Hence any valid values (which are not NULL) can be compared using ‘=’ or ‘<>’ values. When we have to compare NULL or NOT NULL values (a third value of the column), we have to use third operator ‘IS NULL’ or ‘IS NOT NULL’ operator.
SELECT * FROM LOCATIONS WHERE STATE_PROVINCE IS NULL;
SELECT * FROM LOCATIONS WHERE STATE_PROVINCE IS NOT NULL;
When we have to compare any valid values like whether it is equal or not (TRUE or FALSE – the two common values of the column), we can use any comparison operator.
Note below that, the ‘<>’ operator has not returned any records with NULL STATE_PROVICE. This is because it could not compare the third value ‘NULL’ using ‘<>’ operator.
Hence, any column in the table has three values to be compared – TRUE, FALSE or NULL. Therefore, it is known as three valued logic or ternary logic.
4. In SQL, what is the default sort order of the Order By clause?
A query result can be sorted either in ascending or descending order. We need to specify whether we need to sort the table or query result in ascending or descending order by specifying ASC or DESC in the query. When ASC or DESC is not specified in the query, the database considers the default sort order. In SQL, default sort order is ascending. Hence if we do not specify the sorting order, then it will automatically consider sorting it in ascending order.
Consider the JOBS table below. When no ORDER BY clause is used, then it gives the result in the order the data are entered.
Suppose we need to sort them in ascending order of JOB_ID. Note below that we have not used any sorting order. In the result, we can see that it is sorted in ascending order of JOB_ID. There is no harm in specifying ASC below query. Then also it will sort in ascending order.
SELECT * FROM JOBS ORDER BY JOB_ID;
In case we need to sort it in descending order, then we need to explicitly mention DESC in the query.
SELECT * FROM JOBS ORDER BY JOB_ID DESC;
5. In SQL, how do distinct and order by work together?
We know that DISTINCT keyword is used to get the distinct values out of a query (i.e.; without any duplicate values). The ORDER BY clause is used to sort the result in ascending or descending order.
Now consider a query where we find all distinct departments in the descending order of their ID. That means, we do not want to see any duplicate values in the result set. Also, we need them in descending order. This is straight forward. We need to select DISTINCT DEPARTMENT_IDs with ORDER BY DEPARTMENT_ID DESC.
SELECT DISTINCT DEPARTMENT_ID FROM EMPLOYEES ORDER BY DEPARTMENT_ID DESC;
The results are obvious and there is not any complex logic. Now let us twist the requirement little bit. Before twisting the query, first check how data appears in the EMPLOYEES table. That means each department and employee’s salary are listed in the order they are inserted into the table. They are not in the sorted order. When we say DISTINCT on these two columns, the query pulls the result in the same order as they are inserted. We can see the sample data for the same below.
SELECT DISTINCT DEPARTMENT_ID, SALARY FROM EMPLOYEES;
Now let us find all distinct departments and sort the result in the descending order of the SALARY of the employee. That means we need to list all the distinct departments. But listing the departments should be in such a way that the department of the employee with highest salary should be listed at the beginning, then department second highest salary of employee and so on. Before writing the query let us see whole EMPLOYEES data in descending order of their SALARY.
From above sample data of EMPLOYEES table, what we are expecting in our result set is that DEPARTMENT_ID should be in this order: 90, 80, 20, 100, 110, 30, 70, 60 and so on. Let us write the query now.
SELECT DISTINCT DEPARTMENT_ID FROM EMPLOYEES ORDER BY SALARY DESC;
But the result is:
Error! Why? What is wrong with our query? We have written the query for what is asked. But why we cannot execute this query? We could find the distinct departments when sorted based on DEPARTMENT_ID, but why can’t we sort using SALARY?
To answer this, we need to recall all those points discussed above and observe the query written by us carefully. How this query will be evaluated? Firstly, SELECT statement is executed to get the distinct DEPARTMENT_IDs. That means, it will select distinct DEPARTMENT_IDs in the order they appear or inserted into the table. This selection is not dependent on SALARY. Once the various department list is obtained, then the query is trying to sort the list based on SALARY which is not selected at all. Though we will have SALARY in EMPLOYEES table of the query, it will not again fire query to get the salaries. All the selections are over by the time sorting starts. Hence it does not get SALARY to sort. Therefore, it throws an error: “SALARY is NOT in the SELECT Expression”.
Therefore, if we need to select the DEPARTMENT_IDs based on the descending order of the SALARY, we need to select SALARY too.
SELECT DISTINCT DEPARTMENT_ID, SALARY FROM EMPLOYEES ORDER BY SALARY DESC;
But this will not give us distinct DEPARTMENT_IDs alone. It will give combination of distinct DEPARTMENT_ID and SALARY. Hence we have to modify this query even more.
There is another method to get distinct values by using GROUP BY clause. In this method it first selects all the distinct values from the table to create different groups. Hence if we group the records in EMPLOYEES table based on the DEPARTMENT_ID, we will get all the distinct values in the table. Next task is to sort the DEPARTMENT_IDs based on the descending order of the SALARY. In order to get the salary in descending order, we need to select the departments in such a way that highest salaried employee’s department should be listed first. Highest salary of the employee in the department is found by using MAX function on the same group by on DEPARTMENT_ID. That means each distinct department selected will have highest salary. i.e.;
SELECT DEPARTMENT_ID, MAX (SALARY) AS max_salary FROM EMPLOYEES GROUP BY DEPARTMENT_ID;
Now if we sort this result in descending order of salary we will get all the departments – salary in descending order of salary.
SELECT DEPARTMENT_ID, MAX (SALARY) AS max_salary FROM EMPLOYEES GROUP BY DEPARTMENT_ID ORDER BY max_salary DESC;
From this result we have to select DEPARTMENT_ID alone to get the requested result.
SELECT DEPARTMENT_ID FROM (SELECT DEPARTMENT_ID, MAX (SALARY) AS max_salary FROM EMPLOYEES GROUP BY DEPARTMENT_ID ORDER BY max_salary DESC);
This is how we select DISTINCT values from the table by sorting it on some other column of the table.