Basic SQL Interview Questions
1. What is the use of DISTINCT in SELECT Statement?
DISTINCT statement is used for eliminating the duplicate rows from the result. When we fire a SELECT statement, it will select all the matching rows from the table. These matching rows may have duplicate entries. When we use DISTINCT it will eliminate all the duplicate rows and retain single entry for such entries.
Consider the DEPARTMENTS table. It has LOCATION_ID which indicates the location of each department.
If we write a simple query to find the locations where departments exist, then it will list all the location ids irrespective of its row and duplicate values.
SELECT LOCATION_ID FROM DEPARTMENTS;
We can notice the duplicate entries of location ids in above result set. But we need not have duplicate values in the result set to know various locations of the departments. We have to eliminate the duplicate values here. Therefore, we use DISTINCT in above query to get different locations.
SELECT DISTINCT LOCATION_ID FROM DEPARTMENTS;
Now we do not have any duplicate values in the result. We can even apply DISTINCT on combination of columns. It need not be on single column like above.
Consider below query to list various MANAGER_ID and DEPARTMENT_ID from the EMPLOYEES table like below.
SELECT MANAGER_ID, DEPARTMENT_ID FROM EMPLOYEES;
We can see duplicate result values in above query. If we want to eliminate the duplicate entries, we can use DISTINCT clause. It will remove the duplicate values on the combination of columns. We can see duplicate values for MANAGER_ID but combination of MANAGER_ID and DEPARTMENT_ID does not have any duplicate values.
SELECT DISTINCT MANAGER_ID, DEPARTMENT_ID FROM EMPLOYEES;
2. Write a Query to find Maximum and Minimum values from the table without using any MAX () or MIN () functions.
Since we cannot use MAX or MIN aggregate functions, we need to think of logic for getting the maximum and minimum values. The logic for getting these values can be anything but it should return right value. Let us first discuss the simplest way of getting minimum and maximum values by sorting the data.
Consider the DEPARTMENTS table from which we need to find the LOCATION_ID with minimum value and maximum value.
SELECT LOCATION_ID FROM DEPARTMENTS;
It has 27 records with different LOCATION_IDs and is displayed in the order they appear in the table. How can we get minimum value of the LOCATION_ID here without using MIN () function? If we sort the LOCATION_ID column in ascending order, then we will get the smallest number at the first row and the biggest number at the last row. Hence if we sort above query in ascending order, then we will get the result starting from smallest number.
SELECT LOCATION_ID FROM DEPARTMENTS ORDER BY LOCATION_ID;
But we do not want all the LOCATION_IDs. We need only the smallest ID alone. Hence we need to restrict the number of rows in the SELECT Statement above to one row by using ROWNUM. But ROWNUM condition will be given in the WHERE clause. If we restrict the number of rows in WHERE clause, then first row is selected first and then it will be sorted. This will not give us the required smallest number.
SELECT LOCATION_ID FROM DEPARTMENTS WHERE ROWNUM
We need to get 1400 as smallest number. Therefore, we need to sort the records first and then we need to select the smallest number by limiting the rows like below. This will give us correct result.
SELECT LOCATION_ID FROM (SELECT LOCATION_ID FROM DEPARTMENTS ORDER BY LOCATION_ID) WHERE ROWNUM <2;
This is how we got smallest number. What should we do to get largest number? We need to sort the column in descending order. Then largest number will come at the top and we can select it by restricting the number of rows to one.
SELECT LOCATION_ID FROM (SELECT LOCATION_ID FROM DEPARTMENTS ORDER BY LOCATION_ID DESC) WHERE ROWNUM <2;
Suppose we need to find minimum and maximum values without using any min, max function or order by clause. Then we have to think of complex logic to get the minimum and maximum values. In this case we need to think little bit more about the logic and query. We need to consider the same table twice and compare each value with rest of them to get max and min values. Let us start step by step to get the values.
As a first step, how to compare the values from same table? We need to have DEPARTMENTS table twice in our query with self-join. But self-join is not equijoin here (values are not compared with equal to, ‘=’). We have to use greater than or lesser than operator to get smaller and larger values. i.e.;
SELECT smaller.LOCATION_ID smaller_loc_id, larger.LOCATION_ID large_loc_id FROM DEPARTMENTS smaller, DEPARTMENTS larger WHERE smaller.LOCATION_ID < larger.LOCATION_ID;
Here each LOCATION_ID is compared against rest of the LOCATION_IDs by using ‘<’ comparison. Hence in the result set we will get all the smaller LOCATION_IDs and corresponding larger location ID combinations like below.
This is not the final. We need to extract the smaller and larger values from this list. But how? What could be the logic to get the values from this list? In above result set, the first row first column will have smallest number. We need to extract it by limiting the number of rows to one. In addition, we do not want combination of smaller and larger values. We need only smaller number. Hence we can eliminate the second column from above list.
SELECT smaller.LOCATION_ID smaller_loc_id FROM DEPARTMENTS smaller, DEPARTMENTS larger WHERE smaller.LOCATION_ID < larger.LOCATION_ID AND ROWNUM <2;
Now how we will get largest number? One method is to extract the last row second column. But we cannot use ORDER BY to sort and get the value. Another method is to use the same logic as above in little different way. In the first column above we will have all the smaller value compared against larger value. The largest value will not have any value to be compared and it will not be listed in the first column. Hence if we extract that number by writing a query on DEPARTMENTS to find which number is not listed in above query, we will get the largest number.
SELECT LOCATION_ID FROM DEPARTMENTS WHERE LOCATION_ID NOT IN (SELECT smaller.LOCATION_ID smaller_loc_id FROM DEPARTMENTS smaller, DEPARTMENTS larger WHERE smaller.LOCATION_ID < larger.LOCATION_ID);
These are the two ways of getting smaller and larger numbers from the table. There will be other methods too depending on your thinking about logic.
3. What is ternary (also known as three-valued) logic in SQL?
In SQL, any column in the table can take some valid value or UNKNOWN value. When the value of the column is UNKNOWN, we represent it by using NULL or some blank column which is not actually blank.
When we have to compare the column values we usually think that either it has some value or it does not have any value. But NULL does not indicate not having any value. It represents that the value in the column is not still known. Hence in above table, when we write query like below it returns nothing.
That means, we cannot compare NULL like above with ‘=’ operator. Since NULL (a UNKNOWN Value) is not equal to another NULL (UNKNOWN value), comparing NULL using ‘equal to’ or ‘not equal to’ or ‘greater than’ or ‘less than’ etc. are meaningless. Hence it does not return any value.
Suppose we want to list all the STATE_PROVINCE which are not NULL.
It does not return any rows that have STATE_PROVICE value. That means, it cannot compare this UNKNOWN value. Therefore, SQL provides special syntax to compare NULL and Non-NULL value using ‘IS NULL’ or IS NOT NULL’. Hence any valid values (which are not NULL) can be compared using ‘=’ or ‘<>’ values. When we have to compare NULL or NOT NULL values (a third value of the column), we have to use third operator ‘IS NULL’ or ‘IS NOT NULL’ operator.
SELECT * FROM LOCATIONS WHERE STATE_PROVINCE IS NULL;
SELECT * FROM LOCATIONS WHERE STATE_PROVINCE IS NOT NULL;
When we have to compare any valid values like whether it is equal or not (TRUE or FALSE – the two common values of the column), we can use any comparison operator.
Note below that, the ‘<>’ operator has not returned any records with NULL STATE_PROVICE. This is because it could not compare the third value ‘NULL’ using ‘<>’ operator.
Hence, any column in the table has three values to be compared – TRUE, FALSE or NULL. Therefore, it is known as three valued logic or ternary logic.
4. In SQL, what is the default sort order of the Order By clause?
A query result can be sorted either in ascending or descending order. We need to specify whether we need to sort the table or query result in ascending or descending order by specifying ASC or DESC in the query. When ASC or DESC is not specified in the query, the database considers the default sort order. In SQL, default sort order is ascending. Hence if we do not specify the sorting order, then it will automatically consider sorting it in ascending order.
Consider the JOBS table below. When no ORDER BY clause is used, then it gives the result in the order the data are entered.
Suppose we need to sort them in ascending order of JOB_ID. Note below that we have not used any sorting order. In the result, we can see that it is sorted in ascending order of JOB_ID. There is no harm in specifying ASC below query. Then also it will sort in ascending order.
SELECT * FROM JOBS ORDER BY JOB_ID;
In case we need to sort it in descending order, then we need to explicitly mention DESC in the query.
SELECT * FROM JOBS ORDER BY JOB_ID DESC;
5. In SQL, how do distinct and order by work together?
We know that DISTINCT keyword is used to get the distinct values out of a query (i.e.; without any duplicate values). The ORDER BY clause is used to sort the result in ascending or descending order.
Now consider a query where we find all distinct departments in the descending order of their ID. That means, we do not want to see any duplicate values in the result set. Also, we need them in descending order. This is straight forward. We need to select DISTINCT DEPARTMENT_IDs with ORDER BY DEPARTMENT_ID DESC.
SELECT DISTINCT DEPARTMENT_ID FROM EMPLOYEES ORDER BY DEPARTMENT_ID DESC;
The results are obvious and there is not any complex logic. Now let us twist the requirement little bit. Before twisting the query, first check how data appears in the EMPLOYEES table. That means each department and employee’s salary are listed in the order they are inserted into the table. They are not in the sorted order. When we say DISTINCT on these two columns, the query pulls the result in the same order as they are inserted. We can see the sample data for the same below.
SELECT DISTINCT DEPARTMENT_ID, SALARY FROM EMPLOYEES;
Now let us find all distinct departments and sort the result in the descending order of the SALARY of the employee. That means we need to list all the distinct departments. But listing the departments should be in such a way that the department of the employee with highest salary should be listed at the beginning, then department second highest salary of employee and so on. Before writing the query let us see whole EMPLOYEES data in descending order of their SALARY.
From above sample data of EMPLOYEES table, what we are expecting in our result set is that DEPARTMENT_ID should be in this order: 90, 80, 20, 100, 110, 30, 70, 60 and so on. Let us write the query now.
SELECT DISTINCT DEPARTMENT_ID FROM EMPLOYEES ORDER BY SALARY DESC;
But the result is:
Error! Why? What is wrong with our query? We have written the query for what is asked. But why we cannot execute this query? We could find the distinct departments when sorted based on DEPARTMENT_ID, but why can’t we sort using SALARY?
To answer this, we need to recall all those points discussed above and observe the query written by us carefully. How this query will be evaluated? Firstly, SELECT statement is executed to get the distinct DEPARTMENT_IDs. That means, it will select distinct DEPARTMENT_IDs in the order they appear or inserted into the table. This selection is not dependent on SALARY. Once the various department list is obtained, then the query is trying to sort the list based on SALARY which is not selected at all. Though we will have SALARY in EMPLOYEES table of the query, it will not again fire query to get the salaries. All the selections are over by the time sorting starts. Hence it does not get SALARY to sort. Therefore, it throws an error: “SALARY is NOT in the SELECT Expression”.
Therefore, if we need to select the DEPARTMENT_IDs based on the descending order of the SALARY, we need to select SALARY too.
SELECT DISTINCT DEPARTMENT_ID, SALARY FROM EMPLOYEES ORDER BY SALARY DESC;
But this will not give us distinct DEPARTMENT_IDs alone. It will give combination of distinct DEPARTMENT_ID and SALARY. Hence we have to modify this query even more.
There is another method to get distinct values by using GROUP BY clause. In this method it first selects all the distinct values from the table to create different groups. Hence if we group the records in EMPLOYEES table based on the DEPARTMENT_ID, we will get all the distinct values in the table. Next task is to sort the DEPARTMENT_IDs based on the descending order of the SALARY. In order to get the salary in descending order, we need to select the departments in such a way that highest salaried employee’s department should be listed first. Highest salary of the employee in the department is found by using MAX function on the same group by on DEPARTMENT_ID. That means each distinct department selected will have highest salary. i.e.;
SELECT DEPARTMENT_ID, MAX (SALARY) AS max_salary FROM EMPLOYEES GROUP BY DEPARTMENT_ID;
Now if we sort this result in descending order of salary we will get all the departments – salary in descending order of salary.
SELECT DEPARTMENT_ID, MAX (SALARY) AS max_salary FROM EMPLOYEES GROUP BY DEPARTMENT_ID ORDER BY max_salary DESC;
From this result we have to select DEPARTMENT_ID alone to get the requested result.
SELECT DEPARTMENT_ID FROM (SELECT DEPARTMENT_ID, MAX (SALARY) AS max_salary FROM EMPLOYEES GROUP BY DEPARTMENT_ID ORDER BY max_salary DESC);
This is how we select DISTINCT values from the table by sorting it on some other column of the table.
6. In SQL, how and when would you do a group by with multiple columns? Also provide an example.
GROUP BY is the clause in SQL which is used to group the result based on one or more columns. Let us explain this question using example of EMPLOYEES table. Suppose we need to find the total salary of employees. Here the request is to find the total salary of all employees. No individual salary or no other condition is mentioned. That means we need to find the sum of SALARY column. It is not based on any column or condition. Hence it would be:
SELECT SUM (SALARY) FROM EMPLOYEES;
Here result has given sum of salary of all the employees. It does not give us much information about employee or department or their job ids.
Now, consider we have to see the total salary of employees in each department. This request is little different from above. We need to find total salary based on the departments. That means, first we need to list distinct departments for which employees exist in EMPLOYEES table. Then we need to find the SUM of their salaries. In other words, we have group the table data based on DEPARTMENT_ID and then find the sum of salaries.
SELECT DEPARTMENT_ID, SUM (SALARY) TOTAL_SALARY FROM EMPLOYEES GROUP BY DEPARTMENT_ID;
Here the sum of salary is obtained based one column –DEPARTMENT_ID. Let us change the requirement even more. Consider that we need to find the total salary of employees based on their department and job ids. That means, we need to show the DEPARTMENT_ID and then various JOB_IDs in its department. For each of this combination, we need to find the total salary.
SELECT DEPARTMENT_ID, JOB_ID, SUM (SALARY) TOTAL_SALARY FROM EMPLOYEES GROUP BY DEPARTMENT_ID, JOB_ID;
Here two columns are used to find the total salary. That means more than one column is used to get the aggregate value. This is how and why we use multiple columns in group by clause.
7. In the table below, how would you retrieve the unique values for the EMPLOYEE_LOCATION without using the DISTINCT keyword ?
Usually DISTINCT keyword is used to eliminate all the duplicate values from the result set. But there are other techniques also to get the results without any Duplicate values. One of the methods is to use ‘GROUP BY’ clause. The GROUP BY clause is used to group the records based on one or more columns. Usually we use this to get the aggregate results like COUNT, SUM, MAX, MIN and AVG. But this is the common use of GROUP BY clause.
Apart from finding aggregate, if used on SELECT statement, it gives only distinct records. When we use GROUP BY in the query, we need to make sure that the columns that are selected in the SELECT statement should be included in the GROUP BY clause too. Hence if we are going to select all the columns of the table, then all those columns have to be included in group by too. If we are selecting only one or few columns of the table, then only that column/s have to be included in the group by clause.
Consider the EMPLOYEES table, where JOB_ID is the column which contains job ids of different employees.
SELECT JOB_ID FROM EMPLOYEES;
But result has duplicate values. If we write the same query using GROUP BY clause, it first find the distinct values of JOB_IDs from the table and list them. Then it will check what to be done with this grouped value. But we need only first step to get the distinct values from the table
SELECT JOB_ID FROM EMPLOYEES GROUP BY JOB_ID;
Now we do not have any duplicate values in the result.
8. Write a SQL query to get the second highest salary from EMPLOYEES table. Also write a query to find the nth highest salary in SQL, where n can be any number.
Finding the highest salary in the table or finding any highest value is simple and can be achieved by writing GROUP BY clause and MAX function. But here we have to find second highest salary.