Joins in SQL Interview Questions

What is Join and Why it is required?

Database is mainly composed of many tables. If we access any one table at a time, then we will not be using the real use of database. For example, consider a employee database. When we say employee database, it has many tables like EMPLOYEE, DEPARTMENT, and PROJECT etc. If we access EMPLOYEE table alone, then we will get only informations about employees. It will not give us the clear details about his department or projects.
Below is an EMPLOYEE table it has very basic information about employees. If we want to know his department or manager, it gives only their IDs. We will not get any details about his department or manager.

Suppose we want to know department details about an employee then we have to first get the department id of that employee from the EMPLOYEE table and then we have to go to DEPARTMENT table to get the details for that ID. But this kind of fetching is tedious. It will not give us the result in one shot, it will need some intermediary/ temporary memory – variables to store the Department ID of the employee, and moreover it will not be performed in single query.

This is the case of fetching department details for one employee. What if we have to search department details for 10, 100, 1000 etc? What will be the effort involved? How many queries need to be fired? How much intermediary space needs to be allocated to hold the intermediary department id? What is the guarantee that department details fetched by second queries mapped with the correct employee in the first query?

To avoid all these efforts, confusion and memory wastage, SQL uses a technique called joins. In this technique, it combines two or more tables by means of common column to get the requested result. In other words, two or more tables are combined to get all the rows and columns from all the participating tables and only those rows are retained in the final result whose columns in common are matching.

In our example above EMPLOYEE and DEPARTMENT tables are joined together to get the department details of the employee. Here what is done is both the tables are joined to get all the rows and columns from both the tables and then only rows that are matching to columns in common are retained. That means, DEPARTMENT_ID is the common column in both EMPLOYEE and DEPARTMENT table which is used link these two tables. In other words, using the DEPARTMENT_ID in EMPLOYEE table we can connect to DEPARTMENT table to get more details about that department.

Below is the snapshot of data in both the tables.

We can notice here that EMPLOYEE_ID = 100, Steven has DEPARTMENT_ID = 90. If we need to get more details about DEPARTMENT_ID = 90, we will get it in the DEPARTMENT table against DEPARTMENT_ID = 90. Hence the query using join to get the Steven’s department would be like below:

SELECT dept.*
FROM DEPARTMENTS dept, EMPLOYEES emp
WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID
AND emp.FIRST_NAME = 'Steven';

We notice here that we got two department details for Steven. This is because, we have two different employees named Steven in our EMPLOYEE table. Both of them are working for different departments – 50 and 90. Hence we got two department details.

To be more specific on why we got two department details, we can add employee details also in our SELECT query above. It will make the result clearer and accurate. However selecting the columns depends on user’s request.

SELECT emp.EMPLOYEE_ID, emp.first_name, emp.LAST_NAME, dept.*
FROM DEPARTMENTS dept, EMPLOYEES emp
WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID – Join Condition
AND emp.FIRST_NAME = 'Steven'; – Filtering Condition

Now it is very clear why we got two department details.
In above queries we can note that ‘WHERE’ clause is used to combine two tables using common columns in the table. This is how and why we use joins in the query.

Suppose both the Steven in the EMPLOYEE table work for the same Department. What will be the result of query and why?

SELECT emp.EMPLOYEE_ID, emp.first_name, emp.LAST_NAME, dept.*
FROM DEPARTMENTS dept, EMPLOYEES emp
WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID
AND emp.FIRST_NAME = 'Steven';

This is because, we are not filtering the result based on the DEPARTMENT_ID. Here query is written to fetch the DEPARTMENT_ID of Steven and then the details about that Department. Hence the query will first fetch all the Steven in the EMPLOYEE table, and then retrieve all the departments of Steven. Here it will not consider the department of both the Steven as same. Because, here both the Stevens are different and hence their department is also considered as different. To be more clear, above query will first retrieve all the DEPARTMENT_ID of the employees whose name is ‘Steven’. Hence it will give all the department ids without filtering the duplicates. Hence we get two rows from the EMPLOYEE table with DEPARTMENT_ID. Now this DEPARTMET_ID is fetched in the DEPARTMENT table to get its details. Since we have two department ids (even though they are same), DEPARTMENT table is queried twice to get the details. Hence we get two rows.

Suppose we have removed employee details from the SELECT list and retained only DEPARTMENT details. Do you think it will show single row now?

SELECT dept.*
FROM DEPARTMENTS dept, EMPLOYEES emp
WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID
AND emp.FIRST_NAME = 'Steven';

The answer is NO. Still it will show duplicate rows. This is because of the same reason above. That means, it is not the SELECT list data that are determining the duplicity of the rows. This is all because of the filter condition ‘emp.FIRST_NAME = ‘Steven’’.

How can we avoid these duplicate rows in the result? There are different ways to do this.

1.    Suppose we are selecting only department details like below query.

SELECT dept.*
FROM DEPARTMENTS dept, EMPLOYEES emp
WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID
AND emp.FIRST_NAME = 'Steven';

Since here both the Steven in DEPARTMENT table works for the same department, we are getting same department twice. So, we can use DISTINCT clause to remove any duplicate rows here. This DISTINCT clause will remove duplicate only if their departments are same.

2. Another method to get the accurate result is by writing very accurate and specific query. In above case, we had filter condition using employee name. But employee name need not be unique. There can be many employees with same name. If we need to get the result for specific employee, then we need to be very specific while filtering his details from EMPLOYEE table. Like we said above, we cannot filer correct employee by using his name. His details will be filtered correctly by using his ID which is very unique for each employee. Hence in above query rather than specifying the name of the employee, if we specify the ID of the employee, then we will get the more accurate result.

To specify more clearly about the result, we can include employee details also in the select list. However it will not change our result set.

Modifying the results is always depends on the requirement. In above cases, if the request is to query the tables based on employee name, then we can use only DISTINCT to avoid any duplicate rows. But if they work for different department, then above two solutions will not give any affect.

What is Inner Join?

This is the simplest form of joins. Here two or more tables are joined based on the common column/s in the respective tables to get the requested result. This type of join is also known as equijoin. Here two or more tables are joined using ‘=’ in the WHERE clause. It will return the records only if there is matching records in both the tables.

Consider the same example of EMPLOYEE and DEPARTMENT table. Suppose we want to see the department name for which each employee works. As we have learnt above, this is done by using the join condition on DEPARTMENT_ID on both the tables. Here we need to list all the employees from the EMPLOYEE table and for which we need to show the department name depending on their DEPARTMENT_ID. i.e.;

SELECT emp.EMPLOYEE_ID, emp.first_name, emp.LAST_NAME, 
	dept.DEPARTMENT_ID, dept.DEPARTMENT_NAME
FROM DEPARTMENTS dept, EMPLOYEES emp
WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID;

Here we can see that for each employee in the EMPLOYEE table, we have one department Id and have matching details in the DEPARTMENT table. We have only those departments in the EMPLOYEE table that has details in the DEPARTMENT table. Hence when we perform inner join on the table we get matching rows only.

Suppose we have some departments in DEPARTMENT table for which still no employees are assigned. These departments are not at all obtained by the inner query. That means when we equijoin the tables, it will fetch for below department Id in both the table. If it finds them in both the table, then it will display the results based on the SELECT list items.

What is Outer Join?

This is another powerful joining technique in SQL. Here it works same as inner join with little difference. When outer join is used it lists all the records that are matching the joining condition like in inner joins and it also lists the records from either of the table which do not have matching records. Here displaying non-matching records from which table is depend on the type of outer join.

If we consider the same example as above with EMPLOYEE and DEPARTEMENT tables, to pull all the employees who are working for the departments in DEPARTMENT table, then we can see the result like below.

SELECT dept.*,emp.*
FROM DEPARTMENTS dept,EMPLOYEES emp
WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID (+);

OR

SELECT dept.*,emp.*
FROM DEPARTMENTS dept LEFT OUTER JOIN EMPLOYEES emp
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;

This query will pull all the records from DEPARTMENTS table first. Then it fetches all the employees in EMPLOYEES table who are working for the departments in it. It lists all the matching records from the EMPLOYEES table. There might be few departments in DEPARTMENTS table for which there will not be any employees. For such case, outer join will display the department details and its corresponding employees as NULL.

This is how outer join works. In this method of joining we can see that all the matching records from the tables participating in joining condition will be displayed first. Then all the records for which match is not found in the second table is marked as null. This is one of the techniques in outer joins.

What are different types of Outer Join?

Outer join is the technique to get the matching records from the tables that are participating in join and the records that do not have match. But selecting the records that do not have matching record is either from one of the table or from both the tables. According to the method select to pull the records from the table, there are 3 types of outer joins –left outer join, right outer join and full outer join.

  • Left Outer Join : This is the outer join method where all the records from the table on the left of the join condition are listed and the records for which match is not found in the right table NULL is displayed. That means, if we write ‘A LEFT OUTER JOIN B’, then all the records from A is displayed and their corresponding records from table B is displayed against them. For the records in A for which match is not found in table B, NULLs are displayed.

Consider the example where we need to display all the department and their employees. Here, we need to show all the departments. If we have employees working in that department, then we need to show those employees. If there are no employees for the department then we need to list those departments but their employee as NULL.

SELECT dept.*,emp.*
FROM DEPARTMENTS dept,EMPLOYEES emp
WHERE dept.DEPARTMENT_ID = emp.DEPARTMENT_ID (+);

OR

SELECT dept.*,emp.*
FROM DEPARTMENTS dept LEFT OUTER JOIN EMPLOYEES emp
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;

Here DEPARTMENTS is the LEFT table and EMPLOYEES is the right table. When we write the clause ‘LEFT OUTER JOIN’ it pulls all the records from left table with their matching records from the right table. If there are no matching records in the right table, it displays NULLs for their columns.

There is another method of writing left outer join using ‘(+)’ sign. When we perform left outer join, we write ‘(+)’ on the right side of the join condition to indicate that pull all the records from the left side table.

  • Right Outer Join : This is the outer join method where all the records from the table on the right of the join condition are listed and the records for which match is not found in the right table NULL is displayed. That means, if we write ‘A RIGHT OUTER JOIN B’, then all the records from B is displayed and their corresponding records from table A is displayed against them. For the records in B for which match is not found in table A, NULLs are displayed.

Consider the same example where we need to display all the department and their employees. Here, we need to show all the departments. If we have employees working in that department, then we need to show those employees. If there are no employees for the department then we need to list that department but their employee as NULL. But here we are going to use RIGHT outer join. Hence we will use ‘RIGHT OUTER JOIN’ or ‘(+)’. While using ‘(+)’ sign, we will change the position of the sign to indicate that it is right outer join like below:

SELECT dept.*,emp.*
FROM DEPARTMENTS dept,EMPLOYEES emp
WHERE dept.DEPARTMENT_ID(+) = emp.DEPARTMENT_ID;

OR

SELECT dept.*,emp.*
FROM DEPARTMENTS dept RIGHT OUTER JOIN EMPLOYEES emp
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;

  • Full Outer Join : This is another type of outer join where both left and right outer joins are combined. That means, at first all the matching records from both the tables are listed, then all the records from both the tables are listed with NULLs to columns from other table for which match is not found.

Here we cannot use ‘(+)’ symbol to write full outer join query. We need to write ‘FULL OUTER JOIN’ to get the result. In our example of EMPLOYEES and DEPARTMENT, when we perform full outer join, it will list all the matching records from DEPARTMENTS and EMPLOYEES and then it will list rest of the departments for which no employees are present. Then it will list all the employees who are not yet assigned to any department. The columns of the table for which match is not found is shown as NULL. We can see NULLs in DEPARTMENTS and EMPLOYEES columns below for which match is not found.

SELECT dept.*,emp.*
FROM DEPARTMENTS dept FULL OUTER JOIN EMPLOYEES emp
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID

What is Difference between Inner Join and Outer Join?

Both inner join and outer join are the joining technique where matching records from the participating tables are displayed. That is when two tables are joined, based on the common column in both the tables, records are pulled.
In inner join method, when two tables are joined, only those records from both the tables are retrieved for which there exact match is based on the common column. This common column is based on the primary key of one table and is foreign key in other table. When there is exact match for these two column values records are retrieved.

SELECT dept.*,emp.*
FROM DEPARTMENTS dept INNER JOIN EMPLOYEES emp
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID;


That means, only DEPARTMENT_ID in the DEPARTMENTS matches with the DEPARTMENT_ID of the EMPLOYEES, then records are displayed.

In the outer join method, in addition to the exact match between the primary and foreign key of the tables, extra records from either of the table is retrieved based on the type of join. If left outer join is used all the records from the left table is listed and their corresponding matching record from the other table is shown against them. If there is no match in the other table then NULLs are displayed against the left table record. If the right outer join is used, then all the records from the right table is listed, then corresponding matching records from the left table is displayed against the right record. If there is no match against the right record, NULLs are displayed against them. If the full outer join is used, then all the records from both the tables are displayed: matching records are displayed in the same row. If there is no match for any record in either of the table, NULLs are displayed.

What is Self Join?

This is another joining technique where same table is joined with itself. Usually in a join, two different tables with parent – child relation are joined to get the matching data. In a self join same table is joined to get data. Usually this kind of join is useful to get the hierarchical data.

Suppose we need to find the details about the manager of EMPLOYEE_ID = 115. Here manager is also an employee. Hence we will have his details in EMPLOYEES table. But we do not know which manager. We want to get manager details of another employee whose id is 115. That means, first we need to get the manager id from the EMPLOYEES table for the employee 115. Then we again have to query EMPLOYEES table to get the details of the manager. That means we have to query twice EMPLOYEES table: first time to get manager Id of the employee, and next to get the details of the manager. Hence we need to use EMPLOYEES twice in our query as employee and manager. But when we write table name twice, the compiler will not understand for what purpose we have used twice or which table is fetching manager id and which table is giving manager details. Hence we have to use alias name for the table to differentiate both the tables. When we use alias to the tables, here even though both of them are referring to the same table, compiler considers them as two different tables – employee and manager. But we are going to join the same table to get the result. Hence it is a self join.

SELECT emp.EMPLOYEE_ID, emp.FIRST_NAME, mgr.*
FROM EMPLOYEES emp, EMPLOYEES mgr
WHERE emp.MANAGER_ID = mgr.EMPLOYEE_ID
AND emp.EMPLOYEE_ID = 115;

Here emp is used to refer employee 115, and mgr is used to retrieve his manager details. When we join these two EMPLOYEES table, we join the manager Id of the employee 115 with the employee id of the mgr.  Thus we get the details about his manager.

Suppose we want to find managers of the all the employees in the EMPLOYEES table. Then we have to remove the filter ‘AND emp.EMPLOYEE_ID = 115’ from above query. Now we will get all the heirachy of the employees and managers.

SELECT emp.EMPLOYEE_ID, emp.FIRST_NAME, 
	mgr.EMPLOYEE_ID mgr_id, mgr.FIRST_NAME, 
	mgr.LAST_NAME, mgr.MANAGER_ID, mgr.DEPARTMENT_ID
FROM EMPLOYEES emp, EMPLOYEES mgr
WHERE emp.MANAGER_ID = mgr.EMPLOYEE_ID;

Now we can see that Steven with employee id is the top most managers in the company. He does not have any manager. He has his manager Id as NULL.

Then it starts listing all those employees who are manager and their manager is Steven .

 

This is how self join works. We can use self join to get the employees who are working in same department or location of another employee, or to get the list of employee whose salary is greater than any other particular employee etc.

Is Self Join same as Inner Join?

At first glance both self join and inner join may look like same. But both of them are different concepts even though they are related to joins. We can say self join uses concepts of inner join or outer join to implement it. That means, inner join and outer join are method of joining any two same or different table. When same tables are used for joining (may be inner or outer join), then it is called as self join. That means, self join can use either inner join on same tables or outer join (any outer join – left, right or full) to get the requested result. The join method used in the self join depends on the query request.

In the example above to find the manager details of the employee, we have used self join on EMPLOYEES table. In this self join we have used inner join method to join two EMPLOYEES table to get the result. Here the request is simple one – to find the managers of all the employees. In other words, in EMPLOYEES table we have EMPLOYEE_ID and his MANAGER_ID. Hence if we try to match the MANAGER_ID of any employee with any other EMPLOYEE_ID in the table, we will get the manager details (i.e.; querying like below).

SELECT emp.EMPLOYEE_ID, emp.FIRST_NAME, 
	mgr.EMPLOYEE_ID mgr_id, mgr.FIRST_NAME, 
	mgr.LAST_NAME, mgr.MANAGER_ID, mgr.DEPARTMENT_ID
FROM EMPLOYEES emp, EMPLOYEES mgr
WHERE emp.MANAGER_ID = mgr.EMPLOYEE_ID;

But this will not show the top most managers (here, Steven) in the list as he does not have any manager.

Suppose we want to list Steven’s details also using above query. That means, we want to list all the employees who has or has not managers (please note that the first request was to list all the employees who has manager, now we want to see all the employees and their managers, if there is).  From this it is very clear that we will get employee and his manager details from EMPLOYEES table only. But we have to differentiate the EMPLOYEES table to get employee detail and manager detail. Hence we need two EMPLOYEES table with their alias name to differentiate them. Now if we use the self join like above using inner join method, we will end up in same result as above. When we perform inner join on EMPLOYEE_ID and MANAGER_ID, it will check for the exact match between the EMPLOYEE_ID and MANAGER_ID. If it gets any employees who does not have any manager (where MANAGER_ID = NULL), it cannot compare and is not an exact match too. Hence it will not list such employees. In order to list all the employees irrespective of them having their managers, we need to use outer join. i.e.; list all the employees from emp and their matching manager details from mgr. In addition list all the employees from emp who does not have manger.

SELECT emp.EMPLOYEE_ID, emp.FIRST_NAME, 
	mgr.EMPLOYEE_ID mgr_id, mgr.FIRST_NAME, 
	mgr.LAST_NAME, mgr.MANAGER_ID, mgr.DEPARTMENT_ID
FROM EMPLOYEES emp, EMPLOYEES mgr
WHERE emp.MANAGER_ID = mgr.EMPLOYEE_ID (+);

OR

SELECT emp.EMPLOYEE_ID, emp.FIRST_NAME, 
	mgr.EMPLOYEE_ID mgr_id, mgr.FIRST_NAME, 
	mgr.LAST_NAME, mgr.MANAGER_ID, mgr.DEPARTMENT_ID
FROM EMPLOYEES emp LEFT OUTER JOIN EMPLOYEES mgr
ON emp.MANAGER_ID = mgr.EMPLOYEE_ID;

Now we can see all the employees have been listed along with their manager details. We can even Steven in this new result set who does not have any manager.  This is how we use self join and outer join together.

What is the difference between a left outer join and a right outer join?

Left outer join is the one which keeps all the rows of the left table and displays the matching rows from the right table. If there is no match in the right table, then it displays null for them. Here left table is the one which appears first in the SELECT statement.

For example, consider the EMPLOYEES and DEPARTMENT example, where all the departments and its employees are displayed.  Here DEPARTMENTS is the left table and EMPLOYEES is the right table. When we perform left outer join like below, all the records from DEPARTMENTS are listed, then its corresponding matching records are displayed. In case there are no employees for any department, then NULL is shown against them.

SELECT dept.*,emp.*
FROM DEPARTMENTS dept LEFT OUTER JOIN EMPLOYEES emp
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;

In the right outer join all the records from the right table is shown and its corresponding matching records from the left table is displayed. If there is no matching record then NULL is displayed against them.

SELECT dept.*,emp.*
FROM DEPARTMENTS dept RIGHT OUTER JOIN EMPLOYEES emp
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;

Here DEPARTMENTS is the left table and EMPLOYEES is the right table. When we perform right outer join, all the records from right table is displayed. i.e.; all the employees are listed. If they have matching DEPARTMENT_ID in the DEPARTMENTS table, then it is displayed. If not, then department details are shown as NULL.

Should I use a Right Outer Join or A Left Outer Join?

In left outer join and right outer join, as the name itself says, it will show all records from the left and right table respectively. Apart from the records displayed from the tables, there is no difference in the functionalities of these two joins. Using left outer join or right outer join depends on the position of the table being used in the query.

Consider the same example of EMPLOYEES and DEPARTMENTS, where we need to show all the EMPLOYEES and their respective departments. If we use EMPLOYEES first in the query then we will use LEFT outer join to achieve our requirement. If DEPARTMENTS is used first in the query, then we will use RIGHT outer join in the query. In both the cases, we will get same result. Only difference is position of the EMPLOYEES table in the query.

SELECT emp.*, dept.*
FROM EMPLOYEES emp LEFT OUTER JOIN DEPARTMENTS dept
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;
SELECT emp.*, dept.*
FROM DEPARTMENTS dept RIGHT OUTER JOIN EMPLOYEES emp
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;

Thus, depending on our requirement and based on the position of the tables in the query we can use LEFT or RIGHT outer join.

What is the difference between a left join and a left outer join?

There is no difference between Left Outer Join and Left Join. Both are same and yield same result when used in the query.

SELECT emp.*, dept.*
FROM EMPLOYEES emp LEFT OUTER JOIN DEPARTMENTS dept
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;

 

SELECT emp.*, dept.*
FROM EMPLOYEES emp LEFT JOIN DEPARTMENTS dept
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;

What is the difference between a right outer join and a right join?

There is no difference between Right Outer Join and Right Join. Both are same and yield same result when used in the query.

SELECT emp.*, dept.*
FROM DEPARTMENTS dept RIGHT OUTER JOIN EMPLOYEES emp
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;
SELECT emp.*, dept.*
FROM DEPARTMENTS dept RIGHT OUTER JOIN EMPLOYEES emp
ON dept.DEPARTMENT_ID = emp.DEPARTMENT_ID ;
Translate »