When a query is submitted to DB, it is parsed and translated to relational algebra. It is verified for its validity and correctness. Once it passes this stage, different ways of evaluating the query is generated. It is checked for various factors and its execution plan is generated. It may be based on cost of the query or based on the equivalence rules. Once cost based execution and rule based execution plans are generated, optimizer has to decide, which plan to be selected for evaluation. This is the most important step in processing a query.
As we have seen in other articles, the cost or the heuristic execution plan may not be always effective in all the tables with same type of query. They are all general guidelines to evaluate a query. There are lot many factors affecting the performance of a query. The evaluation plans exactly defines the system which plan / algorithm is to be used to evaluate, which index to be used etc.
Consider below example on EMP and DEPT tables.
SELECT EMP_ID, DEPT_NAME FROM EMP, DEPT WHERE EMP.DEPT_ID = DEPT.DEPT_ID AND EMP.DEPT_ID = 10 AND EMP.EMP_LAST_NAME = ‘Joseph’ ;
Above query selects the EMP_ID and DEPT_NAME from EMP and DEPT table for DEPT_ID = 10 with employee’s last name as ‘Joseph’. But when it is given to the DBMS, it divides the query into tokens and sees how it can be put together so that performance will be better. This is the duty of query optimizer. But altering the order of tokens in the query should not change the result. In either way it should give same result. Order of records can change and are least important. This is called equivalent query. There is set of rules to put tokens in the query. This is called equivalence rule.
Above query can be broken down by the DBMS in either ways below :
- Select the records of EMP with DEPT_ID = 10 and EMP_LAST_NAME = ‘Joseph’ first then join them with DEPT table to get all matching records of EMP and DEPT. Then select only the columns EMP_ID and DEPT_NAME to display the result.
- Select all matching records from EMP and DEPT, from which filter on DEPT_ID = 10 and EMP_LAST_NAME = ‘Joseph’ and then select only EMP_ID and DEPT_NAME to display.
- Select all matching records from EMP and DEPT, from which select only EMP_ID and DEPT_NAME and then filter on DEPT_ID = 10 and EMP_LAST_NAME = ‘Joseph’ and then.
Both the steps above are same irrespective of how it is performed. Hence both are called equivalent query. These are not written in SQL, but using relational algebra, or tree.
∏ EMP_ID, DEPT_NAME (σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (EMP) ∞DEPT)
∏ EMP_ID, DEPT_NAME (σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (EMP ∞DEPT))
σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (∏ EMP_ID, DEPT_NAME, DEPT_ID (EMP ∞DEPT))
The optimizer can produce relational expression and tree in above three formats. According to evaluation rule, the first query seems to be the best one. But considering other factors of tables, other plans may also be efficient. Let us see what are other factors affecting the performance and how a ideal execution plan is determined.