Optimization is part of RDBMS. It analyzes the query and tells the parser which path is the best path to execute the query. This is very much essential because when best path is selected to execute the query, it saves lots of time and resource.
Consider a query to find the name of the employee whose phone number is 456-234-876. Here, the query optimizer will analyze the query first.
SELECT FIRST_NAME, LAST_NAME FROM EMPLOYEES WHERE PHONE_NUMBER = '456.234.876';
The optimizer will see that there is a filter condition on phone number and for that phone number first and last name needs to be selected. If the optimizer chooses full table scan, it will have to traverse each and every record in the EMPLOYEES table till it get above phone number. It may get this number at the beginning records or may be at the last records. So time consumed to get the last record in a very huge table like EMPLOYEES is more. At the same time optimizer sees that there is an index on PHONE_NUMBER column. This index will directly points to the address where the record with this phone number is saved. Therefore it understands that selecting the record using index scan is more efficient than full table scan. Hence query optimizer suggests using index scan on PHONE_NUMBER than full table scan. This is how an optimization works.
There are two types of optimization – Cost based and Rule based.
Like the name suggests, cos t based optimizer optimizes the query based on the cost of the query. It collects various statistics about the database and the table, and then decides which path to be taken to execute the query. The statistics include database type, type of transaction mode, number of records in a table, number of indexes, columns on which indexes are created, uniqueness of the column values, selectivity of the column etc. Based on all these data, query optimizer decides the best path to be executed.
In the above example, the employee’s first name and last name is retrieved based on the cost of the EMPLOYEES table. It is decided by the optimizer that index scan is better to execute the query. That means cost of the index scan is less than the full table scan.
In Rule based optimization, it applies set of rules on the query to check best suited path to execute the query. It does not bother about the cost of the query. This is based on the equivalence rule on relational expressions; hence the number of combination of queries get reduces here. Hence the cost of the query too reduces.
This method creates relational tree for the given query based on the equivalence rules. These equivalence rules by providing an alternative way of writing and evaluating the query, gives the better path to evaluate the query. This rule need not be true in all cases. It needs to be examined after applying those rules.
Some of the rules applied while rule based optimization is carried out is:
For example, consider below query:
SELECT FIRST_NAME, LAST_NAME, PHONE_NUMBER, EMAIL, JOB_ID FROM EMPLOYEES emp WHERE DEPARTMENT_ID = 20;
Here in this query, we are going to select first name, last name, phone number, email and job id of the employees who are working in department with id 20. Here we can select all required 5 columns first from the EMPLOYEES table and then we can apply filter to department id = 20. But this will require lot of space to hold the columns for all the records. Instead, if we apply the filter it will reduce the number of records to two and then we can extract the columns from it. Here space required to hold all the columns for two records is much lesser than space required to hold 5 columns of whole table records.
For example, consider below query :
SELECT emp.FIRST_NAME, emp.LAST_NAME, emp.PHONE_NUMBER, emp. EMAIL, emp.JOB_ID, dept.DEPARTMENT_NAME FROM EMPLOYEES emp, DEPARTMENTS dept WHERE emp.DEPARTMENT_ID = dept.DEPARTMENT_ID AND emp.DEPARTMENT_ID = 20;
Here two tables are joined and columns are selected from both the tables. Now, we can see that there is a filter condition on the DEPARTMENT_ID of EMPLOYEES table. As the first rule, we need to filter the records for department id = 20. It will reduce the number of records. Since there is two tables involved, we need to join them. But if we join the tables at this stage, we will have to store all the columns of both the tables. Rather than doing this, if we first extract required columns from both the tables and then join them, then it will reduce the space required to store the data. That means, in total it requires space to store the 5 columns of 2 records from EMPLOYEES table and 1 column for whole department names. If we first joined then selected, then we need space to store whole columns for 2 records of EMPLOYEES and whole columns for all the records of DEPARTMENTS.
SELECT FIRST_NAME, LAST_NAME, PHONE_NUMBER, EMAIL, JOB_ID, dept.DEPARTMENT_NAME, loc.LOCATION_NAME FROM EMPLOYEES emp, DEPARTMENTS dept, LOCATIONS loc WHERE emp.DEPARTMENT_ID = dept.DEPARTMENT_ID AND dept.LOCATION_ID = loc.LOCATION_ID AND dept.DEPARTMENT_ID = 20;
Here three tables are joined to get the result. Here we can evaluate the query in two ways.
If we observe both these methods, in the first method we have to join one huge table with another comparatively bigger table to get huge result set which then joined with small table to get the result. In the second method, two comparatively small tables are joined to get smaller result set, which is then joined with the huge table to get smaller set of result. That means the space consumed by second method is very less.
There are various methods to tune SQL queries. While tuning the query, there are various factors about query and table affect the performance. First and foremost step is to check its explain plan to understand its current execution path and the cost at each step.
Next step is to check, if any sub queries or filters can reduce the number of rows returned by it. Third step is to reduce the number of columns in the SELECT query, i.e.; keep only those columns that are requested in the query. Once these steps are performed, next step is to check what type of optimizer is being used. If it is cost based optimizer, then it will collect the statistics to evaluate the query. We need to see if the statistics are updated properly or not. Next, we need to check if the order of conditions used in the WHERE clause affects the performance. If yes, then re-arrange the conditions (third rule of rule based optimizer). Next, check if the order of table names makes any difference in the query and rearrange them too. It will sometimes affect the performance based on the selectivity of columns. Make sure indexes are used properly so that query is executed effectively.
When a SELECT statement is fired, if the query scans all the rows of a table, then it is called full table scan. It will traverse each row of the table to get the requested table. It will affect the performance of the query if the table has lots of data. For example, consider our EMPLOYEES table where we need to find the FIRST_NAME and LAST name of each employee. Then the query has to traverse all the records in the table to get the requested result. That means, it performs full table scan. Imagine that there is lakhs of employees in a table. Then its performance to get all the employee details will slow down.
SELECT EMPLOYEE_ID, FIRST_NAME, LAST_NAME FROM EMPLOYEES;
There are different scenarios where FULL table scan occur:
If we use NOT EQUAL TO (<>) or any other comparison operator (<, >, <=, >=) or any wild card comparison using LIKE and %, then it will not use index scan, rather it will use full table scan.
In order to run the database perfectly, we need to make sure everything in the DB is perfect and intact. For example, in order to run a query on the table, it should have proper data, constraints and indexes. If any one of them is incorrect or disabled, then its performance will degrade.
In a database, tuning the performance is a continuous task. There will be some of the DB objects that will perform better when tuned. There are many factors that affect the performance of the DB. Some of them are: