1. What is an index?
Index is a data structure which is created on one or more columns of the table. In most of the cases, the indexes are structured as B tree. When index is created on a column of a table, it actually creates a lookup table with column and a pointer to the memory address where a row with this column is actually stored. Therefore when we query a table with column in WHERE clause, the parser first checks if this column is part of index, and if there is a index lookup table for it. If yes, then it checks for the memory address where the record is stored. It then directly jumps to that memory address and returns the result to the user. That is why index scan or if a column has index, then it retrieves the data faster.
2. What kind of data structure is an index?
In most of the cases, B-Tree data structure is used to store the indexes. This is because of the time efficiency of B-Trees. In other words, B-trees are traversed, searched, inserted, deleted and updated in logarithmic time. In addition, B-Tree data are always sorted and stored. Hence it makes searching and inserting the data in known fraction of time. The data values stored in B-trees are balanced – all the values smaller than a particular node can be found at the left of the node and the values greater than the node value are found at the right of the node. Hence it is easy to search any value or record in B-tree indexes.
However, RDBMS actually determines which data structure needs to be used for index. In certain RDBM, we can tell which data structure needs to be used for the index.
3. How does a hash table index work?
In hash table index, index will be created on the columns based on the hash function used on the column. That means, hash function will be applied on the column on which index has to be created and that result will be the location of the record stored. Hence, here in this method all the records will be scattered in the memory.
For example, consider that we have created hash index on PHONE_NUMBER column of the EMPLOYEES table. Let this hash function be any simple function to any complex function. Let us assume, in this case hash function be sum of all digits in the PHONE_NUMBER multiplied by 1000. Then if we need to search any particular phone number say, 546.897.231, then we will get it at the location 45000. Hence the pointer will go that location and retrieve required record details. If we need to see immediate next phone number details, then it will be at location 46000 which far away from previous phone number.
4. What are the disadvantages of hash index?
Hash indexes are created by creating a hash function on the column on which index needs to be created. Hence each column will have different non-contiguous memory locations allocated to it. Therefore, if we need to search any contiguous records with conditions like ‘less than’ or ‘greater than’, then hash index will not be able to get all the records in one shot. It has to search for the records at various locations to get all the records. Hence it is not efficient for such searches. It is good only if we have to search key-value pairs. That means query with WHERE clause with ‘=’ condition will have better performance.
5. What are some other types of indexes?
6. What exactly is inside a database index?
When a index is created on a column or combination of columns in a table, another index lookup table is created with columns with which index is created and a pointer address to the memory location where the whole record of the table is stored. It will not have entire record information in the lookup table.
7. How does a database know when to use an index?
When we run a query with condition ‘WHERE COLUMN_NAME = ‘XYZ’, then database will check if there is any index on this COLUMN_NAME. If there is index on this particular column, it will check for the selectivity of the column and decides if index has to be used or not. If the selectivity of the column is more than 0.33 then, index will be used to retrieve the data.
8. Can you force the database to use an index on a query?
Yes, in Oracle we can force the database to use the index by using HINTS. These HINTS will redirect the path of execution to use the index.
9. How to create an index in SQL?
The general syntax for creating index is :
CREATE INDEX index_name ON TABLE_NAME (COLUMN_NAME); CREATE INDEX idx_phone ON EMPLOYEES (PHONE_NUMBER);
10. How to create a multi-column index in SQL:
We can create index on combination of columns. The general syntax for creating multi-column index is :
CREATE INDEX index_name ON TABLE_NAME (COLUMN_NAME1, COLUMN_NAME2,.. COLUMN_NAMEN); CREATE INDEX idx_emp_name ON EMPLOYEES (FIRST_NAME, LAST_NAME);
11. What is a good analogy for a database index?
A very good real time example of index is index in a book and catalogues in a library. When we want to search / read on particular topic, we look at the index on the book and flip to that particular page rather than scanning entire book for that topic. Searching in a index and then flipping to that page is more time efficient.
12. What is a Bitmap Index?
Bitmap indices are very powerful in retrieving the records with limited number of values for the column. In this method, indexes for columns with less unique values are used in the form of bits. Let us try to understand one by one about this method using example.
In the above example, if we notice the column GENDER, it can have only two values – Male or Female. Compared to the whole STUDENT table, they are not unique value. Similarly, say we have only four semesters for the course, and then we can have only four values – sem1, sem2, sem3 and sem4. These types of columns are called less unique value columns or columns with less cardinality. Even though these columns have less frequent values, they are queried most often.
Bits – as everyone knows it is the smallest unit of data representation. It can have either 0 or 1 as value. Now what if we use these bits to represent these less unique value columns? But how? This technique of storing less frequently used columns in the form of bits is called bitmap indices.
This method is used for very large tables with less unique value columns and is accessed number of time with various retrieval queries. In this method we will have
- As many bits as the number of rows in the table for each of less unique value columns. For example, if the STUDENT table has 10K records, then we will have 10K bits – one bit for each row.
- Number of bitmap indices created on the column will be equal to number of distinct values in the column. For example, for GENDER column we will have two bitmap indices created – one for male and one for female, and for semester column we will have four bitmap indices created – 1, 2, 3, and 4.
- If we have any matching value in the column for the row, then that row bit will have ‘1’, else ‘0’. That means, for GENDER column we will have 2 bitmap indices – one for male and one for female. The bit value for the ‘male’ bit map index will be 1, if that row has GENDER as ‘M’, else ‘0’.
Imagine in our example of STUDENT table has only four records and has values as below.
- As per rule 1, we will have four rows in the table and hence we will have bits – one bit for each row
- The GENDER column has only two values – ‘M’ and ‘F’. Hence we will have two bitmap indices – one for ‘M’ and one for ‘F’.
- Now the bitmap index for GENDER column is as below. Here Bitmap index ‘M’ has value ‘1000’ indicating first row has gender as ‘M’ and rest of the rows do not have gender as ‘M’ Similarly, bitmap index ‘F’ indicates first row is not ‘F’ and rest of all rows are having gender as ‘F’.
Similarly bitmap index for Semester can be as follows :
Suppose that we have to find the female students who are in the second semester. Here this query uses two columns to filter the records, where both of them have less unique value columns.
SELECT * FROM STUDENT WHERE GENDER = ‘F’ AND SEMESTER =2;
The query will search the bitmap index for both of this columns and perform logical ‘AND’ operation on those indexes to get the actual address of the result.
Look at the table to check if it is correct. Yes, it has resulted in correct row. So the DBMS goes to the third row in the file and displays the result for the user.
Here fetching the bitmap index and performing the logical ‘AND’ operation to get the result is comparatively faster. Hence this method of storage is useful for this kind of data.
If we have to delete the record from the table, it will generate a temporary delete index will be generated. Then it will perform logical ‘AND’ operation on filter columns and temporary index to remove the data from the table.
Suppose we have to delete the female student from the 4th semester. Then steps involved in deleting this record is as follows.
SELECT * FROM STUDENT WHERE GENDER = ‘F’ AND SEMESTER =4;
13. How to create a bit map index?
Syntax for creating bit map index is as below:
CREATE BITMAP INDEX index_name ON table_name (column_name);
CREATE BITMAP INDEX idx_gender ON STUDENT (GENDER);
14. What are the Advantages and Disadvantages of Bitmap Index?
Advantages of Bitmap Indices :
As we have seen already, this method helps in faster retrieval of the records when there are less cardinality columns and those columns are most frequently used in the query. This method is efficient even if we have very big table.
- This method is more efficient when the columns have least involved in insert/update/delete operations.
- It allows to combine multiple bitmap indices together to fire the query as we have seen in above examples
Disadvantages of Bitmap Indices :
- They are not suitable for small tables. In small tables, DBMS will force to use full table scan instead of using bitmap index.
- When there is multiple insert/update/delete on the table from different users, it may cause deadlock on the tables. It will take time to perform the DML transaction and then to update the bitmap index. Hence when there is multiple DML transaction from different users, it will not be able perform transaction quickly, and causing the deadlock.
- When there is large number of records, there is an overhead to maintain this bitmap indexes. Each time a new record is entered, we have to modify the bitmap index throughout, which is a tedious and time consuming.
15. What is the difference between B-Tree and Bitmap Index?
In B Tree method, each root will branch to only two nodes and each intermediary node will also have the data. And leaf node will have lowest level of data. However, in this method also, records will be sorted. Since all intermediary nodes also have records, it reduces the traversing till leaf node for the data. A simple B tree can be represented as below:
Bitmap indices are very powerful in retrieving the records with limited number of values for the column. In this method, indexes for columns with less unique values are used in the form of bits.
16. What is Function based index?
Consider a query to find the employee details whose first name is ‘steven’. Here the name is given in small letter. In the EMPLOYEES table name is stored in initcap format. Hence we need to convert FIRST_NAME in EMPLOYEES to lower case.
SELECT * FROM EMPLOYEES WHERE LOWER (FIRST_NAME) = ‘steven’;
But imagine that we have index created on FIRST_NAME column. Then above query will not use index created on FIRST_NAME to search the record. If we need it to use index, then we have to create index along with the function LOWER (). Such indexes on columns are called as function based indexes.
CREATE INDEX idx_lwr_firstname ON EMPLOYEES (LOWER (FIRST_NAME));
17. What is Index Organized table?
When we create an index on table columns, what actually happens in the background is another table is created with that column data (for all the records) and a pointer to the address location is maintained. So, when we fire a query using this column in the WHERE clause, this index table will be looked up for address location and pointer will directly jump to this address location and return the result.
In general, there will be an index created on primary key of the table. But imagine if there is a small table and we create index on rest of the columns too. What will happen? Unnecessarily, these column records are also stored in the index table. This is waste of memory. So, what oracle does is, it stores the entire table itself as index and sorts based on its primary key. In other words, instead of storing column values and its address location alone in the index table, whole table records are stored in the index table based on its primary key. Such tables are known as Index Organized table (IOT).
Below is an example on how to create index organized table.
CREATE TABLE iot_example ( NAME VARCHAR (50), ADDRESS VARCHAR (70), CONSTRAINT PK_NAME PRIMARY KEY (NAME)) ORGANIZATION INDEX;
18. What are some tips on tuning SQL Indexes for better performance?
Indexes are created on the tables to improve the performance of the query on the table. Indexes are usually created by analyzing the queries. That means, it checks most frequently used columns in the WHERE clause, number of records in the table, uniqueness of the column values, selectivity of the column etc. But there is few guidelines while creating indexes on a table so that query will perform better.
- Don’t use too many indexes: -When we create index, another lookup table will be created by the DB to store the column values and the pointer to the address location. Hence if we create lots of indexes, it will take lots of space to store these informations. In addition, it is duplicity of data. Apart from this, if we try to UPDATE or DELETE the record, we have to update all these index tables too. As the number of index increases, updating these index tables also increases which takes time. Hence the performance of the query degrades. That’s why create index which are actually required. Delete all unwanted indexes on the table.
- Try not to include columns that are repeatedly updated in an index: – If we create index on the column, which is frequently updated, then we have to update the index table too. This will be an additional task and will add up to the time taken by the query update. That means, it will increase the actual time required to update the column. Hence it is not appreciated to have index on frequently updated columns.
- Creating indexes on foreign key column(s) can improve performance: – Whenever we write query with multiple tables, we join the primary key of one table with the foreign key from another table (parent-child mapping). However we will have default index on primary key column. But having index on foreign key column will improve the performance of the query. In other words, say DEPARTMENT_ID in EMPLOYEES table is the foreign key. This EMPLOYEES table is a very big table and joining DEPARTMENT_ID of this table with DEPARTMENT_ID of DEPARTMENTS will take lots of time. If we have index on DEPARTMENT_ID of EMPLOYEES table, then it will increase the performance.
- Create indexes for columns that are repeatedly used in predicates of your SQL queries: – If some columns are repeatedly used in the WHERE clause, then it is better to create index on such columns. It will increase the performance of the query.
- Get rid of overlapping indexes: – Indexes can be created on combination of columns – multiple columns. If we have index on multiple columns, then in a query it first checks for the first column and uses that index. If the WHERE clause all the columns, then by default that index will be used. If it has only first column, then also index will be used. Suppose we have multiple indexes using the same column as its first column, then query will be confused to select the index. It will create problem in the performance. Hence do not create overlapping index.
- Consider deleting an index when loading huge amounts of data into a table: – While loading huge amount of data, the database will automatically update the index table/s (depending on how many indexes are created on the table) also. So, if we are loading huge amount of data, the time taken to load will be equal to time taken to load the actual table data plus time taken to update all the index tables. For huge data load, this will be considerably more time. Hence disable all the indexes on the table first and then load the data. After all the data is loaded, enable the indexes.
- Ensure that the indexes you create have high selectivity: -Indexes with high selectivity will give better performance. Hence create indexes on column that has more selectivity. In general, any index with more than .33 selectivity, will give better performance. If a column has unique values then it will have selectivity of 1 – usually primary key column. Therefore, while creating index on column, check for its selectivity and then create the index.
19. What is Clustered index?
In a typical table, records are sorted based on the Primary key of the table and stored in the disk. But when clustered index are created on the column of the table, then records are sorted based on clustered index column and stored in the disk. In other words, clustered index determines the order of the records stored in the disk.
Suppose we have created clustered index on the PHONE_NUMBER column in the EMPLOYEES table. Then all EMPLOYEES records are sorted based on the PHONE_NUMBER in the disk and then index mapping is created. So, now if we write any query to search some sequence of phone numbers of employee – say, less than or equal to, greater than or equal to, less than, greater than etc, then it will be easy to retrieve all the records. Because, once first record is fetched, rests of the records are present immediate next to it. We need not fetch again for rest of the records.
20. When would using a clustered index make sense?
These types of indexes are useful when index column has multiple related values. For example, for a student in a class have marks for various different subjects, say, 5 subjects. That means each student will have marks for 5 subjects. So, if we have to find total marks of each student, and imagine all of 5 subject marks are scattered in the DB, then it will take lot of time to calculate his marks.
Instead, if we have clustered index on student name or id, then whole data will be sorted based on name or id in the disk. That means, when we fetch marks for one student, all his marks will be available one after the other. We need not fetch entire table for all 5 marks five times for each student.
21. What is a disadvantage to using a clustered index?
When there is a new rows entered, for which indexed column has some in-between value, then in order to maintain the sorted order, entire records needs to be shifted. Similarly, when there is an update on the column on which clustered index is created, we need to move the records so that it maintains sorted order. Therefore, clustered indexes are created on primary key or foreign key columns, for which there very less like hood of update.
22. A table can have multiple non-clustered indexes, but only one clustered index. Why?
If a table has clustered index, then its records are sorted based on the column on which clustered index is created. Therefore, if we create multiple clustered index, we cannot sort the table record for all those columns. Only one time sorting based one column for which clustering are done is valid.
23. What is the difference between a Clustered and Non Clustered Index?