Index is a data structure which is created on one or more columns of the table. In most of the cases, the indexes are structured as B tree. When index is created on a column of a table, it actually creates a lookup table with column and a pointer to the memory address where a row with this column is actually stored. Therefore when we query a table with column in WHERE clause, the parser first checks if this column is part of index, and if there is a index lookup table for it. If yes, then it checks for the memory address where the record is stored. It then directly jumps to that memory address and returns the result to the user. That is why index scan or if a column has index, then it retrieves the data faster.
In most of the cases, B-Tree data structure is used to store the indexes. This is because of the time efficiency of B-Trees. In other words, B-trees are traversed, searched, inserted, deleted and updated in logarithmic time. In addition, B-Tree data are always sorted and stored. Hence it makes searching and inserting the data in known fraction of time. The data values stored in B-trees are balanced – all the values smaller than a particular node can be found at the left of the node and the values greater than the node value are found at the right of the node. Hence it is easy to search any value or record in B-tree indexes.
However, RDBMS actually determines which data structure needs to be used for index. In certain RDBM, we can tell which data structure needs to be used for the index.
In hash table index, index will be created on the columns based on the hash function used on the column. That means, hash function will be applied on the column on which index has to be created and that result will be the location of the record stored. Hence, here in this method all the records will be scattered in the memory.
For example, consider that we have created hash index on PHONE_NUMBER column of the EMPLOYEES table. Let this hash function be any simple function to any complex function. Let us assume, in this case hash function be sum of all digits in the PHONE_NUMBER multiplied by 1000. Then if we need to search any particular phone number say, 546.897.231, then we will get it at the location 45000. Hence the pointer will go that location and retrieve required record details. If we need to see immediate next phone number details, then it will be at location 46000 which far away from previous phone number.
Hash indexes are created by creating a hash function on the column on which index needs to be created. Hence each column will have different non-contiguous memory locations allocated to it. Therefore, if we need to search any contiguous records with conditions like ‘less than’ or ‘greater than’, then hash index will not be able to get all the records in one shot. It has to search for the records at various locations to get all the records. Hence it is not efficient for such searches. It is good only if we have to search key-value pairs. That means query with WHERE clause with ‘=’ condition will have better performance.
When a index is created on a column or combination of columns in a table, another index lookup table is created with columns with which index is created and a pointer address to the memory location where the whole record of the table is stored. It will not have entire record information in the lookup table.
When we run a query with condition ‘WHERE COLUMN_NAME = ‘XYZ’, then database will check if there is any index on this COLUMN_NAME. If there is index on this particular column, it will check for the selectivity of the column and decides if index has to be used or not. If the selectivity of the column is more than 0.33 then, index will be used to retrieve the data.
Yes, in Oracle we can force the database to use the index by using HINTS. These HINTS will redirect the path of execution to use the index.
The general syntax for creating index is :
CREATE INDEX index_name ON TABLE_NAME (COLUMN_NAME); CREATE INDEX idx_phone ON EMPLOYEES (PHONE_NUMBER);
We can create index on combination of columns. The general syntax for creating multi-column index is :
CREATE INDEX index_name ON TABLE_NAME (COLUMN_NAME1, COLUMN_NAME2,.. COLUMN_NAMEN); CREATE INDEX idx_emp_name ON EMPLOYEES (FIRST_NAME, LAST_NAME);
A very good real time example of index is index in a book and catalogues in a library. When we want to search / read on particular topic, we look at the index on the book and flip to that particular page rather than scanning entire book for that topic. Searching in a index and then flipping to that page is more time efficient.
Bitmap indices are very powerful in retrieving the records with limited number of values for the column. In this method, indexes for columns with less unique values are used in the form of bits. Let us try to understand one by one about this method using example.
In the above example, if we notice the column GENDER, it can have only two values – Male or Female. Compared to the whole STUDENT table, they are not unique value. Similarly, say we have only four semesters for the course, and then we can have only four values – sem1, sem2, sem3 and sem4. These types of columns are called less unique value columns or columns with less cardinality. Even though these columns have less frequent values, they are queried most often.
Bits – as everyone knows it is the smallest unit of data representation. It can have either 0 or 1 as value. Now what if we use these bits to represent these less unique value columns? But how? This technique of storing less frequently used columns in the form of bits is called bitmap indices.
This method is used for very large tables with less unique value columns and is accessed number of time with various retrieval queries. In this method we will have
Imagine in our example of STUDENT table has only four records and has values as below.
Similarly bitmap index for Semester can be as follows :
Suppose that we have to find the female students who are in the second semester. Here this query uses two columns to filter the records, where both of them have less unique value columns.
SELECT * FROM STUDENT WHERE GENDER = ‘F’ AND SEMESTER =2;
The query will search the bitmap index for both of this columns and perform logical ‘AND’ operation on those indexes to get the actual address of the result.
Look at the table to check if it is correct. Yes, it has resulted in correct row. So the DBMS goes to the third row in the file and displays the result for the user.
Here fetching the bitmap index and performing the logical ‘AND’ operation to get the result is comparatively faster. Hence this method of storage is useful for this kind of data.
If we have to delete the record from the table, it will generate a temporary delete index will be generated. Then it will perform logical ‘AND’ operation on filter columns and temporary index to remove the data from the table.
Suppose we have to delete the female student from the 4th semester. Then steps involved in deleting this record is as follows.
SELECT * FROM STUDENT WHERE GENDER = ‘F’ AND SEMESTER =4;
Syntax for creating bit map index is as below:
CREATE BITMAP INDEX index_name ON table_name (column_name);
CREATE BITMAP INDEX idx_gender ON STUDENT (GENDER);
Advantages of Bitmap Indices :
As we have seen already, this method helps in faster retrieval of the records when there are less cardinality columns and those columns are most frequently used in the query. This method is efficient even if we have very big table.
Disadvantages of Bitmap Indices :
In B Tree method, each root will branch to only two nodes and each intermediary node will also have the data. And leaf node will have lowest level of data. However, in this method also, records will be sorted. Since all intermediary nodes also have records, it reduces the traversing till leaf node for the data. A simple B tree can be represented as below:
Bitmap indices are very powerful in retrieving the records with limited number of values for the column. In this method, indexes for columns with less unique values are used in the form of bits.
Consider a query to find the employee details whose first name is ‘steven’. Here the name is given in small letter. In the EMPLOYEES table name is stored in initcap format. Hence we need to convert FIRST_NAME in EMPLOYEES to lower case.
SELECT * FROM EMPLOYEES WHERE LOWER (FIRST_NAME) = ‘steven’;
But imagine that we have index created on FIRST_NAME column. Then above query will not use index created on FIRST_NAME to search the record. If we need it to use index, then we have to create index along with the function LOWER (). Such indexes on columns are called as function based indexes.
CREATE INDEX idx_lwr_firstname ON EMPLOYEES (LOWER (FIRST_NAME));
When we create an index on table columns, what actually happens in the background is another table is created with that column data (for all the records) and a pointer to the address location is maintained. So, when we fire a query using this column in the WHERE clause, this index table will be looked up for address location and pointer will directly jump to this address location and return the result.
In general, there will be an index created on primary key of the table. But imagine if there is a small table and we create index on rest of the columns too. What will happen? Unnecessarily, these column records are also stored in the index table. This is waste of memory. So, what oracle does is, it stores the entire table itself as index and sorts based on its primary key. In other words, instead of storing column values and its address location alone in the index table, whole table records are stored in the index table based on its primary key. Such tables are known as Index Organized table (IOT).
Below is an example on how to create index organized table.
CREATE TABLE iot_example ( NAME VARCHAR (50), ADDRESS VARCHAR (70), CONSTRAINT PK_NAME PRIMARY KEY (NAME)) ORGANIZATION INDEX;
Indexes are created on the tables to improve the performance of the query on the table. Indexes are usually created by analyzing the queries. That means, it checks most frequently used columns in the WHERE clause, number of records in the table, uniqueness of the column values, selectivity of the column etc. But there is few guidelines while creating indexes on a table so that query will perform better.
In a typical table, records are sorted based on the Primary key of the table and stored in the disk. But when clustered index are created on the column of the table, then records are sorted based on clustered index column and stored in the disk. In other words, clustered index determines the order of the records stored in the disk.
Suppose we have created clustered index on the PHONE_NUMBER column in the EMPLOYEES table. Then all EMPLOYEES records are sorted based on the PHONE_NUMBER in the disk and then index mapping is created. So, now if we write any query to search some sequence of phone numbers of employee – say, less than or equal to, greater than or equal to, less than, greater than etc, then it will be easy to retrieve all the records. Because, once first record is fetched, rests of the records are present immediate next to it. We need not fetch again for rest of the records.
These types of indexes are useful when index column has multiple related values. For example, for a student in a class have marks for various different subjects, say, 5 subjects. That means each student will have marks for 5 subjects. So, if we have to find total marks of each student, and imagine all of 5 subject marks are scattered in the DB, then it will take lot of time to calculate his marks.
Instead, if we have clustered index on student name or id, then whole data will be sorted based on name or id in the disk. That means, when we fetch marks for one student, all his marks will be available one after the other. We need not fetch entire table for all 5 marks five times for each student.
When there is a new rows entered, for which indexed column has some in-between value, then in order to maintain the sorted order, entire records needs to be shifted. Similarly, when there is an update on the column on which clustered index is created, we need to move the records so that it maintains sorted order. Therefore, clustered indexes are created on primary key or foreign key columns, for which there very less like hood of update.
If a table has clustered index, then its records are sorted based on the column on which clustered index is created. Therefore, if we create multiple clustered index, we cannot sort the table record for all those columns. Only one time sorting based one column for which clustering are done is valid.