Suppose we are executing a transaction in a DB. It can result in different results for the same inputs with same test scenarios. The result can vary depending on the isolation levels of the transaction. These isolation levels describe to what extent the transactions are isolated from other transactions, and whether they can see the data inserted / updated by other transactions.
There are four levels of transaction isolation.
Suppose user 1 has placed a transaction to retrieve the data from STUDENT table, while user 2 has placed an update transaction on STUDENT table to update Address of Student – ‘Patty. What will happen to User1’s request? Will he see the update made by User2? Ideally speaking, according to isolation property of transaction, requests placed by user1 and users2 are independent and should not affect each other’s request.
But here in read uncommitted type of isolation, user1 will see the data updated by user2. User1’s request is to retrieve whole STUDENT table, which will take little time as this table has huge number of records. Meanwhile user2 has updated some record, but not completed yet (i.e.; it’s not still committed). Here user1 is allowed read such uncommitted data leading to dirty data read. That means, these uncommitted data may lead to consistent reading of the data from tables leading to wrong information to the user. But this type of isolation gives the quickest performance as it is not validating the data nor waiting for the locks. Here constraints and integrity of the data are ignored.
Above scenario is depicted in the diagram below. When user1 reads, he sees user2 has already changed address of Patty and it will read new address ‘Holland’ (although it is not committed) to maintain consistency.
But this type of read will not create consistency always. In above case, it might have read the correct address and would have given the correct information to the user.
But imagine if T2 (user2’s transaction) has failed while executing; it has to be rolled back. But it is yet to be rolled back. The data in STUDENT table still shows it as ‘Holland’. Transaction T1 sees that T2 has executed and thinks that it is the correct data updated by T2 and reads it as ‘Holland’. After T1 has read this record, T2 rolls back! Now the address is changed to ‘Troy’ which is the right data. But what T1 has read now is wrong data and lead to dirty read! That means, until any transaction is committed or rollback the data modified by it will not be correct data. It is supposed to be dirty and reading such data will lead to dirty read. But this dirty read may be correct if transaction is committed, else it will lead to inconsistent / wrong read.
Some RDBMS supports READ UNCOMMITTED isolation level, while some does not. They will have other strong features in them which will help them to read the correct data in above situations. E.g.: - Oracle does not support this isolation level.
The main purpose of this type of isolation is read the data without waiting for other transactions which have blocked single or multiple records. It assumes transactions that have locked the records for modification will commit it and the read transaction will give the correct results. Hence it assumes reading even if the transactions have locked those data will reduce the time. But all issues arise when locked transactions rollback.
In this type of isolation, the transaction reads the data that are committed at the beginning of transaction. It will not read any data that are modified after the transaction is started like we saw above. This type of isolation looks straight forward and seems like reading correct data. But this also has various problems. If it is implemented correctly, then only it reads the right data. Lets us see below how it leads to reading wrong records.
Firstly, this type of isolation may lead to non-repeatable reads. This means, suppose transaction T1 has read the data at time t1 and got one resultset. Suppose T1 has re-executed at time t2. Now the resultset that we got will have different values. The number of records in first and second resultset will be same but the data/value will be different. This is because of read committed isolation. i.e.; when T1 is executed at time t1, it has read the data which were committed at the time t1. While T1 was executing there would be some other transaction modified these data and committed. Hence when T1 re-reads the data at time t2, the committed set of data has been changed and it reads different values. This type of read is known as non-repeatable read.
Another issue with read committed read is phantom read. This looks similar to non-repeatable read, but here number of records read in first and second transaction with same query will differ. The records read in the first transaction will present in second transaction too with same data value, but in second transaction number of records will be more. This is because some other transaction would have inserted some records which satisfy the transaction query which we are executing. This has resulted in more number of records second time – phantom read.
In most of the databases we see READ COMMITTED isolation. The above two issues of isolations are accepted in most of the time. But if this isolation is not properly implemented, it will lead to inconsistency of data while reading. Let us see how it leads to inconsistency.
Consider the example of balance transfer from one account to another account. Suppose transaction T1 is calculating the total balance in ACCOUNT table while transaction T2 is transferring 5000 from Rose to Stewart’s account. Assume t1<t2.
Below table shows different steps of each transaction as it is processed in DB. Suppose transactions are processed in the same order as given below.
At the end of the transaction, the total balance in Rose and Stewart’s account should be 45000 and 65000 respectively. Above transactions has calculated it properly. The total balance in ACCOUNTS table should not be changes by this transaction –since balances are transferred from one account to another, hence it should show total amount as 170000/-. But the total balance shown above is 175000/- This is because step2 for T1 started before T2 has started, and T1 has read the amount before 5000/- has transferred to Stewart and T1 completed after T2 is complete. Hence there is difference in total amount and calculated extra 5000 in total. In addition, T1 processing had to wait for T2 to complete; hence its cost has waiting period too.
What we can learn from above example is that, even if we read records after commit, if it is not correctly implemented, it will read incorrect data values. If T2 had started after T1 has read Stewart’s data, then it would have lead to correct result.
Alternatively, we would have got correct result if we had followed READ UNCOMMITTED isolation – T1 wouldn’t have waited for T2 to complete, and would have read the balance before T2 has updated it leading to correct total. In addition we would have had correct balance transfer too. But again this method would lead to wrong result, if there is variation in executing time of each step.
Hence implementing these isolation levels in correct form is very much important.
This level of isolation allows the transaction to result in same resultset whenever it is executed. It does not result in different data values when same transaction is being executed at different time intervals. (Please note that when we say time interval t1, t2 while executing transactions, it is very small fraction of time intervals – say in milli/micro seconds. However executing same transaction at two different times – comparatively different time intervals – like minutes / hours, there may or may not have value difference. This will not lead to any issues. Issues arise only when there is time difference in millis / micros).
This is achieved in Oracle like software by multi-versioning of the database – the data in DB are versioned based on the time at which they are committed and transactions have begun. That means, each transaction will hold a set of data that are committed at the time transaction has started and it will hold those sets till transaction is complete. While if there is any other transactions executed after above transaction, these new transactions will have different set of data that are committed at the time these transactions have started. Hence the dataset are versioned based on commit and start of respective transaction. This makes each transaction independent of other transactions and maintains consistency in result.
Let us consider the Balance transfer example with t1<t2. Hence T1 will have data set independent of changes done by T2. i.e.; T1 will have data set as shown in the first table below. T2 will also have dataset same dataset, but it will change the data as shown in second table. But the modification done by T2 will not change the dataset held by T1. Hence dataset held by T1 and T2 are totally independent. Now if we calculate the total balance, it will show it as 170000/- which is correct result.
In some other RDBMS other than oracle, locking itself is used to achieve repeatable reads. In those systems, if a transaction has locked a record for reading, such records are not read /updated until it is released. Ideally shared locks are share or exclusive locked by other transactions; but here in order to achieve this isolation level, shared locks are not allowed to lock by other transaction. Hence each transaction will get consistent data to execute; leading to consistent result. Hence each transaction has to wait for the transaction which has locked the record (irrespective of type of lock), making each transaction to execute sequentially. Here it causes waiting period also to be added to the cost of transactions.
In our example of balance transfer, the transaction will have following steps to get correct results.
Hence T1 calculates correct total and T2 transfers the balance correctly and independently.
Though it works independently leading to correct results, it also has drawbacks. If the locks are not correctly placed, it will lead to deadlock condition. In above transaction, suppose T2 has started already updates Stewart’s record first. Hence it has placed exclusive lock on record 4. T1 processes as usual and its processing reaches record4. It finds that this record has exclusive lock on it and waits for T2 to complete. At the same time T2 starts updating record2, and finds it has shared lock by T1 and waits for T1 to release the lock. What has happened now – a Deadlock!
Here we have to fail / abort any one of the transaction so that deadlock is broken and transaction will be executed smoothly. Another drawback is it has consumed lot of resource and time executing the transaction but ended in aborting one of them and restarting it consuming same amount of time again. Hence using shared / exclusive locks will lead to resource consumptions and deadlocks most of the time. Hence it is always better to use multi-versioning or statement level locks to achieve repeatable read isolation.
This is most strict isolation level. It looks similar to repeatable read isolation, but it provides isolation at transaction level. It executes the transaction in such a way that there are no other transactions being executed to modify the data that it is using.
Suppose there are two transactions running different sessions S1 and S2, and each transaction is trying to access the data from each others’ table.
Here both the transactions are executed in serially as shown above, and trying to read data from each other’s table. But both will end up in inserting zero records. This is because, each transaction is executed as if other transaction is not executing. It will not read any uncommitted or committed data while executing its transaction. Hence T1 will not read the data inserted by T2 and T2 will not read the data inserted by T1. It reads the committed data when each transaction has started.
A transaction is said to be Serializable if the transactions within it can be executed one after the other without having any overlap on time. Also, the execution order of the transaction should be equivalent. The equivalent is irrespective of the data or value. If A+B = B+A then both these expression is said to be equivalent; same is applied while executing the transaction too. In above example, if S1 starts first or S2 starts first, both the transaction is the same.
Simple SQL TutorialSQL Interview Questions