As we have seen above, when there is multiple transactions executing at the same time on same data, it may affect the result of the transaction. Hence it is necessary to maintain the order of execution of those transactions. In addition, it should not alter the ACID property of a transaction.
In order to maintain the concurrent access of transactions, different protocols are introduced.
- Lock Based Protocol
- Time-Stamp Based Protocol
- Validation Based Protocol
Lock is in other words called as access. In this type of protocol any transaction will not be processed until the transaction gets the lock on the record. That means any transaction will not retrieve or insert or update or delete the data unless it gets the access to that particular data.
These locks are broadly classified as Binary locks and shared / exclusive locks.
In binary lock data can either be locked or unlocked. It will have only these two states. It can be locked for retrieve or insert or update or delete the data or unlocked for not using the data.
In shared / exclusive lock technique the data is said to be exclusively locked if for insert / update /delete. When it is exclusively locked no other transaction can read or write the data. When a data is read from the database, then its lock is shared i.e.; the data can be read by other transaction too but it cannot be changed while retrieving the data. The lock compatibility matrix for shared / exclusive locks is given below.
From the matrix, it is clear that when there is shared lock, then any number of transactions can have shared locks on same data. i.e.; all the transactions will be reading the data. But when any one of the transaction has exclusive lock, then none of the other transaction can have any of the locks. This is because; exclusive locks will be used while modifying the data in DB. Hence any shared or exclusive lock by other transaction will lead to incorrect data or deadlock situation. If there is exclusive lock by one transaction and other transaction is trying to access the same data, then lock will not be granted to second transaction. It will be made to wait till the first transaction is executed and locks are released. Then the second transaction is given the lock and is executed.
Simply giving and releasing the locks for the transaction will not be sufficient. For example, see the locks and transactions given below :
Suppose if there is another transaction in between above transaction which updates T. Then the result of T+S will be incorrect. The read (T) will give incorrect data. Hence simply locking the data is not sufficient. A well defined rules for locking and unlocking the data needs to be defined so that incorrect data is not introduced into DB. They are defined in terms of protocols. Lock based protocols are of 4 types :
Simplistic Lock Protocol
As the name suggests it is the simplest way of locking the data during the transaction. This protocol allows all the transaction to get the lock on the data before insert / update /delete on it. After completing the transaction it will unlock the data.
In this protocol, it evaluates the transaction to list all the data items on which transaction needs lock. It then requests DBMS for the lock on all those data items before the transaction begins. If DBMS gives the lock on all the data, then this protocol allows the transaction to begin. Once the transaction is complete, it releases all the locks. If all locks are given by DBMS, then it revert the transactions and waits for the lock.
For example, if we have to calculate total marks of 3 subjects, then this protocol will evaluate the transaction and list the locks on subject1 marks, subject2 marks and then subject3 marks. Once it gets all the locks, it will start the transaction.
Two Phase Locking Protocol (2PL)
In this type of protocol, as the transaction begins to execute, it starts requesting for the locks that it needs. It goes on requesting for the locks as and when it is needed. Hence it has a growing phase of locks. At one stage it will have all the locks. Once the transaction is complete it goes on releasing the locks. Hence it will have descending phase of locks. Thus this protocol has two phases – growing phase of locks and shrinking phase of locks.
For example, if we have to calculate total marks of 3 subjects, then this protocol will go on asking for the locks on subject1 marks, subject2 marks and then subject3 marks. As and when it gets the locks on the subject marks it reads the marks. It does not wait till all the locks are received. Then it will have total calculation. Once it is complete it release the lock on subject3 marks, subject2 marks and subject1 marks.
This protocol ensures that the transactions are executed in a sequence. That is the transactions are executed in the order of locks achieved, and hence the transaction are serialized. But there can be deadlock situation by this protocol (explained below in drawbacks). Hence we need a better protocol to handle locks.
Strict Two Phase Locking (Strict 2PL)
This protocol is similar to 2PL in the first phase. Once it receives the lock on the data, it completes the transaction. Here it does not release the locks as it is used and no more required. It waits till whole transaction to complete and commit, then it releases all the locks at a time. This protocol hence does not have shrinking phase of lock release.
In the example of calculating total marks of 3 subjects, locks are achieved at growing phase of the transaction and once it receives all the locks, it executes the transaction. Once the transaction is fully complete, it releases all the locks together.
Rigorous 2 Phase Locking
In this method, all the locks on transactions are kept till the transaction is committed or aborted. This more strict protocol than others above. The transactions can be serialized by the order they are committed.
In this 2PL protocols, if we need to have exclusive lock on any data for writing, then we have to first get the shared lock for reading. Then we have to request / modify the lock to exclusive lock. Similarly, exclusive locks can be converted into shared locks. It does this process of acquiring locks in two phases as below :
Here we can see that acquiring locks on data happens in first phase and releasing locks on data happens in second phase. Converting locks from shared to exclusive (upgrading) can be done in first phase alone while converting from exclusive to shared (downgrading) is done in release phase.
Automatic Lock acquisition
Locks can be automatically acquired by the system without explicitly calling the locking protocol. i.e.; whenever a transaction is executed, system automatically checks for the locks and assigns the lock. If the respective lock is held by other transaction, then current transaction is asked to wait.
When a read transaction is executed, automatic lock is got by following checks :
When a write transaction is executed, automatic lock is got by following checks :
All these acquired locks are released once the transaction is committed or aborted.
We saw how a lock can be acquired on data and how it can be released. Now let us see how actually locks are implemented internally and are managed by lock manager. Lock manager is a process which gets the requests from different transactions to lock or unlock particular data items. The lock manager then checks for the lock on the data, and provides locks if it is unlocked. It sends reply by giving lock grant. If there is deadlock, then it sends the reply with rollback message. If the data is locked, then it keeps the transaction in waiting without sending the reply till the lock is released.
All the lock and unlock information about the data is stored in the lock table. The lock manager looks into this table for the requested data and decides whether to grant lock or not. Similarly, unlocking requests are updated in the lock table. These lock tables are stored as in-memory hash table with index on the data item to be locked/unlocked.
Below diagram represents a typical lock table. The black blocks represent granted locks and white blocks are waiting locks. The Blue blocks are the data items. Any new request for the same data item is appended at the end of each granted locks; if the lock request is compatible with previous lock, then lock is granted for new request. For example below, data item D is requested by transaction T8 and T7. Both are granted locks because both are shared locks. But lock request for data item A by T1 is in waiting state as the lock held by T3 is exclusive lock or it is shared lock and T1 is requesting exclusive lock.
When an unlock request is sent, the locked blocks are deleted from the table and next waiting lock/ new lock request is being checked. If a particular transaction is aborted, then the lock held by it is deleted. Similarly, if it has any waiting locks, then they are also deleted.
Lock manager will have list of all locks (including shared or exclusive) held by each transaction. This helps lock manager to execute the transaction efficiently.
Drawbacks of Lock based Protocol
If concurrency control manager is not properly designed, then the lock based protocols can lead to dangerous situations in DB.
- Deadlock : It is a situation where two or more transactions are waiting for each other to release the lock, which will end in waiting for each other forever. This will never end and the system will hang forever. This is not a desirable situation and has happened because of poor locking management.
Suppose transaction T1 is updating the value X. At the same time transaction T2 is tries to read X by locking it. In addition T1 will try to lock Y which is held by lock of T2. It can be represented more clearly like below :
Here T2 is waiting for T1 to release lock on X, while T1 is waiting for T2 to release lock on Y. T1 will be complete only when it gets lock on Y and then it will release lock on X and Y. But the lock on Y is held by T2 and is waiting for T1 to get completed, so that it can release Y. Here both the transactions are waiting for each others to get completed. This type of situation is called deadlock. This waiting will never end unless one of them are killed / aborted, so that all the locks held by them are released and other transaction can proceed and complete.
Hence while assigning the locks, utmost care has to be taken to avoid deadlock. In above case if T1 gets lock on Y much before T2 is started, then there will not be a deadlock situation, i.e.; if T1 gets lock on Y, then it need not wait for T2 to release any locks, and hence it will continue with its transaction and release lock on X and Y. By the time T2 starts, there will not be any locks on T1.
- Starvation : This similar to deadlock. Here one transaction would be waiting for exclusive lock on particular data, while shared locks are repeatedly requested on the same data by other set of transactions. Then shared lock will be assigned to those transactions and the first transaction will be kept waiting for its exclusive lock. This is a deadlock situation, where first transaction never gets the lock on the data. Hence it will be repeatedly aborted to avoid deadlock. This type of situation is called starvation. i.e.; first transaction is starved for exclusive lock.
As we have seen above in lock based protocol, it acquires locks at the time of execution. But in this method, as soon as a transaction is created it assigns the order of the transaction. The order of the transaction is nothing but the ascending order of the transaction creation. The priority for older transaction is given to execute first. This protocol uses system time or logical counter to determine the timestamp of the transaction.
Suppose there are two transactions T1 and T2. Suppose T1 has entered the system at time 0005 and T2 has entered the system at 0008 clock time. Priority will be given to T1 to execute first as it is entered the system first.
In addition to the timestamp of a transaction, this protocol also maintains the timestamp of last ‘read’ and ‘write’ operation on a data. Based on the timestamp of transaction and the data which it is accessing a timestamp ordering protocol is defined.
According to this protocol :
- If a transaction T is a read transaction on data X then
IF TS (T) < W_TS(X) THEN Reject T ELSE If TS (T)>= W_TS(X) Then Execute T Update W_TS(X) and R_TS(X) END
Here TS (T) → Timestamp of transaction T
W_TS (X) → Timestamp of write operation on data X
R_TS (X) → Timestamp of read operation on data X
This algorithm states that if there is an active write operation on data X when a transaction T is requesting for X, then reject the transaction T. If the transaction T is started as soon as write is complete or no going write operation on X, then execute T.
For example if there is an update on marks1 on MARKS table and meanwhile there is a request to read marks1, then do not perform read marks1. This is because there is an update being happening on marks1. If there was an update on marks1 which is executed long back or it is complete just now and there is a request to read marks1, then system will allow reading marks1.
- If a transaction T is a write transaction on data X then
This algorithm describes about write operation. If there is an active read or write on data X, and at the same time if the transaction T is requesting for X, then the transaction is rejected. If there is no active read / write operation on X, then execute the transaction.
Suppose T1 is reading marks1 from MARKS table. Meanwhile transaction T2 begins and tries to update marks1 in MARKS. Then the transaction T2 is rejected and rolled back.
Advantage of time-Stamp protocol is it ensures the serializability of the transactions. i.e.; it executes the transactions only after one transaction is complete and hence it guarantees the serializability. In addition it makes sure the transactions with smaller timestamp is executed first and then the ones with larger time stamps. That means transactions are prioritized based on their time stamp, and hence there will not be any cycles. This also makes sure that no deadlocks are created while executing the transactions.
But this protocol has drawbacks like cascading rollback and recoverability issues. Suppose transaction T1 has been aborted. But transaction T2 has read the data that T1 has updated. Then the transaction T2 has also to be rolled back. In addition, any other transaction that has read the data updated by T2 has also to be rolled back. This means, any transactions that are dependent on T1 and transactions that are dependent on these dependent transactions should be rolled back. Hence there will be chain of transaction rollbacks. This is called cascading rollback. This is not expected in ideal database. This will be an overhead to the system to keep track of all dependent transactions for all levels. In order to avoid this cascading rollback, three solutions are suggested.
- A transaction should be formed in such a way that its changes are committed only at the end of its process. In addition, no other transaction should be allowed to read the data while write transaction is being processed. i.e.; in above example, T2 should not be allowed to read the data updated by T1, and T1 should commit its changes only at the end of its transaction / processing. It should not commit the transaction in between and proceed with other changes in it. If T1 is aborted, then it should be restarted with new timestamp. Hence the precedence of T1 will change and T2 can continue with its transaction.
To be more accurate about this solution, consider T1 as updating STUDENT table for AGE and Address of one particular student. Let T2 be reading the AGE of same student. Suppose T1 started updating the data, and at the same time T2 also started reading the data. T1 gets aborted as soon as AGE is updated. Then whole of T1 should be aborted – even the changes made to AGE should reflect old AGE now. But T2 has already read new AGE, which is wrong. Hence we have to roll back T2 too. In order to avoid this type of rollback, this solution suggests that, AGE in T1 should not be committed until ADDRESS is also committed. In addition T2 should not be allowed to read AGE until T1 is complete. If T1 is aborted then it should be restarted with new timestamp. Then T2 will continue reading the old AGE.
- Transaction T2 should not be allowed to read the data if the data is held by other transaction for updating. T2 should wait till it is committed or aborted for it to read the data.
- T2 is made commit dependent so that it will read the correct data.
In this protocol, the transaction is executed in three phases – Read & execution, validation and update / write. In the first phase, the transaction T is read and executed. The results of it are written to the temporary variables. These values will be validated against the actual data to see if it violates the serializability. If it is validated, the temporary results are written to the database; otherwise transaction is rolled back. Here we can combine the phases into one like combine validating and writing the data into DB to one process, but all the three phases have to be executed in the same order.
Here each phase has different time stamps: start (T), Validation (T) and Finish (T). In this protocol time stamp for the transaction for serialization is determined by the time stamp of the validation phase, as it is the actual phase which determines, if the transaction will commit or rollback. Hence TS (T) = Validation (T). Hence the serializability is determined at the validation process and cannot be decided in advance. Hence it ensures greater degree of concurrency while executing the transaction and also less number of conflicts. Thus it has less number of rollbacks on transactions.