Suppose we have two concurrent transactions T1 and T2, where both are updating data d. Suppose T1 started first and read d for update. As soon as T1 read d, T2 started and read d for its update. As soon as T2 reads d, T1 updates d to d’. Once T1 is complete, T2 updates d to d”. Here T2 is unaware of T1’s update as it has read the data before T1 has updated it. Similarly, T1 is unaware of T2’s updates. What happens to final result and T1’s update here? Which value of d will be final here – d’ or d” ?
Since T2 is unaware of T1’s update and is processed at the last, the updates done by T1 is lost. The updates done by T2 will only be retained. T1’s update is totally lost and nowhere its symptom of update is kept. This type of update is known as lost update.
But T1’s transaction is valid one and cannot be ignored. Its update is also as important as T2’s. Probably if T1’s update might have changed the result of T2’s update (cases like update is dependent on the value of the column that we are updating – d=d*10). Hence we cannot lose the data that are being updated by any transactions. This type of lost update can be prevented if these transactions are grouped and executed serially. Suppose T1 is allowed to read and write d, once it completes write then T2 is allowed to read d, then we will have updates done by T1 as well as T2. The first update will however changed by T2, the update of T1 will be stored in undo log or rollback segment. Hence we will know at least there is some value in between transaction begin (here transaction means group of T1 and T2 together) and end of it (end of T2). Such a grouping of transactions and defining the order of execution is known as scheduling or serialization. This type of execution guarantees isolation of transaction. It will not have any dirty reads, non-repeatable reads, deadlocks or lost update issues.
A schedule is a process of grouping the transactions into one and executing them in a predefined order. A schedule is required in a database because when some transactions execute in parallel, they may affect the result of the transaction – means if one transaction is updating the values which the other transaction is accessing, then the order of these two transactions will change the result of second transaction. Hence a schedule is created to execute the transactions. A schedule is called serial schedule, if the transactions in the schedule are defined to execute one after the other. (Please see article: Transactions for more details on schedules)
A transaction is said to be Serializable if it is equivalent to serial schedule. Consider the two cases below. We can see that irrespective of the order of the execution, the transaction results in same result. Hence they can be scheduled serially and hence these transactions are Serializable. But executing them with overlapping time will result in inconsistent data. But again serializability of transaction depends on the code too.
Depending on the type of schedules, we have two types of serializability: Conflict and View serializability.
Suppose T1 and t2 are two transactions and I1 and I2 are the instructions in T1 and T2 respectively. Then these two transactions are said to be conflict Serializable, if both the instruction access the data item d, and at least one of the instruction is write operation.
In below case, we can see that T1 has set of instructions which modifies X and Y, whereas T2 has instructions to modify X and Z. These two transactions is said to be conflict Serializable since instructions of both transaction modify the value of X (write operation). But instruction to modify Y and Z are not conflicting as they are accessing different data item.
A transaction is said to be non-conflicting, if interchanging the non-conflicting instruction in a transaction does not change the result. In below example WRITE (X) of T1 and READ (X) of T2 are conflicting. But it is modified to non-conflicting Serializable by making series of movement of non conflicting instructions in them as shown below.
Suppose we have transactions as shown below. Here we do not have any non-conflicting instructions which can be moved and made this transaction as non-conflicting.
This is similar to view equivalence schedule. Here two sessions will have same transactions. E.g.: – Session 1 is inserting STUDENT data and Session 2 is inserting student information into OLD_STUDENT table.
If a transaction is view equivalent then
- If transaction T1 in Session S1 reads the data d, then T2 in S2 should also read the data d.
- If T1 in S1 reads the data d produced by another transaction T’, then T2 in S2 should also read the data d produced by T’.
- If T1 in S1 writes data d, then T2 in S2 should also write data d.
If a transaction is conflict Serializable, then it is view Serializable too. But other way is not correct. Let us discuss this in detail in next topic below.