Write-Ahead Log

We have learnt that logs have to kept in the memory, so that when there is any failure, DB can be recovered using the log files. Whenever we are executing a transaction, there are two tasks – one to perform the transaction and update DB, another one is to update the log files. But when these log files are created – Before executing the transaction, or during the transaction or after the transaction? Which will be helpful during the crash ?

When a log is created after executing a transaction, there will not be any log information about the data before to the transaction. In addition, if a transaction fails, then there is no question of creating the log itself. Suppose there is a media failure, then how a log file can be created? We will lose all the data if we create a log file after the transaction. Hence it is of no use while recovering the data.

Suppose we created a log file first with before value of the data. Then if the system crashes while executing the transaction, then we know what its previous state / value was and we can easily revert the changes. Hence it is always a better idea to log the details into log file before the transaction is executed. In addition, it should be forced to update the log files first and then have to write the data into DB. i.e.; in ATM withdrawal, each stages of transactions should be logged into log files, and stored somewhere in the memory. Then the actual balance has to be updated in DB. This will guarantee the atomicity of the transaction even if the system fails. This is known as Write-Ahead Logging Protocol.

But in this protocol, we have I/O access twice – one for writing the log and another for writing the actual data. This is reduced by keeping the log buffer in the main memory – log files are kept in the main memory for certain pre-defined time period and then flushed into the disk. The log files are appended with data for certain period, once the buffer is full or it reaches the time limit, then it is written into the disk. This reduces the I/O time for writing the log files into the disk.

Similarly retrieving the data from the disk is also needs I/O. This can also be reduced by maintaining the data in the page cache of the main memory. That is whenever a data has to be retrieved; it will be retrieved from the disk for the first time. Then it will be kept in the page cache for the future reference. If the same data is requested again, then it will be retrieved from this page cache rather than retrieving from the disk. This reduces the time for retrieval of data. When the usage / access to this data reduce to some threshold, then it will be removed from page cache and space is made available for other data.