Before understanding what is data backup and how is it recovered, let us see a brief about storage structure in the computer system.
Storage structure is the memory structure in the system. It is mainly divided into two categories:
- Volatile Memory: – These are the primary memory devices in the system, and are placed along with the CPU. These memories can store only small amount of data, but they are very fast. E.g.: – main memory, cache memory etc. these memories cannot endure system crashes- data in these memories will be lost on failure.
- Non-Volatile memory: – These are secondary memories and are huge in size, but slow in processing. E.g.: – Flash memory, hard disk, magnetic tapes etc. these memories are designed to withstand system crashes.
- Stable Memory: – This is said to be third form of memory structure but it is same as non volatile memory. In this case, copies of same non volatile memories are stored at different places. This is because, in case of any crash and data loss, data can be recovered from other copies.
In a computer system we have primary and secondary memory storage. Primary memory storage devices – RAM is a volatile memory which stores disk buffer, active logs, and other related data of a database. It stores all the recent transactions and the results too. When a query is fired, the database first fetches in the primary memory for the data, if it does not exist there, then it moves to the secondary memory to fetch the record. Fetching the record from primary memory is always faster than secondary memory. What happens if the primary memory crashes? All the data in the primary memory is lost and we cannot recover the database.
In such cases, we can follow any one the following steps so that data in the primary memory are not lost.
- We can create a copy of primary memory in the database with all the logs and buffers, and are copied periodically into database. So in case of any failure, we will not lose all the data. We can recover the data till the point it is last copied to the database.
- We can have checkpoints created at several places so that data is copied to the database.
Suppose the secondary memory itself crashes. What happens to the data stored in it? All the data are lost and we cannot recover. We have to think of some alternative solution for this because we cannot afford for loss of data in huge database.
There are three methods used to back up the data in the secondary memory, so that it can be recovered if there is any failure.
- Remote Backup: – Database copy is created and stored in the remote network. This database is periodically updated with the current database so that it will be in sync with data and other details. This remote database can be updated manually called offline backup. It can be backed up online where the data is updated at current and remote database simultaneously. In this case, as soon as there is a failure of current database, system automatically switches to the remote database and starts functioning. The user will not know that there was a failure.
- In the second method, database is copied to memory devices like magnetic tapes and kept at secured place. If there is any failure, the data would be copied from these tapes to bring the database up.
- As the database grows, it is an overhead to backup whole database. Hence only the log files are backed up at regular intervals. These log files will have all the information about the transaction being made. So seeing these log files, database can be recovered. In this method log files are backed up at regular intervals, and database is backed up once in a week.
There are two types of data backup – physical data backup and Logical data backup. The physical data backup includes physical files like data files, log files, control files, redo- undo logs etc. They are the foundation of the recovery mechanism in the database as they provide the minute details about the transactions and modification to the database
Logical backup includes backup of logical data like tables, views, procedures, functions etc. Logical data backup alone is not sufficient to recover the database as they provide only the structural information. The physical data back actually provides the minute details about the database and is very much important for recovery.
A database is a very huge system with lots of data and transaction. The transaction in the database is executed at each seconds of time and is very critical to the database. If there is any failure or crash while executing the transaction, then it expected that no data is lost. It is necessary to revert the changes of transaction to previously committed point. There are various techniques to recover the data depending on the type of failure or crash.
- Transaction Failure: – This is the condition in the transaction where a transaction cannot execute it further. This type of failure affects only few tables or processes. The failure can be because of logical errors in the code or because of system errors like deadlock or unavailability of system resources to execute the transactions.
- System Crash: – this can be because of hardware or software failure or because of external factors like power failure. In most of the cases data in the secondary memory are not affected because of this crash. This is because; the database has lots of integrity checkpoints to prevent the data loss from secondary memory.
- Disk Failure: – these are the issues with hard disks like formation of bad sectors, disk head crash, unavailability of disk etc.
As we have seen already, each transaction has ACID property. In case of transaction failure or system crash, it should maintain its ACID property. Failing to maintain ACID is the failure of database system. That means, any transaction in the system cannot be left at the stage of its failure. It should either be completed fully or rolled back to the previous consistent state.
Suppose there was a transaction on the Student database to enter the marks of a student in 3 subjects and then to calculate his total. Suppose there is a transaction failure, when 3rd mark has been entered into the table. This transaction cannot be left at this stage because student has marks in two subjects already entered. When the system is recovered and total is calculated, it is calculated based on two subject marks, which is not correct. In this case, either the transaction has to be completed fully to enter the 3rd mark and calculate the total, or remove the marks that have entered already. Either completing the transaction fully or revert the transaction fully brings the database into a consistent state and data will not lead to any miscalculation.
We can conclude from above example that
- States of all the executed transaction should be verified. This is necessary to know because if there is any incomplete transaction.
- Any incomplete transaction should either be completed fully or reverted fully to the consistent state of DBMS before the transaction.
- ACID property of transaction should be met in case of failure too.
We can recover the database in two methods:
- Log Based Recovery: – In this method, log of each transaction is maintained in some stable storage, so that in case of any failure, it can be recovered from there to recover the database. But storing the logs should be done before applying the actual transaction on the database.
Every log in this case will have informations like what transaction is being executed, which values have been modified to which value, and state of the transaction. All these log information will be stored in the order of execution.
Suppose there is a transaction to modify the address of a student. Let us see what logs are written for this transaction.
- As soon as transaction is initiated, it writes ‘start’ log.
- When the transaction modifies the address from ‘Troy’ to ‘Fraser Town’, another log is written to the file.
<Tn, ADDRESS, ‘Troy’, ‘Fraser Town’>
- When the transaction is completed, it writes another log to indicate end of the transaction.
There are two methods of creating this log files and updating the database
- Deferred database modification: – In this method, all the logs for the transaction is created and stored into stable storage system first. Once it is stored, the database is updated with changes. In the above example, after all the three log records are created and stored in some storage system, database will be updated with those steps.
- Immediate database modification: – After creating each log record, database is modified for each step of log entry immediately. In the above example, database is modified at each step of log entry i.e.; after first log entry, transaction will hit the database to fetch the record, then second log will be entered followed by updating the address, then the third log followed by committing the database changes.
- Shadow paging: – This is the method where all the transactions are executed in the primary memory. Once all the transactions completely executed, it will be updated to the database. Hence, if there is any failure in the middle of transaction, it will not be reflected in the database. Database will be updated after all the transaction.
Above two methods hold good if there is single transaction like updating the address or so. But what will happen when there are multiple transactions which occur concurrently? Same method of logging the logs can be followed. But since there are concurrent transactions, order and time of each transaction makes a great difference. Failing to maintain the order of transaction will lead to wrong data while recovering. Also, transactions may have number of steps. Maintaining the log for each step will increase the log file size. Again it will become an overhead to maintain a log file along with these transactions. So our goal here should be small log file with easy recovery of data in case of failure. To handle this situation Checkpoints are introduced during the transaction.
Checkpoint acts like a bookmark. During the execution of transaction, such checkpoints are marked and transaction is executed. The log files will be created as usual with the steps of transactions. When it reaches the checkpoint, the transaction will be updated into database and all the logs till that point will be removed from file. Log files then are updated with new steps of transaction till next checkpoint and so on. Here care should be taken to create a checkpoint because, if any checkpoints are created before any transaction is complete fully, and data is updated to database, it will not meet the purpose of the log file and checkpoint. If checkpoints are created when each transaction is complete or where the database is at consistent state, then it will be useful.
Suppose there are 4 concurrent transactions – T1, T2, T3 and T4. A checkpoint is added at the middle of T1 and there is failure while executing T4. Let us see how a recovery system recovers the database from this failure.
- It starts reading the log files from the end to start, so that it can reverse the transactions. i.e.; it reads log files from transaction T4 to T1.
- Recovery system always maintains undo log and redo log. The log entries in the undo log will be used to undo the transactions where as entries in the redo list will be re executed. The transactions are put into redo list if it reads the log files with (<Tn, Start>, <Tn, Commit>) or <Tn , Commit>. That means, it lists all the transactions that are fully complete into redo list to re execute after the recovery. In above example, transactions T2 andT3 will have (<Tn, Start>, <Tn, Commit>) in the log file. The transaction T1 will have only <Tn, Commit> in the log file. This because, the transaction is committed after the checkpoint is crossed. Hence all the logs with<Tn, Start>, are already written to the database and log file is removed for those steps. Henceit puts T1, T2 and T3 into redo list.
- The logs with only <Tn, Start> are put into undo list because they are not complete and can lead to inconsistent state of DB. In above example T4 will be put into undo list since this transaction is not yet complete and failed amid.
This is how a DBMS recovers the data incase concurrent transaction failure.