File Organization in Database - Types of File Organization in DBMS

Introduction

As we have seen already, database consists of tables, views, index, procedures, functions etc. The tables and views are logical form of viewing the data. But the actual data are stored in the physical memory. Database is a very huge storage mechanism and it will have lots of data and hence it will be in physical storage devices. In the physical memory devices, these datas cannot be stored as it is. They are converted to binary format. Each memory devices will have many data blocks, each of which will be capable of storing certain amount of data. The data and these blocks will be mapped to store the data in the memory.

Any user who wants to view these data or modify these data, simply fires SQL query and gets the result on the screen. But any of these queries should give results as fast as possible. But how these data are fetched from the physical memory? Do you think simply storing the data in memory devices give us the better results when we fire queries? Certainly not. How is it stored in the memory, Accessing method, query type etc makes great affect on getting the results. Hence organizing the data in the database and hence in the memory is one of important topic to think about.

Types of File Organization

In a database we have lots of data. Each data is grouped into related groups called tables. Each table will have lots of related records. Any user will see these records in the form of tables in the screen. But these records are stored as files in the memory. Usually one file will contain all the records of a table.

As we saw above, in order to access the contents of the files – records in the physical memory, it is not that easy. They are not stored as tables there and our SQL queries will not work. We need some accessing methods. To access these files, we need to store them in certain order so that it will be easy to fetch the records. It is same as indexes in the books, or catalogues in the library, which helps us to find required topics or books respectively.

Storing the files in certain order is called file organization. The main objective of file organization is

Optimal selection of records i.e.; records should be accessed as fast as possible.
Any insert, update or delete transaction on records should be easy, quick and should not harm other records.
No duplicate records should be induced as a result of insert, update or delete
Records should be stored efficiently so that cost of storage is minimal.

There are various methods of file organizations. These methods may be efficient for certain types of access/selection meanwhile it will turn inefficient for other selections. Hence it is up to the programmer to decide the best suited file organization method depending on his requirement.

Some of the file organizations are

Sequential File Organization
Heap File Organization
Hash/Direct File Organization
Indexed Sequential Access Method
B+ Tree File Organization
Cluster File Organization

Let us see one by one on clicking the above links

Difference between Sequential, heap/Direct, Hash, ISAM, B+ Tree, Cluster file organization in database management system (DBMS) as shown below:

	Sequential	Heap/Direct	Hash	ISAM	B+ tree	Cluster
Method of storing	Stored as they come or sorted as they come	Stored at the end of the file. But the address in the memory is random.	Stored at the hash address generated	Address index is appended to the record	Stored in a tree like structure	Frequently joined tables are clubbed into one file based on cluster key
Types	Pile file and sorted file Method		Static and dynamic hashing	Dense, Sparse, multilevel indexing		Indexed and Hash
Design	Simple Design	Simplest	Medium	Complex	Complex	Simple
Storage Cost	Cheap (magnetic tapes)	Cheap	Medium	Costlier	Costlier	Medium
Advantage	Fast and efficient when there is large volumes of data, Report generation, statistical calculations etc	Best suited for bulk insertion, and small files/tables	Faster Access No Need to Sort Handles multiple transactions Suitable for Online transactions	Searching records is faster. Suitable for large database. Any of the columns can be used as key column.Searching range of data & partial data are efficient.	Searching range of data & partial data are efficient. No performance degrades when there is insert / delete / update. Grows and shrinks with data. Works well in secondary storage devices and hence reducing disk I/O. Since all datas are at the leaf node, searching is easy. All data at leaf node are sorted sequential linked list.	Best suited for frequently joined tables. Suitable for 1:M mappings
Disadvantage	Sorting of data each time for insert/delete/ update takes time and makes system slow.	Records are scattered in the memory and they are inefficiently used. Hence increases the memory size. Proper memory management is needed. Not suitable for large tables.	Accidental Deletion or updation of Data Use of Memory is inefficient Searching range of data, partial data, non-hash key column, searching single hash column when multiple hash keys present or frequently updated column as hash key are inefficient.	Extra cost to maintain index. File reconstruction is needed as insert/update/delete. Does not grow with data.	Not suitable for static tables	Not suitable for large database. Suitable only for the joins on which clustering is done. Less frequently used joins and 1: 1 Mapping are inefficient.