What is a key? Why do we need them in SQL?
A database is composed of tables. Each table has lots of records / rows of data. Each row will contain unique information about any object. That means each table in the database is designed to represent any real world object and it contains information about that particular object. Hence each row in the table is independent of other row in the same table and we need some unique identifier for each row to differentiate them from each other. This unique identifier in each row which uniquely identifies each row is known as key of the table. This key can be single column or combination of columns.
Consider EMPLOYEES table with following columns.
In EMPLOYEES table each row is composed of all these column values. Here EMPLOYEE_ID is the ID of the employee. Then is his name, email, phone number, date on which he is hired to the department, his job id, salary, his commission, manager and his department. When we observe all these columns, we can see that name of each employee is the one which is differentiating them from other. But there can be two employees with same names. Hence we cannot use name as unique column to identify each row. But EMPLOYEE_ID is unique for each employee. No two employees can have same employee id. Hence it can be a key column. Similarly, email id of each employee is unique and we can use it for determining each record in this table. Same is the case with his phone number.
JOB_ID is another ID column, but there can be many employees with same job ids. Hence we cannot use it as unique column to determine each row. Other columns like HIRE_DATE, SALARY, COMMISSION_PCT, MANAGER_ID and DEPARTMENT_ID is also has same problem and cannot be used as key column.
Hence we can use columns EMPLOYEE_ID, EMAIL and PHONE_NUMBER to determine each row in EMPLOYEES table. We need not use combination of these columns as key. Any one of the column is sufficient to determine the rows. Hence all these columns, individually behave like a key column.
Suppose each employee can work in more than one department. Then in EMPLOYEES table we will have more than one entry for him for each of his department. In such case we cannot use EMPLOYEE_ID alone or his EMAIL alone or his PHONE_NUMBER alone to determine his record. It is his department along with any one of these three columns that uniquely determines each record in EMPLOYEES table.
Here Alexander is working for department 30 and 60 as Clerk and IT Programmer. Hence if we say EMPLOYEE_ID = 103, we cannot unique determine his record. It will return both the records. It will not make the search unique. Instead if we make EMPLOYEE_ID and DEPARTMENT_ID as key, then we can uniquely identify each row. Similarly combination of (EMAIL and DEPARTMENT_ID) and (PHONE_NUMBER and DEPARTMENT_ID) uniquely determines each row. That means, in this type of scenario, key is a combination of columns. Hence determining the key in a table depends on the conditions of the table. i.e.; ‘Each employee works for one department’ and ‘each employee can have multiple departments’ makes the difference in selecting the keys. Hence we need to carefully understand the requirement first and then decide the key to get each row uniquely.
What is Simple Key?
Simple key is a key column which is composed of one column of the table. That means only single column of the table uniquely determines each record in the table.
For example, in a EMPLOYEES table, either EMPLOYEE_ID, EMAIL or PHONE_NUMBER can determine each records in EMPLOYEES table (consider that each employee works for only one department). Hence key column for EMPLOYEES table is any one of these columns and these key columns are called as simple key columns.
Similarly, in DEPARTMENTS table, DEPARTMENT_ID alone is enough to determine each record in the table. Hence DEPARTMENT_ID is a key column and is a simple key column.
Can a key have NULL values?
A key is a column or set of columns, that can uniquely identify each record in a table. A NULL value refers to ‘UNKNOWN’ or ‘Nothing’. In SQL, any two NULL values are totally different even though they give the same meaning. But using the same NULL value, we cannot identify any row in a table. This is because, SQL cannot compare NULL = NULL (note here these two nulls are referring to two different values, hence they can never be equal) to get the unique record.
Also, if any two rows having NULLs in their key column, we cannot differentiate them nor compare them. This contradicts with the definition of the key. Hence any key columns cannot have any NULL values in it. In other words, we cannot select any column that can have NULL value as key columns or candidate keys.
However ‘key cannot be NULL’ is the thumb rule, SQL allows foreign key and unique key to be NULL. But making these two key as NULL depends on the situation and requirement.
MANAGER_ID in the EMPLOYEES table is the foreign key from EMPLOYEES table itself. Here the top most manager will not have any manager and his MANAGER_ID value will be NULL.
A primary key, unique key and a foreign key are keys which are composed of one or more columns of the table. They all together give strength to the constraints of the table and make it stronger. But each of them is created on a table to address their own functionalities.
- Primary Key : It is a key column in a table, composed of one or more columns of the same table. This primary key is the key which is used to uniquely identify each record in the table. This is the primary column with which we identify the records.
For example, in EMPLOYEES table, we use EMPLOYEE_ID to identify each and every employee in the department. Hence EMPLOYEE_ID is the unique column in the table which differentiates each employee from other. No two employees can have same ID. Hence it is considered as primary key column.
Similarly, if we consider the DEPARTMENTS table, we can identify each department by its name. But there can be two departments with same name. In order to avoid confusion, we give IDs to each department which is unique. Hence DEPARTMENT_ID in DEPARTMENTS table is considered as primary key. Using this column value we can differentiate each department from other.
- Unique Key : This is also similar to primary key. These are the column or set of columns in the table which can be used to uniquely identify each record from the other. In other words, every unique key CAN BE a primary key (not necessary that all of them have to be primary key; any one of them can be primary key); and a primary key IS A unique key.
For example, in a EMPLOYEES table, we have other columns like email and phone number. For any employee or a person these two values are unique. That means no two employees can have same email or phone number. i.e.; these two fields are unique for each employee. Thus EMAIL and PHONE_NUMBER are unique key columns in EMPLOYEES table. Even EMPLOYEE_ID is unique key column and we have selected this column to identify each record by making it as primary key.
- Foreign Key : In simple words, this is the column from another table present in current table. SQL is a relational database and each table in the DB is related to each other by means of parent-child relationship. Hence in order to represent or identify parent and child tables, we add a primary key column of the parent table into the child table – hence establishing the relationship. This parent key column in the child table is known as foreign key column. Using this column we can identify, to which category or parent, the records in the child table belongs to.
Consider the same example of EMPLOYEES and DEPARTEMNTS. Each employee will be working for one particular department. That means each employee has a parent key column that represents his department. Hence DEPARTMENT_ID in the DEPARTMENTS table is its primary key and in the EMPLOYEES table is its foreign key.
Why do we need Foreign Key? How it helps in achieving referential Integrity?
Foreign Keys helps in establishing the parent-child relationship in a database. This is also known as referential integrity of data. This is very much essential in a database. When we design a database or tables in a DB, we need to map or related the tables with each other. If there is no relationship or mapping between any two tables, they can have any values and there will not be any integrity among any data. There can be redundant data too.
Consider that EMPLOYEES and DEPARTMENTS table are independent and there is no mapping between them. In DEPARTMENTS table we have all the list of departments. In employees table will have all the employees and their departments. Since there is no mapping between DEPARTMENTS and EMPLOYEES, when we insert the data into EMPLOYEES table, there will not be internal check on DEPARTMENTS table for its existence. That means it will allow us to add an employee into the table who is working for non-existing department! This is not acceptable and it will create all wrong data in the system.
Similarly, it will allow us to enter duplicate employees who work for more than one department. This is also not acceptable. Thus creating a mapping between DEPARTMENTS and EMPLOYEES by means of foreign key will stop all above mistakes.
Note that just having DEPARTMENT_ID in both EMPLOYEES and DEPARTMENTS will not create any mapping between them. This is because; compiler will not understand DEPARTMENT_ID in both these tables are same. We need to explicitly mention the relationship between them to have the mapping.
CREATE TABLE EMPLOYEES (EMPLOYEE_ID NUMBER, .... CONSTRAINT "EMP_DEPT_FK" FOREIGN KEY ("DEPARTMENT_ID") REFERENCES "HR"."DEPARTMENTS" ("DEPARTMENT_ID") ENABLE;
This is how we create mapping while creating the table itself. We can also create mapping once all the tables are created like below. Please note that in order to establish a relationship with a parent table, we must create parent table first.
ALTER TABLE EMPLOYEES CONSTRAINT "EMP_DEPT_FK" FOREIGN KEY ("DEPARTMENT_ID") REFERENCES "HR"."DEPARTMENTS" ("DEPARTMENT_ID") ENABLE;
Can a table have multiple Primary Keys?
Primary keys are the primary point of column which is used to identify each record in the table uniquely. If we have multiple primary keys, then there is no point in telling them as primary point of source to select individual record. There can be multiple individual columns in a table which can be used to identify each record uniquely. But we have to select one of them as primary key to identify each record. In a table we can have only one primary key. A primary key can have one column or more than one column.
For example, EMPLOYEE_ID, EMAIL and PHONE_NUMBER are eligible to identify each record uniquely. But we cannot make all of them as primary keys. We have to select either of them as Primary key column. Thus we have selected EMPLOYEE_ID as primary key of EMPLOYEES table.
Can a table have multiple Unique Keys?
Yes, a table can have more than one unique key column. That means, in a table there can be more than one column that can uniquely identify each record. For example, EMPLOYEE_ID, EMAIL and PHONE_NUMBER are unique key columns of EMPLOYEES table.
Can a table have multiple Foreign Keys?
Yes, we can have more than one foreign key in a table. These foreign keys can be from same table or from different tables. In other words, a table can have one or more parent.
Consider the EMPLOYEES table. Like we discussed above, each employee belong to some department. Hence he has one parent as his department. Each employee cannot work individually in any department (it’s very rare). He will have at least one manager. But manager is also an employee. Hence we have to include manager’s EMPLOYEE_ID as MANAGER_ID in the EMPLOYEES table. That means, EMPLOYEES table has foreign key from its own table. In other words, EMPLOYEES table has more than one foreign key.
Can a Foreign Key reference a Non-Primary Key?
Yes, a foreign key can reference to a non-primary key of its parent table provided the column or columns that it is referencing should be a unique key in parent table.
Suppose the column referenced by foreign key is not a unique key. Then there will be duplicate values for that column in the parent table. So when we map it with the child table we will not able to map the correct relationship between them. Suppose EMPLOYEES table is referencing to NUMBER_OF_EMP in the DEPARTMENTS table which is not a unique key column. Then each employee referring to this foreign key cannot predict which department it is belonging to. Imagine DEPARTMENT_NAME is unique (but not primary key) in DEPARTMENTS table. If we have foreign key on this column, then we will not issue in identifying its department.
What is Natural Key?
A Natural key column is the combination of columns in the table that has logical relationship between these columns as well as other columns in the table. In addition, these columns are real columns of the table and are not created for the purpose of making it as a key.
Consider the EMPLOYEES table. Assume that we have considered combination of FIRST_NAME, LAST_NAME and his EMAIL as key. That means, these three columns together used to determine each row. Here all these columns are real – that means it has real and existing data and each of them are related to each other. i.e.; FIRST_NAME is related with LAST_NAME and both of them are related to his EMAIL id. Hence these three columns are natural columns and are logically related. When such combination of columns is used as key, they are called as Natural key.
Consider EMPLOYEE_ID in the EMPLOYEES table is not given by his department or company and we have created this column to determine each record of the table. We enter the values from 100 and is incremented by 1 for each new employee. Even though this is used as key column to determine each row, they are not existing ones in real life. Hence it is not a natural key column.
What is Business Key?
Business key is same as Natural key. It is the single or combination of columns in the table which is used as key. But this column should have real life data. It cannot have any user created data like natural numbers or some dummy strings as key values. It is also called as Domain Key.
For example, DEPARTMENT_NAME contains natural value and can be used to determine each department in the table. It does not contain any dummy numbers.
What is Surrogate Key? Why do we use it?
These are special types of keys which are generated automatically by the system for each row in a table. It is internal to the system but behave like a primary key. It is a number assigned to each row by the system and are not visible to the user.
This key will be present in the system till the row exists. It is helpful when there is continuous changes occurring on the primary or unique key columns, or when there is a change request on the table and applications using it should not get dereferenced.
What are Candidate and Secondary Key?
In a database tables, any column that can be a primary key but not selected it as primary key are called as secondary key. That means, when a table is created, there will be more than one column using which each record can be uniquely determined. All these columns which are eligible for primary key are known as candidate keys. That means, these columns are candidate for become primary key. But in a table there can be only one primary key. Rest of the columns which can be primary key are called as Secondary Key.
Let us consider an example to understand the difference between primary, candidate and secondary keys. Consider the EMPLOYEE table below.
It has EMPLOYEE_ID, EMAIL and PHONE_NUMBER columns. These columns are unique columns for each employee. In other words, no two employees or persons can have same employee id or phone number or email id. Every individual has their own id, email and phones. Hence we can consider any one of these columns as primary key of the table. That means all these three columns are candidate for becoming primary key. Hence these three columns are called as candidate key columns. In daily practice, we consider ID as primary key, because it is easy to transact and query columns with IDs than strings or lengthy columns. Hence here in EMPLOYEES table, we consider EMPLOYEE_ID as primary key. Rest of the candidate key columns – EMAIL and PHONE_NUMBERS are called secondary keys. Thus:
- All the columns that can be a primary key of the table are called as Candidate Key – EMPLOYEE_ID, EMAIL and PHONE_NUMBER.
- The column that uniquely determines each row/record of the table is called as Primary key – EMPLOYEE_ID.
- The candidate key columns that are not marked as primary key are called as Secondary Key – EMAIL and PHONE_NUMBER.
- All primary key and secondary key columns are Candidate Key columns. But any one of the candidate key column is primary key and rest of the candidate key columns are secondary key columns.
What is referential Integrity?
Referential Integrity is a relational database concept. A relational database is made up of relationships / mappings among the tables within it. This relationship among the table is established by using referential integrity. This will help us in inserting the right data and stop us from deleting wrong data from the tables.
Consider the EMPLOYEES and DEPARTEMENTS table. Here DEPARTMENTS table contains all the departments in the company. EMPLOYEES table has all the employees working in different departments of the company. That means each employees should be mapped to any one of the department. Imagine we have not created any mapping between these two tables. Now if we try to insert an employee record into EMPLOYEES table whose department does not exist, then the DB will allow us to enter the employee record.
In another scenario, we want to delete some departments for which we have still employees. If we delete department for which employee exists, then we will have employees belonging to some department which is not valid.
Both of these scenarios will lead to inconsistent data in the database. In order to avoid such inconsistencies in the system, we introduce referential integrity between these two tables by creating the foreign key constraint in the EMPLOYEES table. Now that we have created referential integrity, if we try to enter an employee whose department does not exists in DEPARTMENT table, it will not allow us to enter the record. It will show error message like below.
Suppose we are going to delete one of the department for which there are employees working. Then also it will not allow us to delete department data and show error message like below.
In this way, referential integrity allow us to retain the consistency and integrity in the database.