So far we have seen what a database is, what is the features of database, how to gather requirements and how to put them in ER diagrams, how to convert them into tables and their columns, set their constraints etc. Once we have database ready users will start using them. But how will they access the database? Most of the time they access the data by using some applications. These applications will communicate to database by SQL and DBMS is responsible for managing the application and SQL intact. SQL has its own querying methods to interact with database. But how these queries work in the database? These queries work similar to relational algebra that we have in mathematics. In database we have tables participating in relational algebra.
Relational Query Languages
Relational query languages use relational algebra to break the user requests and instruct the DBMS to execute the requests. It is the language by which user communicates with the database. These relational query languages can be procedural or non-procedural.
Procedural Query Language
A procedural query language will have set of queries instructing the DBMS to perform various transactions in the sequence to meet the user request. For example, get_CGPA procedure will have various queries to get the marks of student in each subject, calculate the total marks, and then decide the CGPA based on his total marks. This procedural query language tells the database what is required from the database and how to get them from the database. Relational algebra is a procedural query language.
Non-Procedural Query Language
Non-procedural queries will have single query on one or more tables to get result from the database. For example, get the name and address of the student with particular ID will have single query on STUDENT table. Relational Calculus is a non procedural language which informs what to do with the tables, but doesn’t inform how to accomplish this.
These query languages basically will have queries on tables in the database. In the relational database, a table is known as relation. Records / rows of the table are referred as tuples. Columns of the table are also known as attributes. All these names are used interchangeably in relational database.
Relational algebra is a procedural query language. It takes one or more relations / tables and performs the operation and produce the result. This result is also considered as a new table or relation. Suppose we have to retrieve student name, address and class for the given ID. What a relational algebra will do in this case is, it filters the name, address and class from the STUDENT table for the input ID. In mathematical terms, relational algebra has produced a subset of STUDENT table for the given ID.
Relational algebra will have operators to indicate the operations. This algebra can be applied on single relation – called unary or can be applied on two tables – called binary. While applying the operations on the relation, the resulting subset of relation is also known as new relation. There can be multiple steps involved in some of the operations. The subsets of relations at the intermediary level are also known as relation. We will understand it better when we see different operations below.
Relational Algebra in DBMS has 6 fundamental operations. There are several other operations defined upon these fundamental operations.
Select (σ) - This is a unary relational operation. This operation pulls the horizontal subset (subset of rows) of the relation that satisfies the conditions. This can use operators like <, >, <=, >=, = and != to filter the data from the relation. It can also use logical AND, OR and NOT operators to combine the various filtering conditions. This operation can be represented as below:
σ p (r)
Where σ is the symbol for select operation, r represents the relation/table, and p is the logical formula or the filtering conditions to get the subset. Let us see an example as below:
σSTD_NAME = “James” (STUDENT)
What does above relation algebra do? It selects the record/tuple from the STUDENT table with Student name as ‘James’
σdept_id = 20 AND salary>=10000 (EMPLOYEE) - Selects the records from EMPLOYEE table with department ID = 20 and employees whose salary is more than 10000.
Project (∏) - This is a unary operator and is similar to select operation above. It creates the subset of relation based on the conditions specified. Here, it selects only selected columns/attributes from the relation- vertical subset of relation. The select operation above creates subset of relation but for all the attributes in the relation. It is denoted as below:
∏a1, a2, a3 (r)
Where ∏ is the operator for projection, r is the relation and a1, a2, a3 are the attributes of the relations which will be shown in the resultant subset.
∏std_name, address, course (STUDENT) - This will select all the records from STUDENT table but only selected columns – std_name, address and course. Suppose we have to select only these 3 columns for particular student then we have to combine both project and select operations.
∏STD_ID, address, course (σ STD_NAME = “James”(STUDENT)) - this selects the record for ‘James’ and displays only std_ID, address and his course columns. Here we can see two unary operators are combined, and it has two operations performing. First it selects the tuple from STUDENT table for ‘James’. The resultant subset of STUDENT is also considered as intermediary relation. But it is temporary and exists till the end of this operation. It then filters the 3 columns from this temporary relation.
Rename (ρ) - This is a unary operator used to rename the tables and columns of a relation. When we perform self join operation, we have to differentiate two same tables. In such case rename operator on tables comes into picture. When we join two or more tables and if those tables have same column names, then it is always better to rename the columns to differentiate them. This occurs when we perform Cartesian product operation.
Where ρ is the rename operator, E is the existing relation name, and R is the new relation name.
ρ STUDENT (STD_TABLE) – Renames STD_TABLE table to STUDENT
Let us see another example to rename the columns of the table. If the STUDENT table has ID, NAME and ADDRESS columns and if they have to be renamed to STD_ID, STD_NAME, STD_ADDRESS, then we have to write as follows.
ρ STD_ID, STD_NAME, STD_ADDRESS(STUDENT) – It will rename the columns in the order the names appear in the table
Cartesian product (X)
Cartesian product (X): - This is a binary operator. It combines the tuples of two relations into one relation.
Where R and S are two relations and X is the operator. If relation R has m tuples and relation S has n tuples, then the resultant relation will have mn tuples. For example, if we perform cartesian product on EMPLOYEE (5 tuples) and DEPT relations (3 tuples), then we will have new tuple with 15 tuples.
EMPLOYEE X DEPT
This operator will simply create a pair between the tuples of each table. i.e.; each employee in the EMPLOYEE table will be mapped with each department in DEPT table. Below diagram depicts the result of cartesian product.
Union (U) - It is a binary operator, which combines the tuples of two relations. It is denoted by
R U S
Where R and S are the relations and U is the operator.
DESIGN_EMPLOYEE U TESTING_EMPLOYEE
Where DESIGN_EMPLOYEE and TESTING_EMPLOYEE are two relations.
It is different from cartesian product in:
- Cartesian product combines the attributes of two relations into one relation whereas Union combines the tuples of two relations into one relation.
- In Union, both relations should have same number of columns. Suppose we have to list the employees who are working for design and testing department. Then we will do the union on employee table. Since it is union on same table it has same number of attributes. Cartesian product does not concentrate on number of attribute or rows. It blindly combines the attributes.
- In Union, both relations should have same types of attributes in same order. In the above example, since union is on employee relation, it has same type of attribute in the same order.
It need not have same number of tuples in both the relation. If there is a duplicate tuples as a result of union, then it keeps only one tuple. If a tuple is present in any one relation, then it keeps that tuple in the new relation. In the above example, number of employees in design department need not be same as employees in testing department. Below diagram shows the same. We can observe that it combines the table data in the order they appear in the table.
We would not able to join both these tables if the order of columns or the number of columns were different.
Set-difference (-) - This is a binary operator. This operator creates a new relation with tuples that are in one relation but not in other relation. It is denoted by ‘-‘symbol.
R – S
Where R and S are the relations.
Suppose we want to retrieve the employees who are working in Design department but not in testing.
There are additional relational operations based on the above fundamental operations. Some of them are:
Set Intersection - This operation is a binary operation. It results in a relation with tuples that are in both the relations. It is denoted by ‘∩ ‘.
Where R and S are the relations. It picks all the tuples that are present in both R and S, and results it in a new relation.
Suppose we have to find the employees who are working in both design and testing department. If we have tuples as in above example, the new result relation will not have any tuples. Suppose we have tuples like below and see the new relation after set difference.
This set intersection can also be written as a combination of set difference operations.
R ∩ S R-(R-S)
i.e.; it evaluates R-S to get the tuples which are present only in R and then it gets the record which are present only in R but not in new resultant relation of R-S.
In above example of employees,
DESIGN_EMPLOYEE – (DESIGN_EMPLOYEE – TESTING_EMPLOYEE)
It first filters only those employees who are only design employees – (104, Kathy). This result is then used to find the difference with design employee. This will find those employees who are design employees but not in new result – (100, James). Thus it gives the result tuple which is both designer and tester. We can see here fundamental relational operator is used twice to get set intersection. Hence this operation is not fundamental operation.
Assignment - As the name indicates, the assignment operator ‘’ is used to assign the result of a relational operation to temporary relational variable. This is useful when there is multiple steps in relational operation and handling everything in one single expression is difficult. Assigning the results into temporary relation and using this temporary relation in next operation makes task simple and easy.
TS – denotes relation S is assigned to temporary relation T
A relational operation ∏a1, a2 (σ p (E)) with selection and projection can be divided as below.
T σ p (E)
S ∏a1, a2 (T)
Our example above in projection for getting STD_ID, ADDRESS and COURSE for the Student ‘James’ can be re-written as below.
∏STD_ID, address, course (σ STD_NAME = “James”(STUDENT))
T σ STD_NAME = “James”(STUDENT)
S ∏STD_ID, address, course (T)
Natural join - As we have seen above, cartesian product simply combines the attributes of two relations into one. But the new relation will not have correct tuples. It has only combinations of tuples. In order to get the correct tuples, we have to use selection operation on the cartesian product result. This set of operations – cartesian product followed by selection – is combined into one relation called natural join. It is denoted by ∞
Suppose we want to select the employees who are working for department 10. Then we will perform the cartesian product on the EMPLOYEES and DEPT and find the DEPT_ID in both relations matching to 10. The same is done with natural join as
σ EMPLOYEE.DEPT_ID = DEPT>DEPT_ID AND EMPLOYEE.DEPT_ID = 10(EMPLOYEE X DEPT)
Same can be written using natural join as EMPLOYEE ∞ DEPT
From the above example, we see that only the matching data from both the relations are retained in the final relation. Suppose we want to retain all the information from first relation and the corresponding information from the second relation irrespective of if it exists or not. In such case we use outer join. This join makes sure all the combinations of tuples are shown in correct way. Unlike cartesian product, this join make sure that to create a tuple from both the table if there exists right match for them, and if there is no match null is added to those attribute. Let see them in below types of outer join.
There are three types of outer joins
Left Outer Join
Left outer join - In this operation, all the tuples in the left hand side relation is retained. All matching attribute in the right hand relation is displayed with values and the ones which do not have value are shown as NULL.
Below example of left outer join on DEPT and EMPLOYEE table combines the matching combination of DEPT_ID = 10 with values. But DEPT_ID = 30 does not have any employees yet. Hence it displays NULL for those employees. Thus this outer join makes more meaningful to combining two relations than a cartesian product.
Right outer join
Right outer join - This is opposite of left outer join. Here all the attributes of right hand side is retained and it matching attribute in left hand relation is found and displayed. If no matching is found then null is displayed. Same above example is re-written to understand this as below:
Notice the order and column difference in both the cases.
Full Outer Join
Full outer join - This is the combination of both left and right outer join. It displays all the attributes from both the relation. If the matching attribute exists in other relation, then that will be displayed, else those attributes are shown as null.
Hope above diagram is self explanatory.
Division - This operation is used to find the tuples with phrase ‘for all’. It is denoted by ‘÷’. Suppose we want to see all the employees who work in all of departments. What are the steps involved to find this?
- First we find all the department ID - T1 ∏DEPT_ID (DEPARTMENT)
- Next step is list all the employees and their departments – T2 ∏ EMP_ID, DEPT_ID (EMPLOYEE)
In third step we will find the employees in T2 with the entire department ID in T1. This is obtained by using division operation – T2 ÷ T1