In simple words, SQL injection is the process of injecting malicious code into the SQL code by the hackers. Usually this kind of hacking is usually seen in websites. For example, in our daily life we use emails, face book, online banking or any other kind of social medias. In all these sites, we might have to enter email address, password or account number etc(these are just an example; information entered can be anything and using which hackers can hack anything in DB). When we enter this information, websites cannot actually process them. It needs to fire SQL query to the background database to validate the identity. Suppose we have entered user name of some social media. What the underlying code of website does is it fires query to background DB to validate if the user exists in DB. It fires query like below:
SELECT USER_NAME, PASSWORD FROM USERS WHERE USER_NAME = ‘XYZ’; --where XYZ is the name entered through website.
Actually website code has parameterized query to fire above query. For simplicity, it is shown as simple DB query. Observe above query carefully. Where can a hacker enter malicious code above so that they can ruin DB or get useful information from DB to hack? Above query is simple SELECT statement retrieving user name and password from a fixed table for entered user name. That means user name is entered by the user and hackers can use this to enter any value and hack information. But how do they know what information to be entered? It is experimental! i.e.; they try entering various values and see the output and error messages from DB to get more information about the DB.
Suppose, in above query they enter one additional single quote at the end of the name. What happens when query is fired to the DB? For any SQL with user input value, when they encounter single quote, it is either beginning or end of the query. However, they have added as user input, so the query would have already started the single quote and would consider hacker’s single quote as end of user name. But the query itself has end quote. That means, in a query we will have three single quotes – last one is extra quote now.
SELECT USER_NAME, PASSWORD FROM USERS WHERE USER_NAME = ‘XYZ’’;
Now, SQL parser will consider this as error. Obviously it is an error and it needs to be handled. Usually, when websites are created, DB errors are not handled with proper error message. It is left to display DB related error messages which is more useful for the hackers than users! That means, in above case, if we had decoded error message into user understandable form of error like “Cannot input single quote as user name”, hackers will not get any information about DB. Here there is not single information about DB table or columns.
Suppose we have not handled this DB error in the webpage. We have left it to display DB error message like below.
Now hacker will know that input values are not cross verified or validated. Hence they assume they can enter any value as input value. They might even check for number of characters that are allowed in the user name field. Once they get an idea about the validation while entering the data, they will start sending the malicious code or data via user name field like below.
SELECT USER_NAME, PASSWORD FROM USERS WHERE USER_NAME = ‘XYZ’;’UPDATE USERS SET PASSWORD = ‘xyz’ WHERE USER_NAME = ‘XYZ’;
What did hackers do here? They have sent long user name via USER_NAME field of web page. What does it contain? A user name and one update statement! Do you think it will execute successfully? Of course yes!!! Why? Hackers have used the trick of single quote here. They have used single quote as user name to end the actual user name and used semicolon to terminate the SELECT statement. After that they have written a UPDATE statement using all the hardcoded values to update the password of same user. What is next then? The user XYZ cannot login with his password; hackers have changed it to new password. Now they can login to his user id with their created password and corrupt the database.
This is how hackers inject the malicious code into the SQL database. Above one is one of the examples of SQL injection.
The first and foremost way of preventing way of preventing SQL injection is by validating each and every input field. In above case, we had not validated the input values, and hence data was corrupted. Each field will have different validations and it may increase the length of the code. But it is not worth neglecting.
In above case, our validation would be checking for the characters entered by the user. Since it is a user name it will allow us to enter any alphanumeric characters including single quote. But when program should be written in such a way that whenever user enters single quote, it should replace them with escape character – a special character used along with single quote to indicate that it is part of the value entered by the user. When single quote is used as value, we can use one more single quote or backslash as escape character to indicate its part of value.
So, when hackers try to enter single quote along with user name entered, it will not throw any error or execute any unwanted result.
SELECT USER_NAME, PASSWORD FROM USERS WHERE USER_NAME = ‘XYZ’’’; -- here error will not be thrown as double single quote is used
Therefore hackers will not get any error when they try to execute above code. So, if they try to send UPDATE statement along with user name, then it will be replaced as below:
SELECT USER_NAME, PASSWORD FROM USERS WHERE USER_NAME = ‘XYZ’’;’’UPDATE USERS SET PASSWORD = ‘’xyz’’ WHERE USER_NAME =’‘XYZ’’;
Since, we have replaced all single quote with double single quote; above query will not consider two statements above. It will consider whole thing as single user name value. It will not get such user name in the database and hence it will not return anything. Therefore, it will not execute the UPDATE statement there and hackers cannot change password.
Above method works well while preventing SQL injection. But do you think hackers will stop hacking once they cannot proceed with single quote? They will still check for the way to hack the data. They might try with double single quote and see if it works. If they are successful, then they know what to do. They will start hacking like below.
SELECT USER_NAME, PASSWORD FROM USERS WHERE USER_NAME = ‘XYZ’’’;’’UPDATE USERS SET PASSWORD = ‘’xyz’’ WHERE USER_NAME =’‘XYZ’’;
They have used one extra single quote soon after the user name in the SELECT query. It will make first. SELECT statement to get executed without any error and consider UPDATE statement as another query. However using escaping character to handle this scenario will solve the problem at this stage, but this process will continuously as the hackers will not stop it.
Escaping characters are not strong enough to handle SQL injection. Therefore, DBs itself provide some library interfaces to handle escaping characters. These library interfaces are strong enough for parsing the string and quote saving the values.
In above methods, hackers were executing the same query multiple times to get the structure and values of the SQL table. They rely on the error messages sent by the DB or the validation process done by the developer. But in blind SQL injection, it does not rely on error messages. It directly injects some malicious codes. That’s why it is called as blind SQL injection.
Suppose a user is suing a website, say SQL.com with his user id 1100. The URL for logging for this website with his user id would be ‘http://www.SQL.com?id=1100’. Here this URL will run the query like below to get logged in.
SELECT * FROM USERS WHERE USER_ID = 1100;
That means it will check the database for the existence of the user id and if exists, then it will load the page for him. Now hackers will use simple boolean expression to get the details. Hacker will add one more condition to the URL like below.
‘http://www.SQL.com?id=1100 AND 1=1’
Hacker will see if the page loads now. If it loads without any issue, he will add below condition to the URL.
‘http://www.SQL.com?id=1100 AND substring (@@version, 1, 1)= 2’
It checks for the version of DB. If it is equal to 2, then it will load the page, else it will not load the page. Hackers will change the version numbers till page loads. This is how they get the version of the DB is being used. They will now try different set of values in boolean expression above like DB name, table name etc and see if page loads. If page loads, they will get the information about the DB. They keep doing this till they get more details to hack.
Since it involves trial and error method, Blind SQL Injection is slower than normal SQL injection method. But this type of SQL injection is also not acceptable and needs to be prevented. It can be prevented by using Prepared Statements as query.
These prepared statements are compiled before actual values are added to the SQL. That means, in the prepared statements, it will compile the query till http://www.SQL.com?id=1100 and it will never consider any values that are added latter to the URL. Therefore there is high impossibility of blind SQL injection.
Both Parameterized query and Prepared Statements are same. These types of queries are used as an embedded SQL in other high level languages. SQL is mainly used for querying database tables and views. But these queries cannot be used or fired by any common users. They need knowledge about SQL. In order to make each and everyone to use the database, there should be some UI or graphical interface to request different queries. These interfaces in turn will convert user request into SQL and query the database. The user interfaces cannot be developed by using SQL, but it needs some other high level languages (like C, Java, Perl etc). These high level languages cannot query databases, but supports some embedded SQLs within it to query the database. These embedded SQLs are known as prepared statements or parameterized queries.
When queries are embedded within other languages cannot be static – a constant query that always returns same result. It should be flexible enough to query the database for any input from the user through the interface. Hence when we write query, it should be written in such a way that it should be able to accept any parameters from the user. Hence these types of queries are called as parameterized query.
These queries are parsed, optimized and compiled within the high level language and then queried in the database. That means these queries are prepared to be executed much before they are fired in the database. They will not be compiled again in the database. Hence they are also known as Prepared Statements.
Below example shows how a prepared statement look like within a C program. ‘EXEC SQL’ indicates the C language compiler that it has to execute a SQL statement. Hence precompiler of C parses and compiles the SQL. But it does not know for which value of EMP_ID it has to execute the SQL query. Hence it waits till user enters some value for EMP_ID and when he enters the value executes the query to get that employee’s details.
EXEC SQL SELECT * FROM EMPLOYEES WHERE EMPLOYEE_ID =:EMP_ID;
One of the main advantages of prepared statement is better performance. Since query is already parsed, optimized and compiled, it can be executed for as many parameter values as users enter. It need not be re-compiled. Hence the time taken to get the result is only for retrieving / searching for the data in the database table. It does not involve the time required for parsing, optimizing and compiling for each value that is passed.
Another advantage of prepared statement is against SQL injection. It prevents the hackers from adding any unwanted / malicious data into the query. Since all the values are parameterized, it accepts only defined length of data and defined type of data. Hence hackers will not be able to add any extra malicious data into the parameter value of the prepared statement.