What is Selenium

Introduction to Selenium

Selenium is an open source suite of test automation tools which enables effective test automation of GUI of web applications. It can automate web-applications on multi-browser and multi-OS-platforms. It was developed by Jason Huggins in 2004.  It is an open source tool which implies it is free to download and use. Selenium has many projects that combine to make it a versatile testing system.

History of Selenium evolution

Selenium originally consisted of four tools as shown in the diagram below:

  • Selenium IDE
  • Selenium RC
  • Selenium Webdriver
  • Selenium Grid

Selenium Tools

Let us discuss the tools of selenium suite

  • Selenium IDE: Selenium IDE is an integrated development environment which is available as a Firefox add-on and Chrome extension. It was developed by Shinya Kasatani. It is primarily record-and-play tool which can automate functional tests. Automatically generated test scripts can be edited as per the testing needs. It is most suited in creating relatively simple test cases and test suites.
  • Selenium Remote Control (RC): Selenium RC tool was developed by Paul Hammant as a server to act as an HTTP proxy to show to the browser that Selenium Core and web application under test belong to same domain to overcome the limitation of same origin policy. Later on, it became an automation tool to write automated UI tests in any programming language. It consists of a server which receives test commands from test programs and drives the browser. Selenium RC sits between the browser and AUT(application under test). Selenium RC Server “injects” a Javascript program called Selenium Core into the browser to command the browser. Selenium RC is also referred to as Selenium 1.

Note: The sameorigin policy is an important concept in the web application security model. Under the policy, a web browser permits scripts contained in a first web page to access data in a second web page, but only if both web pages have the same origin.

So in the case of javascript injection called Selenium core to browser code, Selenium core cannot access web elements of the web application under test as they belong to a different domain. To overcome this, HTTP proxy by Selenium server is used to show browser Selenium core and application under test belong to the same domain.

  • Selenium Grid: Selenium Grid is used to run tests on different machines on different browsers in parallel. It forms a hub node model using which test can be executed in a distributed environment. It helps save test execution time. Hub acts as a source of Selenium commands to each node connected to it.
  • Selenium WebDriver: Selenium Webdriver is an object-oriented API which is capable of driving browser natively as any browser user would do.  WebDriver was developed by Simon Stewart in 2006 to overcome javascript injection limitation of Selenium RC. It allows test scripts to communicate directly to the browser. Webdriver is an interface in java which is implemented by ChromeDriver, FirefoxDriver, InternetExplorerDriver, SafariDriver, EventFiringWebDriver, HtmlUnitDriver, PhantomJSDriver, RemoteWebDriver classes. It is the first web applications automation tool which operates from OS level and aims at providing real user like interaction with an application under test making testing more realistic.

Selenium 2

Selenium 2 is the most widely used version of Selenium which combines Selenium RC and Webdriver together. Selenium 2.0 maintains two major components, Selenium RC and Web driver APIs functioning altogether. Web Driver APIs are used to write automation scripts for any browser by simply using the suitable driver for any browser.

Selenium 2 = Selenium 1(RC) + WebDriver

Some of the key updates in Selenium 2 were:

  • WebDriver designed in a simpler and more concise programming interface along with addressing some limitations in the Selenium-RC API.
  • WebDriver became a compact Object Oriented API when compared to Selenium1.0
  • It can drive the browser much more effectively and it overcame the limitations of Selenium 1 which affected our functional test coverage, like the file upload or download, pop-ups and dialogs barrier
  • WebDriver overcame the limitation of Selenium RC’s Single Host origin policy

Selenium 3

Selenium 3 is the latest version of Selenium which aims at providing one stop solution for test suite automation of both web-based and mobile applications. Selenium Webdriver became W3C standard with the introduction of this version. Also, Selenium RC has been deprecated and is added to selenium legacy now. It uses the Selenium server which has built-in grid capabilities for distributed multi-browser multi-platform execution of test suites.

This was a brief history of how Selenium grew into its current form of becoming a leading automation tool. Now let us understand the reasons behind its wide acceptance and popularity.

Why Selenium is Popular

  • Open Source: Selenium is an open source freeware. It is free to download and use. It is well supported by its community with frequent latest updates.
  • Language Support: Selenium has libraries made for many programming languages. The tests can be developed in different languages like Java, JavaScript, Python, Ruby, R, C#, PHP and Perl. Hence, the developer can pick any language of his choice to automate.
  • Ease of Framework: Selenium does not force a testing framework. So native testing frameworks available for the programming language can be used. It gives flexibility to developers to write custom tests as per the requirements of the project. Selenium tests can be well integrated with the development framework of the project. It offers flexibility to be integrated with tools such as TestNG & JUnit for managing test cases and generating reports. Also, it can be integrated with build tools like Maven, CI/CD tools like Jenkins & Docker to achieve Continuous Testing.
  • Multi-browser support: Selenium tests once written can be executed on any browser like Chrome, Firefox, IE, Safari. So it’s a multi-browser automation tool.
  • Multi-platform support: Selenium tests are valid across multiple platforms. Same tests can be executed on windows, mac, Ubuntu operating systems.
  • Parallel execution: Parallel and distributed execution of tests is well supported.
  • Multi-device execution: Selenium tests can be executed on mobiles devices like Android, iPhone, and Blackberry as well as on Windows, Mac and Linux OS.
  • Allows Multi-tasking: Selenium tests can be executed with a browser window minimized while the developer can work on other tasks.

Limitations of Selenium Webdriver

  • Desktop applications: Selenium cannot automate desktop or windows based applications.
  • Local system interaction: In case the browser needs to interact with the local system, third-party tools have to be used.
  • Reporting: For developing a good report it needs integration with TestNG or cucumber.
  • OS-Based-Popups: Selenium cannot handle OS generated pop-ups. We need to use a third-party tool like Auto-It to handle Windows-based pop-ups.
  • Captcha Handling: Captcha cannot be handled using Selenium.

Comparison of HP QTP and Selenium

HP QTP or quick test professional is another front runner test automation tool in the software industry. It has its advantages and disadvantages over Selenium. Let us have a look at them in the below table.

QTPSelenium
QTP is a licensed softwareIt is open source
It can automate desktop and web applicationsIt can automate only web applications
Support is dedicated to usersIt’s a development community but no dedicated user support
Can run across IE, chrome, and FireFox onlyCan run across many browsers and continuously enhanced for new browsers
Can run only on windowsIt can automate on Windows, Mac and Linux OS
The browser cannot be minimized to while running testsThe browser can be minimized
For parallel execution, QC is required.It can execute tests in parallel using Selenium Grid
QTP automates is faster than selenium as it is full-fledged IDESelenium needs complete environment set up to work on. So it automates at a slower rate during the initial setup.
No programming knowledge requiredIt requires programming knowledge.

Why Selenium Web driver is preferred over IDE

  • IDE is a record-and-play tool available only for firefox and chrome.
  • It can automate simple functional tests with limited customization possible.
  • Iterations and conditional operations are not supported by IDE.
  • Webdriver is an exhaustive automation API which allows writing from simple to most complex test cases.

Why Selenium WebDriver is preferred over RC

  • Selenium RC way of working is more complicated than that of Webdriver. Selenium RC server needs to be started.
  • It injects javascript code called Selenium core into the browser to execute test cases. This leads to the same origin policy issues. Selenium core receives commands sent by Selenium Server and runs them as javascript commands.
  • Selenium server receives the response from the Selenium core and will display the results. This client-server model of working makes RC slower than Webdriver.
  • Selenium WebDriver executes test commands natively to the browsers. Hence it is faster and more realistic. The same origin policy issue is rectified in Webdriver.

What is the internal architecture of Selenium Webdriver

The key driving feature of Webdriver architecture is it operates browsers natively. Browser drivers are built in the language best suited for the browser and developers use the wrapper around the drivers. It’s a precisely designed object-oriented API which drives browsers while directly interacting with Application under test.

The main components of Selenium architecture are

  • Selenium Language Bindings: These are the built-in client libraries in Selenium code base to assist programming in multiple programming languages. This makes Selenium very flexible and strong.
  • JSON Wire Protocol: It an industry standard transport mechanism used to deliver commands from client to server.
  • Browser Drivers: There is a separate driver specific to each browser and each language binding. Browser driver receives the command for execution from selenium script and conveys it to the browser for execution. It also receives the response after execution in the form of HTTP response and sent back to the script.
  • Real Browsers: Different browsers supported by Selenium like Chrome, Firefox, Internet Explorer and Safari

Key features of WebDriver architecture

  • Selenium WebDriver is developed as a Layered Design to enable usage of best suitable language for every browser.
  • Drivers for each browser like chromedriver, geckodriver are built using the best fitting programming language and developers just see the wrapper around them.
  • Web Driver is designed as Object Oriented API which directly interacts with the Application under tests and does not create intermittent entities.
  • WebDriver utilizes the browser native compatibility to automate. It operates from the OS level and aims to work like real application under test user.

The details of all the packages, interfaces and classes present in Selenium Webdriver can be obtained from its java docs from the link given below

https://seleniumhq.github.io/selenium/docs/api/java/

Conclusion

  • Selenium is a set of test automation tools consisting of Selenium IDE which is record and play tool and Selenium Webdriver which is capable of creating robust browser-based test suites, scaling and distributing them across environments and devices.
  • Selenium is an open source automation tool to automate web applications. It aims to provide friendly API for automation of web applications. Selenium Webdriver uses Selenium server which has built-in grid capabilities.
  • Webdriver is a Java Interface (which defines methods but needs classes to implement them). Java Classes which implement Webdriver interface are AndroidDriver, AndroidWebDriver, ChromeDriver, EventFiringWebDriver, FirefoxDriver, HtmlUnitDriver, InternetExplorerDriver, IPhoneDriver, IPhoneSimulatorDriver, RemoteWebDriver, SafariDriver. These are the individual browser drivers.
  • Selenium is used for automating GUI functional testing of web applications. It can also be used to automate other repetitive administrative tasks.
Translate »