





















































In this article, by the author, Enrique Amodeo, of the book, Learning Behavior-driven Development with JavaScript, we will look into an advanced concept: how to test a user interface. For this purpose, you will learn the following topics:
(For more resources related to this topic, see here.)
There are two traditional strategies towards approaching the problem of UI testing: record-and-replay tools and end-to-end testing.
The first approach, record-and-replay, leverages the use of tools capable of recording user activity in the UI and saves this into a script file. This script file can be later executed to perform exactly the same UI manipulation as the user performed and to check whether the results are exactly the same. This approach is not very compatible with BDD because of the following reasons:
The other classic approach is end-to-end testing, where we do not only test the UI layer, but also most of the system or even the whole of it. To perform the setup of the tests, the most common approach is to substitute the third-party systems with test doubles. Normally, the database is under the control of the development team, so some practitioners use a regular database for the setup. However, we could use an in-memory database or even mock the DAOs. In any case, this approach prompts us to create an integrated test suite where we are not only testing the correctness of the UI, but the business logic as well.
In the context of this discussion, an integrated test is a test that checks several layers of abstraction, or subsystems, in combination. Do not confuse it with the act of testing several classes or functions together.
This approach is not inherently against BDD; for example, we could use Cucumber.js to capture the features of the system and implement Gherkin steps using WebDriver to drive the UI and make assertions. In fact, for most people, when you say BDD they always interpret this term to refer to this kind of test.
We will end up writing a lot of test cases, because we need to combine the scenarios from the business logic domain with the ones from the UI domain. Furthermore, in which language should we formulate the tests? If we use the UI language, maybe it will be too low-level to easily describe business concepts. If we use the business domain language, maybe we will not be able to test the important details of the UI because they are too low-level. Alternatively, we can even end up with tests that mix UI language with business terminology, so they will neither be focused nor very clear to anyone.
If we want to test whether the UI works, why should we test the business rules? After all, this is already tested in the BDD test suite of the business logic layer. To decide which tests to write, we should first determine the responsibilities of the UI layer, which are as follows:
We do not need to write tests about business rules, and we should not assume much about the business layer itself, apart from a loose contract.
How we should word our tests? We should use a UI-related language when we talk about what the user sees and does. Words such as fields, buttons, forms, links, click, hover, highlight, enable/disable, or show and hide are relevant in this context. However, we should not go too far; otherwise, our tests will be too brittle. Saying, for example, that the name field should have a pink border is too low-level. The moment that the designer decides to use red instead of pink, or changes his mind and decides to change the background color instead of the border, our test will break. We should aim for tests that express the real intention of the user interface; for example, the name field should be highlighted as incorrect.
At this point, we could write tests relevant for our UI using the following testing architecture:
A simple testing architecture for our UI
We can use WebDriver to issue user gestures to interact with the browser. These user gestures are transformed by the browser in to DOM events that are the inputs of our UI logic and will trigger operations on it. We can use WebDriver again to read the resulting HTML in the assertions. We can simply use a test double to impersonate our server, so we can set up our tests easily.
This architecture is very simple and sounds like a good plan, but it is not! There are three main problems here:
As UI testing is very risky and expensive, we should try to code as less amount of tests that interact with the UI as possible. We can achieve this without losing testing power, with the following testing architecture:
A smarter testing architecture
We have now split our UI layer into two components: the view and the UI logic.
This design aligns with the family of MV* design patterns. In the context of this article, the view corresponds with a passive view, and the UI logic corresponds with the controller or the presenter, in combination with the model. A passive view is usually very hard to test; so in this article we will focus mostly on how to do it. You will often be able to easily separate the passive view from the UI logic, especially if you are using an MV* pattern, such as MVC, MVP, or MVVM.
Most of our tests will be for the UI logic. This is the component that implements the client-side validation, orchestration of UI components, navigation, and so on. It is the UI logic component that has all the rules about how the user can interact with the UI, and hence it needs to maintain some kind of internal state.
The UI logic component can be tested completely in memory using standard techniques. We can simply mock the XMLHttpRequest object, or the corresponding object in the framework we are using, and test everything in memory using a single Node.js process. No interaction with the browser and the HTML is needed, so these tests will be blazingly fast and robust.
Then we need to test the view. This is a very thin component with only two responsibilities:
The view should not have more responsibilities, and it is a stateless component. It simply does not need to store the internal state, because it only transforms and transmits information between the HTML and the UI logic. Since it is the only component that interacts with the HTML, it is the only one that needs to be tested using WebDriver.
The point of all of this is that the view can be tested with only a bunch of tests that are conceptually simple. Hence, we minimize the number and complexity of the tests that need to interact with the UI.
Testing the passive view layer is a technical challenge. We not only need to find a way for our test to inject native events into the browser to simulate user interaction, but we also need to be able to inspect the DOM elements and inject and execute scripts. This was very challenging to do approximately 5 years ago. In fact, it was considered complex and expensive, and some practitioners recommended not to test the passive view. After all, this layer is very thin and mostly contains the bindings of the UI to the HTML DOM, so the risk of error is not supposed to be high, specially if we use modern cross-browser frameworks to implement this layer.
Nonetheless, nowadays the technology has evolved, and we can do this kind of testing without much fuss if we use the right tools. One of these tools is Selenium 2.0 (also known as WebDriver) and its library for JavaScript, which is WebDriverJS (https://code.google.com/p/selenium/wiki/WebDriverJs).
In this book, we will use WebDriverJS, but there are other bindings in JavaScript for Selenium 2.0, such as WebDriverIO (http://webdriver.io/). You can use the one you like most or even try both. The point is that the techniques I will show you here can be applied with any client of WebDriver or even with other tools that are not WebDriver.
Selenium 2.0 is a tool that allows us to make direct calls to a browser automation API. This way, we can simulate native events, we can access the DOM, and we can control the browser. Each browser provides a different API and has its own quirks, but Selenium 2.0 will offer us a unified API called the WebDriver API. This allows us to interact with different browsers without changing the code of our tests. As we are accessing the browser directly, we do not need a special server, unless we want to control browsers that are on a different machine.
Actually, this is only true, due some technical limitations, if we want to test against a Google Chrome or a Firefox browser using WebDriverJS.
So, basically, the testing architecture for our passive view looks like this:
Testing with WebDriverJS
We can see that we use WebDriverJS for the following:
Apart from this, we need some extra infrastructure, such as a web server that serves our test HTML page and the components we want to test.
As is evident from the diagram, the commands of WebDriverJS require some network traffic to able to send the appropriate request to the browser automation API, wait for the browser to execute, and get the result back through the network. This forces the API of WebDriverJS to be asynchronous in order to not block unnecessarily. That is why WebDriverJS has an API designed around promises. Most of the methods will return a promise or an object whose methods return promises. This plays perfectly well with Mocha and Chai.
There is a W3C specification for the WebDriver API. If you want to have a look, just visit https://dvcs.w3.org/hg/webdriver/raw-file/default/webdriver-spec.html.
The API of WebDriverJS is a bit complex, and you can find its official documentation at http://selenium.googlecode.com/git/docs/api/javascript/module_selenium-webdriver.html. However, to follow this article, you do not need to read it, since I will now show you the most important API that WebDriverJS offers us.
It is very easy to find an HTML element using WebDriverJS; we just need to use either the findElement or the findElements methods. Both methods receive a locator object specifying which element or elements to find. The first method will return the first element it finds, or simply fail with an exception, if there are no elements matching the locator. The findElements method will return a promise for an array with all the matching elements. If there are no matching elements, the promised array will be empty and no error will be thrown.
How do we specify which elements we want to find? To do so, we need to use a locator object as a parameter. For example, if we would like to find the element whose identifier is order_item1, then we could use the following code:
var By = require('selenium-webdriver').By; driver.findElement(By.id('order_item1'));
We need to import the selenium-webdriver module and capture its locator factory object. By convention, we store this locator factory in a variable called By. Later, we will see how we can get a WebDriverJS instance.
This code is very expressive, but a bit verbose. There is another version of this:
driver.findElement({ id: 'order_item1' });
Here, the locator criteria is passed in the form of a plain JSON object. There is no need to use the By object or any factory. Which version is better? Neither. You just use the one you like most. In this article, the plain JSON locator will be used.
The following are the criteria for finding elements:
driver.findElements(By.tagName('li'));
driver.findElements({ tagName: 'li' });
driver.findElement(By.name('password'));
driver.findElement({ name: 'password' });
driver.findElement(By.className('item'));
driver.findElement({ className: 'item' });
driver.findElement(By.css('.order .item:nth-of-type(2)')); driver.findElement({ css: '.order .item:nth-of-type(2)' });
Using only the CSS selector you can locate any element, and it is the one I recommend. The other ones can be very handy in specific situations.
There are more ways of locating elements, such as linkText, partialLinkText, or xpath, but I seldom use them. Locating elements by their text, such as in linkText or partialLinkText, is brittle because small changes in the wording of the text can break the tests. Also, locating by xpath is not as useful in HTML as using a CSS selector. Obviously, it can be used if the UI is defined as an XML document, but this is very rare nowadays.
In both methods, findElement and findElements, the resulting HTML elements are wrapped as a WebElement object. This object allows us to send an event to that element or inspect its contents. Some of its methods that allow us to manipulate the DOM are as follows:
var Key = require('selenium-webdriver').Key; var searchField = driver.findElement({name: 'searchTxt'}); searchField.sendKeys('BDD with JS', Key.ENTER);
The webdriver.Key object allows us to specify any key that does not represent a character, such as Enter, the up arrow, Command, Ctrl, Shift, and so on. We can also use its chord method to represent a combination of several keys pressed at the same time. For example, to simulate Alt + Command + J, use driver.sendKeys(Key.chord(Key.ALT, Key.COMMAND, 'J'));.
Sometimes, the center of an element is nonclickable, and an exception is thrown! This can happen, for example, with table rows, since the center of a table row may just be the padding between cells!
Apart from sending events to an element, we can inspect its contents with the following methods:
If you really need to be precise about getting an attribute or a property, it is much better to use an injected script to get it.
As you can see, the WebElement API is pretty simple and allows us to do most of our tests easily. However, what if we need to perform some complex interaction with the UI, such as drag-and-drop?
WebDriverJS allows us to define a complex action gesture in an easy way using the DSL defined in the webdriver.ActionSequence object. This DSL allows us to define any sequence of browser events using the builder pattern. For example, to simulate a drag-and-drop gesture, proceed with the following code:
var beverageElement = driver.findElement({ id: 'expresso' });
var orderElement = driver.findElement({ id: 'order' });
driver.actions()
.mouseMove(beverageElement)
.mouseDown()
.mouseMove(orderElement)
.mouseUp()
.perform();
We want to drag an espresso to our order, so we move the mouse to the center of the espresso and press the mouse. Then, we move the mouse, by dragging the element, over the order. Finally, we release the mouse button to drop the espresso.
We can add as many actions we want, but the sequence of events will not be executed until we call the perform method. The perform method will return a promise that will be fulfilled when the full sequence is finished.
The webdriver.ActionSequence object has the following methods:
var Button = require('selenium-webdriver').Button; // to emit the event in the center of the expresso element driver.actions().mouseDown(expresso).perform(); // to make a right click in the current position driver.actions().click(Button.RIGHT).perform(); // Middle click in the expresso element driver.actions().click(expresso, Button.MIDDLE).perform();
The webdriver.Button object defines the three possible buttons of a mouse: LEFT, RIGHT, and MIDDLE. However, note that mouseDown() and mouseUp() only support the LEFT button!
We can use WebDriver to execute scripts in the browser and then wait for its results. There are two methods for this: executeScript and executeAsyncScript.
Both methods receive a script and an optional list of parameters and send the script and the parameters to the browser to be executed. They return a promise that will be fulfilled with the result of the script; it will be rejected if the script failed.
An important detail is how the script and its parameters are sent to the browser. For this, they need to be serialized and sent through the network. Once there, they will be deserialized, and the script will be executed inside an autoexecuted function that will receive the parameters as arguments. As a result of of this, our scripts cannot access any variable in our tests, unless they are explicitly sent as parameters. The script is executed in the browser with the window object as its execution context (the value of this).
When passing parameters, we need to take into consideration the kind of data that WebDriver can serialize. This data includes the following:
With this in mind, we could, for example, retrieve the identifier of an element, such as the following one:
var elementSelector = ".order ul > li"; driver.executeScript( "return document.querySelector(arguments[0]).id;", elementSelector ).then(function(id) { expect(id).to.be.equal('order_item0'); });
Notice that the script is specified as a string with the code. This can be a bit awkward, so there is an alternative available:
var elementSelector = ".order ul > li"; driver.executeScript(function() { var selector = arguments[0]; return document.querySelector(selector).id; }, elementSelector).then(function(id) { expect(id).to.be.equal('order_item0'); });
WebDriver will just convert the body of the function to a string and send it to the browser. Since the script is executed in the browser, we cannot access the elementSelector variable, and we need to access it through parameters. Unfortunately, we are forced to retrieve the parameters using the arguments pseudoarray, because WebDriver have no way of knowing the name of each argument.
As its name suggest, executeAsyncScript allows us to execute an asynchronous script. In this case, the last argument provided to the script is always a callback that we need to call to signal that the script has finalized. The result of the script will be the first argument provided to that callback. If no argument or undefined is explicitly provided, then the result will be null. Note that this is not directly compatible with the Node.js callback convention and that any extra parameters passed to the callback will be ignored. There is no way to explicitly signal an error in an asynchronous way.
For example, if we want to return the value of an asynchronous DAO, then proceed with the following code:
driver.executeAsyncScript(function() { var cb = arguments[1], userId = arguments[0]; window.userDAO.findById(userId).then(cb, cb); }, 'user1').then(function(userOrError) { expect(userOrError).to.be.equal(expectedUser); });
All the commands in WebDriverJS are asynchronous and return a promise or WebElement. How do we execute an ordered sequence of commands? Well, using promises could be something like this:
return driver.findElement({name:'quantity'}).sendKeys('23') .then(function() { return driver.findElement({name:'add'}).click(); }) .then(function() { return driver.findElement({css:firstItemSel}).getText(); }) .then(function(quantity) { expect(quantity).to.be.equal('23'); });
This works because we wait for each command to finish before issuing the next command. However, it is a bit verbose. Fortunately, with WebDriverJS we can do the following:
driver.findElement({name:'quantity'}).sendKeys('23'); driver.findElement({name:'add'}).click(); return expect(driver.findElement({css:firstItemSel}).getText()) .to.eventually.be.equal('23');
How can the preceding code work? Because whenever we tell WebDriverJS to do something, it simply schedules the requested command in a queue-like structure called the control flow. The point is that each command will not be executed until it reaches the top of the queue. This way, we do not need to explicitly wait for the sendKeys command to be completed before executing the click command. The sendKeys command is scheduled in the control flow before click, so the latter one will not be executed until sendKeys is done.
All the commands are scheduled against the same control flow queue that is associated with the WebDriver object. However, we can optionally create several control flows if we want to execute commands in parallel:
var flow1 = webdriver.promise.createFlow(function() { var driver = new webdriver.Builder().build(); // do something with driver here }); var flow2 = webdriver.promise.createFlow(function() { var driver = new webdriver.Builder().build(); // do something with driver here }); webdriver.promise.fullyResolved([flow1, flow2]).then(function(){ // Wait for flow1 and flow2 to finish and do something });
We need to create each control flow instance manually and, inside each flow, create a separate WebDriver instance. The commands in both flows will be executed in parallel, and we can wait for both of them to be finalized to do something else using fullyResolved. In fact, we can even nest flows if needed to create a custom parallel command-execution graph.
Sometimes, it is useful to take some screenshots of the current screen for debugging purposes. This can be done with the takeScreenshot() method. This method will return a promise that will be fulfilled with a string that contains a base-64 encoded PNG. It is our responsibility to save this string as a PNG file. The following snippet of code will do the trick:
driver.takeScreenshot() .then(function(shot) { fs.writeFileSync(fileFullPath, shot, 'base64'); });
Note that not all browsers support this capability. Read the documentation for the specific browser adapter to see if it is available.
WebDriver allows us to control several tabs, or windows, for the same browser. This can be useful if we want to test several pages in parallel or if our test needs to assert or manipulate things in several frames at the same time. This can be done with the switchTo() method that will return a webdriver.WebDriver.TargetLocator object. This object allows us to change the target of our commands to a specific frame or window. It has the following three main methods:
We can dismiss an alert with driver.switchTo().alert().dismiss();.
The promise returned by these methods will be rejected if the specified window, frame, or alert window is not found.
To make tests on several tabs at the same time, we must ensure that they do not share any kind of state, or interfere with each other through cookies, local storage, or an other kind of mechanism.
This article showed us that a good way to test the UI of an application is actually to split it into two parts and test them separately. One part is the core logic of the UI that takes responsibility for control logic, models, calls to the server, validations, and so on. This part can be tested in a classic way, using BDD, and mocking the server access. No new techniques are needed for this, and the tests will be fast. Here, we can involve nonengineer stakeholders, such as UX designers, users, and so on, to write some nice BDD features using Gherkin and Cucumber.js.
The other part is a thin view layer that follows a passive view design. It only updates the HTML when it is asked for, and listens to DOM events to transform them as requests to the core logic UI layer. This layer has no internal state or control rules; it simply transforms data and manipulates the DOM. We can use WebDriverJS to test the view.
This is a good approach because the most complex part of the UI can be fully test-driven easily, and the hard and slow parts to test the view do not need many tests since they are very simple. In this sense, the passive view should not have a state; it should only act as a proxy of the DOM.
Further resources on this subject: