Low-Maintenance Selenium RC Web App Test Code

In my rank procedural coding days, I might have written through-the-web-app-GUI test code that looks like this (we’re using TestNG here, though at first it may look like Junit 4):

This code runs fine. And with a bit of probing (including reading comments), one can sort of tell what it does. It sets up some Selenium machinery, asks selenium to enter some values in 3 fields identified via xpath, then verifies that another field (looks like a text field) contains an expected string. It then tears down the Selenium machinery.

If one knows Refactoring in a procedural sort of way, which is to say, knowing enough to eliminate rank setup duplication within a TestCase class (to use the Junit term), one might then extract the blocks of setup and teardown code to appropriate methods, plus a single private helper method, leaving us, in the test method, with this:

The Expressiveness Problem

Aside from the test method name, we still cannot much tell what this test actually tests on the page. And in this case, the test method semantics do not help us understand. All we have are the Selenium semantics for marching around the DOM: (“Put this value in this field; check that this other field’s value is such and such.”).

“Lingual Design”

There are abstraction layers trying to emerge here — separate groups of classes and methods.

And you know what? I hate that term,  ”abstraction layer.” I’ve always hated it, and several others nearby it (API, command-query separation, package cohesion, blah, blah). I want a term that describes the lingual expressiveness of a cohesive set of classes and methods.

So I’ll invent (with a nod to Mike Hill and Dave LeBlanc, who helped me with this invention) a couple of new terms: Lingo, and Lingual Design (stay tuned, later, for a Lingual Design blog post). Think of a Lingo as a small, bounded vernacular language, expressed as the classes and methods (what we usually call the API) of a discrete bit of solution domain (see a bit more definition below). I know. I’m not being much clearer yet than your average IEEE paper on C++ idioms published in 1993. But bear with me. I strive for clarity here. I really do.

For now, I’ll call each of the abstraction layers trying to emerge here a Lingo. See if you can grasp what I mean by that, as I march through the code, pulling it apart.

The topmost Lingo  trying to emerge here is the vernacular we use naturally when we describe to each other how to use the web app:  “You go to the triangle testing page, then you enter the values for the sides of the triangle, and it immediately tells you what kind of triangle you have.”  So far our test expresses very little of that lingo.

Highest-Level Lingo: Operating the Web App as a User

Here is a test method that accomplishes exactly what the one above does, but with very different semantics/lingo:

It’s clearer at first reading what we are testing. The lingo used by the test method are those of the highest-level slang used by people when they talk about operating and testing the web application: “First check that we can enter the 3 sides for a Right triangle, then verify that the triangle label indeed reads ‘Right’.” The trianglePage object and methods accessible to the test describe the behavior of the web page involved. We might quibble about how much or how little of the behavior on the page has been verified, but the semantics are now at the right level of abstraction. We are mostly sticking to an appropriate lingo. We aren’t yet talking about what kinds of HTML elements are being manipulated (its own lingo), much less the Selenium “selenese” lingo for manipulating them.

Second Level Lingo: Operating the Controls on the Web Page

As with any OOD, this highest-level Lingo encapsulates and hides details that are not useful in the test method code. Here is a peek at the area of the code that provides this encapsulation. Under the test method’s covers, there is a TriangleTestHomePage class:

I’ve omitted a few private methods for brevity and clarity. Here is our second-level page-control Lingo, hiding inside methods  ”canSpecifyTriangle()” and “triangleTypeLabelReads()”: a TextField “canEnterText()”. A TextLabel “reads()” as expected or not. (Note: these are not distant third-party-library page control classes. They are our own facades; more on this below.)

Notice a few other things about this class. One is that our TriangleTestHomePage knows nothing about Selenium; it is mostly decoupled from it (we could easily sever the rest of the coupling with a PageControl interface).

Another point is that this page class is autonomous and self verifying, to at least some degree. It knows its own URL and navigates Selenium there. It knows its own expected page title, and the base BasePageContainer class contains a verifyTitleIsCorrect() method that asserts that the actual page title matches the expected one passed to it.

Third Level Lingo: Selenium “Selenese” for Manipulating Page Controls

Here is the TextField class:

Within the canEnterText() method, we finally begin to speak in the lingo we inherited when we chose Selenium — the one we began with in the first procedural example up top: the down-and-dirty “selenese” of walking the DOM, changing its state, and verifying that state: selenium.type(id, entry). In the real world of web app testing, this selenese is frequently fraught with xpath and regex ugliness. However pretty or ugly, we want such low-level lingo as far from the test method lingo as possible.

Summary: Three Lingual Tiers

We end up with a hierarchy of three little Lingos, three little vernaculars:  a test method expresses itself in the lingo of web page operation from a user’s standpoint. It does this using facade-like classes that contain methods describing page-specific operations. Some of those page methods are on the page classes themselves, while others are on an underlying BasePageContainer class. (Note, OOD folks, that these Lingos tend naturally to be encapsulated in  Commun-Closure-Principle-compliant packages of classes. I want cohesive packages to  be Lingos, and vice versa.)

The page classes contain methods that in turn express themselves in the lingo of page controls, such as typing in entries (and verifying that such operations work), and checking the values of labels.

Finally, the methods on those page control classes (TextLabel, TextField) express themselves in the nitty gritty Selenium RC selenese lingo of traversing and manipulating a page DOM, and checking its state.

Each of these Lingos can and should be organized into their own package hierarchies. And while we might start with simple 2-tier object hierarchies per package/layer, they might easily grow new tiers. (Groups of wep page facade classes, might, for example have some duplicate portlets that deserve an intermediate class).

Why Bother, For Crying Out Loud?

The extensibility that all of this buys us can be the difference between a test suite that gets maintained, and one that gets abandoned. With these three tiers of vernaculars, we can have very expressive, succinct web page flow tests that contain the least duplicate code, and are readily extended. The web page classes and page control classes under the covers can be just as clear about their page-control-level operations, while they hide details and squeeze out duplication. This is fairly standard, healthy Object Oriented programming, but explained in a more accessible way (I hope!).

Our test methods can be concise even with very lengthy page flows. We need not duplicate any selenium code, nor anything but actual page flow operations (which is unavoidable in such tests). So let’s now look at this in the context of some non-toy code.

Meanwhile, in the Real World

I am currently evaluating the following test code. Based on our above guidelines, what condition do you think it’s in?

So these private helpers are more expressive than selenese, granted. But where do all of these private helpers end up living? In a single web page TestCase “God class”?  An unrelated set of such classes? In a bit of TestCase object tree? This particular app has several more pages, and more kinds of expected and actual values to be compared to each other. Will all of the underlying selenese be duplicated?

With this design paradigm, even with a bit of object tree here and there, we still have a more elaborate version of the refactored procedural example we began with up top. We’ve squeezed the selenese out of the test code, but we are still conflating Lingos. We still have duplication down in the details, and the objects in play are not nearly as expressive and extensible as they could be. We are still on a slippery slope.

I’ve been refactoring this test method, while stubbing out the supporting Lingos it needs to be expressive in the way we have been discussing. Here is what I have so far:

And for example, my first stub pass at the SearchResultDetailedView class looks like this:

So this could doubtless use another pass or two; and lacks some actual implementation here and there. And our TextBlock class would likely need its equals() method overridden, or would require a matches() method of some sort. Its entire semantics might need more thought. How much expected state should be passed all the way from the test method? Good question.

But we are at least making steps in the right design direction. Our test method now uses a user-centered lingo that reads like this: “From the advanced search page, search for this author name using this search type. On the resulting advanced results page, verify that all returned actuals match expected values.”

And as we push down from there, we evolve the underlying layers, keeping them SRP-compliant and DRY.

Web App Test Code is OO Code Too

Most quasi-procedural functional test code mixes up at least three Lingos. In many situations, you’ll have more than four layers in play. For example, for one recent team and app, the customer wanted a BDD-style FitNesse test page that ran green when a large tradeshow conference demo page flow had successfully been verified by Selenium RC. The FitNesse fixtures used a “given/when/then” semantics that were their own lingo, and in fact, a true Domain Specific Language (DSL), and had to be crafted to reuse two of the other Lingos shared by TestNG. It turned out that all of this was fairly straightforward to introduce in a way that was extensible and expressive, and DRY.

We want to minimize the Total Cost of Ownership(TCO) of our test code, just as we do for any other code asset. A couple of short examples may not show it, but mixing up Lingos  eventually guarantees duplicate code, lack of test clarity, lack of problem domain clarity, lack of expressiveness generally, with every new TestCase and test method. This inevitably increases our TCO for that code.

Whether you are an agile tester, or a programmer with agile testing responsibilities, and whether you are using Selenium or HtmlUnit or WATIR or RobotFramework or whatever, learn to recognize the boundaries between the little Lingos you are given by your tools, as well as the ones you are creating. Keep them separate to keep your code DRY, expressive, and extensible. And have fun. Outside the hellish realms of QTP and VBS, automated functional test code can be enormous fun.

Note: you can play with code very similar to the first examples above, by checking it out from its google code repository using any Subversion client, here. The codebase is an exercise we use to hire potential agile testers, as a first gate to measure their coding ability. You can read more about that here.