Low-Maintenance Selenium RC Web App Test Code

In my rank procedural coding days, I might have written through-the-web-app-GUI test code that looks like this (we’re using TestNG here, though at first it may look like Junit 4):

This code runs fine. And with a bit of probing (including reading comments), one can sort of tell what it does. It sets up some Selenium machinery, asks selenium to enter some values in 3 fields identified via xpath, then verifies that another field (looks like a text field) contains an expected string. It then tears down the Selenium machinery.

If one knows Refactoring in a procedural sort of way, which is to say, knowing enough to eliminate rank setup duplication within a TestCase class (to use the Junit term), one might then extract the blocks of setup and teardown code to appropriate methods, plus a single private helper method, leaving us, in the test method, with this:

The Expressiveness Problem

Aside from the test method name, we still cannot much tell what this test actually tests on the page. And in this case, the test method semantics do not help us understand. All we have are the Selenium semantics for marching around the DOM: (“Put this value in this field; check that this other field’s value is such and such.”).

“Lingual Design”

There are abstraction layers trying to emerge here — separate groups of classes and methods.

And you know what? I hate that term,  ”abstraction layer.” I’ve always hated it, and several others nearby it (API, command-query separation, package cohesion, blah, blah). I want a term that describes the lingual expressiveness of a cohesive set of classes and methods.

So I’ll invent (with a nod to Mike Hill and Dave LeBlanc, who helped me with this invention) a couple of new terms: Lingo, and Lingual Design (stay tuned, later, for a Lingual Design blog post). Think of a Lingo as a small, bounded vernacular language, expressed as the classes and methods (what we usually call the API) of a discrete bit of solution domain (see a bit more definition below). I know. I’m not being much clearer yet than your average IEEE paper on C++ idioms published in 1993. But bear with me. I strive for clarity here. I really do.

For now, I’ll call each of the abstraction layers trying to emerge here a Lingo. See if you can grasp what I mean by that, as I march through the code, pulling it apart.

The topmost Lingo  trying to emerge here is the vernacular we use naturally when we describe to each other how to use the web app:  “You go to the triangle testing page, then you enter the values for the sides of the triangle, and it immediately tells you what kind of triangle you have.”  So far our test expresses very little of that lingo.

Highest-Level Lingo: Operating the Web App as a User

Here is a test method that accomplishes exactly what the one above does, but with very different semantics/lingo:

It’s clearer at first reading what we are testing. The lingo used by the test method are those of the highest-level slang used by people when they talk about operating and testing the web application: “First check that we can enter the 3 sides for a Right triangle, then verify that the triangle label indeed reads ‘Right’.” The trianglePage object and methods accessible to the test describe the behavior of the web page involved. We might quibble about how much or how little of the behavior on the page has been verified, but the semantics are now at the right level of abstraction. We are mostly sticking to an appropriate lingo. We aren’t yet talking about what kinds of HTML elements are being manipulated (its own lingo), much less the Selenium “selenese” lingo for manipulating them.

Second Level Lingo: Operating the Controls on the Web Page

As with any OOD, this highest-level Lingo encapsulates and hides details that are not useful in the test method code. Here is a peek at the area of the code that provides this encapsulation. Under the test method’s covers, there is a TriangleTestHomePage class:

I’ve omitted a few private methods for brevity and clarity. Here is our second-level page-control Lingo, hiding inside methods  ”canSpecifyTriangle()” and “triangleTypeLabelReads()”: a TextField “canEnterText()”. A TextLabel “reads()” as expected or not. (Note: these are not distant third-party-library page control classes. They are our own facades; more on this below.)

Notice a few other things about this class. One is that our TriangleTestHomePage knows nothing about Selenium; it is mostly decoupled from it (we could easily sever the rest of the coupling with a PageControl interface).

Another point is that this page class is autonomous and self verifying, to at least some degree. It knows its own URL and navigates Selenium there. It knows its own expected page title, and the base BasePageContainer class contains a verifyTitleIsCorrect() method that asserts that the actual page title matches the expected one passed to it.

Third Level Lingo: Selenium “Selenese” for Manipulating Page Controls

Here is the TextField class:

Within the canEnterText() method, we finally begin to speak in the lingo we inherited when we chose Selenium — the one we began with in the first procedural example up top: the down-and-dirty “selenese” of walking the DOM, changing its state, and verifying that state: selenium.type(id, entry). In the real world of web app testing, this selenese is frequently fraught with xpath and regex ugliness. However pretty or ugly, we want such low-level lingo as far from the test method lingo as possible.

Summary: Three Lingual Tiers

We end up with a hierarchy of three little Lingos, three little vernaculars:  a test method expresses itself in the lingo of web page operation from a user’s standpoint. It does this using facade-like classes that contain methods describing page-specific operations. Some of those page methods are on the page classes themselves, while others are on an underlying BasePageContainer class. (Note, OOD folks, that these Lingos tend naturally to be encapsulated in  Commun-Closure-Principle-compliant packages of classes. I want cohesive packages to  be Lingos, and vice versa.)

The page classes contain methods that in turn express themselves in the lingo of page controls, such as typing in entries (and verifying that such operations work), and checking the values of labels.

Finally, the methods on those page control classes (TextLabel, TextField) express themselves in the nitty gritty Selenium RC selenese lingo of traversing and manipulating a page DOM, and checking its state.

Each of these Lingos can and should be organized into their own package hierarchies. And while we might start with simple 2-tier object hierarchies per package/layer, they might easily grow new tiers. (Groups of wep page facade classes, might, for example have some duplicate portlets that deserve an intermediate class).

Why Bother, For Crying Out Loud?

The extensibility that all of this buys us can be the difference between a test suite that gets maintained, and one that gets abandoned. With these three tiers of vernaculars, we can have very expressive, succinct web page flow tests that contain the least duplicate code, and are readily extended. The web page classes and page control classes under the covers can be just as clear about their page-control-level operations, while they hide details and squeeze out duplication. This is fairly standard, healthy Object Oriented programming, but explained in a more accessible way (I hope!).

Our test methods can be concise even with very lengthy page flows. We need not duplicate any selenium code, nor anything but actual page flow operations (which is unavoidable in such tests). So let’s now look at this in the context of some non-toy code.

Meanwhile, in the Real World

I am currently evaluating the following test code. Based on our above guidelines, what condition do you think it’s in?

So these private helpers are more expressive than selenese, granted. But where do all of these private helpers end up living? In a single web page TestCase “God class”?  An unrelated set of such classes? In a bit of TestCase object tree? This particular app has several more pages, and more kinds of expected and actual values to be compared to each other. Will all of the underlying selenese be duplicated?

With this design paradigm, even with a bit of object tree here and there, we still have a more elaborate version of the refactored procedural example we began with up top. We’ve squeezed the selenese out of the test code, but we are still conflating Lingos. We still have duplication down in the details, and the objects in play are not nearly as expressive and extensible as they could be. We are still on a slippery slope.

I’ve been refactoring this test method, while stubbing out the supporting Lingos it needs to be expressive in the way we have been discussing. Here is what I have so far:

And for example, my first stub pass at the SearchResultDetailedView class looks like this:

So this could doubtless use another pass or two; and lacks some actual implementation here and there. And our TextBlock class would likely need its equals() method overridden, or would require a matches() method of some sort. Its entire semantics might need more thought. How much expected state should be passed all the way from the test method? Good question.

But we are at least making steps in the right design direction. Our test method now uses a user-centered lingo that reads like this: “From the advanced search page, search for this author name using this search type. On the resulting advanced results page, verify that all returned actuals match expected values.”

And as we push down from there, we evolve the underlying layers, keeping them SRP-compliant and DRY.

Web App Test Code is OO Code Too

Most quasi-procedural functional test code mixes up at least three Lingos. In many situations, you’ll have more than four layers in play. For example, for one recent team and app, the customer wanted a BDD-style FitNesse test page that ran green when a large tradeshow conference demo page flow had successfully been verified by Selenium RC. The FitNesse fixtures used a “given/when/then” semantics that were their own lingo, and in fact, a true Domain Specific Language (DSL), and had to be crafted to reuse two of the other Lingos shared by TestNG. It turned out that all of this was fairly straightforward to introduce in a way that was extensible and expressive, and DRY.

We want to minimize the Total Cost of Ownership(TCO) of our test code, just as we do for any other code asset. A couple of short examples may not show it, but mixing up Lingos  eventually guarantees duplicate code, lack of test clarity, lack of problem domain clarity, lack of expressiveness generally, with every new TestCase and test method. This inevitably increases our TCO for that code.

Whether you are an agile tester, or a programmer with agile testing responsibilities, and whether you are using Selenium or HtmlUnit or WATIR or RobotFramework or whatever, learn to recognize the boundaries between the little Lingos you are given by your tools, as well as the ones you are creating. Keep them separate to keep your code DRY, expressive, and extensible. And have fun. Outside the hellish realms of QTP and VBS, automated functional test code can be enormous fun.

Note: you can play with code very similar to the first examples above, by checking it out from its google code repository using any Subversion client, here. The codebase is an exercise we use to hire potential agile testers, as a first gate to measure their coding ability. You can read more about that here.

Wasteful vs. Necessary Types of Variation and Complexity

Premise: No, Software Dev Ain’t Like Manufacturing

So it’s pretty old news that software development is not manufacturing. The reason many of us have questioned manufacturing metaphors  is that software development involves inherently much more variation. Compared to a factory line for a sports car, building the average corporate CRUD web application (not to mention much more interesting app types) is just a completely different animal.

Once a factory line for a sports car is designed, built, tuned, and humming along, it includes no equivalents of test-driving an object model, or storytest-driving requirements. In real manufacturing of sports cars, using approaches like Lean, you can reduce away nearly all of your variation in the actual manufacture. You want to eliminate muda. You can get to a point where the exact steps required to build any one of your sports cars are very nearly identical. This kind of perfectly repeatable, fully-automated process is just not possible for converting feature requirements into actual running, tested features. Sports cars and software system features are completely different animals. Fair enough.

Yet We Do Eliminate Wasteful Variation & Complexity in Software

Nevertheless, a lot of what we do in the world of agile development, and agile coaching and mentoring, is exactly this: reducing or eliminating wasteful, useless variation. Things that complicate our lives unnecessarily, that make the whole system less repeatable, predictable, and more expensive. A fleet of best practices and their literature, I would claim, are about exactly this: squeezing the wasteful variation out of our development.

This doesn’t make us mechanizing, reductionist, draconian Taylorists. We are not being less humane, but more, when we squeeze out wasteful variation. The difference between humanizing and Taylorizing turns on being able to distinguish between useful and unavoidable variation/complexity (things that can be crafted) and useless and wasteful variation (things that really can, and should, be simplified or automated).

Indeed: the more of the latter, wasteful sorts of variation we eliminate, the more freedom we buy for ourselves to really shine in those areas where complexity and variation must be crafted.

So what are the kinds of variation that are unavoidable in software development, and what are the kinds of wasteful variation we always want to consider eliminating? Here are my initial lists. I am trying to launch a discussion here, not finish it, so please comment. As usual, I reserve the right to revise this blog post to include smart observations made by commenters and pals.

Necessary Variation Types

When people like Pete McBreen write books like Software Craftsmanship, and people like Eric Evans write books like Domain Driven Design, much of what they are talking about are the varieties of variation and complexity in software development that are unavoidable. The kinds of things that must be crafted, that unavoidably require skill, discipline,  experience, and even trust-community, passion, autodidactic reflex, and courage. These kinds of inherent variation include (but are not limited to):

Requirements Variation

Yes, from a Portfolio Manager or Product Manager’s perspective, it often makes sense to master the art of saying No to requirements. Joel Spolsky, for example, has written wisely about the nature of the “Consultingware” that results from essentially never saying No.

Nevertheless, we have no real control over the kinds of requirements pressure the market will bring to us, and to which our Product Manager might sensibly say Yes. You cannot, in 2002, anticipate rich-client web app behavior that your hand-crafted Web 1.0 application might be “required” by the market to suddenly include. A great deal of requirements variation is not only unavoidable, but wonderfully necessary: this is how, as an industry, we innovate new value flows.

Object Model Complexity

Yes, we want to keep our designs as simple as possible. Yet, over the course of 10 releases and thousands of lines of code, even in the best factored, best test-driven systems, there will be plenty of unanticipated variation and complexity in the object model. This is the problem to which we apply as much Software Craftsmanship as we can. We try to keep the order of complexity of our solutions relatively in line with the order of complexity of the problems they model, release after release. It is very complex to learn to write very simple code. And there is no avoiding it, if the goal is lowest TCO codebase assets, highest-velocity value flow, and good clean fun.

“Given” Technology Stack Variety

I’m not talking about technology/framework selection, but about instead how, for a given set of such selections, some amount of variation comes along. If your team is required to keep working in Java 5 /Spring MVC /Hibernate /Oracle 10g or whatever, then bang, you have a wide variety of syntactic and semantic variations that you cannot avoid, and which, to some extent, you must master.

Team Dynamics

People get hired and get ramped up. People go on vacation and get sick. People quit and get promoted and get fired. Many team membership changes are unavoidable and inherent. Some variation in team membership, experience levels, etc, are unavoidable. (But see below on the importance of optimizing that variation.)

Wasteful Variation Types

OK, here is where I start to enjoy myself. I get to list the kinds of problems I keep trying to solve, better and better. The ways we can improve. Again, this is a draft, partial list of main categories. Feel free to help me flesh it out with your comments.

Non-Deterministic, Imprecise Scope

Just because we cannot control how much requirements variation might arrive at our doorstep does not mean we cannot be precise about describing, planning, and testing a given new requirement’s completeness and robustness. Much of the storytest-driving, acceptance test-driving that agilists push for is about exactly this: given any wacky new requirement, we ought to be able click on a test and have it turn green when we are done-done-done with it. No, this is not easy or cheap. But it is cheaper than continuous scope-related creep, misunderstanding, and kindergarten blamestorming (“It’s a bug!”  “Hell no, it’s a feature!”).  So Yes, we can squeeze out lots of useless variation here, and we do.

Manual Build and Deployment Steps

My pal Mike Hill has blogged hilariously about the courage to automate away manual build voodoo, what he calls Jiggling Toilet Handles. The manual voodoo is classic useless variation. Yes all the little manual steps might be difficult to completely automate away, end-to-end. Doing so might require courage, discipline, and after-the-fact forgiveness. Nevertheless, every time we automate away some dumb, non-repeatable, expensive manual handle jiggle, we start to reap immediate, repeatable rewards. Others who have been fighting this good fight for years include the Pragmatic guys, with books like the original Pragmatic Programmer, and Mike Clark’s lovely Pragmatic Project Automation. Indeed, the entire practice of Continuous Integration attacks this category of wasteful variation.

Manual Regression Testing

Manual regression testing is a classic Sysiphian struggle, wrapped in a Faustian bargain. Your manual test-recorders or manual test plan executors are cheap per hour?  An indescribably false economy! No manual QA team can ever really hope to catch up or keep up. The untested items and defects pile up. Trust erodes. This, truly, is the canonical thing we can automate away, with one or more tiers of automated testing, most critically programmer-level TDD. Squeezing out this variation yields gigantic benefits. Skillful automated testing may well be the most critical contribution to software development of the last 15 years.

BigBallofMud Code

Every un-test-driven, unrefactored codebase eventually gets muddy, then really annoying, then unmanageable, then completely toxic. As a code asset deteriorates, the cost per new feature goes up dramatically in that codebase.  Yes, as we mentioned above, some complexity in a codebase is inevitable. But most complexity in most codebases is entirely useless and wasteful, and about as avoidable as the lung cancer deaths traced to smoking. Courage and discipline and skill and several specific practices are required to produce Clean Code.

Conveyance Muda

As a team, we knock down our cubie walls, we sit in the same open space, we pair, we have standups, and we write story tests (and programmer tests, for that matter). As a team, we continuously estimate, we continuously plan, we continuously retrospect. We do these things for several reasons, but they all have the partial effect of reducing the cost of getting some knowledge or work product from one person to another, from one state to another. These practices make all manner of conveyance vastly cheaper.

The alternative mass-creation of Word, Visio, Excel, and similar artifacts, then firing them in volcanoe-ash volleys over walls between our silos (using emails with dozens of cc’s!), then discussing them vaguely at large meetings, must surely be orders of magnitude more wasteful. It has repeatedly seemed so to me and my colleagues.

Wasteful Skill-Level Variation

Another silo thing. The lottery number: if you were the Nuxeo/Liferay/Maven guy, and none of the rest of us knew that stuff, and you get “hit by the lottery” and quit, how much money is wasted recovering/reconsistuting/reverse-engineering your knowledge?  Ouch!  We pair, and sit together, and collaborate closer and closer on a story’s definition of done, to squeeze out the useless variation in who knows what and who can do what.

Context-Switching Thrash

If you move programmers too frequently from one team to another, where they must struggle to learn whole new (to them) problem domains, solution domains, technology stacks, local customs, it is much like moving all the cab drivers from Miami to New York, and from New York to Miami. How long will it take before anyone gets to the airport on time?

No, you should probably not keep anyone on any agile team forever. But teams should mostly persist. If you let them persist, self-organize, and continuously improve, and if you mostly bring work to teams, instead of people to work, then you save all kinds of thrash cost, and you gain hard-to-describe production economies from new levels of passion, craft, collective courage, and overall quality and throughput.

OK. Enough for one session. Below are categories I have not yet fleshed out. I can and will. Meanwhile, what other kinds of main categories have I missed?  Where can I/we collapse categories together?  How useful is this principle of wasteful vs necessary variation useful to you, as a crude pattern language?  Lemme know, peeps and tweeps.

Poor Value Flows

Administrivia and Bureaucracy

Over-Engineering

Unnecessary Framework Complexity/Waste

Unnecessary Optimization