Agile Programming: Lesson One (Part Two)
Two posts ago, I suggested a first pair of exercises for programmers who want to learn some Agile Programming off-line, which is to say, by themselves. The exercises in that first post ask you to focus on learning to refactor existing code to what I call a Clean Method Hierarchy. It asked you to make a leap of faith, trying the exercises, without first reading several more paragraphs about why I think that makes such a great starting point for learning Agile Programming. In this post, I’m trying to backfill that explanation and justification.
The Original Goals
In that first post, I suggested that the target level of cleanliness you should shoot for in the first two exercises should include these characteristics:
- Keep all existing tests, such as they are, running green as much as possible. Also try to purposefully break them sometimes, to see what kinds of coverage they give you in the code (more on that later, too).
- Rename each entity (project, package, class, method, variable) at least twice, as you learn more and more about what it should be doing
- By the time you are done, no method should contain more than 8 lines of code (this also applies to test methods), and most methods should be in the neighborhood of 4 or 5 lines of code, including declarations and return statements.
Now: Why Start with Just These Exercises?
Below, I explain why it matters so much that you learn to accomplish the above ends instinctively as an agile programmer, as a software craftsman.
A Clean Class is Much Easier to Learn than Clean Everything Else
Below we’ll dive into the handful of things you need to understand deeply in order to get really, really good at keeping one class clean. Compared to what is necessary to keep an entire object model clean, a multi-tiered architecture clean, a codebase build clean, web application framework implementations clean, continuous integration systems clean, and continuous delivery systems clean (not to mention thorny subjects like exception handling and concurrency), the short list of things required to keep one class clean is pleasantly tractable.
It’s Easier Partly Because There are Only a Few Things Involved
The specific list of practices, principles, and patterns, as mentioned in the first post, includes:
- Code Coverage
- Naming and Renaming
- Basic Refactoring
- Command/Query Separation
- The Extract Method refactoring
- The Introduce variable/field refactorings (and scope change refactorings generally)
- The Extract Class refactoring
- The Move Method refactoring
- The Single-Responsibility Principle (SRP)
- Keeping Modules Really, Really, Insanely Small
- Squeeze Class
In this post, let’s cover just the few of these that are most vital:
- Code Coverage
- Naming and Renaming
- Basic Refactoring
- The Extract Method refactoring
- Keeping Modules Small
- The SRP (Single-Responsibility Principle)
- Cyclomatic Complexity
- Squeeze Class
To the above list of original goals, let me now suggest that you add this item (you are going to keep working on the exercises over and over again, right?):
- Don’t let code coverage fall below 85%.
The definition of a Clean Class includes (contextually) something like 85% code coverage (the measured extend to which your code is protected by tests). (For some kinds of code, you’ll need higher coverage.) For Java code, you can measure code coverage (and observe the detailed results, line by line of code) using any number of commercial and open source tools, including IDE plugins. One such useful, good-enough open-source pair of code-coverage tools for Java is Emma and Eclemma.
The reason good enough code coverage with tests is so important is that without those tests, you don’t know if you are introducing defects when you change the code.
Good enough code coverage does not just mean the coverage metric and other metrics. It mainly means that if you accidentally introduce a defect in existing code, some test somewhere that just ran green now runs red.
Does the code, as originally delivered to you, have 85% coverage? When you first went through the exercise, did you decrease or increase the coverage, or did it stay the same? How did that happen?
The Problem with Code Coverage: It’s a One-Way Metric
If you measure that a codebase has 2% code coverage, it is guaranteed to be in danger. Increasingly, the industry-accepted definition of Legacy Code is simply “insufficient tests.”
On the other hand, if you have 90% code coverage, what does that mean? Unfortunately, in the absence of other assessment and measurement, it means nothing. Smart people have learned how to “game” the coverage metric so that they can achieve the magic, imposed high numbers with very few, very bad tests that have no real value.
Only if both code coverage numbers, median module size, and median cyclomatic complexity are reasonably good for a codebase can you begin to breathe easy about it. That’s because
If you have high enough code coverage, across a codebase that consists of very small classes and methods, you can almost always rescue it completely (if you need to).
Put another way:
If you have high code coverage and low median method-level cyclomatic complexity alone, your codebase is probably in pretty good shape.
Notice the word probably. It is still possible to test-drive codebases with 85% coverage and median method-level complexity of 2, and still have a very bad object model, very hard-to-understand names, and a slew of other non-trivial problems. But you can probably do something about those other problems, if truly you have 85% coverage and median method-level complexity of 2.
That’s much of why I emphasize mastering this small list of things first, in these first few exercises: it can make the difference between a codebase that can be rescued, and one that truly must be thrown away and rewritten.
So what about these few other things?
Naming and Renaming
The most vital, and most subjective and difficult, Clean Code practice is the art of naming and renaming. The very best writing on the subject is by Tim Ottinger (@tottinge), who covered it succinctly and beautifully in Chapter Two of Bob Martin’s Clean Code book (this is one of the first books you should buy, BTW). Tim self-describes as more of an artist than a scientist/logician, and I think that may be why he is such a virtuosic namer. Naming and renaming are art, not science. I won’t recap Tim’s material here; like I said, you’ll need to own the book anyway in order to make sense of this series of blog posts.
I’ll add my own naming practice, FWIW: Attempt to Rename Each Entity Three Times, at Each Contextual Change
- When you first encounter an entity (class, field, constant, method, local variable) that you don’t completely understand at first glance, consider renaming it to what you suspect it actually means. Feel free, for especially pernicious names, to include modifiers like “perhaps,” “maybe,” or “WTF” to the name, in order to remind you that you know this name is not quite right.
- The next time you encounter that same entity in the context of code change, if you have now learned more about it’s responsibility, rename it again.
- Do this one more time (at least), the next time you find yourself working with the same entity, as the code evolves, and especially as your understanding of the underlying problem domain evolves.
- Anytime an entity’s relationship with its clients, its delegates, its home scope, or any of its context changes, it may indeed require a new name. Is some new client calling it? Did you change its access level or method signature? Did you extract behavior from it? Did you move it to another class? Did it’s class name change, or its class’s package name? These are reasons to consider renaming it.
The original Refactoring book by Martin Fowler does not attempt to sort refactorings in order of most common use, or in order of easiest to learn. That’s OK. Since the book was first published, the list of known, useful refactorings has continued to evolve. But learning refactorings in alphabetical order is not necessarily the most pragmatic or friendly approach.
There are a handful of refactorings that are really all you need for the scope of this series of blog posts on refactoring code to Clean Method Hierarchies. We’ll cover about a half-dozen refactorings that I believe are most useful to learn first, in coming blog posts.
For today, please consider, primarily, the Extract Method refactoring.
The Extract Method Refactoring
Extract Method, which most IDE’s will let you attempt automatically these days, is simply about taking complex methods and extracting smaller bits of named responsibility out of them in the form of new methods. It is simply letting the pregnant cat give birth to her kittens, naturally. One of its best on-line write-ups is here.
Keeping Modules Really, Really, Insanely Small
Really, really small means, in Java, median method size of around 5 lines; and median class size of around 5 methods. Insanely small means median method size of 3 or 4 lines, and median class size of 3 methods. Again, the premise is that this code has good enough code coverage.
Although there is more detail involved here (see below), and it can be frightfully hard to keep methods this small when you are not yet skillful at it, it can be astonishing what a positive difference it can make.
When modules are really, really small, it begins to become magically easier to read, understand, and modify code — especially for those who have not seen that code before, or for awhile.
Yes, this is my judgment and experience talking. It requires a leap of faith from you. Please see if it’s true for you, after you have worked enough code into this kind of shape. And please let me know what you find.
Now onto a few deeper items behind “small modules.”
The Single-Responsibility Principle (SRP)
The Single-Responsibility Principle states that every object “should have a single responsibility, and that responsibility should be entirely encapsulated by the class.” But the SRP applies equally to methods and modules at other levels of granularity and abstraction.
And as I was taught the SRP, every module should have a single responsibility means that its components all operate at the same level of abstraction. What the hell is a “level of abstraction”? Indeed. This is not always easily detected and policed.
Think back to the “building a garage” example in the first post. A Clean Method Hierarchy means that at each level of the “outline,” each method is SRP-compliant at that level. This doesn’t even really begin to make sense until you have used the Extract Method refactoring to make lots and lots of methods really, really, insanely small.
One rule of thumb, however: if a method has more than a single level of indentation, it is not likely SRP-compliant: it likely includes code from at least two different levels of abstraction. And it likely has a Cyclomatic Complexity that is unnecessarily high. Nested if statements within if statements within if statements are always asking for trouble.
Cyclomatic Complexity is simple the number of discrete flow-of-control paths a module has. In Java, for example, every conditional statement or clause, every loop, every continue statement, and every other return point or exception clause constitutes another level of complexity.
Every method with low enough Cyclomatic Complexity is by definition SRP-compliant and small enough. Wherever you can, shoot for a method-level complexity of 1 or 2 (e.g., a 5-line method with a single If statement).
The reason that Cyclomatic Complexity matters so much in Agile Programming is that complex code attracts more complexity.
Code complexity is like crime in a neighborhood: it attracts more crime, which attracts more crime.
Complex code is hard code to read, hard code to test, and hard code to modify safely. It contributes to, in Bob Martin’s pattern language, the “viscosity” of code: complexity in a module, alone, can make it substantially easier to “do the wrong thing” than to “do the right thing.”
As you can read at the bottom of this wikipedia write-up, there is increasing evidence that code complexity corresponds with high defect rates. This is like high crime and general urban decay correlating. And it makes intuitive sense to agile programmers. If I cannot easily understand, test, or modify complex code, of course it provides a breeding ground for defects. Defects will tend to lurk in places where they are hard to find, verify, and fix.
We need our object models, our lower level designs, our service layers, our architectures, our builds, and our frameworks to be as simple as possible. Unnecessary complexity is ever an enemy in code as in prose (“Omit needless words.”).
But first, we need you to become expert at keeping the small stuff, the easier stuff, insanely simple. Learn to excel at it the way you would learn, as a classical pianist, to play Chopin etudes at insanely high speeds with insanely smooth flow. Learn to excel at it as you would practice military techniques in a safe bootcamp before you are thrown before enemy fire.
Regardless of the level of complexity and code coverage your current or eventual jobs permit, learn to turn the dials to 11 in these exercises, so that when you need them in a pinch, they flow like a jazz solo.
Squeeze Class is the term invented by Mike (“@GeePaw”) Hill for the algorithm the exercises in the first post make you repeat, again and again:
- Try to extract small methods from larger ones, within a single class
- Run into variable scope problems and control flow problems and conditional logic problems that you have to fix first
- Try to extract methods again; this time you have more success
- Now extract those smaller methods into even smaller methods
- Fix, at a lower level, the same kinds of problems you encountered the first time
- Now finish extracting those smaller methods
- Keep going until you have extracted till you drop — you can go no further
- Now notice, from various clues, that your class full of small methods really “wants” to be two or three or four separate classes, each with its own responsibility.
Squeeze Class is one way to refactor to Clean Method Hierarchies, SRP-compliant methods, and SRP-compliant classes.
Tracking Test Coverage and Median Complexity Together
Commercial static analysis tools like Clover will give you handy mechanisms for plotting module size against code coverage throughout a codebase. With Clover, it’s also trivial to measure and plot complexity and coverage together. This is a fabulous, actionable measure of how your codebase is deteriorating or improving as an asset as time goes by.
But you can measure code coverage with open source tools, and some IDEs will monitor complexity for you in real-time as well.
For the exercises from the first post, I’ve provided a fun, odd, useful mechanism for monitoring size and complexity called Whack-A-Method. For code coverage, I recommend tools like Emma and Eclemma.
With and without the help of tests and tools, you will need to learn to recognize when a module is violating the SRP. Learn to spot when it is conflating and combining responsibilities that belong in different levels of abstraction. Again, in our book outline metaphor: learn when a chapter really wants to be a section, or a section really wants to be a subsection, or when a subsection needs no title at all.
Summary: Keep Your Methods Really Small, and SRP Compliant, with Low Complexity
There is a roteness and mechanistic quality to keeping methods under 5 lines, and classes under 3 or 4 methods. Paint the fence; sand the floor. That’s OK. Practice like that for awhile anyway. A lot of new learning has this rote quality while principles and deeper meaning are internalized.
If you are truly committed to learning this Agile Programming magic deeply, you will make this leap of faith like many of us have. And after awhile, you will indeed get a feel for it. You will be able to spot SRP violation.
Eventually, you will go from keeping modules small in order to approximate SRP-compliance, to mastering SRP-compliance in a way that automatically results in small modules.
And then, magical things will begin to happen in your code. Magical things that you have difficulty expressing to people sometimes, much less justifying.
And if you are like me, Extract Method and Renaming will become a way of life: I use them all the time to learn new code, to learn new problem domains, to spike problems. I use them continually in production code, test code, and toy code.
Extract Method and Renaming are two refactorings that are especially useful for learning to keep software truly soft.
Next Up: We’ll dive more deeply into microtests, and test-driving our KataBankOCR problem domain.