patrickwilsonwelsh.com

Notes and rants on software development best practices, agility, management, and team building.

The Metric I Want for Christmas

Enterprise Software Blight

I get hired to help teams learn agile software development practices. Most of the practices in my tool bag — not all, but most — come from experience, books, articles, blogs, conferences that focus mainly on greenfield development. And as an agile consultant pal of mine, Mike Hill, says, “First step when you are digging a hole:  Stop Digging!”  Turning around how we launch greenfield projects, and the standards of craft, quality, feedback, accountability, and ROI we establish for them — Hey, that’s obviously all good. Most teams, most enterprise are, in fact, still digging everytime they launch a new project. Still making bad enterprise situations much, much worse with more stinky code.

But they are making things worse in more ways than my favorite tools reveal. And perhaps I have been, we have all been, focusing on the wrong kinds of damage. That’s what I want to explore here.

We have spent a number of years trying to help enterprises learn how to, at least, stop digging holes in the object model, in the architecture, in how the team works.

The thing is, these tools in our toolbags really do work best in greenfield situations. Meanwhile greenfield opportunities seem to be slowly drying up. Over the last 10+ years, the software best practices community has been acquiring agile experts, expertise, books, conferences, entire processes, that I think are slowly turning around greenfield project standards. If you work on a project where the issue is how to get everyone up to speed on iterating, velocity, OO, TDD, CI, build and deployment automation, and simple design on a new project, well, good for you. Count yourself extremely lucky. It’s still darned hard to do, but it can be done. It is deeply gratifying work, given enough skill, knowledge, courage, discipline, and management advocacy.

But as arose as a topic at Agile 2008,  and has been arising for me with clients a lot, most developers in the industry can work for years without the opportunity to start from scratch. For most of our careers, we are basically hamstrung by the legacy code issues that keep so many software development professionals living in worlds of constant emergency, constant production defect repair, very slow progress.

Worse than this, our legacy code is accumulating faster than we can cope with it. Our release schedules and iteration schedules are more pressing, while we are increasingly dwarfed by these enormous, stinky, towering piles of crap. We really, really need a way out of this situation — and not just for one team, but for the entire enterprise. And not just in the object model, and not just in the architecture.

Legacy Complexity: It’s All Over the Place

When we do start talking about legacy codebase repair, we often start talking about how to get part of the object model under test. How to start repairing the Java, or the C++, or the C#, or whatever. As far as this goes, this too is 100% goodness. We certainly need characterization tests, opportunistic refactoring in high-traffic, high-business-value neighborhoods of the code. Again, all goodness.

But I suggest that that too might be the wrong thing for us to start with, or at least the wrong thing for us to focus most of our consulting energy on. I suggest that without a better measure of overall complexity from the top to the bottom and from back to front of the enterprise, we don’t really know the best place to start.

I have seen more and more teams engaging in agile software development followed immediately by waterfall integration and deployment.

The more I work at this, the more convinced I am that the legacy complexity that is hurting us all the most is all of this contextual enterprise complexity. Our biggest problem, and biggest potential point of leverage, is the massive legacy bureaucracy that makes inter-system integration, promotion between environments, environment configuration, version control, configuration management, and production deployment such stupendously, horrific nightmares, release after release.

“Total Enterprise Software Complexity”

The main problem is not within the individual systems (as crappy as most of them are, and as tempting it is for us to start diving in and refactoring and test-protecting them). The main problem, as far as I can tell, is between all of these systems. I don’t care how many million lines of stinky legacy untested Java you have. I bet dollars to donuts most of your worst problems are actually between those piles of Java.

I read somewhere a great little discussion (I forget where) about how cyclomatic complexity, for OO code, captures or covers most of what is healthy or unhealthy about a codebase. All the other kinds of dysfunction you would likely find in stinky OO code, and might measure separately, can be covered by cyclomatic complexity. As readers of mine know, I would amend that only slightly, using Crap4J for example, to measure how well test protected the most cyclomatically complex code is. Anyway, the point is that if you are smart, you end up with a single number. Cool. I love single numbers.

So I want a new kind of number. For a given enterprise, before I start determining where to focus my consulting, the metric I want for Christmas would be a single number that blends or relates to at least the following objective and subjective categories of enterprise mess:

  • How many total development teams do we have?
  • How many total developers do we have?
  • How many total systems do we have that are interacting with each other?
  • How many distinct technology stacks do we have in play (e.g., .Net, J2EE, SOA, AS400, Tibco, Rails, etc)?
  • How many distinct frameworks do we have in play (e.g., Struts, Spring, Hibernate, Toplink, Corba, EJB)?
  • How many total  languages are we using (including SQL, Perl, shell scripts, Groovy, Ruby, XML, etc)?
  • How automated is the process of deploying from a dev machine or CI machine to a QA target? From a QA target to Production?
  • How many total lines of XML are there in play in the enterprise?  How many total lines of build-related properties files? XML is so nasty it really does deserve its own measure. XML is a powerful carcinogenic force in organic enterprise systems.
  • What is the average ratio of the lines of code in each system’s build scripts to the total lines of code in the system itself? (Feel free to substitute a better measure of build complexity here)?
  • How much automated end-to-end functional test coverage do we have? Granted (as I advocate elsewhere) you don’t want to lean forever on huge suites of automated functional tests. But as we start healing a quagmire, how many have we got?
  • And yes, what is the complexity of the average object model? How bad off is the average Java or C# or C++ project? A Crap4J number is great here.

So. For the while frigging enterprise, I want a metric, on a scale from zero (ideal) to 1000 (doomed) that describes how much of this mess is interfering with everyone’s ability to make production deadlines, much less transition to a continuously improving, manageable, agile approach.

And I want to be able to tailor a consulting approach — somehow — for an enterprise with a “Total Enterprise Software Complexity” score of 200 very differently than I would for an enterprise with a score of 750.

That’s all I want for Christmas this year. Is that what you want too?  Let’s talk about it. :)

Caveat Lector: This really is an early draft. I throw it out for feedback. If smart enough people review it, I’ll likely be able to refine it greatly. That is my hope. So smart people, please comment and email me.

Flipping the Automated Testing Triangle: the Upshot

This afternoon at Agile 2008 in Toronto, which has been a smashing good time, I led (with lots of help from Lisa Crispin, and then serendipitously from Brian Marick, J.B. Rainsberger, Dave LeBlanc, Matt VanVleet, and Declan Whelan) a presentation/workshop/micro-Open-Space session. I began with a premise from Mike Cohn: the idea of an ideal testing pyramid, or triangle, with three tiers:

automated testing triangle

What makes the triangle ideal is that you are spending most of your resources, and relying most, on the bottom-most tier of xUnit tests to do good things like protect you from defects and drive your design. That suite does the lion’s share of your regression protection, for example. And conversely, you are relying least, and expending least resources, on very slow, brittle, large, black-box, through-the-GUI, recorded kinds of tests as created by tools like Selenium IDE.

My notion is that this minimizes the Total Cost of Ownership (TCO) of your automated tests. Eventually.

It’s a long (3-hour!) conversation, but the gist, as presented, is that nearly no teams get to start with so well-formed a triangle, and in fact the triangle is most typically inverted for teams just starting down the agile path:

bad_automated_testing_triangle

Given such a challenging starting point, you need three automated testing initiatives, one for each of the three kinds of tests (which I refer to metaphorically as brick, stick, and straw), to ensure that you end up with the triangle above. And this is hard, in large part because it is so hard to learn to produce really, really effective suites of unit/isolation/programmer/micro tests (as defined by folks like Michael Feathers, J.B. Rainsberger, Mike Hill, Bob Martin, etc). And the triangle-flipping is hard also because we are talking about culture change, and resistance to it. Well, of course, there are lots of other reasons why it’s hard. It’s hard.

In the presentation, we explored, together, lots of subcategories of this automated-testing-triangle-flipping challenge. To my mind, there were lots of aha moments and unexpected conclusions and good ideas.

So, I’ll blog more on the details later. In the meantime, as promised to several people,  here is a link to a PDF version of the presentation. I have had trouble uploading the Keynote version, and so abandoned that. PDF is likely best in any case.

Thanks again to all who attended, and all who helped me prepare and present. It was a blast.

My next goal for this material is to extend it into an experiential, inspirational, 2-day training course that allows cross-disciplinary teams to experience the nasty cruddy inverted testing triangle on day one, and the lovely, right-side-up triangle on day two. More later.  Now, at 2:00 AM, off to bed with this agilista.

The “E” Word, Part 3: Uncle Bob’s Clean Code Book

Learning Happens

No matter what happens, learning happens. I say this a lot. As many of you know, I joined the agile community way behind the OO-developer-competence 8-ball years ago. I had done lots of procedural programming in the small, but I had had poor instruction overall, and I was not learning OO well on my own. A learning-style thing, I suspect. I had done a fair bit of low-craft Java code, under the loose instruction of folk who waved Bruce Eckels books at me and said “Polymorphism! Prefer composition to inheritance! Encapsulation!”. I tried hard, but I was incapable so far, apparently, of really thinking in OO. To be fair, I had also been doing lots of other things: writing, managing, UI design, marketing, sales, training, QA, starting new businesses, rabble rousing.

And then, out of the blue, I had my agile epiphany, discovered the agile community of thoughtleaders and fanatics, and I sought out masters of agile programming craft. Oddly (for then), I began to learn OO through the lense of TDD and Refactoring. I discovered, of course, that the agile masters and thoughtleaders had long been masters of OO craft, project automation, programming craft generally. They also tended to be continuous learning masters, continuous improvement masters. They were also passionate about humanity, creativity, and fun in the workplace. They loved to talk and read and write. They were masters of software extensibility (the “E” word of this blog thread). They liked beer. True Masters, in other words. Folks who lived life fully and passionately, whether or not software was involved.

It kept seeming like a good fit for me, and it still does. I will be learning from these folks in my 80s, no doubt. And I just learned a bunch from a handful of them again this past couple of weeks.

Uncle Bob’s New Book

Uncle Bob Martin has a book coming out on Clean Code. It is a condensed version of the strictly code-related principles, guidelines, and standards that are covered in more detail in Agile Software Development: Principles, Patterns, and Practices. I am fortunate to have an advance copy of this book, since I and my odd little TicTacPente codebase are also fortunate to be part of the Agile 2008 Clean Code Clinic.

And I love this little book. Uncle Bob and his work need no additional praise, but I am going to praise them anyway, because you, as a programmer, need this book if you are serious about programming craft, and especially if you are serious about helping others with their programing craft.

Bob and his other Object Mentors (and to some extent, Kent Beck before them) together produced this condensed, unabashedly opinionated style guide in such a way that it succinctly covers all of the critical aspects of what it means to write clean code: naming, function size and responsibility, function parameter lists, function side effects, Command-Query separation, formatting, comments, exception handling, class structure, class APIs, emergent design, Simple Design, etc. It is a true style guide, not in the sense of coding style (formatting alone), but in the literary sense of guiding our semantic and syntactic decisions. It also refers back to former programming style guides and guides on programming craft, going back 30 years.

Each topic is covered cogently, with great little code samples, and with characteristic Uncle Bob passion and irreverent verve. I laugh out loud with agreement again and again. My guess is you will too.

Hmm. Time for Some Refactoring

So I am reading this awesome book, by this Bob guy, this agile thoughtleader who helped personally introduce me to XP years ago, who will be managing the Clean Code Clinic of which I will shortly be a part. And I realize, you know, my code is close to complying completely with the book’s guidelines.

Close. But you know? Not quite there.

So, with the book electronically in hand, I begin making refactoring passes through my codebase. Some modules are fine as is. Some, not so much. Hmmm.

Ben Franklin (apparently) said “Do not fear mistakes. You will know failure. Continue to reach out.” So I continue to reach out and refactor my little codebase. And I start to realize. Man! I am really improving some of these modules! The resulting 3-line, 5-line, and occasional 8-line methods are way better than their 10-line and 15-line parents. I watch myself improving my class API’s, hiding more implementation details, and replacing dumb getters here and there with more object-appropriate, intention-revealing, behavior-revealing methods. Cool. As I refactor, I am periodically refreshing the current version of the TicTacPente codebase on this site.

Mea Culpa: Permission to Learn, Sir?

So Man! is learning happening as I read, refactor, read, refactor. I have a few rituals I like to do around learning: I like to give myself permission to do it, which means that I must first admit that I don’t know everything already. I like to grant myself the time and safety to undertake the learning, partly for myself, and partly so that others may learn better. (I love to learn partly because I love to teach.) This requires taking a leap of faith that my production schedules and commitments will not fall to pieces if I take the time to learn. And finally I like to celebrate learning and its results.

I do these things because in my professional life, I have been surrounded by insecure know-it-alls, arrogant code cowboys, shy retiring cubicle-hermits, and every other kind of person who could not stand up and boldly say, “You know what? I don’t know the answer to that one. But I would sure like to know!” And in recent years, I have become the guy who, no matter what the circumstances, no matter what I am being paid, will loudly proclaim when I do not know something. Otherwise, I am not asking for, not giving myself, permission to keep learning, and I am also not modeling permission to learn to others. And I love to keep learning so, so much. And I love when others also do.

You are Using a Language to Make a Language

Here is my favorite lesson from this book. OO programming, at the highest level of craft, is about creating a rich DSL (Domain Specific Language). I am not simply implementing logic, or stringing together extensible algorithms and data structures. The packages, interfaces, classes, methods, variables, algorithms — even white-space — all form a language that is extensible according to how understandable it is. I am creating a DSL that reveals, as it implements, a way in which the problem would really like to be solved. A natural and terse way in which the problem can be solved.

I don’t want my DSL to turn out like English, which is really more of a collision than a blend. And I certainly don’t want it to turn out like Java, although I could do worse.

When I focus on the quality of the language I am creating, all of the other guidelines in the book appear what they really are, means to an end, as opposed to ends in themselves. I really like that. It reminds me of Yoga. Yoga, despite its many Westernized versions, is not an end, not something to get good at as sport, or status, or distraction. Yoga is a means, and the ends include health, equanimity, gratitude, affection, peace of mind, humor. Uncle Bob’s guidelines and principles are means toward an end of a requisite little DSL for the problem at hand, whatever that problem happens to be. When I focus on that end, then the means seem to fall more naturally into mind and under my fingers. What a beautiful way to guide a refactoring session. Learning happens. You see why I love these folks? How many books and communities can you learn that kind of insight from?

Clean Code: Upshot

Once Uncle Bob’s book (well, really all of the Object Mentors’ book) is on the electronic shelves, I strongly recommend you buy it. It’s a quick, funny, useful, compelling read, unlike most of the books on agile craft, OO craft, and programming craft.

And you need to give yourself permission to take the time to read it, to take the time to try its lessons out on some of your code, and to take the time to celebrate how cool it is to have code that clean. This is how we all make progress toward the level of craft at which we always create code this clean. And this is important, because the world contains very, very little code this clean, expressed as a percentage. I have no idea what the actual percentage would be, so I am going to make one up: way less than 1% of the code in the world is this clean. And this matters, because the other 99% is much, much more expensive to maintain and Extend (there’s the E-Word).

Finally, consider coming to Agile 2008 and the Clean Code Clinic, where we are going to use pain and epiphany, courage, outrage, and truth to learn about the real differences between really ugly code and (in my case) at least somewhat clean code.

In the next post, I swear, I’ll return to the TicTacPente problem domain, and what I currently know about 10 x 10, first-to-5-wins Tic Tac Toe.

I keep getting side-tracked by cool, related things. If I am anything about blogging, I am emergent. So sue me.

Great xUnit Test Suites: the Pre-TDD Conversation

A Burning Issue

We interrupt the current blog thread (the “E” word series) to bring you a burning issue. Well, burning for me, anyway.

I have been working with some other Pillar programmers on systems for helping not-yet-agile programmers learn some best practices. And while many of us in the industry are accustomed to coaching, mentoring, training, and otherwise cajoling people to attempt TDD specifically as a practice, I recently have begun to suspect that in fact, that’s a poor place to start the conversation.

TDD is all about how you get a good design, and good tests as specifications, and most critically to my mind, a great xUnit test suite for its regression protection. But what is a great xUnit test suite? What does that look like?

I have been finding (but not grokking until recently) that before I can have a TDD conversation with anyone, I really have to have a good conversation about the characteristics and value of a great xUnit suite.

Characteristics of a Great xUnit Test Suite

So when I come across a fresh codebase (I mean fresh to me — it might actually be quite rotten), these are the things I want to see in the xUnit tests. In future posts, I can give these more discussion, and perhaps include code snippets, but for today, it’s just a list:

  • Code coverage is no lower than 85%. (Note: As important as code coverage is — especially for teams new to xUnit best practices — it can be a dangerous narcotic. It can hide bigger problems. It is possible to have a test suite that provides 100% coverage that is about 100% crappy. People do things like comment out all assertions except assertNotNull(blah), and make other poor choices when under pressure to (A) keep the coverage rates up, and (B) get the features out the door.)
  • As much of the testing as possible is accomplished by “isolation tests”; small unit tests that run entirely in memory, with no dependencies on file systems, networks, databases, or other external resources. This is Mike Feathers’ definition of a unit test. This level of isolation (and the execution speed that goes with it) in turn depend on proper use of static and dynamic mocks. That in turn depends on dependency injection, which in turn depends on people knowing enough OO to code to interfaces.
  • Speaking of execution speed: isolation test suites should average no more than 0.5 seconds per test, on a crappy machine. If everything really is in memory, it’s pretty common to get speeds of more like 100 isolation tests per second.
  • The suite also includes end-to-end tests, “collaboration tests,” and other tests that are more real-world than isolation tests, include less or no mocking, and take longer to set up and run. These tests do talk to real databases, real networks, and perhaps completely external systems through various APIs.
  • The isolation tests and non-isolation tests are separate from each other (separate source folders, to my mind), so that they can easily be run separately by developers, and by a CI server. As projects grow, the speed of their non-isolation suites slows. Because we don’t want to discourage programmers from running isolation test suites frequently, we want to keep the isolation test execution speed fast. We also want to keep the build nice and fast. So we want to be able to run slower non-isolation suites separately, and perhaps less frequently. So if the slow tests run slowly enough, we may not make them part of each CI build, but instead run them every few hours, or overnight, in a separate CI target.
  • Each test method involves only one cycle of Arrange/Act/Assert (setup and instantiation, getting to the testable state, and verifying that state).
  • Each isolation test method isolates a thin slice of system behavior. One industry term for this (proposed by Industrial Logic) is “micro-tests.”
  • Average length of test methods is under 20 lines, ideally fewer than 10 lines.
  • Test methods and TestCase classes are written and organized in terms of system behavior, not system structure. Related to this: all the test methods in a TestCase use the code in the setUp() method in that class, with as little addition test-specific setup as possible. All of the “Arrange” part of “Arrange/Act/Assert” really should be handled in the setUp() method, whenever possible.
  • TestCases systematically cover unhappy paths: exception cases, edge cases and boundary conditions, etc. Mocks/fakes are used to simulate failure of external dependent resources.
  • TestCase object trees make effective use of base TestCase classes, and make good use of reusable, private or protected helper methods (a sort of local testing DSL). Or, as Ryan points out in the comment below, the TestCases all use a separate object tree that holds a well-thought-out, rich little local testing DSL, completely decoupled from the test code. The more of that DSL pattern you need, as Ryan might say, the less you want to use inheritance, and the more you want to use composition.
  • Test suites manage test data centrally (the repository of canonical test data might be a static class full of constants, or an in-memory database, or whatever). TestCases and test methods avoid primitive type literals wherever possible, and likewise avoid duplicate local variables and constants.
  • Test suites, TestCase classes, and test methods contain as little duplicate code as possible. This includes small details like recurring complex assertion patterns that can be extracted, repeating the name of the TestCase in a test method name, etc.
  • TestCase classes and Test methods have intention-revealing names, and use a consistent naming convention.
  • Test suites are designed to be as resistant as possible to production code design changes. They are robust, not brittle.
  • Test suites test the hard and harder things: xml configuration files, servlets, Swing GUIs, Jsp files,etc.

I’ve gathered up this first-draft list of characteristics from multiple sources — books, others’ experience, and my own experience. I’m sure I’m missing a few things in there — I’ll add and prune according to my future thinking and your comments.

Paint the Fence; Sand the Floor

Before people can talk to me with authority about the value of TDD, they need to talk with authority about the value of a great xUnit test suite. And before they can do that, they need to have (as my late mother would have said) suffered enough. They need to have suffered at the hands of codebases without great xUnit suites. They also need to have had their bacon saved by great xUnit suites.

So before we get to the TDD conversation, I increasingly want to encourage programmers new to xUnit testing practices to shoot for an xUnit test suite with the above characteristics. I don’t especially care, at first, how or why they paint the fence (from The Karate Kid), as long as they do it. I would in fact prefer that life and code provide them with the painful, indelible lessons that go with good and bad xUnit test suites.

THEN, once they have felt how hard it is to get that great xUnit suite when they have to stop, go back, retrofit tests to existing code. And once they have felt how hard it is to debug an “Eager Test” (from Gerard Meszaros great book on refactoring xUnit tests). THEN we can talk about how, hey, you know, if that great xUnit test suite is your goal, then my experience has been that TDD gets me there better and faster.

Now we are painting the fence in a specific way.

But along the way, it’s all good.

The “E” Word; Part Two

The TicTacPente Eclipse Project

In my first post on this topic, I set the stage. I had a need for two implementations of the same problem domain: one ugly, one not. As promised, by the way, you can anonymously download the entire codebase discussed in this series of blog posts from a google code project here. It’s an Eclipse project, all zipped up.

The project includes a GameGUI.java applet that you can run, to play the game (right click on source/ view.applet.GameGUI.java, and pull down “Run As > Java Applet”). There is a first-draft README file that describes the whole shebang, and suggests some exercises to try. See what you think of it all.

LegacyGame.java

The legacy version of the TicTacToe game is in legacy/ legacyGame.LegacyGame.java. Now, take into consideration that this version reflects lots of little refactorings on my part, dating back to when I had characterization tests for this “class” (I’ve since removed all of those tests — I didn’t want students and job candidates subjected to these exercises to benefit from them). I renamed a lot of methods that started out with names like “c24occx()”, assigning placeholder-quality names that I thought my characterization tests were revealing to me, like tryToFindPositionGivingSeriesOf4OnTwoOrMoreAxes(). In some cases my educated guesses were accurate, and in some other cases, I later learned that I was far off.

I extracted a few small methods from other, larger, stranger ones, naming them as meaningfully as I could at the time. I extracted lots of constants. I renamed variables. I killed a lot of dead code and inscrutable comments. I managed to extract the Java applet code (woven into the gameplay code’s DNA) into its own class. I just couldn’t stand not doing that. (Clue: what do you notice about that applet code?)

But eventually, I just gave up working with it. After person-days of jUnit poking and prodding, this codebase remained quite opaque to me. I’ve inferred a lot of its algorithmic meat from its external gameplayer behavior. But I’m still baffled by much of it.

So this is our first measure of inextensibility in a codebase we discover: what Uncle Bob Martin calls opacity. One of the characteristics I wrote about here. As we glance through it, as we write tests for it, we struggle to understand it.

But it seems that every month or so these days, there are new tools to help us grasp what we are up against. I ran Crap4J against LegacyGame.java, and of course it pegs the tool’s little meter at the far right, at 36.84, as if pressed forcefully against that right-hand fence, searching for a measure of even more non-test-protected cyclomatic complexity. Average Crap4J score (blue triangle) is just under 5, BTW. As you can see, that little yellow triangle is trying to leave the ballpark:

crap4j_snap.JPG

So, I did determine how the legacy game manages its board state and game state, and got lots of peeks into how it determines which move to make next. Enough so that I was able to run the game from a test harness, one move at a time. This is the TestCase that pits the two games, old and new, against each other, some number of times. Currently that number is 200. You can find this code in manualTests/ manualTests. OldGameAgainstNewGameTests.java.

The source folder and package names contain the word “manual” because at first, I was printing out a representation of the board after each move taken by each game.I was examining System.out.println() output manually, to learn.

It took a bit for it to dawn on me: I was doing exploratory testing.

Old Game vs New Game: Exploratory Testing

So I started with lots of high hopes, deep fears, and ignorance about my prospects of test-driving a decent version of this problem domain. My goal was for my game, if it took the first move, to beat the old game or play it to a draw most of the time. (As it turns out, I did much better than that. After my second run at this code, I ended up with a game that beats the old LegacyGame about 50% of the time, and beats it to a draw about 40% of the time. When I go first, the old game wins no more than 7% of the time.)

The new game clobbers the old game, after much research and development.

In my first test-driven version, my first few defensive algorithms were, in addition to being completely ineffectual against the old game strategically and tactically, pretty badly conceived. My object model was in parts over-engineered, and in other parts procedural, sloppy, and under-engineered. I paired with my good friend Dave LeBlanc on it for an hour, and he made several forthright observations about what I had done well and what I had done poorly. My design had some real flaws. I had pretty good test coverage, but nothing like what I wanted. For the next few days I pushed this first codebase version as far as I could, and got it to the point where it edged out the old game if it went first, on average. It performed OK.

But I was deeply disappointed at the results. I knew I had to rewrite it. I can get an A+ in any course I’ve already taken, if I take it again enough times. Dave had encouraged me with suggested new design approaches. I wiped the slate clean. I started over with an empty Game class, and a much better sense of which strategic and tactical behaviors I wanted to test drive in what order.

That’s when I started turning my attention more rigorously to the move-by-move board printouts I was logging in my manual game-against-game test harness. I started combing through each loss I suffered as I test-drove my second version, looking at the strategic setup patterns, while I looked for a cleaner way to represent the basic defensive and offensive patterns. I watched carefully as the old game, ugly or not, set itself up cleverly to defeat me a couple of moves into a new game.

And as all kinds of interesting patterns emerged from this manual exploratory testing, I began to understand the problem domain much more deeply. And this, or course, made Simple Design easier, and refactoring easier, and test-driving easier. I noted specific patterns, wrapped test data and failing tests around them, and produced new behaviors of my own that played the game better.

exploring1.JPG

Then suddenly one day, after I had finished one particular bit of strategy involving collecting all possible moves, ranked by tactical priority, and looking for any of the highest priority moves that also matched lower-priority moves (blocking the other player’s new series while simultaneously extending a series of our own, for example), I saw a huge new jump in my game’s performance. I added another bit of logic around responding to the other player’s first move, then another around making the first move on one of the center-most 4 squares on the board. With each of these well-thought-out bits of new behavior, I saw big jumps in my game’s performance against the old one. Meanwhile, there was not that much total strategic and tactical logic, and I was simplifying and consolidating as I went. I had a reasonably clean, reasonably well test-protected codebase that was kicking the other game’s keister. It was rewarding.

In my next post on this thread, I’ll dive a bit into this fun little problem domain: 10 x 10 TicTacToe where first player to 5 wins. I’ll talk also about how close this is to a game called Pente, which leads to at least one interesting “requirements change” we can use to measure the relative extensibility of the legacy game vs. the new game. You can read about it yourself, if you like, in the README file.

In the meantime, please feel free to download, unzip, import, and play around with the codebase. I make no apologies for the ugly graphics, BTW. If you can make me better ones, please feel free. I am a PhotoShop dunce.

Oh, one last note. I have challenged my good friend Dima, who is an algorithmic genius, to test-drive an entirely new version of the game, pitting it against the old game using my test harness design. He has accepted, so we shall all see some beautiful code indeed. And I challenge you too, reader, if you like these kinds of challenges, to test-drive your own version of the game in the same way. How much better a design than mine can you produce? I much better test-protected can it be? I how much better can you make your game play strategy and algorithms? Along how many more “lines of extensibility” can your version of the game be open to extension? My version has flaws, some fairly obvious, some subtler. What are they?

No, this is the last note: Two other good friends of mine, Ryan Cooper and Dave LeBlanc (mentioned before) have also accepted the challenge. Ryan, in fact, improved my implementation by extracting the strategic logic in such a way that it is easier for him, you, or anyone to create new game strategy by implementing Ryan’s new IStrategy interface. Yay Ryan. All that is reflected in the version for download above.

Till next time, patient reader.

Teaching Programmers about the “E” Word

A Tale of Two Codebases

So, pardon the long interlude, readers. I have been up to my eyeballs in alligators. But all my parts are still connected, and I return from the software jungles with something I find interesting, and I hope you do too.

I have been working for months on a “toy” codebase to use for three main purposes: evaluating the technical skills of programming candidates at Pillar Technology, making baseline assessments about the technical skills of new hires and client programmers, and conducting classes on agile/OO programming practices. In this and coming blogs, I am going to share with you the codebase itself (Java/Eclipse project), and my experiences and insights while developing and using it. I am also going to solicit your input on how to improve it as a teaching/mentoring tool, and as a set of exercises for evaluating programmers.

With a bit of luck, I’ll be conducting a hands-on session using this codebase at Agile 2008.

But first, two things: the Problem Domain, the Technical Practice Scope, and the “E” Word. OK, so that’s three things. I really need coffee this morning.

The Problem Domain

The codebase is two completely separate implementations (Legacy and “non-Legacy” implementations) of a 10 x 10 TicTacToe game where the first player to 5 in a row in any direction wins. You play against the computer, and it typically kicks your patootie, whichever version you are playing against. (Well, it kicks mine.)

So I happened upon the Legacy version of this codebase more than a year ago, when looking to design this pedagogical tool, and determined it to be perfect, in a kind of sick, pathological way. Let me explain. The original codebase is a Java applet. It is, including all the applet code, a single, 1200+ line “class” with dozens of methods that looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
public int someWierdMethodName(int playerMark, int x, int type)
{
    int j, k, l;
    int position = 0, position2 = 0;
 
    for (l = 0; l < 6; l++) {
        for (j = 0; j < SQUARES_PER_SIDE; j++) /* horiz & vert */
        {
            resetAllMarksAlongAxesForFirstHalfOfBoard();
 
            position = checkFor5AlongHorizAxis(playerMark, x, j, l, position);
 
            if (marksByAxisByPlayerForChecking[0] == 3 && marksByAxisByPlayerForChecking[1] == 2) {
                if (type == SETFLAGS_MODE) {
                    tempTableForChecks[tempRowForChecks[0]] = OCCUPIED;
                    tempTableForChecks[tempRowForChecks[1]] = OCCUPIED;
                }
                if (type == CLEAN_MODE) return tempRowForChecks[0];
            }
 
            if (marksByAxisByPlayerForChecking[0] == 4 && marksByAxisByPlayerForChecking[1] == 1 && type == CHECK_MODE)
                                    return position;
 
            position = checkFor5AlongVertAxis(playerMark, x, j, l, position);
 
            if (marksByAxisByPlayerForChecking[2] == 3 && marksByAxisByPlayerForChecking[3] == 2) {
                if (type == SETFLAGS_MODE) {
                    tempTableForChecks[tempRowForChecks[0]] = OCCUPIED;
                    tempTableForChecks[tempRowForChecks[1]] = OCCUPIED;
                }
                if (type == CLEAN_MODE)
                    return tempRowForChecks[0];
            }
            if (marksByAxisByPlayerForChecking[2] == 4 && marksByAxisByPlayerForChecking[3] == 1 && type == CHECK_MODE)
                return position;
        }
 
        for (j = 0; j < 6; j++) {
            resetAllMarksAlongAxesForFirstHalfOfBoard();
 
            for (k = 0; k < 5; k++)
            {
                position  = checkFor5AlongDiagDownRightAxis(playerMark, x, j, k, l, position);
                position2 = checkFor5AlongDiagUpRightAxis(playerMark, x, j, k, l, position2);
            }
 
            if (marksByAxisByPlayerForChecking[0] == 3 && marksByAxisByPlayerForChecking[1] == 2) {
                if (type == SETFLAGS_MODE) {
                    tempTableForChecks[tempRowForChecks[0]] = OCCUPIED;
                    tempTableForChecks[tempRowForChecks[1]] = OCCUPIED;
                }
                if (type == CLEAN_MODE) return tempRowForChecks[0];
            }
            if (marksByAxisByPlayerForChecking[0] == 4 && marksByAxisByPlayerForChecking[1] == 1 && type == CHECK_MODE) return position;
 
            if (marksByAxisByPlayerForChecking[2] == 3 && marksByAxisByPlayerForChecking[3] == 2) {
                if (type == SETFLAGS_MODE) {
                    tempTableForChecks[tempRowForChecks[0]] = OCCUPIED;
                    tempTableForChecks[tempRowForChecks[1]] = OCCUPIED;
                }
                if (type == CLEAN_MODE) return tempRowForChecks[0];
            }
            if (marksByAxisByPlayerForChecking[2] == 4 && marksByAxisByPlayerForChecking[3] == 1 && type == CHECK_MODE) return position2;
        }
    }
    return (NONE);
}
1
 

Actually, that snippet includes method extractions and renames that I did. It was much worse before I got a hold of it and starting retrofitting Mike Feathers-style “characterization tests” and doing bits of opportunistic refactoring here and there.

So the whole thing has a fabulously high cyclomatic complexity. In other words, though this little TicTacToe applet is quite clever algorithmically at kicking your keister, it is supremely inextensible code, along at least several axes of extension. That was exactly what I needed. BWAHAHAHAHA.

Technical Practice Scope: the “E” Word

For clarity, the only technical practices I intended (and currently intend) to try to teach using this codebase all center around what I call the “E” word: extensibility. So they primarily include: xUnit testing practices, OO practices, Refactoring, Simple Design (or “Travel Light”), and TDD (test-driving).

The entire point of the codebase and its various uses is to make the point of the differences between extensible and inextensible code, and to measure and teach these practices that are most central to the extensibility of a codebase. The point is actually to give people an experiential sense of the sum of NOT using the best practices listed above, on the one hand, and using them, on the other. It’s a sharp-edged little A/B comparison exercise.

What I Did, Crazy Man that I Am

I began with this Legacy game code, and with the tentative steps of poking and prodding it with tests (more on all of that later). I played against it (and lost!) a lot. I began to learn the algorithmic problem domain (entirely despite the maddeningly bad design).

I then constructed a jUnit test harness that would enable me to play a new, test-driven version of the game against the old one, and measure how much of the time I won. And I began to test-drive my first version of this game.

I ran my head-to-head test a lot. It was depressing. Despite months of research and learning, I was still getting my rear kicked 98% of the time, or so.

And that is roughly where we shall pick up in the next blog post, when I’ll share with you a zip file of the entire Eclipse project, including the Legacy and Less-Legacy versions of the code.

Until then, fair readers.

Dynamic Languages, Blimps, TDD, Alpha Geeks, and “Compiler as Nanny”

Nannies for Blimps!

In the old days, the compiler was your nanny, because computer resources were expensive and delicate and huge, like a giant hydrogen blimp. You want to fly the blimp, baby, you better be really good at flight plans.

So static-typing was one of several ways to prevent us from blowing up the blimp at runtime. (This metaphor may not work for too many more paragraphs, but I am flying with it for now anyway.) Static-typing is really a sort of BDUF (Big Design Up Front) enforced at the language level. It imposes design straitjackets that only become plain once a dynamic language has removed them from you. (Wow! You can do THAT in this language? Really? Look at how much less code that is.)

The Nanny is Not Groovy, Man

Extending designs in Java is genuinely hindered by static typing. It’s no longer a political issue, it’s just plain fact. And I don’t just mean because it takes 400 characters to print “Hello World.” No, I mean the kinds of shenanigans imposed on every type you inherit or use or create. Think: generics, and how long it took for Java to get them, and what a pain in the arse they are. The Nanny really is everywhere in Java, to my mind.

Bruce Tate, Stuart Halloway, Justin Gehtland, and others have made passionate, convincing argument that (for example) convention-based Rails programming as a means of getting an enterprise web app “off the ground” (so to speak) may be faster than the lightest-weight J2EE frameworks by a factor of 10. Maybe more than one factor of 10. Sheesh, that Nanny is EXPENSIVE! How would your stakeholders like it if you could produce 10 times as many web applications to solve enterprise problems than your teams can today?

The catch, of course, is Ruby and its still relative dirth of supporting libraries, frameworks, and similar open-source support. And Ruby syntax looks wonky to us old Algol-based fuddy-duddies. This is why I am personally so attracted to Groovy and Grails. (Introduced to me by Andrew Glover of Stelligent and Chris Judd at CodeMash in Ohio last month.) All the convention-based goodness, plus leverage of my Spring and Hibernate experience, and I can still use the Java stuff that the mean Nanny pounded into my head over the years. (Just the good stuff. She’s not a completely mean, insane, dictator nanny, I can now see in retrospect.)

Look Nanny, No Unit Tests! (Uh Oh.)

So off we go to a dynamic language, full of our static-typing outrage. The catch, of course, is this: because you can send a “divideYourselfByZero” message or any odd message you like to the integer 42 in languages like Smalltalk, you can get runtime errors of the sort that would curl a Java programmer’s hair.

Does it suddenly make a bit of sense why the first member of the xUnit family was, in fact sUnit? My question for the Smalltalk crew is this: before you had sUnit, how the heck did any of you keep your jobs? The production deployment problems must have been Hindenburg-spectacular. (I’m partly kidding. Actually, it turns out, the really good Smalltalkers had other tricks up their sleeves to avoid runtime disaster.)

So the bottom line is this: with dynamic languages, You Have No Choice but to develop exhaustive, requisite suites of true unit-level isolation tests. Oh, plus end-to-end tests, and several other automated test varieties. This is the equivalent of venting out all the Hydrogen and replacing it with … Helium! Yay! Doesn’t explode, a bit more expensive, for sure, and not quite as “lifty,” but way safer, and you can still fly.

With dynamic languages, you have the authority and responsibility to be an adult, not a child. No more Nanny, so no more stepping out into the street before you look both ways.

Note to Recruiters: Hire the Guys Who Know Dynamic Languages, No Matter What They Charge

Power Programmers, Alpha Geeks, the ones who some now claim in public (reasonably, I believe) can outproduce “vocational programmers” by a factor of 10 or more (there is that same math, hmm)? It turns out that one of the primary indicators of one of those guys or gals is that they just cannot keep their hands off of lots of different languages, and operating systems, and computers, and you name it. Some of them actually play banjo! They are really good at comparing entire language systems and development systems to one another. They really like dynamic languages, because they are so much faster and cleaner and better. And they really, really like unit tests, because suites of them save their behinds so frequently.

Oh, BTW, you know what true alpha geeks are all on about these days? A good old idea come full circle: functional programming. Oh, and also, BDD. Whee, here we go!

Alpha Geek example: my alpha geek pal Dimitri says “dynamic languages are so much more expressive it’s not even funny.” He says that when he noticed that Ruby does not require generics, “I almost started crying.” He says “I think the whole thing can be summarized as: static typing breeds incidental complexity”. Great quote. And he correctly points out that in the emerging world of Domain-Specific Languages (DSLs), dynamic languages are absolutely vital.

So save the really big programmer salaries for the ones who (A) know unit testing backward and forward, including TDD, (B) know multiple languages, and program avocationally and recreationally, and (C) can spout endlessly about the benefits of once-seeming exotica like dynamic languages and functional programming. That, at least, would be my agile alpha geek definition. There are other kinds of alpha geeks, certainly. In my big enterprise app world, I need the agile ones.

Hire guys like Dimitri. Make sure your team has a ratio of at least 1 alpha geek to every 3 or 4 non alpha geeks. And be the kind of boss and organization that alpha geeks love to work for and with (yet another blog topic, for another time).

And be the kind of boss who enables non-alpha geeks to find their way to alpha, if they want it. Again, another blog for another time.

The Whiteboard-Space to Wall-Space Ratio (WBS/WS)

Filed Under: Seriously Cheap Wins

Why this is true, I really do not completely understand. I want to understand it, and not judge it, but I admit I have difficulty there.

In the kinds of companies at which I have been doing agile software development consulting — coaching, mentoring, training, development — over the past few years, there is an odd trend: lots and lots of wall space, and too little whiteboard space.

I have been seeing lots and lots of conference rooms, team rooms, and miscellaneous rooms in which software development works gets done. And there are acres of wall space around. And there are tons of ideas that must be worked through collaboratively. Brainstorming that must happen, and design and architecture, and project tracking, and planning, and learning and mentoring, and training, and you name it.

Yet, there is this incredible dirth of whiteboard space. As if whiteboards were made of platinum. My favorite example of this is the very large conference room with a 20′ table that seats 24, and at the end of it, a tiny, 4′x4′ whiteboard, folded away in a little closet of its own (as if to say, “Only to be used in dire imaginative emergencies!”). Oh, and best of all, those little round whiteboard erasers maybe 3″ in diameter. They don’t so much erase as they smear.

Closely related to this: the dry-erase marker to whiteboard ratio (DEM/WB), and the dry-eraser-size to whiteboard-size ratio (DES/WBS).

How in the world do people get any creative, collaborative work done in such environments? In high-function agile teams of yore, I have seen walls covered with whiteboard stuff, and we have blithely scribbled floor to ceiling and wall to wall on it, with genuinely useful information. When I walk into a high-function team room, this is one of the things I immediately look for: huge whiteboards slathered with passionate creation and communication and clarification.

At one past engagement, 7 or so of us on a client site shared a little room the size of a large walk-in closet, with no windows, and a single 5′ square whiteboard. We positively crammed that poor board with ideas, then took digital pix of it, then erased it and crammed it with ideas again.

Our ability to think and create and collaborate in software development can literally be constrained by the whiteboard space available to us.

Coming Soon: Whiteboards On Me

I haven’t begun doing this, but I suspect I shall shortly. When I am brought to one of those conference rooms with the tiny closeted whiteboard, I shall say “Hey, I’ll work for you tomorrow for free, if you’ll let me put up 80 square feet of whiteboard on that empty wall there, at my own expense.” I’m going to start building that into my bill rate. [My fall back position will be that suggested by my pal Mike Gantz in the comment below: I’ll bring in several whiteboards on wheels.]

Meanwhile, here is my contention around Whiteboard-Space to Wall-Space ratio (WBS/WS). The higher it is, the more time it takes to get things done, the more waste and rework you are likely to have, and the more, in particular, people end up communicating across one week and 50 emails what could have been handled elegantly in 5 minutes with a decent whiteboard diagramming session. Talk about muda.

Go forth, agilistas, and shrink the WBS/WS. Increase the DEM/WB, and the DES/WBS. Every room should have at least one wall where at least half the wall space is covered with whiteboard. Every whiteboard should have at least 8 markers on its little ledge per 30 square feet. And you can get these awesome extra large erasers that clean the boards faster and better. Every whiteboard should have one of those, regardless of size.

Surely this falls under the “cheap win” and “low hanging fruit” category for agile coaches everywhere.

Maybe I should just become a whiteboard consultant. Then I could wear my leather toolbelt and tools everywhere. I love to wear that thing. It’s all pockets and loops.

Client Validity, Client Validation, Code Smells, TDD & BDD

Or, BDUF & Fast OO Karmic Resolution

I’ve been chewing this one over for awhile, and it is finally ready for the world to attempt to digest it. Or something like that. (Ewww.)

Why do our most carefully conceived UML diagrams of object models of any size fall apart? Wait. More precisely, how and when do they fall apart? How can we tell whether a given class exhibits, say, the “Inappropriate Intimacycode smell, or the “Message Chains” smell, or the “Middle Man” smell?

More to my immediate point, in the context of Software Karma and Justice: If you create a class or collection of classes that inflicts such suffering on programmers who later must maintain and extend your code, then who should be the first one to suffer as a consequence? Well, trick question, of course.

You should, baby. The karmic resolution should be this fast: you create a class with bad separation of concerns or a cruddy API or rampant duplication, and you are the first person who is made to suffer as a consequence. What goes around comes immediately around and smacks you in the head, like a tetherball. Ah, would that justice flowed thus swiftly in all realms! (spoken in Robin Hood voice from “Men in Tights.”)

Test-Driving: Delivering Smells to Your Nose First

So how is this possible in object design? Only through TDD and BDD, which require that in fact you be the first person to invoke the API of your new class. The unit test you are writing is the first client of the new behavior you are creating. In fact, you have to try to invoke the production code before it exists. You can’t pay karmic debt any faster than that. Pay in advance, baby! You try to instantiate the class, set up some state for it involving some new behavior, and make some assertions on that state/behavior. And you suddenly realize: Eeeew. That’s a horrible API (or at least this seems to happen to me a lot).

Most likely, for me, the class already exists, and I need some new behavior (fast, baby, fast!). What’s the first tool I reach for? The old procedural one from my Old Coder DNA: adding a method. Doh. And what happens? I test drive that new method and realize: Eeeew. This class is getting more bloated than MS Word. And I have some nasty duplication going on.

Oh, Man. Not only does this new method not belong there, but the last two methods I added elsewhere are just as bad as this one. In fact, I am going to have to split this thing into a whole new little object tree, and I think I am going to need a Template Method. I’ll be pulling stuff up and pushing stuff down my new little tree for the next several minutes, as soon as I get this ugly test to pass.

Cause My Client Told Me So

So the tube through which this smell arises for me, I am calling “Client Validation.” My point is this: only from the perspective of the clients of a given class’s API can we really tell how bad they smell. (I’m trying not to mix two metaphors of human sense here, but dude, it’s hard.)

Only once my test shows a bit of “Primitive Obsession” or “Message Chains” do I realize, from the perspective of my client test, that other programmers will likely find this API as stinky as I do.

So, one test method at a time, one API call at a time, I find a particular class to be slightly (or horridly!) client-invalid. From the client’s perspective, that class or method or method signature stinks.

And one test method at a time, one production method at a time, one production class at a time, I repair my design, refactoring it into a state that feels, from a client-test-method’s perspective, to be valid. I end up with an API that smells valid to its client tests (and production clients).

This is what I can never smell in my UML diagrams. I can never feel (oh great! Now my metaphors include sight, smell, and touch!) whether each of these calls is “Client Valid.”

This may be old news to some brilliant old Smalltalk farts I could name, but not to me. It is a useful little heuristic for my on-going TDD journey. Maybe this will all be easier in Groovy. Hmmm.

Anyway, I am going forth now, nose held high, to sniff out Client Validity in my object models, one horrid little test method at a time. If comments are code-smell deodorant, then refactoring is code-smell Febreze. My cube will be the one that smells of Febreze. That stuff is great, you know. It can eliminate all of the cat-urine odor from an entire 6′x6′ Little Tikes play-structure (moved indoors one Winter) that apparently served as the olfactory bulletin board for every feline in Oak Park, Michigan. But that, of course, is another blog, for another time.

Continuous Refactoring and the Cost of Decay

Refactor Your Codebase as You Go, or Lose it to Early Death

Also, Scrub Your Teeth Twice a Day

Refactoring is badly misunderstood by many software professionals, and that misunderstanding causes software teams of all kinds - traditional and agile - to forgo refactoring, which in turn dooms them to waste millions of dollars. This is because failure to refactor software systems continuously as they evolve really is tantamount to a death-sentence for them.

To fail to refactor is to unwittingly allow a system to decay, and unchecked, nearly all non-trivial systems decay to the point where they are no longer extensible or maintainable. This has forced thousands of organizations over the decades to attempt to rewrite their business-critical software systems from scratch.

These rewrites, which have their own chronicles of enormous expense and grave peril, are completely avoidable. Using good automated testing and refactoring practices, it is possible to keep codebases extensible enough throughout their useful lifespans that such complete rewrites are never necessary. But such practices take discipline and skill. And acquiring that discipline and skill requires a strategy, commitment, and courage.

So, First of all: Refactoring - What is It?

The original meaning of the word has been polluted and diluted. Here are some of the “refactoring” definitions floating around:

  • Some view it as “gold-plating” - work that adds no business value, and merely serves to stroke the egos of perfectionists who are out of touch with business reality.
  • Some view it as “rework” - rewriting things that could, and should, have been written properly in the first place.
  • Others look at refactoring as miscellaneous code tidying of the kind that is “nice to have,” but should only happen when the team has some slack-time, and is a luxury we can do without, without any serious consequences. This view would compare refactoring to the kind of endless fire-truck-polishing and pushups that firemen do between fires. Busy work, in other words.
  • Still others look at refactoring as a vital, precise way of looking at the daily business of code cleanup, code maintenance, and code extension. They would say that refactoring is something that must be done continuously, to avoid disaster.

Of course, not all of these definitions can be right.

The original, and proper, definition of refactoring is that last one. Here I attempt to explain and justify that. But first let’s talk about where refactoring came from as a practice.

What problem does refactoring try to solve?

The Problem: “Code Debt” and the “Cost of Decay” Curve

What is Code Debt?

Warning: Mixed Metaphors Ahead

Veteran programmers will tell you that from day one, every system is trying to run off the rails, to become a monstrous, tangled behemoth that is increasingly difficult to maintain. Though it can be difficult to accept this unless you have seen it repeatedly firsthand, it is in fact true. No matter how thoughtfully we design up front and try to get it entirely right the first time, no matter how carefully we write tests to protect us as we go, no matter how carefully we try to embrace Simple Design, we inevitably create little messes at the end of each hour, or each day, or each week. There is simply no way to anticipate all the little changes, course corrections, and design experiments that complex systems will undergo in any period.

So enough of dental metaphors for a moment. Software decay is like the sawdust that accumulates in a cabinetmaker’s shop, or the dirty dishes and pots that pile up in a commercial kitchen - such accumulating mess is a kind of opportunity cost. It always happens, and it must be accounted for, planned for, and dealt with, in order to avoid disaster.

Programmers increasingly talk about these little software messes as “code debt” (also called “technical debt“) - debt that must be noted, entered into some kind of local ledger, and eventually paid down, because these little messes, if left unchecked, compound and grow out of control, much like real financial debt.

The Software “Cost of Decay” Curve

Years ago it was discovered that the cost of correcting a defect in software increases exponentially over time. Multiple articles, studies, and white papers have documented this “Cost of Change Curve” since the 1970’s. This curve describes how the cost of change tends to increase as we proceed from one waterfall phase to another. In other words, correcting a problem is cheapest in requirements, more expensive in design, yet more expensive in “coding,” yet more costly in testing, yet more costly in integration and deployment. Scott Ambler discusses this from an agile perspective here, talking about how some claim that agile methods generally flatten this curve. Ron Jeffries contends, alternately, that healthy agile methods like XP don’t flatten this curve, but merely insist on correcting problems at the earliest, cheapest part of it. I agree with Ron, but I claim that’s only part of how agility (and refactoring in particular) helps us with software cost of change.

There is a different (but related) exponential curve I dub the “cost of decay curve.” This curve describes the increasing cost of making any sort of change to the code itself, in any development phase, as the codebase grows more complex and less healthy. As it decays, in other words.

Whether you are adding new functionality, or fixing bugs, or optimizing performance, or whatever, the cost of making changes to your system starts out cheap in release 1, and tends to grow along a scary curve during future releases, if decay goes unrepaired. In release 10, any change you plan to make to your BigBallofMud system is more expensive than it was in release 1. In the graph-like image below, the red line shows how the cost of adding a feature to a system grows from release to release as its decay grows.

Classic cost of decay curve.

The number of releases shown here is arbitrary and illustrative — your mileage will vary. Once more, I am not talking about how, within a project, the cost of detecting and fixing a problem increases inevitably over time, as the Cost of Change curve does. I am saying that we can use the cost of any sort of change (like adding a new feature) to measure how much our increasing decay is costing us. I am using the cost of a change to measure increasing cost of decay.

Back to the dental metaphor. If, in the last few minutes of programming, I just created a tiny inevitable mess by writing 20 lines of code to get a test to pass, and if that mess will inevitably ramify and compound if left uncorrected (as is usually true), then from the organization’s perspective, the cheapest time for the organization to pay me to clean up that mess is immediately - the moment after I created it. I have reduced future change costs by removing the decay. I have scrubbed my teeth, removing the little vermin that tend to eat, multiply, defecate, and die there (I never promised a pleasant return to the metaphor — teeth are, let’s face it, gross).

Again, if a day’s worth of programming, or a week’s worth of programming, caused uncorrected, unrefactored messes to accumulate, the same logic is imposed upon us by the cost of decay curve. The sooner we deal with the messes, the lower the cost of that cleaning effort. It’s really no different than any other “pay a bit now or pay a lot later” practice from our work lives or personal lives. We really ought to scrub our teeth.

Little software messes really are as inevitable as morning breath, from a programmer’s perspective. And nearly all little software messes do ramify, compound, and grow out of control, as the system continues to grow and change. Our need to clean up the mess never vanishes - it just grows larger and larger the longer we put it off, continuously slowing us down and costing us money. But before we talk about how these little messes grow huge, helping to give that cost of decay curve it’s dramatic shape, let’s talk about the worst-case scenario: the BigBallOfMud, and the Complete System Rewrite.

Worst-Case Scenario: The BigBallOfMud, and the Complete Rewrite

Most veteran programmers, whether working in procedural or object oriented languages, have encountered the so-called BigBallOfMud pattern. The characteristics of this pattern are what make the worst legacy code so difficult or impossible to work with. These are codebases in which decay has made the cost of any change very expensive. At one shop at which I once consulted, morale was very low. Everybody seemed to be in the debugger all the time, wrestling with the local legacy BigBallOfMud. When I asked one of them how low morale had sunk, he said something like “You would need to dig a trench to find it.”

With a bad enough BigBallOfMud, the cost of the decay can be so high that the cost of adding the next handful of features is roughly the same as the cost of rewriting the system from scratch. This is a dreadfully expensive and dangerous outcome for any codebase that still retains significant business value. Total system rewrites often blow budgets, teams and careers - unplanned-for resources must be found somewhere for such huge and risky efforts. Below we revisit the cost of decay curve, adding in a blue line showing how we strive to increase our development capacity from release to release. At best, we can achieve this growth linearly, not exponentially.

BigBallOfMud! Busted!.

At the point where the two lines cross, we have our BigBallOfMud. We are out of luck for this particular system - it is no longer possible to add enough resources to maintain or extend it, nor shall it ever be again. Indeed, the cost of decay, and the cost of making any sort of change, can only continue to increase from there, until it becomes essentially infinite - change cannot be made safely or quickly enough at all.

We are then faced with a total system rewrite, because we have lost all of our refactoring opportunity, along with our ability to make any other forms of change. How many expensive, perilous total system rewrites have you seen or taken part in, in your career? How many “legacy codebases” do you know of that just could not be maintained any longer, and which had to be replaced, at great expense, by a rewrite, perhaps in a new technology or with a new approach, perhaps by a completely new team? I have personally seen several over the years. They have not all gone well.

Read more

Next Page »