CodeRetreat #2 Outdoes #1, by a lot
CodeRetreat #1 set a pretty high standard for fun, learning, engagement, connection, and community building. CodeRetreat#2 at LeanDog blew that standard away.
More people showed up. More passion and creativity and humor showed up. Corey Haines and J.B. Rainsberger showed up again (and J.B. brought his wife Sarah), and Cheezy showed up. Chris Judd and James Shingler (the Groovy/Grails guys) showed up. Jon Stahl and his wife Deb prepared awesome food all day. The sun and the seagulls were out; the boat was a lovely, expansive work/play space. Lots of room to extemporize, come together, break apart again, project our mob programming experiments.
We had our schtick down better. We did Java Conway’s Game of Life kata/turtles all the way down. We ping-pong-paired more and more throughout the day. The retrospectives were long, and rich with feedback, humor, and (in the afternoon) micro-brews.
J.B. and Corey performed a Game of Life test-drive that was thought-provoking, hilarious, and quite crowd-interactive. Jim and Chris followed with a rather crowd-hushing port of J.B and Corey’s work to Groovy. Sighs and gulps all around. Nothing like seeing the same test cases in Groovy and Java to re-sensitize you to Java’s nasty noisy gunk. I’d love for us to repeat that at CodeRetreat #3.
CodeRetreat #3 will likely be in Columbus, OH sometime in May, co-sponsored by Pillar and LeanDog. We’ll take care not to collide with the several other Columbus and Cleveland tech events in May. #4 will likely be in Philly or Montreal, perhaps as early as June; maybe July. Watch the ning site for details.
Thanks so much Jon and Deb and Cheezy for hosting us. Yesterday will be really hard to top, and we’ll have fun trying.
Summary: CodeRetreat #1 Smashing Success
Yay Team
Man, did we have fun at the first CodeRetreat in Ann Arbor, MI, sponsored by Pillar Technology. We had 25 or so programmers show up, including 5 dedicated software craftsmen who assisted me as Master Rabble Rousers for the event. So special thanks indeed to Ron Jeffries, Chet Hendrickson, Corey Haines, Bill Wake, and the guy who (of course) stole the show, J.B. Rainsberger. J.B. flew down from approximately the North Pole on his own dime. He had, like, seven connecting flights. He was wearing a full beard and long hair, because where he lives, if you don’t do that, you die.There was lots of pairing, lots of discussion, a very cool String templatizer Kata performance in Ruby by Corey Haines. There were beers and brainstorming and stupid geek tricks at Bar Louie’s afterwards till late. I’ve made a couple of animotos of the event here and here.
Learning Outcomes (as they say)
So, we learned that we want to do this on a regular basis: several times per year, per region that adopts it. That’s our immediate goal.We learned not to use nasty little legacy Java applets as starting points. (What is it with me and nasty little legacy Java applets? Bad old Java karma, I expect.) We learned to spend each CodeRetreat event on a single language. Splitting the first one into Java and Ruby involved too much thrash and context switching, and did not focus us all on craft as much as we had wanted. Too many were not fully engaged in honing craft, enough of the time. I felt their pain. The next CodeRetreat event, at least, will be entirely in Java.
Kata All the Way Down
We also learned that we want to concentrate the next few CodeRetreats on repeating the same exercise again and again, Kata-style. We’ll likely adopt Conway’s Game of Life as our standard exercise. We can adopt more if we choose to later. Let’s see how well this one serves us as a craft-focused, learning sandbox. We can still mix up the session design around the way we do that Kata. But we agreed we’ll dive deeply for awhile together into that problem domain and potential implementations. We’ll get that one completely “under the fingers.”
CodeRetreat #2: Saturday, March 14, 2009: LeanDog, Cleveland
Just got the go-ahead from my pal Jon Stahl that his consultancy, LeanDog Software will be hosting our next CodeRetreat on his cool headquarters-boat on Lake Erie, some Saturday (TBD) in March of 09. Again: this one will be wall-to-wall Game of Life in Java. Other details will be emergent. Our primary texts, where craft is concerned, are Bob Martin’s Clean Code, Fowler’s Refactoring, and Lasse Koskela’s Test Driven. We might switch them up a bit, but we’ll try to stick to three reference works at a time, for a good long while.
Stay Tuned
If you want to start a CodeRetreat in your region (we are concentrating for now on MI and OH), please contact me, and I will help you set it up if I can. I have had interest in the Philly suburbs in hosting one at Ternary Software, for example.If you want to attend another of our MI/OH CodeRetreats, please watch the ning site. Well, watch the site anyway. I think CodeRetreat is a meme that can spread. If not, it will not be for lack of stubbornness on my part.
CodeRetreat #1
Corey Haines, Nayan Hajratwala, a couple other coders, and I, catching up with each other at CodeMash last week, discovered that this single idea had been percolating independently for awhile in our separate heads. The idea is this: you get some programmers together for one or more days, and with bare-bones structure, a few Kata and exercises, you have people code for most of the time, then talk for just a little time.
Let the Mechanics Evolve
We might intersperse coding and discussion with just a couple of Kata performances. We might have a bit of competition; we might ask teams to exchange codebases and whine about what they dislike in the code they inherit. We might have programmers disagree loudly about this or that practice. But a bunch of group-learning would occur, and it would all center around craft. So no PowerPoint or Keynote decks. No pitches of this technology or framework or that one. Leave your Masters degree or your alpha-geek attitude at the door. We are all peers at CodeRetreat. We just test-drive and refactor the best code we all can, changing up pairs in some sensible fashion, discussing design decisions as we go. Some of it we do as a mob; some of it we do in teams; some of it is performed by one programmer; some is performed by a pair. At the end of the day, we all go off and have beers and keep debating, with or without our laptops.
First One: 1/24/09, Ann Arbor, MI
Well. Now we have a new little social network on-line that describes CodeRetreat, and Nayan and Corey and I have actually pulled the first of these events together: it will be Saturday, January 24th, 2009, at the Ann Arbor, MI Spark center. The CodeRetreat site provides all the details. And we hope to replicate this pattern, publishing our results as we go. There are so many questions this might ask and eventually answer. Might ongoing 1-day regional CodeRetreats become a way for programmers to hone their craft together on a regular basis? Might itinerant, genuine Journeymen like Corey spread this seed to other CodeRetreats in other regions? What will eventually emerge as the best format — most engaging, fun, thought-provoking? What sorts of Kata and exercises lend themselves to this kind of learning best? Might this be a better way for programmers to learn in general, on a steady basis, than we tend to have right now? Who knows. Let’s find out.
Join Us if You Can
Off we go to the first CodeRetreat, between 15 and 30 of us, to code and debate code together. If you live anywhere near Southeast Michigan, and have a Saturday free on 1/24, please consider signing up for the event, and joining us.
The Metric I Want for Christmas
Enterprise Software Blight
I get hired to help teams learn agile software development practices. Most of the practices in my tool bag — not all, but most — come from experience, books, articles, blogs, conferences that focus mainly on greenfield development. And as an agile consultant pal of mine, Mike Hill, says, “First step when you are digging a hole: Stop Digging!” Turning around how we launch greenfield projects, and the standards of craft, quality, feedback, accountability, and ROI we establish for them — Hey, that’s obviously all good. Most teams, most enterprise are, in fact, still digging everytime they launch a new project. Still making bad enterprise situations much, much worse with more stinky code.
But they are making things worse in more ways than my favorite tools reveal. And perhaps I have been, we have all been, focusing on the wrong kinds of damage. That’s what I want to explore here.
We have spent a number of years trying to help enterprises learn how to, at least, stop digging holes in the object model, in the architecture, in how the team works.
The thing is, these tools in our toolbags really do work best in greenfield situations. Meanwhile greenfield opportunities seem to be slowly drying up. Over the last 10+ years, the software best practices community has been acquiring agile experts, expertise, books, conferences, entire processes, that I think are slowly turning around greenfield project standards. If you work on a project where the issue is how to get everyone up to speed on iterating, velocity, OO, TDD, CI, build and deployment automation, and simple design on a new project, well, good for you. Count yourself extremely lucky. It’s still darned hard to do, but it can be done. It is deeply gratifying work, given enough skill, knowledge, courage, discipline, and management advocacy.
But as arose as a topic at Agile 2008, and has been arising for me with clients a lot, most developers in the industry can work for years without the opportunity to start from scratch. For most of our careers, we are basically hamstrung by the legacy code issues that keep so many software development professionals living in worlds of constant emergency, constant production defect repair, very slow progress.
Worse than this, our legacy code is accumulating faster than we can cope with it. Our release schedules and iteration schedules are more pressing, while we are increasingly dwarfed by these enormous, stinky, towering piles of crap. We really, really need a way out of this situation — and not just for one team, but for the entire enterprise. And not just in the object model, and not just in the architecture.
Legacy Complexity: It’s All Over the Place
When we do start talking about legacy codebase repair, we often start talking about how to get part of the object model under test. How to start repairing the Java, or the C++, or the C#, or whatever. As far as this goes, this too is 100% goodness. We certainly need characterization tests, opportunistic refactoring in high-traffic, high-business-value neighborhoods of the code. Again, all goodness.
But I suggest that that too might be the wrong thing for us to start with, or at least the wrong thing for us to focus most of our consulting energy on. I suggest that without a better measure of overall complexity from the top to the bottom and from back to front of the enterprise, we don’t really know the best place to start.
I have seen more and more teams engaging in agile software development followed immediately by waterfall integration and deployment.
The more I work at this, the more convinced I am that the legacy complexity that is hurting us all the most is all of this contextual enterprise complexity. Our biggest problem, and biggest potential point of leverage, is the massive legacy bureaucracy that makes inter-system integration, promotion between environments, environment configuration, version control, configuration management, and production deployment such stupendously, horrific nightmares, release after release.
“Total Enterprise Software Complexity”
The main problem is not within the individual systems (as crappy as most of them are, and as tempting it is for us to start diving in and refactoring and test-protecting them). The main problem, as far as I can tell, is between all of these systems. I don’t care how many million lines of stinky legacy untested Java you have. I bet dollars to donuts most of your worst problems are actually between those piles of Java.
I read somewhere a great little discussion (I forget where) about how cyclomatic complexity, for OO code, captures or covers most of what is healthy or unhealthy about a codebase. All the other kinds of dysfunction you would likely find in stinky OO code, and might measure separately, can be covered by cyclomatic complexity. As readers of mine know, I would amend that only slightly, using Crap4J for example, to measure how well test protected the most cyclomatically complex code is. Anyway, the point is that if you are smart, you end up with a single number. Cool. I love single numbers.
So I want a new kind of number. For a given enterprise, before I start determining where to focus my consulting, the metric I want for Christmas would be a single number that blends or relates to at least the following objective and subjective categories of enterprise mess:
- How many total development teams do we have?
- How many total developers do we have?
- How many total systems do we have that are interacting with each other?
- How many distinct technology stacks do we have in play (e.g., .Net, J2EE, SOA, AS400, Tibco, Rails, etc)?
- How many distinct frameworks do we have in play (e.g., Struts, Spring, Hibernate, Toplink, Corba, EJB)?
- How many total languages are we using (including SQL, Perl, shell scripts, Groovy, Ruby, XML, etc)?
- How automated is the process of deploying from a dev machine or CI machine to a QA target? From a QA target to Production?
- How many total lines of XML are there in play in the enterprise? How many total lines of build-related properties files? XML is so nasty it really does deserve its own measure. XML is a powerful carcinogenic force in organic enterprise systems.
- What is the average ratio of the lines of code in each system’s build scripts to the total lines of code in the system itself? (Feel free to substitute a better measure of build complexity here)?
- How much automated end-to-end functional test coverage do we have? Granted (as I advocate elsewhere) you don’t want to lean forever on huge suites of automated functional tests. But as we start healing a quagmire, how many have we got?
- And yes, what is the complexity of the average object model? How bad off is the average Java or C# or C++ project? A Crap4J number is great here.
So. For the while frigging enterprise, I want a metric, on a scale from zero (ideal) to 1000 (doomed) that describes how much of this mess is interfering with everyone’s ability to make production deadlines, much less transition to a continuously improving, manageable, agile approach.
And I want to be able to tailor a consulting approach — somehow — for an enterprise with a “Total Enterprise Software Complexity” score of 200 very differently than I would for an enterprise with a score of 750.
That’s all I want for Christmas this year. Is that what you want too? Let’s talk about it.
Caveat Lector: This really is an early draft. I throw it out for feedback. If smart enough people review it, I’ll likely be able to refine it greatly. That is my hope. So smart people, please comment and email me.
Flipping the Automated Testing Triangle: the Upshot
This afternoon at Agile 2008 in Toronto, which has been a smashing good time, I led (with lots of help from Lisa Crispin, and then serendipitously from Brian Marick, J.B. Rainsberger, Dave LeBlanc, Matt VanVleet, and Declan Whelan) a presentation/workshop/micro-Open-Space session. I began with a premise from Mike Cohn: the idea of an ideal testing pyramid, or triangle, with three tiers:
What makes the triangle ideal is that you are spending most of your resources, and relying most, on the bottom-most tier of xUnit tests to do good things like protect you from defects and drive your design. That suite does the lion’s share of your regression protection, for example. And conversely, you are relying least, and expending least resources, on very slow, brittle, large, black-box, through-the-GUI, recorded kinds of tests as created by tools like Selenium IDE.
My notion is that this minimizes the Total Cost of Ownership (TCO) of your automated tests. Eventually.
It’s a long (3-hour!) conversation, but the gist, as presented, is that nearly no teams get to start with so well-formed a triangle, and in fact the triangle is most typically inverted for teams just starting down the agile path:
Given such a challenging starting point, you need three automated testing initiatives, one for each of the three kinds of tests (which I refer to metaphorically as brick, stick, and straw), to ensure that you end up with the triangle above. And this is hard, in large part because it is so hard to learn to produce really, really effective suites of unit/isolation/programmer/micro tests (as defined by folks like Michael Feathers, J.B. Rainsberger, Mike Hill, Bob Martin, etc). And the triangle-flipping is hard also because we are talking about culture change, and resistance to it. Well, of course, there are lots of other reasons why it’s hard. It’s hard.
In the presentation, we explored, together, lots of subcategories of this automated-testing-triangle-flipping challenge. To my mind, there were lots of aha moments and unexpected conclusions and good ideas.
So, I’ll blog more on the details later. In the meantime, as promised to several people, here is a link to a PDF version of the presentation. I have had trouble uploading the Keynote version, and so abandoned that. PDF is likely best in any case.
Thanks again to all who attended, and all who helped me prepare and present. It was a blast.
My next goal for this material is to extend it into an experiential, inspirational, 2-day training course that allows cross-disciplinary teams to experience the nasty cruddy inverted testing triangle on day one, and the lovely, right-side-up triangle on day two. More later. Now, at 2:00 AM, off to bed with this agilista.
The “E” Word, Part 3: Uncle Bob’s Clean Code Book
Learning Happens
No matter what happens, learning happens. I say this a lot. As many of you know, I joined the agile community way behind the OO-developer-competence 8-ball years ago. I had done lots of procedural programming in the small, but I had had poor instruction overall, and I was not learning OO well on my own. A learning-style thing, I suspect. I had done a fair bit of low-craft Java code, under the loose instruction of folk who waved Bruce Eckels books at me and said “Polymorphism! Prefer composition to inheritance! Encapsulation!”. I tried hard, but I was incapable so far, apparently, of really thinking in OO. To be fair, I had also been doing lots of other things: writing, managing, UI design, marketing, sales, training, QA, starting new businesses, rabble rousing.
And then, out of the blue, I had my agile epiphany, discovered the agile community of thoughtleaders and fanatics, and I sought out masters of agile programming craft. Oddly (for then), I began to learn OO through the lense of TDD and Refactoring. I discovered, of course, that the agile masters and thoughtleaders had long been masters of OO craft, project automation, programming craft generally. They also tended to be continuous learning masters, continuous improvement masters. They were also passionate about humanity, creativity, and fun in the workplace. They loved to talk and read and write. They were masters of software extensibility (the “E” word of this blog thread). They liked beer. True Masters, in other words. Folks who lived life fully and passionately, whether or not software was involved.
It kept seeming like a good fit for me, and it still does. I will be learning from these folks in my 80s, no doubt. And I just learned a bunch from a handful of them again this past couple of weeks.
Uncle Bob’s New Book
Uncle Bob Martin has a book coming out on Clean Code. It is a condensed version of the strictly code-related principles, guidelines, and standards that are covered in more detail in Agile Software Development: Principles, Patterns, and Practices. I am fortunate to have an advance copy of this book, since I and my odd little TicTacPente codebase are also fortunate to be part of the Agile 2008 Clean Code Clinic.
And I love this little book. Uncle Bob and his work need no additional praise, but I am going to praise them anyway, because you, as a programmer, need this book if you are serious about programming craft, and especially if you are serious about helping others with their programing craft.
Bob and his other Object Mentors (and to some extent, Kent Beck) together produced this condensed, unabashedly opinionated style guide in such a way that it succinctly covers all of the critical aspects of what it means to write clean code: naming, function size and responsibility, function parameter lists, function side effects, Command-Query separation, formatting, comments, exception handling, class structure, class APIs, emergent design, Simple Design, etc. It is a true style guide, not in the sense of coding style (formatting alone), but in the literary sense of guiding our semantic and syntactic decisions. It also refers back to former programming style guides and guides on programming craft, going back 30 years.
Each topic is covered cogently, with great little code samples, and with characteristic Uncle Bob passion and irreverent verve. I laugh out loud with agreement again and again. My guess is you will too.
Hmm. Time for Some Refactoring
So I am reading this awesome book, by this Bob guy, this agile thoughtleader who helped personally introduce me to XP years ago, who will be managing the Clean Code Clinic of which I will shortly be a part. And I realize, you know, my code is close to complying completely with the book’s guidelines.
Close. But you know? Not quite there.
So, with the book electronically in hand, I begin making refactoring passes through my codebase. Some modules are fine as is. Some, not so much. Hmmm.
Ben Franklin (apparently) said “Do not fear mistakes. You will know failure. Continue to reach out.” So I continue to reach out and refactor my little codebase. And I start to realize. Man! I am really improving some of these modules! The resulting 3-line, 5-line, and occasional 8-line methods are way better than their 10-line and 15-line parents. I watch myself improving my class API’s, hiding more implementation details, and replacing dumb getters here and there with more object-appropriate, intention-revealing, behavior-revealing methods. Cool. As I refactor, I am periodically refreshing the current version of the TicTacPente codebase on this site.
Mea Culpa: Permission to Learn, Sir?
So Man! is learning happening as I read, refactor, read, refactor. I have a few rituals I like to do around learning: I like to give myself permission to do it, which means that I must first admit that I don’t know everything already. I like to grant myself the time and safety to undertake the learning, partly for myself, and partly so that others may learn better. (I love to learn partly because I love to teach.) This requires taking a leap of faith that my production schedules and commitments will not fall to pieces if I take the time to learn. And finally I like to celebrate learning and its results.
I do these things because in my professional life, I have been surrounded by insecure know-it-alls, arrogant code cowboys, shy retiring cubicle-hermits, and every other kind of person who could not stand up and boldly say, “You know what? I don’t know the answer to that one. But I would sure like to know!” And in recent years, I have become the guy who, no matter what the circumstances, no matter what I am being paid, will loudly proclaim when I do not know something. Otherwise, I am not asking for, not giving myself, permission to keep learning, and I am also not modeling permission to learn to others. And I love to keep learning so, so much. And I love when others also do.
You are Using a Language to Make a Language
Here is my favorite lesson from this book. OO programming, at the highest level of craft, is about creating a rich DSL (Domain Specific Language). I am not simply implementing logic, or stringing together extensible algorithms and data structures. The packages, interfaces, classes, methods, variables, algorithms — even white-space — all form a language that is extensible according to how understandable it is. I am creating a DSL that reveals, as it implements, a way in which the problem would really like to be solved. A natural and terse way in which the problem can be solved.
I don’t want my DSL to turn out like English, which is really more of a collision than a blend. And I certainly don’t want it to turn out like Java, although I could do worse.
When I focus on the quality of the language I am creating, all of the other guidelines in the book appear what they really are, means to an end, as opposed to ends in themselves. I really like that. It reminds me of Yoga. Yoga, despite its many Westernized versions, is not an end, not something to get good at as sport, or status, or distraction. Yoga is a means, and the ends include health, equanimity, gratitude, affection, peace of mind, humor. Uncle Bob’s guidelines and principles are means toward an end of a requisite little DSL for the problem at hand, whatever that problem happens to be. When I focus on that end, then the means seem to fall more naturally into mind and under my fingers. What a beautiful way to guide a refactoring session. Learning happens. You see why I love these folks? How many books and communities can you learn that kind of insight from?
Clean Code: Upshot
Buy Uncle Bob’s Book. It’s a quick, funny, useful, compelling read, unlike most of the books on agile craft, OO craft, and programming craft. I am organizing most of my software conversations these days around this book. Indeed, so is much of the industry.
And you need to give yourself permission to take the time to read it, to take the time to try its lessons out on some of your code, and to take the time to celebrate how cool it is to have code that clean. This is how we all make progress toward the level of craft at which we always create code this clean. And this is important, because the world contains very, very little code this clean, expressed as a percentage. I have no idea what the actual percentage would be, so I am going to make one up: way less than 1% of the code in the world is this clean. And this matters, because the other 99% is much, much more expensive to maintain and Extend (there’s the E-Word).
Finally, consider coming to Agile 2008 and the Clean Code Clinic, where we are going to use pain and epiphany, courage, outrage, and truth to learn about the real differences between really ugly code and (in my case) at least somewhat clean code.
In the next post, I swear, I’ll return to the TicTacPente problem domain, and what I currently know about 10 x 10, first-to-5-wins Tic Tac Toe.
I keep getting side-tracked by cool, related things. If I am anything about blogging, I am emergent. So sue me.
Great xUnit Test Suites: the Pre-TDD Conversation
A Burning Issue
We interrupt the current blog thread (the “E” word series) to bring you a burning issue. Well, burning for me, anyway.
I have been working with some other Pillar programmers on systems for helping not-yet-agile programmers learn some best practices. And while many of us in the industry are accustomed to coaching, mentoring, training, and otherwise cajoling people to attempt TDD specifically as a practice, I recently have begun to suspect that in fact, that’s a poor place to start the conversation.
TDD is all about how you get a good design, and good tests as specifications, and most critically to my mind, a great xUnit test suite for its regression protection. But what is a great xUnit test suite? What does that look like?
I have been finding (but not grokking until recently) that before I can have a TDD conversation with anyone, I really have to have a good conversation about the characteristics and value of a great xUnit suite.
Characteristics of a Great xUnit Test Suite
So when I come across a fresh codebase (I mean fresh to me — it might actually be quite rotten), these are the things I want to see in the xUnit tests. In future posts, I can give these more discussion, and perhaps include code snippets, but for today, it’s just a list:
- Code coverage is no lower than 85%. (Note: As important as code coverage is — especially for teams new to xUnit best practices — it can be a dangerous narcotic. It can hide bigger problems. It is possible to have a test suite that provides 100% coverage that is about 100% crappy. People do things like comment out all assertions except assertNotNull(blah), and make other poor choices when under pressure to (A) keep the coverage rates up, and (B) get the features out the door.)
- As much of the testing as possible is accomplished by “isolation tests”; small unit tests that run entirely in memory, with no dependencies on file systems, networks, databases, or other external resources. This is Mike Feathers’ definition of a unit test. This level of isolation (and the execution speed that goes with it) in turn depend on proper use of static and dynamic mocks. That in turn depends on dependency injection, which in turn depends on people knowing enough OO to code to interfaces.
- Speaking of execution speed: isolation test suites should average no more than 0.5 seconds per test, on a crappy machine. If everything really is in memory, it’s pretty common to get speeds of more like 100 isolation tests per second.
- The suite also includes end-to-end tests, “collaboration tests,” and other tests that are more real-world than isolation tests, include less or no mocking, and take longer to set up and run. These tests do talk to real databases, real networks, and perhaps completely external systems through various APIs.
- The isolation tests and non-isolation tests are separate from each other (separate source folders, to my mind), so that they can easily be run separately by developers, and by a CI server. As projects grow, the speed of their non-isolation suites slows. Because we don’t want to discourage programmers from running isolation test suites frequently, we want to keep the isolation test execution speed fast. We also want to keep the build nice and fast. So we want to be able to run slower non-isolation suites separately, and perhaps less frequently. So if the slow tests run slowly enough, we may not make them part of each CI build, but instead run them every few hours, or overnight, in a separate CI target.
- Each test method involves only one cycle of Arrange/Act/Assert (setup and instantiation, getting to the testable state, and verifying that state).
- Each isolation test method isolates a thin slice of system behavior. One industry term for this (proposed by Industrial Logic) is “micro-tests.”
- Average length of test methods is under 20 lines, ideally fewer than 10 lines.
- Test methods and TestCase classes are written and organized in terms of system behavior, not system structure. Related to this: all the test methods in a TestCase use the code in the setUp() method in that class, with as little addition test-specific setup as possible. All of the “Arrange” part of “Arrange/Act/Assert” really should be handled in the setUp() method, whenever possible.
- TestCases systematically cover unhappy paths: exception cases, edge cases and boundary conditions, etc. Mocks/fakes are used to simulate failure of external dependent resources.
- TestCase object trees make effective use of base TestCase classes, and make good use of reusable, private or protected helper methods (a sort of local testing DSL). Or, as Ryan points out in the comment below, the TestCases all use a separate object tree that holds a well-thought-out, rich little local testing DSL, completely decoupled from the test code. The more of that DSL pattern you need, as Ryan might say, the less you want to use inheritance, and the more you want to use composition.
- Test suites manage test data centrally (the repository of canonical test data might be a static class full of constants, or an in-memory database, or whatever). TestCases and test methods avoid primitive type literals wherever possible, and likewise avoid duplicate local variables and constants.
- Test suites, TestCase classes, and test methods contain as little duplicate code as possible. This includes small details like recurring complex assertion patterns that can be extracted, repeating the name of the TestCase in a test method name, etc.
- TestCase classes and Test methods have intention-revealing names, and use a consistent naming convention.
- Test suites are designed to be as resistant as possible to production code design changes. They are robust, not brittle.
- Test suites test the hard and harder things: xml configuration files, servlets, Swing GUIs, Jsp files,etc.
I’ve gathered up this first-draft list of characteristics from multiple sources — books, others’ experience, and my own experience. I’m sure I’m missing a few things in there — I’ll add and prune according to my future thinking and your comments.
Paint the Fence; Sand the Floor
Before people can talk to me with authority about the value of TDD, they need to talk with authority about the value of a great xUnit test suite. And before they can do that, they need to have (as my late mother would have said) suffered enough. They need to have suffered at the hands of codebases without great xUnit suites. They also need to have had their bacon saved by great xUnit suites.
So before we get to the TDD conversation, I increasingly want to encourage programmers new to xUnit testing practices to shoot for an xUnit test suite with the above characteristics. I don’t especially care, at first, how or why they paint the fence (from The Karate Kid), as long as they do it. I would in fact prefer that life and code provide them with the painful, indelible lessons that go with good and bad xUnit test suites.
THEN, once they have felt how hard it is to get that great xUnit suite when they have to stop, go back, retrofit tests to existing code. And once they have felt how hard it is to debug an “Eager Test” (from Gerard Meszaros great book on refactoring xUnit tests). THEN we can talk about how, hey, you know, if that great xUnit test suite is your goal, then my experience has been that TDD gets me there better and faster.
Now we are painting the fence in a specific way.
But along the way, it’s all good.
The “E” Word; Part Two
The TicTacPente Eclipse Project
In my first post on this topic, I set the stage. I had a need for two implementations of the same problem domain: one ugly, one not. As promised, by the way, you can anonymously download the entire codebase discussed in this series of blog posts from a google code project here. It’s an Eclipse project, all zipped up.
The project includes a GameGUI.java applet that you can run, to play the game (right click on source/ view.applet.GameGUI.java, and pull down “Run As > Java Applet”). There is a first-draft README file that describes the whole shebang, and suggests some exercises to try. See what you think of it all.
LegacyGame.java
The legacy version of the TicTacToe game is in legacy/ legacyGame.LegacyGame.java. Now, take into consideration that this version reflects lots of little refactorings on my part, dating back to when I had characterization tests for this “class” (I’ve since removed all of those tests — I didn’t want students and job candidates subjected to these exercises to benefit from them). I renamed a lot of methods that started out with names like “c24occx()”, assigning placeholder-quality names that I thought my characterization tests were revealing to me, like tryToFindPositionGivingSeriesOf4OnTwoOrMoreAxes(). In some cases my educated guesses were accurate, and in some other cases, I later learned that I was far off.
I extracted a few small methods from other, larger, stranger ones, naming them as meaningfully as I could at the time. I extracted lots of constants. I renamed variables. I killed a lot of dead code and inscrutable comments. I managed to extract the Java applet code (woven into the gameplay code’s DNA) into its own class. I just couldn’t stand not doing that. (Clue: what do you notice about that applet code?)
But eventually, I just gave up working with it. After person-days of jUnit poking and prodding, this codebase remained quite opaque to me. I’ve inferred a lot of its algorithmic meat from its external gameplayer behavior. But I’m still baffled by much of it.
So this is our first measure of inextensibility in a codebase we discover: what Uncle Bob Martin calls opacity. One of the characteristics I wrote about here. As we glance through it, as we write tests for it, we struggle to understand it.
But it seems that every month or so these days, there are new tools to help us grasp what we are up against. I ran Crap4J against LegacyGame.java, and of course it pegs the tool’s little meter at the far right, at 36.84, as if pressed forcefully against that right-hand fence, searching for a measure of even more non-test-protected cyclomatic complexity. Average Crap4J score (blue triangle) is just under 5, BTW. As you can see, that little yellow triangle is trying to leave the ballpark:
So, I did determine how the legacy game manages its board state and game state, and got lots of peeks into how it determines which move to make next. Enough so that I was able to run the game from a test harness, one move at a time. This is the TestCase that pits the two games, old and new, against each other, some number of times. Currently that number is 200. You can find this code in manualTests/ manualTests. OldGameAgainstNewGameTests.java.
The source folder and package names contain the word “manual” because at first, I was printing out a representation of the board after each move taken by each game.I was examining System.out.println() output manually, to learn.
It took a bit for it to dawn on me: I was doing exploratory testing.
Old Game vs New Game: Exploratory Testing
So I started with lots of high hopes, deep fears, and ignorance about my prospects of test-driving a decent version of this problem domain. My goal was for my game, if it took the first move, to beat the old game or play it to a draw most of the time. (As it turns out, I did much better than that. After my second run at this code, I ended up with a game that beats the old LegacyGame about 50% of the time, and beats it to a draw about 40% of the time. When I go first, the old game wins no more than 7% of the time.)
In my first test-driven version, my first few defensive algorithms were, in addition to being completely ineffectual against the old game strategically and tactically, pretty badly conceived. My object model was in parts over-engineered, and in other parts procedural, sloppy, and under-engineered. I paired with my good friend Dave LeBlanc on it for an hour, and he made several forthright observations about what I had done well and what I had done poorly. My design had some real flaws. I had pretty good test coverage, but nothing like what I wanted. For the next few days I pushed this first codebase version as far as I could, and got it to the point where it edged out the old game if it went first, on average. It performed OK.
But I was deeply disappointed at the results. I knew I had to rewrite it. I can get an A+ in any course I’ve already taken, if I take it again enough times. Dave had encouraged me with suggested new design approaches. I wiped the slate clean. I started over with an empty Game class, and a much better sense of which strategic and tactical behaviors I wanted to test drive in what order.
That’s when I started turning my attention more rigorously to the move-by-move board printouts I was logging in my manual game-against-game test harness. I started combing through each loss I suffered as I test-drove my second version, looking at the strategic setup patterns, while I looked for a cleaner way to represent the basic defensive and offensive patterns. I watched carefully as the old game, ugly or not, set itself up cleverly to defeat me a couple of moves into a new game.
And as all kinds of interesting patterns emerged from this manual exploratory testing, I began to understand the problem domain much more deeply. And this, or course, made Simple Design easier, and refactoring easier, and test-driving easier. I noted specific patterns, wrapped test data and failing tests around them, and produced new behaviors of my own that played the game better.
Then suddenly one day, after I had finished one particular bit of strategy involving collecting all possible moves, ranked by tactical priority, and looking for any of the highest priority moves that also matched lower-priority moves (blocking the other player’s new series while simultaneously extending a series of our own, for example), I saw a huge new jump in my game’s performance. I added another bit of logic around responding to the other player’s first move, then another around making the first move on one of the center-most 4 squares on the board. With each of these well-thought-out bits of new behavior, I saw big jumps in my game’s performance against the old one. Meanwhile, there was not that much total strategic and tactical logic, and I was simplifying and consolidating as I went. I had a reasonably clean, reasonably well test-protected codebase that was kicking the other game’s keister. It was rewarding.
In my next post on this thread, I’ll dive a bit into this fun little problem domain: 10 x 10 TicTacToe where first player to 5 wins. I’ll talk also about how close this is to a game called Pente, which leads to at least one interesting “requirements change” we can use to measure the relative extensibility of the legacy game vs. the new game. You can read about it yourself, if you like, in the README file.
In the meantime, please feel free to download, unzip, import, and play around with the codebase. I make no apologies for the ugly graphics, BTW. If you can make me better ones, please feel free. I am a PhotoShop dunce.
Oh, one last note. I have challenged my good friend Dima, who is an algorithmic genius, to test-drive an entirely new version of the game, pitting it against the old game using my test harness design. He has accepted, so we shall all see some beautiful code indeed. And I challenge you too, reader, if you like these kinds of challenges, to test-drive your own version of the game in the same way. How much better a design than mine can you produce? I much better test-protected can it be? I how much better can you make your game play strategy and algorithms? Along how many more “lines of extensibility” can your version of the game be open to extension? My version has flaws, some fairly obvious, some subtler. What are they?
No, this is the last note: Two other good friends of mine, Ryan Cooper and Dave LeBlanc (mentioned before) have also accepted the challenge. Ryan, in fact, improved my implementation by extracting the strategic logic in such a way that it is easier for him, you, or anyone to create new game strategy by implementing Ryan’s new IStrategy interface. Yay Ryan. All that is reflected in the version for download above.
Till next time, patient reader.
Teaching Programmers about the “E” Word
A Tale of Two Codebases
So, pardon the long interlude, readers. I have been up to my eyeballs in alligators. But all my parts are still connected, and I return from the software jungles with something I find interesting, and I hope you do too.
I have been working for months on a “toy” codebase to use for three main purposes: evaluating the technical skills of programming candidates at Pillar Technology, making baseline assessments about the technical skills of new hires and client programmers, and conducting classes on agile/OO programming practices. In this and coming blogs, I am going to share with you the codebase itself (Java/Eclipse project), and my experiences and insights while developing and using it. I am also going to solicit your input on how to improve it as a teaching/mentoring tool, and as a set of exercises for evaluating programmers.
With a bit of luck, I’ll be conducting a hands-on session using this codebase at Agile 2008.
But first, two things: the Problem Domain, the Technical Practice Scope, and the “E” Word. OK, so that’s three things. I really need coffee this morning.
The Problem Domain
The codebase is two completely separate implementations (Legacy and “non-Legacy” implementations) of a 10 x 10 TicTacToe game where the first player to 5 in a row in any direction wins. You play against the computer, and it typically kicks your patootie, whichever version you are playing against. (Well, it kicks mine.)
So I happened upon the Legacy version of this codebase more than a year ago, when looking to design this pedagogical tool, and determined it to be perfect, in a kind of sick, pathological way. Let me explain. The original codebase is a Java applet. It is, including all the applet code, a single, 1200+ line “class” with dozens of methods that looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | public int someWierdMethodName(int playerMark, int x, int type) { int j, k, l; int position = 0, position2 = 0; for (l = 0; l < 6; l++) { for (j = 0; j < SQUARES_PER_SIDE; j++) /* horiz & vert */ { resetAllMarksAlongAxesForFirstHalfOfBoard(); position = checkFor5AlongHorizAxis(playerMark, x, j, l, position); if (marksByAxisByPlayerForChecking[0] == 3 && marksByAxisByPlayerForChecking[1] == 2) { if (type == SETFLAGS_MODE) { tempTableForChecks[tempRowForChecks[0]] = OCCUPIED; tempTableForChecks[tempRowForChecks[1]] = OCCUPIED; } if (type == CLEAN_MODE) return tempRowForChecks[0]; } if (marksByAxisByPlayerForChecking[0] == 4 && marksByAxisByPlayerForChecking[1] == 1 && type == CHECK_MODE) return position; position = checkFor5AlongVertAxis(playerMark, x, j, l, position); if (marksByAxisByPlayerForChecking[2] == 3 && marksByAxisByPlayerForChecking[3] == 2) { if (type == SETFLAGS_MODE) { tempTableForChecks[tempRowForChecks[0]] = OCCUPIED; tempTableForChecks[tempRowForChecks[1]] = OCCUPIED; } if (type == CLEAN_MODE) return tempRowForChecks[0]; } if (marksByAxisByPlayerForChecking[2] == 4 && marksByAxisByPlayerForChecking[3] == 1 && type == CHECK_MODE) return position; } for (j = 0; j < 6; j++) { resetAllMarksAlongAxesForFirstHalfOfBoard(); for (k = 0; k < 5; k++) { position = checkFor5AlongDiagDownRightAxis(playerMark, x, j, k, l, position); position2 = checkFor5AlongDiagUpRightAxis(playerMark, x, j, k, l, position2); } if (marksByAxisByPlayerForChecking[0] == 3 && marksByAxisByPlayerForChecking[1] == 2) { if (type == SETFLAGS_MODE) { tempTableForChecks[tempRowForChecks[0]] = OCCUPIED; tempTableForChecks[tempRowForChecks[1]] = OCCUPIED; } if (type == CLEAN_MODE) return tempRowForChecks[0]; } if (marksByAxisByPlayerForChecking[0] == 4 && marksByAxisByPlayerForChecking[1] == 1 && type == CHECK_MODE) return position; if (marksByAxisByPlayerForChecking[2] == 3 && marksByAxisByPlayerForChecking[3] == 2) { if (type == SETFLAGS_MODE) { tempTableForChecks[tempRowForChecks[0]] = OCCUPIED; tempTableForChecks[tempRowForChecks[1]] = OCCUPIED; } if (type == CLEAN_MODE) return tempRowForChecks[0]; } if (marksByAxisByPlayerForChecking[2] == 4 && marksByAxisByPlayerForChecking[3] == 1 && type == CHECK_MODE) return position2; } } return (NONE); } |
1 |
Actually, that snippet includes method extractions and renames that I did. It was much worse before I got a hold of it and starting retrofitting Mike Feathers-style “characterization tests” and doing bits of opportunistic refactoring here and there.
So the whole thing has a fabulously high cyclomatic complexity. In other words, though this little TicTacToe applet is quite clever algorithmically at kicking your keister, it is supremely inextensible code, along at least several axes of extension. That was exactly what I needed. BWAHAHAHAHA.
Technical Practice Scope: the “E” Word
For clarity, the only technical practices I intended (and currently intend) to try to teach using this codebase all center around what I call the “E” word: extensibility. So they primarily include: xUnit testing practices, OO practices, Refactoring, Simple Design (or “Travel Light”), and TDD (test-driving).
The entire point of the codebase and its various uses is to make the point of the differences between extensible and inextensible code, and to measure and teach these practices that are most central to the extensibility of a codebase. The point is actually to give people an experiential sense of the sum of NOT using the best practices listed above, on the one hand, and using them, on the other. It’s a sharp-edged little A/B comparison exercise.
What I Did, Crazy Man that I Am
I began with this Legacy game code, and with the tentative steps of poking and prodding it with tests (more on all of that later). I played against it (and lost!) a lot. I began to learn the algorithmic problem domain (entirely despite the maddeningly bad design).
I then constructed a jUnit test harness that would enable me to play a new, test-driven version of the game against the old one, and measure how much of the time I won. And I began to test-drive my first version of this game.
I ran my head-to-head test a lot. It was depressing. Despite months of research and learning, I was still getting my rear kicked 98% of the time, or so.
And that is roughly where we shall pick up in the next blog post, when I’ll share with you a zip file of the entire Eclipse project, including the Legacy and Less-Legacy versions of the code.
Until then, fair readers.
Dynamic Languages, Blimps, TDD, Alpha Geeks, and “Compiler as Nanny”
Nannies for Blimps!
In the old days, the compiler was your nanny, because computer resources were expensive and delicate and huge, like a giant hydrogen blimp. You want to fly the blimp, baby, you better be really good at flight plans.
So static-typing was one of several ways to prevent us from blowing up the blimp at runtime. (This metaphor may not work for too many more paragraphs, but I am flying with it for now anyway.) Static-typing is really a sort of BDUF (Big Design Up Front) enforced at the language level. It imposes design straitjackets that only become plain once a dynamic language has removed them from you. (Wow! You can do THAT in this language? Really? Look at how much less code that is.)
The Nanny is Not Groovy, Man
Extending designs in Java is genuinely hindered by static typing. It’s no longer a political issue, it’s just plain fact. And I don’t just mean because it takes 400 characters to print “Hello World.” No, I mean the kinds of shenanigans imposed on every type you inherit or use or create. Think: generics, and how long it took for Java to get them, and what a pain in the arse they are. The Nanny really is everywhere in Java, to my mind.
Bruce Tate, Stuart Halloway, Justin Gehtland, and others have made passionate, convincing argument that (for example) convention-based Rails programming as a means of getting an enterprise web app “off the ground” (so to speak) may be faster than the lightest-weight J2EE frameworks by a factor of 10. Maybe more than one factor of 10. Sheesh, that Nanny is EXPENSIVE! How would your stakeholders like it if you could produce 10 times as many web applications to solve enterprise problems than your teams can today?
The catch, of course, is Ruby and its still relative dirth of supporting libraries, frameworks, and similar open-source support. And Ruby syntax looks wonky to us old Algol-based fuddy-duddies. This is why I am personally so attracted to Groovy and Grails. (Introduced to me by Andrew Glover of Stelligent and Chris Judd at CodeMash in Ohio last month.) All the convention-based goodness, plus leverage of my Spring and Hibernate experience, and I can still use the Java stuff that the mean Nanny pounded into my head over the years. (Just the good stuff. She’s not a completely mean, insane, dictator nanny, I can now see in retrospect.)
Look Nanny, No Unit Tests! (Uh Oh.)
So off we go to a dynamic language, full of our static-typing outrage. The catch, of course, is this: because you can send a “divideYourselfByZero” message or any odd message you like to the integer 42 in languages like Smalltalk, you can get runtime errors of the sort that would curl a Java programmer’s hair.
Does it suddenly make a bit of sense why the first member of the xUnit family was, in fact sUnit? My question for the Smalltalk crew is this: before you had sUnit, how the heck did any of you keep your jobs? The production deployment problems must have been Hindenburg-spectacular. (I’m partly kidding. Actually, it turns out, the really good Smalltalkers had other tricks up their sleeves to avoid runtime disaster.)
So the bottom line is this: with dynamic languages, You Have No Choice but to develop exhaustive, requisite suites of true unit-level isolation tests. Oh, plus end-to-end tests, and several other automated test varieties. This is the equivalent of venting out all the Hydrogen and replacing it with … Helium! Yay! Doesn’t explode, a bit more expensive, for sure, and not quite as “lifty,” but way safer, and you can still fly.
With dynamic languages, you have the authority and responsibility to be an adult, not a child. No more Nanny, so no more stepping out into the street before you look both ways.
Note to Recruiters: Hire the Guys Who Know Dynamic Languages, No Matter What They Charge
Power Programmers, Alpha Geeks, the ones who some now claim in public (reasonably, I believe) can outproduce “vocational programmers” by a factor of 10 or more (there is that same math, hmm)? It turns out that one of the primary indicators of one of those guys or gals is that they just cannot keep their hands off of lots of different languages, and operating systems, and computers, and you name it. Some of them actually play banjo! They are really good at comparing entire language systems and development systems to one another. They really like dynamic languages, because they are so much faster and cleaner and better. And they really, really like unit tests, because suites of them save their behinds so frequently.
Oh, BTW, you know what true alpha geeks are all on about these days? A good old idea come full circle: functional programming. Oh, and also, BDD. Whee, here we go!
Alpha Geek example: my alpha geek pal Dimitri says “dynamic languages are so much more expressive it’s not even funny.” He says that when he noticed that Ruby does not require generics, “I almost started crying.” He says “I think the whole thing can be summarized as: static typing breeds incidental complexity”. Great quote. And he correctly points out that in the emerging world of Domain-Specific Languages (DSLs), dynamic languages are absolutely vital.
So save the really big programmer salaries for the ones who (A) know unit testing backward and forward, including TDD, (B) know multiple languages, and program avocationally and recreationally, and (C) can spout endlessly about the benefits of once-seeming exotica like dynamic languages and functional programming. That, at least, would be my agile alpha geek definition. There are other kinds of alpha geeks, certainly. In my big enterprise app world, I need the agile ones.
Hire guys like Dimitri. Make sure your team has a ratio of at least 1 alpha geek to every 3 or 4 non alpha geeks. And be the kind of boss and organization that alpha geeks love to work for and with (yet another blog topic, for another time).
And be the kind of boss who enables non-alpha geeks to find their way to alpha, if they want it. Again, another blog for another time.


