Simple “Clean Code” Metrics for C-Level Execs
A recent Twitter thread I was involved in goes something like this. Someone claimed that software managers and executives should not have to care whether their developers are test-driving Clean Code. They should be able to presume that their developers are always courageous, disciplined, and skillful enough to produce that level of quality. Ultimately that quality turns into least Total Cost of Ownership (TCO) for any codebase asset.
Sigh. Well, would that that were true. To my mind, it’s like saying that all consumers should presume that all cars are built as well as Hondas. Sadly, they are not.
So, yes, of course, developers should take full ownership of the extent to which they create Clean Code. Developers should own their own levels of skill, discipline, knowledge, passion, and courage. Absolutely so. Developers should refuse to cave to pressure to hack junk out the door to meet deadlines. That’s not what I am debating. I am debating whether or not the industry has accountability systems and quality monitoring systems to ensure that developers are in fact doing all of that. My premise is that something like the opposite of that is going on.
If managers and developers are still being rewarded for hacking junk out the door, and executives and managers cannot and do not measure the TCO consequences, well then no wonder our culture of Software Craftsmanship is not spreading. We have a crappy incentive structure.
Are Hondas still better made than GM cars, all these years later, and despite quality gains on both sides? Of course. The car industry does, in fact, have accountability systems in place to measure asset quality, duty cycles, TCO. Too much money is at stake.
As a responsible car buyer, I inform myself with exactly the least info necessary and sufficient to determine whether I am about to buy a great car or a lemon. I have data to help me predict the outcome.
Managers and executives in software cannot expect that every codebase is as good as a Chevy, nor even a Yugo. Most enterprise software these days, 10+ years into an agile movement and Software Craftsmanship movement, is still gunk that continuously deteriorates.
And managers and executives cannot see that. They are buying lemon after lemon, not knowing any better.
We want managers of developers to insist on Clean Code, so we want them to be able to tell the difference between Clean and Mud, and to hire programmers who code Clean. And we want executives to hire managers like that. These inevitably will be executives who can distinguish between managers who know Clean Code and those who do not. I posit that these executives will in turn need to know how, at a very basic level, to distinguish between Clean and Mud. Only then can they preserve their asset value, and hire delegates who can.
Two Metrics You Can Use to Measure Code Asset Deterioration
At my current client, each team self-governs several kinds of objective and subjective Clean Code quality measures, including test coverage, cyclomatic complexity per module, average module size, coupling, etc. There are all kinds of details here around issues like automated deployment, test quality and semantics, etc. They don’t publish it all, they use most of it tactically, within the team boundaries, to hold themselves and each other accountable for continuous code improvement. The teams can and should own that stuff, and they do.
But you know what? Each of these teams is also publishing at least two metrics to other teams and straight up the management chain for their codebase assets: test coverage and cyclomatic complexity per method. The Continuous Integration plugins publish these metrics for all to see. And all any team is held accountable for is this: do not let these numbers slip between iterations. Anyone can see historical trend graphs for these numbers for any of the projects/codebases currently covered (there are several so far, and more each month).
Yes, these two measures are imperfect and can be gamed. Yes, test coverage is a one-way metric. But let’s presume for a moment that we are not hacking the coverage config to exclude huge whacks of yucky code, and we have good-faith participation on developers’ part. If average complexity per method goes from 4 to 6 over a two-week iteration, and if test coverage slips from 80% to 60%, does that not often mean that the codebase, as an asset, probably deteriorated? My experience has been that it does. As an owner of such an asset for which numbers had slipped like that, would you not care, and would you not want some answers? I would, and I counsel others to care and dig in. I hereby counsel you, if you own such assets, to care if those two numbers are slipping from week to week. If they are, I bet you dollars to donuts your software asset is getting toxic.
A Culture of Accountability
So at this client, if those two metrics slip, teams hold each other accountable, and execs are learning to hold dev team managers accountable. Why not? Every car buyer understands MPG these days. Why not educate executives a little bit about how to monitor their code asset health?
Could there be a better 2 or 3 metrics to publish upstairs? You guys are smart; you tell me. So far, these 2 are working pretty well for me. The published metrics are not sufficient to protect against asset deterioration, but so far they sure seem necessary.
So guess how this is turning out? We are growing a top-to-bottom, side-to-side culture of Clean Code accountability in what once a garden-variety, badly-outsource-eviscerated, procedural-hacking sort of culture. Partly by hiring lots of Been-There-Done-That agile coders, and partly with these metrics. Suddenly, managers who were only measuring cost per coding-hour (and slashing FTE jobs to get to LOW $/hour numbers) are measuring more meaningful things. Could we do better? Doubtless. Stop by, see what we are doing, and help improve it.
What metrics would you publish up the reporting chain, and between teams? How would you help executives detect when their code assets are slipping into that horrible BigBallofMud state?
Speak up, all you shy people.
Here is an example of some measurements I gave my senior management that convinced him to fund cleaning up of old code.
Other metrics that I have used to show the cost of buggy code are
1) Ask all developers to record roughly how much time they spend working on old bugs each week. Total up this data after a month to get an estimate of the fraction of developer wages being spent on old bugs.
2)a)Find out the cost to your company of bugs found at each stage of development. b) Go look in your bug tracker and source code manager (I assume these are cross-referenced to each other) and pull a decent sample of bugs found at each stage of development (say 30 for each stage). c) Find when each bug was introduced (see link above if you don’t know how to do this). d) Compare the cost of fixing the bugs as they were introduced to when they were actually fixed using a). This gives an estimate of the TCO of leaving bugs in code.
So, how do you keep these metrics, and so forth from becoming another form of administrivia? I’ve been thinking about the same things lately and I keep coming back to creating more ‘wasteful’ work. What can we do to keep things light or make them automatic?
I think it’s a great leap forward to start thinking about metrics that are useful to the team vs. metrics that are useful to outsiders. I think a lot of people fear metrics because they’ll be used as a club by outsiders who should be interacting with the team instead of a dashboard, so differentiating the two is important.
You described your approach very well. Are your executives/managers actually going and looking at these trend charts that are made available? How did you encourage them to do so and did you have to do anything special to make them interesting/useful to the execs?