Metrics, the good the bad and the ugly

Metrics, the good the bad and the ugly

For the last few weeks I’ve been involved with helping a start-up get moving. You may have noticed that I’ve been a little fixated on tools for the last few posts and that’s why. I’m currently choosing tools and procedures, which is a nice change from having them foisted on you, and I want to be sure to avoid the mistakes I’ve seen in the past.

The reason tool choice is important is becasue your tools define metrics you can collect. For this post I will focus only on the kind of metrics you might use when managing a Agile/Scrum team or teams. But the concepts are more generally applicable.

The internet is abound with stories of bad metric choice, setting pay rises based on the number of lines developers checked in is just one example. I’ll leave that kind of metric for a later post, if I ever address them at all.

But that example of a bad metric does serve to introduce some general principles for metrics.

They are;

  1. When a metric is gamed, it should still produces an outcome you desire.
  2. Metrics should be generated automatically using published methods.
  3. Metrics should be generated and published frequently, if not on demand.
  4. Metrics should be visible to everyone on the team or in the group.

I’ll discuss each of these concepts shortly, but in general if a metric does not satisfy those four conditions, it either needs to be replaced, or abandoned.

The first principle is probably the hardest to stick to. It’s hard becasue unless you’re a fool you’re hiring clever developers, ideally cleverer than you. They like solving complex puzzles and playing games. Any metric you create will be gamed. The lazy manager puts rules and policies in place to avoid the obvious gaming. This not only creates an us-and-them dynamic, but it also just makes the metric more complex, more challenging and thus more interesting to game.

A better approach is to anticipate the gaming, embrace it even. But this is not always possible. e.g. Years ago, during a bug-fix crunch period, before we’d adopted Agile, we had a product with a lot of bugs to fix to hit a release readiness milestone. We experimented with some gamification (that’s a topic for another post) but the practical upshot was that we offered prizes for the more days in a row that devs marked bugs as fixed. They only had to fix one bug in the day for it to count.

Now the senior devs could probably fix more than one bug a day, and we were concerned that they’d fix bugs and then sit on them, doling them out each day to get the maximum number in a row. But the thing was, over six weeks, we’d still get 30 bugs out of each of them, so why care.

The second principle is a little easier to stick to. It’s one of the reasons I’ve been harping on about ensuring that the tools you use can be queried and automated. If a metric is generated by someone counting the sticky notes on a wall, human error will always be a factor. Specifically, having a person generate a metric allows the veracity of the metric to be doubted. You can argue with a person and question their ability but you can’t argue with  the results of a SQL query.

The third principle is tied closely to the second. Only automated metrics can be produced on demand. Generally if a lot of human processing is required to produce a metric it might be produced weekly or even worse monthly. Imagine how useless a sprint burn-down chart would be if it was only produced at the end of the sprint. That metric is only useful becasue it’s reviewed every day by the team. The moment someone is having trouble it’s obvious and the team pitches in to help.

The fourth principle is probably the most contentious. Metrics influence behavior most strongly when the team can access their metrics on demand and they know that you, and their peers can also access the metric. Think of all the sales teams that have leaderboards, or call centres that show how many calls operators answer. It’s the visibility of the metric that makes this work.

But this is a two edged sword. If the metrics focus down to individuals then you most definitely get developers locally optimising their own numbers even if that’s at the expense of their peers. Software development is a team activity, so in general you only want metrics that are measured for a whole team. Not wanting to let your peers down is probably one of the strongest motivators you have. You can leverage this by measuring the whole team, not individuals.

What are some examples of good and bad metrics that are specific to sprint team management, I hear you ask.

  1. Good: burndowns. As long as they’re generated by a system and not a scrum master with a spreadsheet. Putting it up on a dashboard or portal is good. But having every team’s burndown compiled into a single graph heavy email and sent out is the best use I’ve seen for a burn down.
  2. Good: Bugcounts. Using the sprint starting point as a baseline is good if you need to focus on quality and your teams have a tendency to focus on the features and leave the bugs for later. This kind of thinking is common for teams who’ve been doing waterfall for years.
  3. Good: Automated Test Counts and Coverage. Daily build results are good for this. As teams add features they should be adding tests and not introducing regressions. Even a simple percentage of code covered as generated by automated tools would work. Of course, beware of them gaming the system by having empty tests.
  4. Bad: Story Point Consumption. This one is easy to get from most tools, and as a manager you might want to know for planning what your teams are capable of consuming but do you really want them cherry picking the easy backlog items to boost their averages? It also encourages shipping features over quality.

</ul> To sum up, metrics are about communication. They not only tell you what your teams are doing, they tell your teams what you care about. Careful metric choice can also encourage teamwork. When a team’s metrics are published they can’t help but compare themselves to other teams. So metrics need to be chosen with thought becasue like it or not they influence behavior.