"This data is internal to the team, you should not share this or use it to compare different teams". I've often heard of sentences like this and agreed to it for most of my career as a Scrum Master. Somehow we're all about transparency, except when it comes to metrics. Eventually, this did not make sense to me anymore and I started to challenge this view.
In this post, I'd like to share a different perspective, show how teams can profit from transparency around metrics, and what you need to get started.
What's usually being compared?
The metric I've encountered most in such a discussion was Velocity, defined as Story Points closed per unit of time (for example per Sprint), often in the context of "How much can we get done"?
When the debate about comparing teams came up, it was usually to answer those questions:
Why is this team performing better (meaning more Story Points are getting closed) than others?
Why is this team planning better (meaning the team is "closer" to completing the planned amount of work) than other teams?
Other metrics that I've seen being used to compare were:
Number of defects open/found
"Code Quality" measures like test coverage, code smells, etc.
Often these numbers needed to be reported somewhere, accompanied by a semaphore indicating whether we are "green", "amber", or "red" against some arbitrary target (80% code coverage, anyone?).
Why is this a problem?
Whenever someone would suggest comparing the above metrics across teams, my knee-jerk reaction would be to block and explain why this is a bad idea:
The context of my team is different, thus it does not make sense to compare this.
Whoever sees this metric doesn't know that the context is different. They will wonder why we are "worse" than other teams, and start to punish the team in some form.
The team is smart and knows that, so they will start to game the metric, making it useless not only for people asking for it but also for the team themselves.
Now those problems are real, and I've seen this happening. So why would I suggest to compare teams and metrics anyway?
It's all about the Targets...
Let's start with the last point above, which is the fact that teams will game the metric. You might have heard of Goodhart's Law:
When a measure becomes a target, it ceases to be a good measure
If arbitrary targets are set across the company, it usually doesn't take long and teams start to figure out a way to reach that target.
"What if we exclude these parts from the test coverage, they are legacy and we can't test them, so it's not making sense to include them..."
While it probably comes with a good intention, such targets undermine continuous improvement efforts. Instead of having transparency about where teams stand, you can't trust the data anymore. So even for the team itself, assessing how they are doing might not be possible.
"Hey, this bug actually somewhat is like a feature request in a way, so let's track this differently...".
What I've seen work is to look at trends over time instead of "hard numbers". This brings several benefits:
It takes the context of the team into account. No matter if you right now have 200 or 3 open defects, 85% or 29% code coverage, you can track if you're improving over time.
If you look at it over time, you can better deal with the natural variability. Your defect count might spike once for some reason, but over time you should be able to keep it stable or reduce it.
These "relative improvement targets" should then keep the context of the team into account. For example, if you use a tool to track "code smells" (like SonarCloud), instead of saying you should have 0 smells (or less than 1000, or any other random number someone thinks of...), look at how you can prevent new smells from entering your code base, and continuously work to reduce the current issues. Sometimes you might still introduce a smell, and that's an opportunity to improve your process.
This will have a more sustainable impact than forcing to get to 0, as it will lead to rules being disabled (no rule, no issue...), or code being excluded from the analysis. Some might say that teams can still game those numbers. I agree with that, but in my experience, it's a lot easier to explain that our goal is to eventually be at 0 code smells, but for now, we're targeting to not introduce any new ones and clean up existing ones as we go, than simply saying "make it to 0 or else".
Now if you compare various teams, and see that one team manages to get down quicker, you might ask what they are doing differently (maybe they simply prioritize this work, or they have better tooling...). If you see a team where the issues are continuously rising, you should probably also ask what's going on with this team, and how this team can be supported.
...And The Metrics
Next to the targets, you should also make sure that the metric itself makes sense. As mentioned, I've often seen that estimates respectively plans based on estimates are compared. Estimates are guesses by nature, they are not meant to be accurate. If you start comparing teams based on their accuracy, what most likely will happen is that estimates are being blown up. Instead of a "5", this now becomes an "8" or "13", just to be safe.
If you can play with your metrics like this "at will", I would argue that it's not a good metric, and you should not track it. While I'm not a fan of using estimations like Story Points at all, you might still find them useful as a tool for discussion. Just don't use it for anything else.
Instead, use something comparable across teams. Code coverage and the number of defects work better for this because you can clearly define what it means. Instead of estimates, I would propose to use Flow Metrics, as they will give you information about the flow of each team that you can compare with each other.
For example, we can look at the 70th percentile of the Cycle Time over the last 3 months. If you see a team that has considerably lower Cycle Times, that's an opportunity to learn. How are they breaking down work? How do they collaborate to get to done as fast as possible?
Try to refrain from metrics that are not comparable and highly dependent on the team like estimations, and instead look at more "neutral" data. That doesn't mean the numbers will not be different based on the context, but it will allow you to be curious about what they are doing differently, as it's not just "subjective" to the team.
Context Matters - But is not an Excuse for not Improving
Yes, teams have different skill levels and work on different technology stacks. The culture differs from company to company. However, why would this stop you from looking at how others are doing, and thinking about what they are doing differently if they seem to perform better?
Why would we not want to learn what this other team in our company is doing differently? Maybe it doesn't apply to us, but maybe it does.
Instead of "protecting" the team, we should actively look for ways to improve. It's easy to say "Oh that team has more senior developers, that's why they have better numbers". Is it the seniority, or are they just doing something different that you could also try as an experiment? Are you investigating this? Are you having discussions on how the seniority in your team could be increased? Or are you just looking for excuses?
Every team, every product, every context is different. But that is no excuse for not learning from each other.
For example, I've seen a team that had a much higher Cycle Time than another one in the same company. When checking what was different, it turned out that they were relying on a different team that did the verification for them. So the items were "stuck" for quite a while until the other team picked them up.
Once we learned that, there were two options:
Saying "Oh yeah, of course, you have lower cycle times if you don't depend on this team. But that won't work for us" and not doing anything.
Taking action and find ways on how to improve the situation.
I guess it's needless to say which one should be the preferred option.
Prerequisites
As mentioned at the beginning of this post, I used to be highly "protective" when it came to metrics, especially making them transparent or even actively comparing them with other teams.
Nowadays, I can see many benefits of comparing the metrics of teams together. My aim with any team I work with is not to be mediocre. We aim that we become the best at what we do. This doesn't happen by hiding numbers, but by being brutally honest. You may not like what you see, but it's the first step to improving anything.
"The truth will set you free, but first it will make you miserable" - Attributed to James A. Garfield
You also should look at what you are currently measuring. Are those metrics useful, or should you measure something else? Data can be incredibly useful, and I urge you to check out Flow Metrics if you haven't yet, as they are powerful, actionable, and very easy to measure.
You should educate your team, as well as your organization. In the past, I accepted that we had to fulfill random targets, and was ok with finding any way to get to this number. In retrospect, I was just not doing a good job and making my life easy. Thinking that "management anyway doesn't get it" is an excuse that will lead to no improvements. In reality, people might simply not know better. If you approach them and explain why for example looking at the improvement trend over time might make more sense than a hard number, you might learn that they are happy to learn from you.
Accepting "faking" metrics is not only unethical, but you also rob your company of the chance to improve. If something is not ideal, you should not be hiding it. You should make it transparent and ask for support to improve it.
These prerequisites are especially important if you are in some kind of Scrum Master/Agile Coaching role. This is your job. Go do something to improve the environment to create transparency. Without it, you won't ever establish any kind of continuous improvement.
You may not change your complete organization at once, but start with what you can control. Create a dashboard, or even better, an Obeya with the data that is relevant for your team. Invite others to look at your metrics, and see what happens. Actively ask how you are doing compared to other teams, and be open for input.
Conclusion
Using metrics can be incredibly helpful to improve within a team itself. If you collect useful metrics, you can base your retrospectives on data. However, if you do it across many teams, you can supercharge your continuous improvement efforts and cross-pollinate knowledge across teams.
If one team seems to be "better" (whatever that means in your context), it does make a lot of sense to be curious and ask what they might be doing differently, instead of just assuming "their context is different, we don't want to hear anything about it".
Work with your teams and organization to:
Move from fixed targets towards continuous improvements over time, by looking at trends and making sure you improve.
Move from metrics that are subjective to the context of a team towards metrics that can be compared (keeping the context in mind), for example, Flow Metrics.
Move from secrecy around metrics of the team, towards transparency to enable inspection and adaptation, for example, by creating an Obeya for your team.
If you are interested in learning more about Flow Metrics, Obeya, or Data-Driven Decision Making, check out our training or reach out for a call.
Commentaires