Published: August 20, 2022
Maximize Releases, Minimize Incidents and Iterate
3 minutes read
If you want to get a more reliable and valuable product to your customers more often, then it helps to measure and maximize release frequency and measure and minimize incident frequency, iteratively, while critically thinking and making small adjustments, and of course, not giving up.
I have found that while there are many good situational measures for 'more reliable', minimizing incident frequency is both generally applicable and extremely simple.
Similarly, release frequency is trivial to measure and yet is empirically strongly and positively correlated with value.
This may seem like a regurgitation of continuous delivery ideas but I wanted to distil what I consider essential.
For me, the principles and practices that help us achieve much more regular and smaller releases do not need prescribing. Rather, they can be discovered, learnt, created and used iteratively, cobbled together on the journey to the goals.
It will then quickly become evident to the intelligent people on our team, that our code should probably be in a releasable state, that we should be able to know and view the change in a release, that we need a release queue of some kind, that we need to be able to ensure we do not break critical features, that it must not take very long to test or release (whether manual or automated) and release processes must be more and more reliable to move quickly, and so on, and on.
But these practices and tools are clearly not ends in themselves, and can be a distraction from the simplicity of the ideas. Worse still they can easily offer a crutch by means of a recipe to follow. The tendency to reach for and use the recipe, then slows personal and team progress toward the real productivity wins.
The real target state is one in which developers are accountable and hyper-aware of the scarcity of our resources in terms of choices and opportunity cost, but also enabled risk takers and experimenters, who build habits to ship and test ideas empirically. Creating for themselves the feedback loop about their practical technical techniques, the resiliency of their architecture and tooling and ultimately what customer impact looks like, that is just to name a few. The list is actually endless, as the point in the shift is freeing developers to enable more free agency. Pick the metrics that matter set the direction, and otherwise get out of the way and let them do their job (which is not so much coding to specification but managing complex systems and changes to complex systems over time in collaboration with others). Of course, picking a couple of metrics is not going to overnight create an XP super-team, but it is important to keep the intent in mind, as it does need fostering.
Other than making sure you keep things simple and apply thought... what are some of the other dragons to look out for when pursuing a shift to more frequent releases?
Well, I am yet to find anyone who thinks minimizing incidents is a bad idea. Having no-ones mind to change certainly makes the change easier, but, do beware of weaponizing incidents if you are undergoing a move to continuous releases and not everything goes to plan.
If an increase in release frequency leads to a few more incidents, make sure the reaction is commensurate with the risk. Often the hyperactivity and increase in engagement of the team during the shift up in release cadence makes us more acutely aware of incidents that do occur. Even if there is a statistically significant uptick in incidents still keep perspective, and iterate on the problems as they come. It is best to identify causes and take proportionate responses as a team. Be that adding missing test automation or adding a control to the release process, keep your nerve, and iterate.
More often than not the previous status-quo of infrequent, bigger releases that were sometimes delayed or cancelled were not actually less risky. Even if you did do stop-the-world expensive QA testing, the chance that this was done thoroughly is pretty rare in my experience. The time delay between inception, delivery and release usually means a lack of engagement by business and implementer during the critical release and post release verification phase.
Other than getting the risk management adjustments right, attention should also be paid to how developers adjust to the change. A good decentralized continuous release process inevitably puts more responsibility and tasks on the developers' plate.
They need to think about keeping master releasable and backwards compatibility. They need to communicate more frequently and probably more precisely about releases. They need to think deeply and understand their infrastructure, their dependencies, deployment pipelines and rollbacks. In effect, you are asking them to do less traditional development (when considered narrowly as a specialization) and have more complete and holistic understanding of their feature and its risks. More work, more pressure, more accountability.
On the other side of this change, they are highly likely to value their new autonomy and capability, but during the journey, they can be more sensitive, irritable, tired, or stressed.
Awareness of this by leadership helps temper the beat-stick of incidents and focus on supporting the team to reach their new capability goals faster. Having said that, I would maintain a strict discipline about the agreed release process itself. We should be honest where the discipline does slip or when unacceptable risk is taken. But do not hyper-accentuate the issues - keep them in context, make it about the results, not the personal mistake.
In Summary
I wrote this article to share my experiences trying to increase release cadence, and various attempts I have made, after finding the recommendations in certain DORA/CI/CD literature to be a bit complex to actually implement iteratively.
In my experience picking some key metrics and the right ones does require a lot of thought and can help improve value delivery.
But its equally important to remember the real big wins come from individual and team capability growth, confidence, and ownership so while you need to be disciplined and fair, keeping out of the micro, and away from blame, is crucial to keep the right culture for resilient growth.
You might not get there quite as fast as planned, but you will be surprised how far you can get in a few months if you keep it simple.