“Features are taking far too long to ship — how can I fix it?“ This is one of the most common problems I have encountered.
It is actually one of my favourites, because there is typically plenty that can be done. Although it can take months to solve, the first results can show up within a few short weeks.
It is rare that the problem is due to a single cause. It is usually a combination of factors.
However, we opened with a subjective feeling, so the first step is to understand whether it is justified.
However, we opened with a subjective feeling, so the first step is to understand whether it is justified.
Behavioural Patterns & Indicators
The ticketing system (e.g., Jira) is a goldmine of behavioural data that shows how the organisation operates.
We are primarily looking for repeating patterns of behaviour, not isolated incidents. These patterns are indicators of possible problems, and are not necessarily the problems themselves.
Here is a breakdown of common indicators that are not difficult to discover. At this stage, I'm trying to be non-judgemental, but as you'll see, some judgement inevitably leaks through.
- Bumped Stories — These are stories that got bumped to the next sprint, sometimes repeatedly. I sort these into three main categories:
- Cliff Hangers — These are stories that were completed, but not deployed, and got bumped to the next sprint. This type of thing borders on the criminal, and almost always comes with an interesting (but sad) story.
- Unfinished Masterpieces — These are stories that were started, but got bumped to the next sprint for seemingly trivial reasons, sometimes repeatedly. This is where most of the meat usually is.
- Waiting for someone on another team, or in another department, to do something (e.g. design, translate, approve, register, pay, …)
- Waiting for someone on the same team to do something (e.g., review and approve the code, implement code review feedback, …)
- Waiting for QA validation — depending on how the engineering department is structured, this could belong under 1 or 2, but it really deserves its own entry.
- Failed QA validation, but not repaired and/or resubmitted for validation.
- Stuck because of a dependency on another story.
- Waiting for someone on another team, or in another department, to do something (e.g. design, translate, approve, register, pay, …)
- Writer’s Block — These are stories that were not even started, and got bumped to the next sprint, sometimes repeatedly.
- Cliff Hangers — These are stories that were completed, but not deployed, and got bumped to the next sprint. This type of thing borders on the criminal, and almost always comes with an interesting (but sad) story.
- Stowaways — These are stories that somehow sneaked into the sprint after it started. There are 3 common types:
- Production or support issues.
- “The Boss wants it done immediately!”
- No apparent reason.
- Production or support issues.
- Overweight Baggage — These are stories that are just taking far longer than the original estimates, and were not finished in time.
- Unanticipated complexities were uncovered during implementation.
- Stories that should have been epics, i.e. too large in scope.
- Effort is consistently underestimated.
- Small stories have unexpectedly high estimates.
- Unanticipated complexities were uncovered during implementation.
Actionable Causes
The causes of each of the above issues are typically a combination of the following common actionable causes. This is where I'm being very judgemental!
- Overbooking — Too many stories are scheduled for the sprint. This can be because:
- Engineering throughput is uncalculated, or does not reflect reality.
- QA capacity does not match planned engineering throughput, violating the Theory of Constraints.
- Too much negotiation and pressure on engineering regarding story size estimates.
- "To make sure the engineers always have plenty of work to do."
- Stowaways — High priority support incidents and/or other "urgent" work enters the sprint late and displaces scheduled tasks, because sprint planning did not allow for an average quantity of incidents.
- Process not respected by management — The engineering organisation's method of planning and executing has not been communicated to, and fully embraced by, upper management. Upper management is not aware of the importance of respecting the process, and routinely introduce noise in the form of Stowaways. Product management and engineering have not developed the ability to push back on such requests.
- Requirements are not ready — Product requirements are prepared in a vacuum, i.e. "thrown over the wall" without collaborating with engineering. This inevitably causes ping-pong while the engineers try to digest the requirements and ask questions that the product manager didn't think of.
- Technical debt — There is no deliberate plan to track and manage technical debt, and it keeps accumulating. The effort needed for every engineering task increases with the level of technical debt.
- Features are prioritised inconsistently — Work is prioritised based on who shouts the loudest rather than by business impact.
- Features or support issues are prioritised ineffectively — Too much work is classified as highest priority. I have seen one case where there were 3 levels of "Highest" priority: Highest-Highest, Highest-High, and Highest-Medium (even they thought that Highest-Low would have been too much of a joke).
- Stories are not small enough and/or are sliced incorrectly.
- Effort is underestimated — Analysis by engineers is not thorough enough when estimating the effort required, or (far worse) there is undue pressure to reduce the estimates.
- Suboptimal ordering of work — The order of execution does not attempt to reduce risk and maximise impact early, and to take advantage of QA throughput.
- Code reviews cause delays — Code waits too long for review by tech leads or senior engineers. In some cases it cannot be fixed up and tested in time for release.
- QA is a bottleneck — QA can cause delays for a number of reasons, especially:
- QA are not part of the product/engineering collaboration that generates the stories.
- Lack of automation that makes it take a long time to set up complex testing environments.
- Understaffing.
- Cross-team dependencies — Too many cross team dependencies, or even a few dependencies that are not aligned, are problematic. Different teams naturally have a different plan, priorities and cadence. Waiting for someone on another team can cause delays, or at least unpredictable timing.
- External bureaucratic processes — Depending on a clumsy bureaucratic process that is managed outside the team and has a mind of its own can drive you crazy, unless you either get it changed or allow for it in your plans.
- Faulty implementations — Implementaions can be imperfect for many reasons, especially ineffective definition of requirements by a product manager and lack of engineering professionalism.
- No distinction between deployment and release — Deployment of some stories is artificially delayed because deploying a feature necessarily releases it; feature flags are not used.
- Migrations cause downtime — Deployment of some stories is delayed for operational reasons, because downtime is required to migrate the database structure.
Root Causes
The above "actionable causes" are easy targets when focusing on fixing the process. However, the real root causes are much more systemic. Some organisations have the culture, maturity, self-awareness and desire to recognise and treat these systemic issues. It is not trivial, but is quite doable with willing partners.
- Lack of alignment between departments, especially:
- Ineffective communication.
- Processes not aligned.
- Insufficient collaboration at relevant stages.
- Management deficiencies, especially:
- Ineffective planning & delivery management on macro and/or micro levels.
- Unsuitable organisational structure.
- Weak management, lacking backbone.
- Following the bureaucratic steps of the process without understanding its essence.
- Execution deficiencies, e.g.:
- Professional abilities are lacking, e.g. defining requirements, slicing stories.
- Professionalism is lacking, i.e. lack of diligence and attention to detail.
- Lack of appropriate tools, e.g. automated tests (or the time/resources to build them — see ineffective planning).
Solutions
Some of the solutions for the above actionable problems are obvious, and others are far less so. This is the first in a series of upcoming posts in which I will drill down each set of problems and discuss the solutions in detail.
Thank you for reading this far! I will appreciate your comments, questions, suggestions, clarifications and especially corrections via this LinkedIn post.
"Can you help me?"
🤙 Yes, my mission is to help clients to transform their engineering outcomes. Contact me via my LinkedIn profile.