In my 20 plus years developing and delivering software, I was part of many software delivery initiatives that, despite all efforts, ended up in failure. Each failure was due to many different reasons but some of those reasons were present in most of those failed initiatives.
My goal with this article is to list a few of the problems that, in my opinion and experience, will doom any software delivery initiative to failure. I’ll also offer solutions to those problems so the next time you are faced with such issues you’ll (hopefully) be able to identify and avoid them.
I must say that by no means this is meant to be an exhaustive list of reasons that can lead a project to failure. I’ve simply picked the eight most recurring and impactful scenarios that, in my experience, will throw any software delivery initiative off the rails.
Without further ado, here’s the list of problems (in no specific order) I like to call “The eight deadly sins of (failed) software delivery initiatives”.
Sin #1 – Don’t know what to build and/or how it should work
Have you ever been part of a software delivery initiative where things were a bit fuzzy? Everyone involved seems to have a high-level idea of what needs to be done but when details are needed and people start asking each other questions it quickly becomes clear that there’s a lot of unknowns that still need clarification. Those unknowns can be both about the what and the how.
Not knowing what the software you’re spending millions to build is supposed to do may seem foolish but it happens more often than you might think. Ideally, the need to build a software should come from the users of the software or from people that know the users really, really well.
Unfortunately, that’s not always the case. It’s not uncommon to have people higher up on the organization chart, both on the business and IT sides, deciding what initiatives get funded and what they’re supposed to achieve. The problem in this case is that the higher someone’s position in the organization is, the more likely it’ll be they’re detached from the reality of the business and the people that actually run it.
Add to that the fact that the larger the organization is, the more communication issues it’s likely to have. This can lead to people making decisions without having a complete understanding of the problems that (they think) must be solved.
Finally, the people that the software is being built for, be it employees or customers, are rarely involved in the software delivery process. In fact, I’ve seen organizations that intentionally isolate end users from delivery teams, only God knows why. The result is software delivery teams building solutions to problems that don’t exist or they don’t fully understand. This problem is also known as “building the wrong thing”.
Now let’s say that somehow the delivery teams got the what right and they’re on track to build the right thing (or solve the right problems). Unfortunately, it’s still possible they’ll build the thing wrong. In this case what usually happens is that there’s a lack os subject matter experts to tell the teams how things should work. This is very common in complex domains and/or old organizations where the knowledge of the how is embedded in the code of legacy systems built decades ago and whose developers are now long gone (hopefully retired and not dead).
The new developers that are now maintaining those systems usually only know small parts of it that they were forced to learn by painstakingly reverse engineering the code over many years of sweat, blood, and tears. If only they knew how valuable they are they’d be making more money than the organization’s CEO.
So, if the delivery team gets the what wrong, even if they get the how right they’re doomed to failure by building the wrong thing. On the other hand, even if they get the what right but the how wrong they’re doomed to failure by building the thing wrong. It’s also possible the eventually get the how right but only after a long and arduous process of reverse engineering legacy code as well as trial and error where they release a solution that costumers will complain doesn’t work as expected. Of course, it’s also possible that the perfect storm can hit and the delivery teams end up getting both the what and the how wrong, leading to a spectacular failure.
Solution
So, what do then? How to avoid such a catastrophe?
In order to get the what and the how right, a delivery team must collaborate with all the areas of the organization impacted by the software to be built so they can ask questions and understand what problems they’re trying to solve and how it can be achieved. This will usually require a workshop where representatives from the business (including SMEs, product owners, and end users) and from IT (including infrastructure, operations, security and any other impacted systems) must be present. Those workshops can take from a couple of days to a couple of weeks, depending on the complexity of the business, system architecture and the problems being solved. The organizations that already run those workshops usually only do so in the beginning of the initiative which is a big mistake. Those workshops must happen multiple times during the entire life of the initiative in order to validate assumptions and course correct if necessary.
It’s also important that the end users (or their representatives in the case of users external to the organization) are also part of the delivery process and have access to (frequent) production releases of the solution being built in order to give feedback to the delivery team on the usefulness of what’s being built.
Finally, communication across the delivery teams and the organization silos must be streamlined as much as possible. The best way of achieving this is by having no silos at all. Ideally the delivery teams should contain not only developers, QAs, BAs and PMs but also infrastructure, security, operations, architects, end users, product owners and whatever other roles are required in order to deliver the right thing right.
Sin #2 – Trying to do too much at once
Human beings are amazing but they lack the ability to grasp lots of things in their heads at once. Someone once said that whatever problem you’re working on, “it must fit in your head”. This also apply to software initiatives.
Big organizations seem to be a big fan of huge, long-lived software delivery initiatives or the infamous “programs”. They usually spend months analyzing all existing problems and trying to fund a software delivery program to rule them all. Such programs are meant to solve all the organization’s problems both from a business and technical perspectives. They can take from 2 to sometimes 10 years with the average length being around 5 years (in my experience).
However, despite all the planning, most of the time those programs fail. I’d love to have some statistics to share with you here but unfortunately I don’t.
Failure has different modes. A software delivery initiative can fail because it build the wrong thing right, the right thing wrong or because it ran over budget and/or over time. Or, in some (not rare) cases, all of the above combined.
Startups might not always be successful but they usually succeed in delivering software. Whether the software is successful or not will depend on a viable business model, right timing, among other things. The startups that are successful in delivering software usually have something in common: focus. They’re trying to solve a very specific problem and need to prove they are capable of delivering working software as soon as possible in order to guarantee funding.
They’re also very flat organizations hierarchically speaking with very streamlined communication channels simply because everyone sits next to each other no matter their role in the organization. Have a question? The person that has the answer is a few feet away.
However, the most important thing is startups are constrained. Time constrained. Money constrained. They know they can’t come up with the perfect product with all the possible whistles and bells in a timely fashion. They need to iterate. They need to focus on the minimum viable product that’s good enough to showcase their ideas to the world. Then they can add the next feature. And the next. And so on. That’s exactly the opposite of what big organizations do.
Solution
The big enterprise need to break away from their current approach to software delivery including how they plan, fund and and deliver software. They need to act like startups and think iteratively, focusing on specific problems that can be solved in a timely fashion. No more big software delivery programs. No more boiling the ocean. Whatever you’re trying to solve needs to fit in the heads of the people solving the problem.
Billions have been spent and different methodologies have tried (and failed) to make software delivery predictable. The uncomfortable truth people don’t want to face is that software will take whatever time it takes for that software to be built given the organization’s current structure (both organizationally and technically speaking) as well as the resources available to the delivery teams. There’s no magic here. No amount of pressure will magically make a software to be built in six months when it actually would take it one year to be built (given the same constraints). Something needs to give and it’s usually quality and people’s morale. The longer a software initiative is, the more likely it’s to get off track.
Organizations must stop pretending estimates are not commitments and burning people to the ground trying to deliver the impossible. Instead, focus on micro-initiatives. Whatever problems you’re trying to solve, break it in such a way that each piece can be delivered in three months or less. Fail or succeed fast. Collect feedback then iterate, course correcting as needed. If resources allow it, nothing prevents you from running multiple micro-projects in parallel as long as they are not intrinsically dependent on each other (otherwise it’s just a huge software program in disguise).
Sin #3 – Organizational silos
The truth is, silos create bottlenecks and friction in communication. For a long time organizations tried to approach software delivery by applying a factory (or production line) mindset where each step of the process is performed by a specialized individual then handed over to the next step until you finally get the final product at the end of the line.
That just doesn’t work, period. Software cannot be built through a repeatable process where parts are assembled into a final working product. We as an industry tried it and failed. With the advent of Agile and Lean practices much has been improved but we still see self-denominated Agile organizations following a similar model where business analysts or similar write requirements that are then analyzed by architects that then come up with the technical design so developers can code so QAs (or QEs or testers) can test so operations can deploy, each one with their own skills and responsibilities. If something goes wrong? It’s not my problem, I did my part, it’s someone else’s fault.
Let’s imagine for a second that such approach works. There’s still an impedance mismatch problem here. Developers might be able to code faster than BAs can write requirements and architects can come up with the technical design. Then QAs might not be able to keep up with the amount of stories developer are developing. Releasing to production requires a lot of work and coordination so operation folks bundle as many changes as possible into a single release. Releasing every two weeks? Are you crazy?
A user then finds a bug in production. They must call through all support levels until someone can create a bug in the backlog that’s going to be prioritized by the business so developer can work on it. What if developers need a new piece of infrastructure provisioned or modified? They submit a ticket to the infrastructure team that’s already overwhelmed with tickets from many other teams not to mention any work related to improvements to the infrastructure which ends up always deprioritized.
I haven’t even mentioned all the meetings and long email threads where folks across silos try without success to communicate their problems and align on possible solutions. An example of that happens when developers and/or BAs have questions on what needs to be done and/or how to do it but can only schedule a meeting with the business and/or architects for the next week in order to have their questions answered. If someone is on vacation or really busy then the meeting gets pushed for the next month. Meanwhile, time flies.
[insert your preferred version of the “it’s fine” meme here].
Solution
Organizations must embrace cross-functional, autonomous and self-sufficient software delivery teams. Even if a resource is scarce or not needed 100% of the time, they must have a percentage of their time assigned to the delivery team. They must be part of the team. That means participating in team ceremonies, discussing problems and solutions and, above all, feel accountable (and be rewarded) for the success of the team as a whole.
Business people, SMEs, and end users should be available and easily reachable to answer any questions the delivery team might have. The same is true for infrastructure and QA resources needed by the delivery teams as we’ll see in a moment.
Sin #4 – No one can make a decision
If I never make a decision then I can’t be blamed for anything bad that might happen, right?
That’s the culture in most of the organizations that struggle to deliver software. People live in fear. Fear of being fired, fear of not being promoted, fear of missing their bonuses at the end of the year, i.e., fear of failure. In the Westrum’s model of organization culture, those organizations would be classified either as pathological or bureaucratic, or somewhere in between.
One of the many bad outcomes of living in fear is that it can paralyze you. It can prevent you from making any decisions or mislead you on making bad ones. When it comes to software delivery initiatives it usually leads to analysis paralysis. People get stuck in a never-ending series of meetings waiting for a decision to be made. When it doesn’t happen, the responsibility bubbles up the organization’s hierarchy until it finds someone empowered enough to make the call. There are two obvious problems with this approach.
One is that it can take a long time until a decision is made and this can happen for every decision that must be made during the life of the software delivery initiative, which can be a lot. The second problem is the telephone game issue. Usually, the higher in the hierarchy, the least context someone has about the problem being solved which increases the chances of them making a bad decision. Add to that the noise introduced due to the information having to pass through many layers of hierarchy which can lead to people making decisions based on wrong or incomplete information. Assuming they’re able to make a decision at all.
Like many of the other problems mentioned here, it can lead to the wrong thing being built or the right thing being built wrong or, in the worst case, nothing worth being built at all.
Solution
Once again, cross-functional, autonomous and self-sufficient teams are our best chance at solving this problem. Also, the organizational culture must be one that empowers people to make decisions at the level they need to be made without fear of punishment in case something goes wrong. Delivery teams must have easy access to the information they need, at the time it’s needed, so they can make well-informed decisions. Important decisions must be documented and easy accessible, so in the future people can have context on why certain decisions were made.
Sin #5 – Ineffective test strategy
When an organization performs tests too late in the delivery process it increases developer cognitive load, since by the time defects are found, prioritized, and ready for developers to work on, developers will have long moved to another story and have lost the necessary context to quickly figure out what the cause of a given defect is. The cognitive load can get even worse if a developer gets assigned to work on a defect in a section of the code they’re not familiar with.
Reliance on manual tests which are usually error-prone, slow to perform, and dependent on a individual’s knowledge in order to be executed correctly, will result in tests that will take a long time to execute, will very likely miss important edge cases which in turn will lead to a big number of errors in production. Manual tests are also hard to scale and can lead to an starvation of resources and a scenario where organizations find themselves in the need of more and more QAs and can never keep up with the increasing backlog of stories in need of testing.
Expensive and brittle end-to-end tests that require lots of setup and coordination and take too long to be executed are also a source of delays and potential defects in production. Since those tests touch multiple systems usually maintained by different teams, the ownership of those tests and the environments where they are executed is hard to establish leading to complex coordination across those teams. Maintaining the right state (data) across all systems is a very cumbersome and frustrating process leading to the common “who touched my data?” blaming game between teams. The necessity of having the right data and the right versions of different systems in order to execute end-to-end tests make them brittle, usually leading to a lot of false negatives. This, combined with the long time those tests take to be executed usually leads to teams abandoning them since they’re slow and not reliable.
QAs (or QEs or testers) who are not involved in the other phases of the software delivery process can lead to a lack of context on what the solution being built is supposed to do and how it should behave, This usually leads to costly knowledge transfer between BAs, developers and QAs, delaying tests and even leading to QAs testing the wrong thing (or testing things wrong) which in turn can lead to false negatives and waste of developer time in case they have to investigate defects that are invalid.
The issues above are among the main mistakes I’ve seen delivery teams make when it comes to test strategy.
Solution
Testing should be moved earlier in the development process (shift-left), fully leverage test automation and follow the test pyramid approach. Manual testing should be left to (a few) ad-hoc scenarios and sanity checks. Costly end-to-end tests should be replaced by unit, integration and contract tests (all automated) which are more efficient and reliable. QAs should participate in story writing as well as any other team ceremonies. A story can only be considered done when it’s fully tested and ready to be deployed to production. Developers (and QAs) should feel comfortable with writing (automated) tests of all kinds (including performance and UI tests) as well as pairing with each other, even during the coding phase of a story.
Sin #6 – Cumbersome path to production
Many organizations are required to follow strict regulations and/or operate in risky industries where failures can be catastrophic which leads them to introduce lots of check and balances (aka red tape) before releasing any software changes to production.
The problem is that it’s also common for those governance process to become unnecessarily complex and not really help to prevent the issues they’re supposed to. Human beings hate doing things that are repetitive, take too long and/or they cannot perceive the value of.
This leads to people doing things in an automated manner (the bad kind of automation) without really thinking about what they’re doing, rushing things so they can get done with the boring stuff quickly and/or finding workarounds to a process they don’t agree with or cannot understand the value of. In the case people are somehow forced to stick to the process what usually happens is that the time to release a chunk of work to production can be delayed by weeks, if not months, until all check and balances are met.
A natural response to doing something that’s hard and/or complex is to do it less frequently. The less frequent software is released to production, the more changes get accumulated and the longer it takes for the delivery team to get feedback from the users on the work that has been done. This can lead to a big number of defects and/or negative user feedback to a given release which in turn can overwhelm delivery teams throwing their planning for the next release off.
Most of the time those process are not there to avoid errors getting to production but to cover people’s backs in case something goes wrong. After all, if the process has been followed, no one is to blame, right?
Solution
Reevaluating why a given step in the path to production process is required and how this requirement can be effectively and efficiently met needs to be a constant in the life of software delivery teams. We need to avoid the monkeys and the ladder behavior and avoid doing things just because it has always been the way things have always been done.
Another crucial solutions to this problem is to automate everything. No more filling soul-eating-life-crushing forms by copying-and-pasting data all over the place and trying to answer questions that won’t get reviewed by any intelligent life form in the next century or so. If some piece of data doesn’t add any value to the release process but it’s required due to some regulation, find a way to automate the collection of this data. If there’s no regulation requiring such data, don’t even bother collecting it.
For all valuable data that’s used by people during sign offs, automate how it’s collected. All sign offs must also be automated and registered for future auditing.
On top of that, the whole path to production should be automated and practices like blue/green deployment and smoke tests in production can help avoid any defects to reach end users.
Releasing software to production should be considered a non-event and people shouldn’t be kept awake and/or working over the weekends (and nights) to guarantee a successful release.
Sin #7 – Obscure integration points
Whenever initiating a new software delivery initiative, one of my first things I try to learn is how many existing integration points this new solution will have to be integrated with as well as how well-documented those existing integration points are.
Integrations are risky and most of the time, complex. Things get even worse if we’re integrating with existing (and possibly legacy) systems. What data must I send for this specific API call? Where can I get this data from? What does the data mean? In which order must I call the different endpoints? Is there even an API to integrate with or must we implement some sort of CDC to extract the data we need? Should we go with SOAP, REST, gRPC, CORBA or file transfer over SFTP? Is there any documentation available on how those existing systems behave or is it all in the heads of now long gone SMEs, architects and developers?
I’ve seen developers (almost?) going crazy trying to reverse engineer decades-old codebases to try and answer the questions above then scheduling meetings with that one person who knows everything to only be pointed to another person who’s supposed to know everything to finally realize no one really knows anything about anything.
This is not only frustrating and soul-crushing but is also (in my experience) one of the leading causes of developer attrition and can risk throwing a software delivery initiative off the rails completely by delaying delivery by months if not years.
Solution
I’m going to be honest with you, this is a hard one. The right thing to do can be really costly and take a long time to implement and not all organizations have the appetite to take on such challenges. It’s not uncommon for c-level executives to kick the can down the road and let the mess to their successor to deal with.
Sometimes the solution includes hiring people to exclusively work on reverse engineering the codebases and document how things work. Sometimes the best solution is to throw everything away and start anew. This might require hiring SMEs that know how things should work which can be difficult and expensive. That’s one of the reasons we still see huge (and rich) organizations relying on decades-old mainframe systems they’d rather get rid off.
In some cases it’s worth to apply a domain-driven approach and create clean business interfaces in front of those existing systems so at least the new software being built has an easy time integrating with other components. This approach also helps decoupling the new components being built from the legacy ones making it easier (or possible) to replace the old systems at some point.
However, this latter approach doesn’t solve the fact that those new well-defined interfaces still must be integrated with the existing (potentially old, legacy) systems. Again, there’s no easy way here, I’m sorry.
If your organization find itself in such a position it might be the case to hire a consultancy specialized in solving this kind of issues to help you find the best way out of this mess. (My apologies for the shameless plug but one got to live).
Sin #8 – Lack of infrastructure ownership
This one could be considered a subsection of organizational silos since it has to do with the software delivery team not being empowered to provision new (or make changes to) the infrastructure which can happen when there’s no infrastructure and/or operation resources readily available to the team. This can also include lack of access to debugging capabilities like some types of logs, monitoring and observability tools.
For example, a scenario that’s very common is to have the infrastructure team providing access to application logs via a log aggregation tool. This might sound enough but if the application is running in containers and for some reason the container fails to start up, the application won’t have the chance to have any logs collected since it hasn’t even started.
When something like that happens, infrastructure teams become a bottleneck to delivery and, similarly to the QA bottleneck, it slows down delivery and can be difficult to scale since the more delivery teams you have, the more infrastructure folks will be needed.
A most recent gate keeping strategy I’ve seen some organizations implement is to hide their cloud capabilities behind a custom portal so they can have better control over their cloud resources. In my opinion, besides being expensive and potentially delaying delivery (after all, it can take a long time for those portals to be implemented) this completely defeats the purpose and one of the main advantages of the cloud which is to enable teams to self-serve and experiment with new technologies quickly.
Solution
The delivery teams must be able to provision and modify their own infrastructure. A well designed and implemented platform is of great help here. The focus of the infrastructure team should be on improving developer experience through automation and a self-service platform.
Every developer should be empowered (with knowledge and access) to perform any infrastructure and/or operations tasks within the scope of the software they’re delivering.
Infrastructure and operations teams should focus on building and maintaining the aforementioned self service platforms that automate all tasks that are usually performed manually by the infra and ops teams. By doing so, they will be empowering developers to practice DevOps (yes, DevOps is not a role but a way of working) and become autonomous, self sufficient, and, above all, productive.
When it comes to cloud governance, one can always uses their cloud provider tools to restrict and/or control access to any resources that need this kind of control, be it because they’re not approved for use within the organization or just to keep costs under control. Having said that, delivery teams should be trusted (and made accountable) with deciding which technologies and tools work best for them otherwise one would be curbing innovation.
Conclusion
There are many reasons why software delivery initiative might fail. The ones I’ve mentioned in this article are the ones that hit closest to home based on my professional experience. The solutions I present to each one of those problems is also very opinionated and based on what I’ve seen work best in the initiatives I saw succeed.
It’s not a coincidence that those solutions come from the Agile and Lean communities since for the past 9 years I’ve worked as consultant at Thoughtworks where I not just learned the why and how of those practices but also had the opportunity to apply them and see the real impact they have on the success of software delivery initiatives.
I hope this article has resonated with you. I’ll consider myself successful if at any point while reading it you caught yourself thinking “yeah, that already happened (or is happening) to me and my team and it sucked (or still sucks)!”.
If that’s the case, please share in the comments what resonates (or not) with you and share this article with others, especially people in your organization you think is empowered to make the required transformations happen so our industry can become more productive and a less soul-eating-life-crushing one.
Acknowledgements
I’d like to thank my Thoughtworks colleagues Brandon Byars and Premanand Chandrasekaran for reviewing the draft of this article. I appreciate all their comments and suggestions, even the ones that were not incorporated in the final article.