Tuesday, 3 August 2010

Drowning in the Technical Debt Mountain

Over the last few years I’ve worked on a number of Agile projects that have stumbled into problems. Looking back, all of these projects have had similar characteristics. Namely:
  1. They were all long-term strategic projects vital to the success of the company building them
  2. They were all long-running projects with at least 18 months development behind them
The problem that each of these projects encountered was that over time it became increasingly more difficult and time consuming to add new features to the products. The number of bugs and issues slowly rose and each time they would take longer to fix. Early in the project, teams had velocities measured in tens of points, but these gradually reduced until single digit velocities were the norm. Why does this happen?
Taking a broad look across these projects highlights a number of problems that they all had in common:
Build Up of Technical Debt
Probably the most critical problem was that each of the projects had allowed Technical Debt to build up. They had all started well with good intentions and with emergent architectures. However, in the rush to get new features in, they neglected the continuous architectural improvement needed. Releases went out with rushed development and the intention to ‘go back and fix that’, but the business pressure to add new functionality always won over technical improvement.
Over time the interest on the technical debt builds up and you spend so long servicing it that velocity drops. You then end up in a death spiral catch-22 situation: you can’t add new functionality without extensive time and the addition of many new defects but you can’t just stop adding functionality and re-write major portions of a critical product that is already live.

Too Much Focus on Frameworks
Another thing that I noticed about all these projects is that they spent a significant amount of time early on developing ‘frameworks’ and aiming to ‘develop for reuse’. This just wouldn’t happen on a short-duration project.
Unfortunately, building smart solutions in this way just doesn’t work in an agile world. Firstly, you just can’t fully know all the features the framework will require in advance. You therefore try to preempt future requirements which at best results in unnecessary work. At worst (and usually most common) you end up building a framework that forces a way of working that you constantly have to shoe-horn future work into.
Asking the business to fund a major rework of a core framework mid or late in the project usually results in some very hard business questions, so development teams limp on with an increasingly complex and not-fit-for-purpose solution - which is ultimately the source of much of the technical debt build up.

Building Depth Before Width
Another problem that all these projects encountered is that they aimed to build one component of the system to completion rather than focusing broadly across the whole architecture. Thus, they spent a lot of time building some really cool features into one part of the product but then had to rush other parts when that time started to run out.
The net result of this approach is that you get architectural complexity building up in one area and insufficient architectural development in others. Ultimately you hit a number of problems such as scalability and reliability issues in the areas where not enough time was spent and maintainability issues in the areas where too much time was spent and too much complexity was added. All of these issues add to the burden of technical debt.
So, how do we avoid these problems happening? There are a number of key things that you need to do:
  1. Build and Architectural Straw-Man: The first thing that a long-running agile project must do is spend a sprint (or three) building a broad architectural straw-man that addresses the full breadth of the architecture. Focus on the smallest possible number of key features that touch the entire product without adding too much depth
  2. Prove the Straw-Man: Make sure that the architectural straw-man is proven in terms of scalability, reliability and performance. It should have plenty to spare in all areas. Ideally tests for these non-functional areas should be automated so that they can be repeated as each new feature is added.
  3. Focus the Business on Critical Features: Often the business gets focused on trying to build something that is feature complete in one area. Make sure they don’t do this. Instead get them to focus on what is most important across the entire product. Build breadth of functionality rather than depth in just one area. Avoid at all costs release plans that focus on just one component of the solution.
  4. Don’t Build Frameworks!: There’s absolutely no point in building any frameworks or coding specifically for re-use. It just doesn’t work. Instead, just work on the straw-man and then on adding new features. When a new feature overlaps with something already existing then refactor out the duplication - do nothing more than that! Let the frameworks evolve and emerge through this process of refactoring out duplication, don’t try to preempt what might be needed.
  5. Continuously Validate the Architecture: Each new feature added to a product might have required some architectural change, might have required refactoring out of duplication or might even have added a new component. The key is that the development of each new feature should include work to validate that the scalability, reliability and performance have not been compromised. There should also be time available to improve the architecture before moving onto the next feature.
By starting with an architecture that is sound and leaving it in a sound state at the end of every feature addition you avoid the build up of technical debt. Many would argue that this slows down delivery of feature releases and the addition of new functionality. This may be true, but for a long-term strategic project the benefits of continued future productivity outweigh this many times. Also, avoiding building frameworks and minimising initial complexity often saves so much time that initial releases come in quicker anyway.
One question remains, how do we save any of the projects that I looked at that are already drowning under their mountains of technical debt? There’s no easy solution to this question - often cancelling the project may be the best answer. However, where that’s not feasible then hare’s some ideas that I’d like to explore further:
  • Strip out existing complexity - find the main areas of the project that are too complex and thus have high defect rates and which are difficult to maintain. Remove features and complexity from them until they become simple enough to fix and refactor. While this might not be popular with the business, a product that works and that can be maintained and enhanced is much better than one packed with features but which is non-functional and costs a fortune to maintain.
  • Move beyond exiting architecture - accept that what went before is perhaps beyond saving. Start out with a new architectural straw-man that avoids fancy frameworks and is proven scalable, reliable and performant. Build all new functionality against this architecture and gradually migrate existing functionality to it. Prove the architecture and refactor for EVERY feature.
  • Write-off all existing technical debt. NEVER let technical debt build up again
  • Re-prioritise - work with the Product Owner to re-prioritise the backlog to focus on building the broadest set of essential features first from now on rather than trying to add every bell and whistle from the start.
  • Shoot any developer on the team who still thinks that building re-usable frameworks was a good idea - seriously!

No comments:

Post a Comment