Tuesday 8 February 2011

The Architecture Doesn't Work

I've been building web and other large-scale, enterprise type applications for many years, following the standard accepted approaches to application architecture. I've been increasingly feeling that these approaches are somewhat flawed and that we need something better. Over this series of three blog posts I'm going to talk about some new technologies and architectures that I have been having some great success with. In this first of the three posts I want to look at why...

The Architecture Doesn't Work

If like me you've been building enterprise style applications in Java (or pretty much most other similar languages) then you will be intimately familiar with this layered architectural pattern:

The Standard Architecture

For those, who need a quick refresher, here's a quick summary of each layer:

  • Presentation Layer - Deals with rendering the current state of the application to the user/client. Could be a web page, an GUI application or even some structured representation (e.g. XML, JSON). Usually renders the contents of the Domain Objects, often by some form of template system.
  • Controllers - Handles events generated by the user/client and orchestrates the processing and Presentation. Typical approach is to accept an incoming request, extract any supplied request values, call a Service, pass the resulting Domain Objects to the appropriate Presentation component and finally return a result.
  • Service Layer - Executes business logic and processes against the Domain Objects and returns the resulting objects. Typically stateless and transactional. Processing usually involves using the Persistence Layer to load Domain Objects from a Persistent Store, working with the Domain Objects to mutate state, using the Persistence Layer to save the changes and then returning the resulting Domain Objects for presentation.
  • Domain Objects - The main objects representing entities in the business, their relationships and the business rules associated with each. Entities are usually persisted to a Persistent Store. Some people prefer anaemic Domain Objects, but I have argued that rich models are significantly better.
  • Persistence Layer - Provides an interface between the Service Layer and the Persistent Store that talks the language of Domain Objects. Allows the Service Layer to undertake CRUD (Create, Read, Update Delete) and Find operations against the Persistent Store.
  • Persistent Store - The storage for the data in the application. Usually a Relational Database Management System (RDBMS). Domain Objects are typically mapped to tables. Supports transactional semantics.
  • Object-Relational Mapping (ORM) - A library that assists the mapping between Domain Objects and the Persistent Store. Tries to solve the impedance miss-match between object and relational structures. Used by the Persistence Layer. Modern solutions use annotations on the Domain Objects (e.g. JPA & Hibernate)
  • Container Services - General services provided by a container. Supports features including: dependency injection; transactions; configuration; concurrency; service location; aop etc.

So, why doesn't this work?

It has become apparent to me over many projects that the above architectural approach suffers from a number of technical weaknesses. Often these have come about through the aim of simplifying the programming model for developers, but as we will see later the result is often the opposite. First, let's look at what I believe to be the main general technical problems that we find in this standard architecture. Later we will look at the impacts of these problems on the application and developers.

Thread-per-request model

Typically, in the above architecture, each request is handled in its own thread. Threads are usually managed in a pool and are then assigned to a request when it arrives. The thread and request are then linked until a response is returned, at which time the thread is returned back to the pool. This approach aims to simplify the development model by reducing the need for developers to have to think about concurrency issues - effectively giving them a single-threaded application path for handling each request.

Unfortunately, assigning a separate thread to each request is a very inefficient way of utilising system resources. Threads are fairly heavy-weight entities, having to maintain their own stacks, address space and so forth. The work involved in scheduling them is also quite large and given today's i/o intensive applications, many of them will be in a blocked state at any point in time. Additionally, most systems can only support a finite number of threads before suffering degradation in performance. (Some languages have tried to overcome the thread problem by introducing lighter weight constructs, such as fibers, but these still share many of the same problems for concurrent programming.)

The technical limitation of this thread-per-request approach is to actually stifle system scalability and impact application performance. Additionally, when the need to share state occurs, then developers must step outside their comfort zone into the world of concurrent programming: something the thread-per-request model has isolated them from.

Constant reloading from the persistent storage

Given a thread-per-request model, there is a natural approach to load all objects required to service a request from the persistent store on each request. Often this is completely unnecessary as a large percentage of the required data is unlikely to have changed from one request to another, or to be different between two separate running requests. However, with no convenient way to store state safely between requests or share infrequently changing state across threads we are forced down the load everything each time direction. The alternative is to drop down into shared state and concurrent programming - and we've already explored why that's not the best option! Techniques such as lazy loading try to minimise the overhead, but we have to accept that much of the data an application uses is being reloaded unnecessarily on each request.

Using database transactions for optimistic locking & concurrency control

Given each request is working with its own copy of application data kept in its own thread, how does the architecture handle multiple threads trying to update the same data at the same time? The general approach is to use database transactions to implement optimistic locking. Two requests can each load a separate copy of some data in their own threads and update it as they wish. When a thread's current transaction is committed, the state is written back to the persistent store. The database's transaction support ensures that each thread's changes are committed in an atomic and correctly isolated manner. Optimistic concurrency can follow a last-commit-wins model, or many ORM solutions offer a row versioning approach so the second transaction to commit changes to an object will fail with a stale data exception.

However, relying on the lowest level in the application stack to prevent concurrent updates isn't always the best solution. For a start, it allows both transactions to pretty much complete all their processing before one discovers that all its work is no longer valid. Secondly it causes problems when concurrency in the application layers above is required as some state would be shared across threads, and thus transactions - not nice to deal with. Finally, it doesn't actually simplify the problem because instead of having to deal with concurrency, the developer must now deal with the possibility of transaction rollback due to stale data, and the need for replaying of transactions. You'd be surprised how many applications I've come across that either don't address the problem (just raise a 500 error) or try to do something in this area and fail miserably!

Tight coupling between domain and persistent store through ORM

Another problem inherent in this architectural model is the tight coupling between the domain objects and the persistent store. This is caused by the need for the domain objects to be closely coupled to the object-relational mapping (ORM) solution. Why would we want to couple our domain objects this tightly? Well, it turns out that quite often we have to do this for performance reasons related to needing to reload from the persistent store for each request. Given that we are loading data so frequently we find that we have to tune our domain model not to make it optimal to our business model but to allow us to work most efficiently with the persistent store. And, don't even get me started on annotation based configuration (aka JPA) which forces us to consider and include all the details of our persistence strategy inside our representation of the business domain!

Scaling through adding servers

Once we have built our application using our architecture, we then have to deploy and scale it. Given that our thread-per-request approach doesn't make the best use of resources, the typical approach to scaling is to add additional servers, each running a separate copy of the application. Unfortunately this is not a particularly good way to achieve scale because we are still dependent on a shared database - and we are making heavy use of the database by loading data on each request and also using it for concurrency control.

Additionally, this approach to scalability also causes session based data sharing problems. With a couple of servers we can perhaps use session clustering with some success. As the number of servers grows we end up either having to use sticky sessions (i.e. tie each session to a single server - which means sessions are lost if that server crashes) or use some other way of persisting session data. This usually involves either adding session state into our already stressed database or introducing a whole new technology (distributed cache) to handle this data.

Static configuration

While not as significant as many of the above items, our standard architecture, and the libraries it is built upon, tends to push us towards a model of static configuration held in XML/YAML/Properties or other similar files. This increases the difficulty in managing our application at scale and in trying to make dynamic changes to values to tune the running system. When we want to support dynamic configuration we are often forced down the route of exposing properties to a management console (i.e. JMX) or pushing configuration down into the database or distributed cache, which we already saw above is not the ideal direction.

Poor approach to fault tolerance

Finally, we have the problem of fault tolerance. Typically, the approach to exceptions at any stage of processing a request is one of: catch the exception at the top level, log it and then display a generic "Oops, something went wrong!" message to the user. Hardly a great user experience. Unfortunately our architecture makes doing anything more than this prohibitively hard. For example, we might want to put some context information in the message. However, when the exception was thrown the developer might not have added this context information to the exception or, more likely, the exception was thrown by a library (such as the ORM) which is unaware of the context in the first place. We might also try to implement a re-try strategy, but implementing retries in a thread-per-request model is actually very difficult - you have to remember to throw everything you currently have away (because memory state in the thread may not be consistent with the database), clear all caches and so on. Try it, it's pretty hard!

And what is the impact?

So, given all of the above technical weaknesses with our current architectural approach, what effects do these have on our application?

Achieving performant applications is more difficult than it should be

The first major impact is that the architecture makes it hard to achieve application performance: the thread-per-request model makes poor use of resources and, loading unchanged data from the database for each request is inefficient and overloads the database.

Given these problems we generally find that we have to consider performance far earlier and in much greater depth than we really should. I'm not saying that we should never develop without considering performance (that would be negligent), but that the need to optimise our well written code and architecture for performance should be something we do rarely and in response to specific issues identified during performance testing. Designing a domain model so that it has the most efficient mapping to a database in order to achieve basic application performance is neither desirable nor the best way of capturing an accurate representation of the business domain. Compromising simple code for more complex but performant alternatives just to meet basic performance criteria is never desirable.

Another impact of the poor performance characteristics of our architecture is that we often end up having to circumvent the simplified thread-per-requets model in order to get a performant and usable application. What do we mean by this? We have to add some form of shared state - exactly what the simplified request model was put in place to avoid. Typically, we do this in one of two ways: adding caching or doing concurrent programming.

When an application proves to be less performant than required, most developers will initially suggest adding caches for data and results that don't change regularly. There are plenty of places in an application where caching can be applied. However, implementing caching is SIGNIFICANTLY more complex than most developers like to assume. Let's consider some of the things that need to be addressed: how do we keep the cache in sync with the database? what's the expiry strategy? what's the update strategy? do we pre-warm? do we block reads during an update or just block concurrent updates? is a cache per server fine or do we need a distributed solution? how will we ensure we don't get cache deadlock? As you can see, implementing a cache is actually a very complex option which requires great skill to get right. Caches should be reserved for special cases and should not be a catch-all for badly performing code and architecture.

The alternative approach to a cache is to introduce some form of shared state with concurrent access. Rather than loading all objects on each request, we keep common objects in memory and share them across and between requests. However, this comes with many of the same problems as using caching, plus the requirement of transaction management for updating this shared state. If done right, shared state with concurrent access can be simpler, more elegant and more flexible than caching, but it's the doing it right which is the challenge.

Now, this might be an over-generalisation, but my experience has shown that most 'average' developers can't write concurrent code using shared mutable state, multiple threads and synchronisation blocks. They usually end up with deadlocks, inconsistent state and difficult to track down bugs. Why? Because it is HARD and also because the single-threaded thread-per-requet model that they are used to programming to has isolated them from any considerations of concurrency. It's true that modern concurrency libraries (e.g. java.util.concurrent) have helped the problem, but most developers are using these without understanding exactly the reasons why. Just ask an average java programmer to explain the 'volatile' keyword and you will see what I mean!

Additionally, our dominant L-mode brain activity is optimised for linear, single-threaded thought. This in turn makes thinking about concurrency and all its implications a very unnatural process. I don't have any proof, but I have some very strong suspicions that a large proportion of the developer community is just unable to comprehend the real impacts of concurrent threads reading and writing in a shared state model.

Scaling applications is difficult

The other major impact of our current architectural approach is that it doesn't scale very well as application load increases. Firstly, the thread-per-request model just doesn't make good use of resources. Application servers have a finite set of threads that they are able to handle at any one time. Going beyond this limit significantly slows down the server - even though at any time many of the threads will be blocked on i/o. Once a server has reached thread saturation, the only solutions are to speed up performance (which we have seen is very hard) to free threads up more quickly, or add more application servers.

However, adding more application servers doesn't actually solve the problem. All it does is enable more concurrent threads to run, which in turn puts more load on the underlying database. Given the approach of re-retrieving data on each request we just end up overloading the database. We then need to scale the database to handle the extra load - and scaling databases is a VERY specialist job and VERY expensive as well. There's no way we can follow a path of continued scalability following this model. Clearly having an architecture based around cheap commodity hardware running app servers doesn't work if you need to massively scale databases to get any performance - so we are back to caching and shared state again, and we already explored what a nightmare these can be.

The final difficulty of scaling the application comes in the way configuration and shared session state is managed. At low scale, configuration by static config files is fine. Sharing session state through session clustering or sticky sessions is also acceptable. However, as the number of servers grows and infrastructure becomes more complex you soon end up in configuration hell - different versions of the application with slightly different configurations, massive challenges tuning parameters on running servers and so forth. Additionally, sticky sessions become less feasible and any form of session state clustering becomes impossible. Session state therefore needs to be pushed down into the database, increasing its overload and scalability issues or we start having to introducing distributed shared state solutions, with all the problems this entails.

So, it really doesn't work. Where do we go now?

We've seen that the current standard layered architecture that we are using to build applications causes a number of performance and scalability problems. Mostly these have come about from the desire to shield 'average' developers from the complexity of concurrent programming with shared mutable state. Unfortunately this simplicity causes performance problems and impedes scalability, requiring that developers must work around the restrictions in order to build something that works. There must be a better approach that solves these problems without needing developers to become shared mutable state concurrent programming gurus!

In the next post in this series I will look at a solution to these problems using lightweight asynchronous 'actors' and immutable messages. The final post will then look at some specific technology solutions that I have employed to implement this new approach.

2 comments:

  1. Hi Chris,

    not sure you can call this architecture broken. Its worked for countless webapps over many years.

    Perhaps what you are trying to say is that there are certain apps where this architecture is not good enough. That I feel may be a minority.

    Please don't throw the baby out with the bathwater.

    ReplyDelete
  2. Thanks for your comment. I do agree with you that this architecture can be applied successfully. I've delivered many webapps using this model. Perhaps broken is a bit too sensationalist ;-)

    However I would have to say that on pretty much every non-trivial app I've built using this architecture, I've hit some form of performance or scalability issue. To solve them I've always had to drop back to a combination of shared state, caching, concurrent programming and compromises in my domain model. An approach that allows me to deliver performance and scalability using well structured code and without all these difficulties is long overdue.

    ReplyDelete