A CQRS+ES Retrospective

I’ve kind of neglected my blog just recently as I’ve been so damn busy and tired to do anything else but I’m trying to put it right, starting with this post. The culprit for occupying so much of my time and thoughts is a new system we’ve been developing for our Procurement department who, believe it or not, here in the 21st Century, run it all off Excel. That needed to change, so we set off on a journey to understand the often complex world they inhabit, apply some DDD principles, and try to deliver a system that meets their needs. Despite some hiccups along the way we’re nearly there and we should be going live shortly. This post is a sort of look back at what we’ve done and how we did it.

The last six or seven months or so have been interesting to say the least. From the start I felt CQRS and Event Sourcing would be a good fit for this project (I’ve been learning and prototyping the concepts for the last couple of years) and I set about attempting to sell the idea to the rest of the team. I read a tweet by Eric Evans recently that said, it’s been five years since Greg young first started talking about CQRS and Event Sourcing. It’s been around quite a while now, but to most people, including those I needed to convince, it’s something alien and a little bit frightening. However, fairplay to them, it was adopted with relatively few concerns and after a while people started to see progress and became more comfortable with the ideas. We’ve had to change course a few times as our understanding developed, and one change in particular was quite drastic but the chosen architecture meant we had very little friction when doing so.

There’s something beautiful to me about the CQRS + ES style. Everywhere I look throughout the codebase, I get a very strong picture of how the application is held together. It’s very obvious what the responsibilities are of each class and namespace i.e. Commands, CommandHandlers, Events, EventHandlers, Projections, etc and it’s easy to picture the flow of the messages through the system. This is in stark contrast to the typical n-tier, multi-layered approach where to be honest I’ve never seen anything that hasn’t resembled a big ball of mud to some degree. And behind that elegant structure sits a database that stores events, an append-only log of all the actions that were performed in the system. This too has benefits. Apart from being an audit log that you can rely on as to what actually happened in your system, it reduces the cognitive load on your brain because you only need to know that your database contains events. You don’t need to think in terms of entities sitting in one table, joined to entities sat in another and the relationships between them and what the data actually means. You don’t need to worry about efficient fetching strategies in an ORM to get the data for a query (because you don’t query it!). For aggregates, you simply load up their events, and replay them to get back to current state, and you’re done. On the read side, for view models, it’s an entirely different choice. For us they’re stored as documents in RavenDB and again, simple to retrieve and bind to the screen.

Whilst most CQRS reading material tends to focus on asynchronous, highly scalable distributed systems where transactions are a no-no, you don’t have to do it that way. There is nothing prescriptive about CQRS. Trade-offs are everywhere depending on your needs. For us, going with transactions was one such trade-off. Our target audience for the Procurement application is fairly small, somewhere around 8 users or so to begin with. To take onboard the complexity that comes with every command being asynchronous was too high a price to pay in our situation. We felt it best to keep things simple and familiar as we started down our path, so we went with the familiarity of transactions and synchronous requests and, of course, the sky hasn’t fallen on our heads, it works just fine. So we lose the ability to scale (or suffer the DTC – Ugh!) but I don’t think we’re going to have that problem with this application. And of course, the first law of distributed computing is don’t distribute ;). I know too, that should the need ever arise to go async with our commands, it’s not going to be some massive refuctoring of the applcation.

So, contrary to the name of this blog, in terms of our view models we are immediately consistent, but for other parts of the system we do take advantage of eventual consistency. All our events are published on a Bus using MSMQ. Subscribing to those events are a reporting service, an email service, and an SLA service. The email service is as simple as you’d expect. All it does is sit and listen for particular events and then sends an email on our application’s behalf. The SLA service is about the users acting upon particular events within a given timeframe. We use Sagas to track the passage of time between events and again, send emails out as necessary. The report service is a little more interesting. It outputs the event data into denormalised SQL Server tables to serve traditional business reporting needs. Again this is another benefit of the CQRS approach. Instead of one RDBMS trying to be both an OLTP and OLAP system as you will often find, we are now able to use the right tool for the job at hand. This allowed us to shape the reporting database in an OLAP style star schema because it’s only responsibility is to serve reports. What really makes the Event Sourcing part of the project come alive is when you get to the point where you’re able to replay all your captured events back from the beginning of time and push them through the event handlers to see your reporting database rebuilt. Very cool.

We’re using a single database, RavenDB, for both our events and view models simply because we don’t need to scale our reads. RavenDb is a great choice, and when it comes to NoSQL databases, is one of only a few that supports transactions. Working with it has shown me just how much more productive a person can be when they don’t have to fight the object-relational impedance mismatch. The more I use it the more I discover just how useful it really is. For instance, whilst finishing up the last few remaining parts of the system, I began to think about how, going forward we would migrate our events as and when they change to support new features of the application. Truth be told this was one of those areas I had little experience in even though I was comfortable with what was required. My first thought was that I would version my events and make the aggregate handle the new event as well as the old one. They’d look something like this.


public class ItemWasCreatedEvent
{
    public string PartNumber { get; private set }
}

public class ItemWasCreatedEvent_v2
{
    public string PartNumber { get; private set }
    public string Description { get; private set }
}

This would mean that during the loading of events into the aggregate the old handler would be invoked for the old events in the database but the application, going forward would only ever raise the new event. The problem with this approach is that the aggregate can get quite bloated if you end up versioning your events frequently as you have to keep all the handlers around for all the different versions. An alternative would be some kind of in-place upgrade on the fly as and when the system encounters the old events but with RavenDB there is yet another option, and that’s the Patching API. This allows you to do Set based changes to your Json documents as a one off operation. This will allow us to just go ahead and modify the class of our old event to its new form without having to introduce any new event types or fill our aggregates with new handlers. Originally, I had thought I would need to write a utility that would take my changes and call that API but now with RavenDB v2.0, this patching can be done within the Raven Studio UI itself. There is now a Patch tab that lets you write your patch to upgrade a document and even test it so you can see the result without actually applying the change. You can change existing properties, add new ones, delete old ones, etc. When you’re ready you can choose to apply the patch to all the applicable documents in one operation. Having only just discovered this functionality over the last couple of days, I am excited by how simple the process looks like it is now going to be and almost certain that that is the approach we’ll be taking. We will change our events as necessary, patch the existing documents to match the new event and then deploy. Simples.

In retrospect, whilst I’m happy overall, there are some decisions that we took that I’m not too enamoured with. One of which is that we chose to write this as an Asp.Net MVC application. As this app is mainly for internal use I think we should have made it a traditional desktop application whether WinForms or WPF. Just like the often repeated message within the community to get people to stop and consider their choice of database rather than blindly going with a RDBMS, I think the same consideration should be given to the application itself. Anyone who knows me, knows I’m not really a fan of writing web front ends. I personally don’t get a lot of enjoyment out of writing reams of HTML and Javascript to create a rich UI and all the fudges involved regarding different browsers, etc. I think desktop applications still have a place in the world and applications for internal use are one such scenario. Having said that, I don’t think it would be a big job to put a rich client UI on the application if we needed/wanted to but it’s unlikely that will ever happen now. Lesson definitely learned though.

The other decision is potentially more serious and is one of coupling to another bounded context. Basically at some point the user submits data to another application, an existing enterprise RDBMS that deals with orders for the company. In my opinion, this should have been done asynchronously through messaging. There is absolutely no need for our new application to have any knowledge of the other system whatsoever, and should the interface to that system ever change it will mean we will have to update and redeploy our new system too, but, alas, we’ve done it, and we’ll have to live with that decision, at least for the time being.

Finally just as in any application with a degree of domain complexity, discovering what the true aggregates are for this application was quite a lot of work, and took some time to get right. Understanding the difference between an aggregate and an object graph is essential in order to ensure your transactional boundaries are correct and for that I have to thank the work done by Vaughn Vernon in his Effective Aggregate Design essays, they’re well worth reading (and re-reading).

All the architecture patterns in the world can’t help you if you don’t capture the things important to the people who will be using your application. This is essentially what caused us to take such a drastic turn in the middle of the project. We basically had one of those “breakthrough” moments when all became a lot clearer. The result of that breakthrough was that we threw away a lot of the code we’d already written but the clear separation gained from going down the CQRS route meant we had relatively little trouble in adapting and changing course. It also helped that we have a large suite of unit and integration tests to help keep us on the straight and narrow.

Overall, the combination of CQRS, Event Sourcing, and RavenDB has made this probably the most enjoyable project I’ve ever worked on. To take an idea from start to finish using these architectural principles and modern database technologies has been a fantastic ride and I’ve learned so much. Combined with some of the messaging techniques I’ve picked up over the last two years or so, the way I write code has changed becoming more functional in nature and it’s allowed me to visualise ways of writing simpler, more composable, and testable code. Some of those ideas I was able to apply to this project to good effect and it’s something I plan to blog about more in the near future. Would I do it again? Absolutely, given a domain with enough complexity. Yes, it requires effort and you certainly have to think more compared to a traditional CRUD application, but the result I feel is worth it given that you end up with a more maintainable and flexible application.

Advertisements
A CQRS+ES Retrospective