For as long as we’ve been talking about microservices, we’ve been talking about data. In fact, before we even had the word microservices in our lexicon, back when it was just good old-fashioned service-oriented architecture, we were talking about data: how to access it, where it lives, who “owns” it. Data is all-important—vital for the continued success of our businesses—but has also been seen as a massive constraint in how we design and evolve our systems.
My own journey into microservices began with work I was doing to help organizations ship software more quickly. This meant a lot of time was spent on things like cycle time analysis, build pipeline design, test automation, and infrastructure automation. The advent of the cloud was a huge boon to the work we were doing, as the improved automation made us even more productive. But I kept hitting up against some fundamental issues. All too often, the software wasn’t designed in a way that made it easy to ship. And data was at the heart of the problem.
Back then, the most common pattern I saw for service-based systems was sharing a database among multiple services. The rationale was simple: the data I need is already in this other database, and accessing a database is easy, so I’ll just reach in and grab what I need. This may allow for fast development of a new service, but over time, it becomes a major constraint.
As I expanded upon in my book, Building Microservices, a shared database creates a huge coupling point in your architecture. It becomes difficult to understand what changes can be made to a schema shared by multiple services. David Parnas showed us back in 1971 that the secret to creating software whose parts could be changed independently was to hide information between modules. But at a swoop, exposing a schema to multiple services prohibits our ability to independently evolve our codebases.
As the needs and expectations of software changed, IT organizations changed with them. The shift from siloed IT toward business- or product-aligned teams helped improve the customer focus of those teams. This shift often happened in concert with the move to improve the autonomy of those teams, allowing them to develop new ideas, implement them, and then ship them, all while reducing the need for coordination with other parts of the organization. But highly coupled architectures require heavy coordination between systems and the teams that maintain them—they are the enemy of any organization that wants to optimize autonomy.
I gradually came to the realization that how we store and share data is key to ensuring we develop loosely coupled architectures. Well-defined interfaces are key, as is hiding information. If we need to store data in a database, that database should be part of a service, and not accessed directly by other services. A well-defined interface should guide when and how that data is accessed and manipulated.
Much of my time over the past several years has been taken up with pushing this idea. But while people increasingly get it, challenges remain. The reality is that services do need to work together and do sometimes need to share data. How do you do that effectively? How do you ensure this is done in a way that is sympathetic to your application’s latency and load conditions? What happens when one service needs a lot of information from another?
Enter streams of events, specifically the kinds of streams that technology like Kafka makes possible. I have to say that, when I first looked at it, Kafka seemed pretty similar to the tooling I’d been using for decades, but with a few fancy scalability properties thrown in. But as I looked more closely I realized that Kafka’s ability to persist those events changed things as it’s not just a message broker, it’s also a message store. This opens up some interesting possibilities for how services can share data.
One of the most intriguing is the idea of “turning the database inside out” where the internal sub-components of a database—storage, log, cache and view—are broken into parts and deployed separately. The log can then be used as an event store, and those events are available for other services to consume, create caches, or views, often using stream processing tooling. My old employer, Thoughtworks, called this pattern “Event streaming as the source of truth,” highlighting the benefit as: “the potential to reduce duplication efforts between local persistence and integration.” So, the pattern is a bit like a shared database—there is a shared data set that everyone can use. But it’s also a bit different, as the view stays decoupled.
I suspect this concept will get as many skeptical responses as I got back when I was suggesting moving away from giant shared databases. But there is something intuitively appealing about this whole idea: when I look at the microservice architectures I help companies build, data always flows from service to service in some capacity or other. Sometimes this is with messaging and sometimes with REST or RPCs. Whichever way it happens to be framed, data is in motion.
So none of my advice of old has changed: microservices should have databases of their own, but modeling data as events and storing them centrally provides an interesting alternative to the normal ways microservices share data. The concepts may well seem odd at first, but stick with them. They’ll take you on a very interesting journey.