Chapter 1. Introduction

Microservices certainly aren’t a panacea, but they’re a good solution if you have the right problem. And each solution also comes with its own set of problems. Most of the attention when approaching the microservice solution is focused on the architecture around the code artifacts, but no application lives without its data. And when distributing data between different microservices, we have the challenge of integrating them.

In the sections that follow, we’ll explore some of the reasons you might want to consider microservices for your application. If you understand why you need them, we’ll be able to help you figure out how to distribute and integrate your persistent data in relational databases.

The Feedback Loop

The feedback loop is one of the most important processes in human development. We need to constantly assess the way that we do things to ensure that we’re on the right track. Even the classic Plan-Do-Check-Act (PDCA) process is a variation of the feedback loop.

In software—as with everything we do in life—the longer the feedback loop, the worse the results are. And this happens because we have a limited amount of capacity for holding information in our brains, both in terms of volume and duration.

Remember the old days when all we had as a tool to code was a text editor with black background and green fonts? We needed to compile our code to check if the syntax was correct. Sometimes the compilation took minutes, and when it was finished we already had lost the context of what we were doing before. The lead time¹ in this case was too long. We improved when our IDEs featured on-the-fly syntax highlighting and compilation.

We can say the same thing for testing. We used to have a dedicated team for manual testing, and the lead time between committing something and knowing if we broke anything was days or weeks. Today, we have automated testing tools for unit testing, integration testing, acceptance testing, and so on. We improved because now we can simply run a build on our own machines and check if we broke code somewhere else in the application.

These are some of the numerous examples of how reducing the lead time generated better results in the software development process. In fact, we might consider that all the major improvements we had with respect to process and tools over the past 40 years were targeting the improvement of the feedback loop in one way or another.

The current improvement areas that we’re discussing for the feedback loop are DevOps and microservices.

DevOps

You can find thousands of different definitions regarding DevOps. Most of them talk about culture, processes, and tools. And they’re not wrong. They’re all part of this bigger transformation that is DevOps.

The purpose of DevOps is to make software development teams reclaim the ownership of their work. As we all know, bad things happen when we separate people from the consequences of their jobs. The entire team, Dev and Ops, must be responsible for the outcomes of the application.

There’s no bigger frustration for developers than watching their code stay idle in a repository for months before entering into production. We need to regain that bright gleam in our eyes from delivering something and seeing the difference that it makes in people’s lives.

We need to deliver software faster—and safer. But what are the excuses that we lean on to prevent us from delivering it?

After visiting hundreds of different development teams, from small to big, and from financial institutions to ecommerce companies, I can testify that the number one excuse is bugs.

We don’t deliver software faster because each one of our software releases creates a lot of bugs in production.

The next question is: what causes bugs in production?

This one might be easy to answer. The cause of bugs in production in each one of our releases is change: both changes in code and in the environment. When we change things, they tend to fall apart. But we can’t use this as an excuse for not changing! Change is part of our lives. In the end, it’s the only certainty we have.

Let’s try to make a very simple correlation between changes and bugs. The more changes we have in each one of our releases, the more bugs we have in production. Doesn’t it make sense? The more we mix the things in our codebase, the more likely it is something gets screwed up somewhere.

The traditional way of trying to solve this problem is to have more time for testing. If we delivered code every week, now we need two weeks—because we need to test more. If we delivered code every month, now we need two months, and so on. It isn’t difficult to imagine that sooner or later some teams are going to deploy software into production only on anniversaries.

This approach sounds anti-economical. The economic approach for delivering software in order to have fewer bugs in production is the opposite: we need to deliver more often. And when we deliver more often, we’re also reducing the amount of things that change between one release and the next. So the fewer things we change between releases, the less likely it is for the new version to cause bugs in production.

And even if we still have bugs in production, if we only changed a few dozen lines of code, where can the source of these bugs possibly be? The smaller the changes, the easier it is to spot the source of the bugs. And it’s easier to fix them, too.

The technical term used in DevOps to characterize the amount of changes that we have between each release of software is called batch size. So, if we had to coin just one principle for DevOps success, it would be this:

Reduce your batch size to the minimum allowable size you can handle.

To achieve that, you need a fully automated software deployment pipeline. That’s where the processes and tools fit together in the big picture. But you’re doing all of that in order to reduce your batch size.

Bugs Caused by Environment Differences Are the Worst

When we’re dealing with bugs, we usually have log statements, a stacktrace, a debugger, and so on. But even with all of that, we still find ourselves shouting: “but it works on my machine!”

This horrible scenario—code that works on your machine but doesn’t in production—is caused by differences in your environments. You have different operating systems, different kernel versions, different dependency versions, different database drivers, and so forth. In fact, it’s a surprise things ever do work well in production.

You need to develop, test, and run your applications in development environments that are as close as possible in configuration to your production environment. Maybe you can’t have an Oracle RAC and multiple Xeon servers to run in your development environment. But you might be able to run the same Oracle version, the same kernel version, and the same application server version in a virtual machine (VM) on your own development machine.

Infrastructure-as-code tools such as Ansible, Puppet, and Chef really shine, automating the configuration of infrastructure in multiple environments. We strongly advocate that you use them, and you should commit their scripts in the same source repository as your application code.² There’s usually a match between the environment configuration and your application code. Why can’t they be versioned together?

Container technologies offer many advantages, but they are particularly useful at solving the problem of different environment configurations by packaging application and environment into a single containment unit—the container. More specifically, the result of packaging application and environment in a single unit is called a virtual appliance. You can set up virtual appliances through VMs, but they tend to be big and slow to start. Containers take virtual appliances one level further by minimizing the virtual appliance size and startup time, and by providing an easy way for distributing and consuming container images.

Another popular tool is Vagrant. Vagrant currently does much more than that, but it was created as a provisioning tool with which you can easily set up a development environment that closely mimics as your production environment. You literally just need a Vagrantfile, some configuration scripts, and with a simple vagrant up command, you can have a full-featured VM or container with your development dependencies ready to run.

Why Microservices?

Some might think that the discussion around microservices is about scalability. Most likely it’s not. Certainly we always read great things about the microservices architectures implemented by companies like Netflix or Amazon. So let me ask a question: how many companies in the world can be Netflix and Amazon? And following this question, another one: how many companies in the world need to deal with the same scalability requirements as Netflix or Amazon?

The answer is that the great majority of developers worldwide are dealing with enterprise application software. Now, I don’t want to underestimate Netflix’s or Amazon’s domain model, but an enterprise domain model is a completely wild beast to deal with.

So, for the majority of us developers, microservices is usually not about scalability; it’s all about again improving our lead time and reducing the batch size of our releases.

But we have DevOps that shares the same goals, so why are we even discussing microservices to achieve this? Maybe your development team is so big and your codebase is so huge that it’s just too difficult to change anything without messing up a dozen different points in your application. It’s difficult to coordinate work between people in a huge, tightly coupled, and entangled codebase.

With microservices, we’re trying to split a piece of this huge monolithic codebase into a smaller, well-defined, cohesive, and loosely coupled artifact. And we’ll call this piece a microservice. If we can identify some pieces of our codebase that naturally change together and apart from the rest, we can separate them into another artifact that can be released independently from the other artifacts. We’ll improve our lead time and batch size because we won’t need to wait for the other pieces to be “ready”; thus, we can deploy our microservice into production.

You Need to Be This Tall to Use Microservices

Microservices architectures encompasses multiple artifacts, each of which must be deployed into production. If you still have issues deploying one single monolith into production, what makes you think that you’ll have fewer problems with multiple artifacts? A very mature software deployment pipeline is an absolute requirement for any microservices architecture. Some indicators that you can use to assess pipeline maturity are the amount of manual intervention required, the amount of automated tests, the automatic provisioning of environments, and monitoring.

Distributed systems are difficult. So are people. When we’re dealing with microservices, we must be aware that we’ll need to face an entire new set of problems that distributed systems bring to the table. Tracing, monitoring, log aggregation, and resilience are some of problems that you don’t need to deal with when you work on a monolith.

Microservices architectures come with a high toll, which is worth paying if the problems with your monolithic approaches cost you more. Monoliths and microservices are different architectures, and architectures are all about trade-off.

Strangler Pattern

Martin Fowler wrote a nice article regarding the monolith-first approach. Let me quote two interesting points of his article:

Almost all the successful microservice stories have started with a monolith that grew too big and was broken up.
Almost all the cases I’ve heard of a system that was built as a microservice system from scratch, it has ended up in serious trouble.

For all of us enterprise application software developers, maybe we’re lucky—we don’t need to throw everything away and start from scratch (if anybody even considered this approach). We would end up in serious trouble. But the real lucky part is that we already have a monolith to maintain in production.

The monolith-first is also called the strangler pattern because it resembles the development of a tree called the strangler fig. The strangler fig starts small in the top of a host tree. Its roots then start to grow toward the ground. Once its roots reach the ground, it grows stronger and stronger, and the fig tree begins to grow around the host tree. Eventually the fig tree becomes bigger than the host tree, and sometimes it even kills the host. Maybe it’s the perfect analogy, as we all have somewhere hidden in our hearts the deep desire of killing that monolith beast.

Having a stable monolith is a good starting point because one of the hardest things in software is the identification of boundaries between the domain model—things that change together, and things that change apart. Create wrong boundaries and you’ll be doomed with the consequences of cascading changes and bugs. And boundary identification is usually something that we mature over time. We refactor and restructure our system to accommodate the acquired boundary knowledge. And it’s much easier to do that when you have a single codebase to deal with, for which our modern IDEs will be able to refactor and move things automatically. Later you’ll be able to use these established boundaries for your microservices. That’s why we really enjoy the strangler pattern: you start small with microservices and grow around a monolith. It sounds like the wisest and safest approach for evolving enterprise application software.

The usual candidates for the first microservices in your new architecture are new features of your system or changing features that are peripheral to the application’s core. In time, your microservices architecture will grow just like a strangler fig tree, but we believe that the reality of most companies will still be one, two, or maybe even up to half-dozen microservices coexisting around a monolith.

The challenge of choosing which piece of software is a good candidate for a microservice requires a bit of Domain-Driven Design knowledge, which we’ll cover in the next section.

Domain-Driven Design

It’s interesting how some methodologies and techniques take years to “mature” or to gain awareness among the general public. And Domain-Driven Design (DDD) is one of these very useful techniques that is becoming almost essential in any discussion about microservices. Why now? Historically we’ve always been trying to achieve two synergic properties in software design: high cohesion and low coupling. We aim for the ability to create boundaries between entities in our model so that they work well together and don’t propagate changes to other entities beyond the boundary. Unfortunately, we’re usually especially bad at that.

DDD is an approach to software development that tackles complex systems by mapping activities, tasks, events, and data from a business domain to software artifacts. One of the most important concepts of DDD is the bounded context, which is a cohesive and well-defined unit within the business model in which you define the boundaries of your software artifacts.

From a domain model perspective, microservices are all about boundaries: we’re splitting a specific piece of our domain model that can be turned into an independently releasable artifact. With a badly defined boundary, we will create an artifact that depends too much on information confined in another microservice. We will also create another operational pain: whenever we make modifications in one artifact, we will need to synchronize these changes with another artifact.

We advocate for the monolith-first approach because it allows you to mature your knowledge around your business domain model first. DDD is such a useful technique for identifying the bounded contexts of your domain model: things that are grouped together and achieve high cohesion and low coupling. From the beginning, it’s very difficult to guess which parts of the system change together and which ones change separately. However, after months, or more likely years, developers and business analysts should have a better picture of the evolution cycle of each one of the bounded contexts. These are the ideal candidates for microservices extraction, and that will be the starting point for the strangling of our monolith.

Note

To learn more about DDD, check out Eric Evan’s book, Domain-Driven Design: Tackling Complexity in the Heart of Software, and Vaughn Vernon’s book, Implementing Domain-Driven Design.

Microservices Characteristics

James Lewis and Martin Fowler provided a reasonable common set of characteristics that fit most of the microservices architectures:

Componentization via services
Organized around business capabilities
Products not projects
Smart endpoints and dumb pipes
Decentralized governance
Decentralized data management
Infrastructure automation
Design for failure
Evolutionary design

All of the aforementioned characteristics certainly deserve their own careful attention. But after researching, coding, and talking about microservices architectures for a couple of years, I have to admit that the most common question that arises is this:

How do I evolve my monolithic legacy database?

This question provoked some thoughts with respect to how enterprise application developers could break their monoliths more effectively. So the main characteristic that we’ll be discussing throughout this book is Decentralized Data Management. Trying to simplify it to a single-sentence concept, we might be able to state that:

Each microservice should have its own separate database.

This statement comes with its own challenges. Even if we think about greenfield projects, there are many different scenarios in which we require information that will be provided by another service. Experience has taught us that relying on remote calls (either some kind of Remote Procedure Call [RPC] or REST over HTTP) usually is not performant enough for data-intensive use cases, both in terms of throughput and latency.

This book is all about strategies for dealing with your relational database. Chapter 2 addresses the architectures associated with deployment. The zero downtime migrations presented in Chapter 3 are not exclusive to microservices, but they’re even more important in the context of distributed systems. Because we’re dealing with distributed systems with information scattered through different artifacts interconnected via a network, we’ll also need to deal with how this information will converge. Chapter 4 describes the difference between consistency models: Create, Read, Update, and Delete (CRUD); and Command and Query Responsibility Segregation (CQRS). The final topic, which is covered in Chapter 5, looks at how we can integrate the information between the nodes of a microservices architecture.

What About NoSQL Databases?

Discussing microservices and database types different than relational ones seems natural. If each microservice must have is own separate database, what prevents you from choosing other types of technology? Perhaps some kinds of data will be better handled through key-value stores, or document stores, or even flat files and git repositories.

There are many different success stories about using NoSQL databases in different contexts, and some of these contexts might fit your current enterprise context, as well. But even if it does, we still recommend that you begin your microservices journey on the safe side: using a relational database. First, make it work using your existing relational database. Once you have successfully finished implementing and integrating your first microservice, you can decide whether you (or) your project will be better served by another type of database technology.

The microservices journey is difficult and as with any change, you’ll have better chances if you struggle with one problem at a time. It doesn’t help having to simultaneously deal with a new thing such as microservices and new unexpected problems caused by a different database technology.

¹ The amount of time between the beginning of a task and its completion.

² Just make sure to follow the tool’s best practices and do not store sensitive information, such as passwords, in a way that unauthorized users might have access to it.

Get Migrating to Microservice Databases now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Migrating to Microservice Databases by Edson Yanaga