Editor's note: The following article is adapted from the Microservices, Micro Deployments and DevOps talk that Alois Reitbauer and Martin Gutenbrunner gave at the Velocity Conference in May 2015.
Follow along as they ask and answer the top five questions that are important when implementing and maintaining a microservices architecture with one single full-time resource.
- Do you understand your system?
- Is your system healthy?
- How well are your deployments working?
- Is your architecture safe to modify?
- How resilient is your architecture to change?
Monolithic versus Microservices
Large monolithic web applications have a tendency to spiral in complexity, not only because of their size, but also due to the accumulation of newer technologies. As apps get bigger, more people are required to manage them, and soon you end up with more people looking at charts and data than you have people actually building the application.
Microservices architectures are a hot trending solution for developing resilient, web-scale distributed software systems. The problem with monolithic software is that the different components often aren't clearly identifiable, so changes to those components impact more aspects than necessary. With every change, the code grows more and more into a big-ball-of-mud, making it difficult to change the system. Microservices, however, solve this quagmire, since you can incrementally update your application by deploying small changes to individual isolated components, without impacting other components of your application.
Read along to discover the five questions you should ask yourself as you implement microservices for your applications.
1. Do you understand your system?
Can you get a complete overview of your production system in less than two hours? In order to evolve your system, you must understand it. This includes knowing all of the dependencies and how safe they are. It also involves being aware of all the deployments that have occurred over the last 24 hours, and their potential impacts.
To know your system with minimal effort, you need access to the following information:
- Application monitoring data, to see the application dependencies and how healthy your application is.
- System monitoring data, to understand and monitor the underlying infrastructure and network.
- Architecture diagram, to know how the parts of your system were supposed to fit together.
- Puppet/Chef/Ansible scripts, Vagrant, to understand how the parts of the system were actually deployed.
When you are aware of the plan for how the components of your system were meant to fit together and how the overall deployment was supposed to happen, you can compare this against what was actually deployed using data from system and application monitoring. If you have access to all of the above data, you can form a concise picture of your whole system environment.
2. Is your system healthy?
Once you understand your system, you can assess how healthy it is. There are three different levels of system health to consider:
- Underlying infrastructure
- Application services
- Everything user-facing (i.e. Business services)
Many people start by looking at the infrastructure, because the logical assumption is that if the infrastructure is not healthy, then the services that are running on top of it might have problems too. If the machine is healthy, and the processes are stable, you may then look at the response time for the end user.
Instead of thinking infrastructure-first, however, it is better to start from the end-user experience first. Consider using in-house analysis and structure the analysis process to determine the most critical user-facing problems first, and then focus on analyzing the health of the key application components relating to those user-facing services. Work your way back from those application components down to the underlying infrastructure, to locate the source of the user-facing issues and identify changes that will need to be made to address them.
3. How stable are the deployments?
Microservices architectures make it possible to develop and deploy changes fast, for example, through a process of continuous integration and delivery. The key to successful continuous deployments is to know if and when you just broke something so you can roll back, or fix problems quickly, to minimize the impact on user-facing services.
If you really want to ship fast, you must be able to relate functional and performance problems to the specific deployment responsible. When you have very frequent deployments, it is also vital that you can find out about problems as soon as they occur so that you can deal with them rapidly. Anomaly detection and baselining can help you with this.
Anomaly detection automatically identifies problems with your system, ideally before your end-users notice them. Baselining is the basis on which anomaly detection is performed. The baseline defines, based upon previous data, the scope of acceptable behavior for a given metric, for example response time, and reports situations outside of that scope as an anomaly. Anomaly detection can reduce the time to detect any issues introduced by a given deployment.
4. How safe is it to modify your system?
Determining whether it is safe to modify your system is based on what your boundaries are and what the dependencies are within the system. Let's examine three different types of dependencies for each component (or service) within your system:
Incoming dependencies: are things that depend on each individual microservice or component, i.e. other services within your system that rely on this one to perform their function.
Outgoing dependencies: are the other components within the system that an individual component requires in order to get its job done. The higher the number of outgoing dependencies, the more likely you still have some small monolithic piece in there, which will make it more difficult to modify your system.
External dependencies: these are similar to outgoing dependencies, except these are outside of your control. These are external systems like a payment provider, or some other third-party service that your system needs in order for your services to work.
If you don’t have visibility of your system in terms of all of the above, you won’t be able to accurately assess the number and types of dependencies for each component, and thus you’ll be unable to determine how safe it is to modify any given part of your system.
5. How resilient is your infrastructure to changed usage patterns?
Microservices architectures are designed to be scalable. Data captured through monitoring is essential to inform and automate decisions about how to scale services and to increase or decrease resource usage in response to changing usage patterns. However, in a microservices architecture, metric granularity can be a problem, when ideally you’d like finer grained metrics than might be being captured to understand how each individual service is doing. The key here is to relate the operating system metrics to the actual service metrics. If you increase the number of service calls by a factor of two or three, how much infrastructure do you really need, or how much effect does this have on CPU time? If your system monitoring is not well connected to your application model, it is hard to figure this out, especially if you have many microservices running on a machine.
Adopting a microservices architecture can be a complex process as you move from a monolithic application towards a system comprising many smaller loosely-coupled, asynchronous services, being worked on by smaller teams. However, despite the increasing complexity of your environment, you gain the advantages of being able to ship more rapidly, and to improve the quality of your system by moving towards a more scalable, resilient, and failsafe infrastructure. What we propose is a monitoring-first approach to adopting microservices in a high frequency deployment architecture. The more successful you are at breaking down your architecture into fine-grained components, the more quickly and effectively your teams can evolve and scale your application to meet changing requirements and usage patterns.