Chapter 4. Do More
“Here is my source code, run it on the cloud for me. I do not care how!”
Onsi Fakhouri, VP of Engineering of Pivotal Cloud Foundry
Platforms abstract the underlying infrastructure and middleware to offer a rich set of capabilities. These capabilities include providing runtime environments and services for running applications. Deploying and configuring middleware consumes time and prolongs release cycles; platforms remove the requirement for this individualized effort.
A primary goal of Cloud Foundry is to enable the development-to-deployment process to be as fast as possible. However, platform capabilities do not stop here. Platforms are intrinsically about doing more. Specifically, platforms take on more of the mandatory undifferentiated heavy lifting discussed in Chapter 1. This is beneficial; the less you are required to take on, the higher your velocity will be.
This chapter unpacks Cloud Foundry’s capabilities that remove the extraneous and undifferentiated heavy lifting from provisioning infrastructure, runtime environments, applications, and services. The application life cycle may be the same as in the client-server era, but developers can now iterate around that life cycle with velocity, using a self-service model every step of the way.
This chapter explores the following built-in platform capabilities:
-
Resiliency and fault tolerance through self-healing and redundancy
-
User management
-
Security and auditing
-
Application life-cycle management, including aggregated streaming of logs and metrics
-
Release engineering, including provisioning VMs, containers, middleware, and databases
When there is a technological ecosystem that spends considerable time and effort building these capabilities, it is prudent to leverage them for high-velocity delivery as opposed to spending time and effort building a bespoke, hand-crafted solution.
Resiliency and Fault Tolerance
Cloud Foundry provides built-in resiliency based on control theory. Control theory is a branch of engineering and mathematics that uses feedback loops to control and modify the behavior of a dynamic system. Resiliency is about ensuring that the actual system state (the number of running applications, for example) matches the desired state at all times, even in the event of failures; it is an essential but often costly component of business continuity.
Cloud Foundry automates the recovery of failed applications, components, and processes. This self-healing removes the recovery burden from the operator, ensuring speed of recovery. Cloud Foundry achieves resiliency and self-healing through:
-
Restarting failed system processes
-
Recreating missing or unresponsive VMs
-
Deployment of new application instances if an application crashes or becomes unresponsive
-
Application striping across availability zones to enforce separation of the underlying infrastructure
-
Dynamic routing and load balancing
Cloud Foundry deals with application orchestration and placement focused on even distribution across the infrastructure. The user should not have to care about how the underlying infrastructure runs the application beyond having an equal distribution across different resources (known as availability zones). The fact that multiple copies of the application are running with built-in resiliency is what matters.
Cloud Foundry provides dynamic load balancing. Application consumers use a route to access an application; each route is directly bound to one or more applications in Cloud Foundry. When running multiple instances, Cloud Foundry balances the load across the instances, dynamically updating its routing table. Dead application routes are automatically pruned from the routing table with new routes added when they become available.
Without the preceding capabilities, the operations team is required to continually monitor and respond to pager alerts from failed apps and invalid routes. By replacing manual interaction with automated, self-healing software, applications and system components are restored quickly with less risk and downtime. The resiliency concern is satisfied once, for all applications running on the platform, as opposed to developing customized monitoring and restart scripts per application. The platform removes the ongoing cost and associated maintenance of bespoke resiliency solutions.
User Access and Authentication Management
Role-based access defines who can use the platform and how. Cloud Foundry uses role-based access control (RBAC), with each role granting permissions to a specific environment the user is targeting. All collaborators target an environment with their individual user accounts associated with a role that governs what level and type of access the user has within that environment.
Cloud Foundry’s User Account and Authentication (UAA) is the central identity-management service for both users and applications. In addition, the UAA’s user-identity store can be configured by connecting to external user stores through LDAP or SAML. UAA is based in the latest of security standards like OAuth, OpenID Connect, and SCIM.
Security
Cloud Foundry protects you from security threats by applying security controls and isolating applications and data in the following ways:
-
It manages software-release vulnerability using new Cloud Foundry releases, created with timely updates to address code issues.
-
It manages OS vulnerability using a new OS created with patches for the latest security fixes.
-
It implements role-based access controls, applying and enforcing roles and permissions to ensure that users of the platform can only view and affect the resources they have been granted access to.
-
It secures both the code and the configuration of an application within a multitenant environment.
-
It deploys each application within its own self-contained and isolated containerized environment.
-
It prevents possible denial-of-service attacks through resource starvation.
-
It provides an operator audit trail showing all operator actions applied to the platform.
-
It provides a user audit trail recording all relevant API invocations of an application.
-
It implements network traffic rules (security groups) to prevent system access from and to external networks, production services, and between internal components.
Why is this important? Securing distributed systems is complex. For example, think about these issues:
-
How much effort is required to automatically establish and apply network traffic rules to isolate components?
-
What policies should be applied to automatically limit resources in order to defend against DoS attacks?
-
How do you implement role-based access controls with inbuilt auditing of system access and actions?
-
How do you know which components are potentially affected by a specific vulnerability and require patching?
-
How do you safely patch the underlying OS without incurring application downtime?
These examples are standard requirements for most systems running in corporate datacenters. The more bespoke engineering you use, the more you need to take on securing and patching that system. Distributed systems increase the security burden because there are more moving parts. Additionally, when it comes to rolling out security patches to update the system, many organizations suffer from configuration drift.
The Challenge of Configuration Drift
Deployment environments (such as staging, QA, and production) are often complex and time-consuming to construct and administer, producing the ongoing challenge of trying to manage configuration drift to maintain consistency between environments and VMs. Reproducible consistency through release engineering tool chains, such as Cloud Foundry’s BOSH component, addresses this challenge.
Cloud Foundry manages OS and software-release vulnerability using a new OS and new software releases, created with the required patches for the latest security fixes and code remediation.
Cloud Foundry eases the burden of rolling out these OS and software-release updates. Every component within Cloud Foundry is created with the same OS image. To patch Cloud Foundry, you do not apply the patch to a running OS or component; instead, you redeploy Cloud Foundry with an updated OS or software release. Cloud Foundry’s BOSH component redeploys updates component by component to ensure zero-to-minimal downtime. This removes the patching and updating concerns from the operator and provides a safer and more resilient way to update Cloud Foundry while keeping applications running.
In addition to patching, if for any reason a component becomes compromised, it can instantly be recreated using a known and clean software release and OS image, and the compromised component can be removed into a quarantine area for further inspection.
This ability to redeploy Cloud Foundry components at will, from a known, healthy OS image and software release, and with zero-to-minimal downtime, provides an additional level of security and resiliency to the system. Your applications remain available for longer through a simple mechanism for applying updates.
The Application Life Cycle
Typically, in most traditional scenarios, the application developer:
-
Develops an application
-
Deploys application services
-
Deploys an application and connects (binds) it to application services
-
Scales an application, both up and down
-
Monitors an application
-
Upgrades an application
This application life cycle is in play until the application is decommissioned and taken offline. Cloud Foundry simplifies the application life cycle by offering self-service capabilities to the end user. Adopting a self-service approach removes handoffs and potentially lengthy delays between teams. For example, the ability to deploy an application, provision and bind applications to services, scale, monitor, and upgrade are all offered by simple call to the platform.
Traditionally, deploying application code required the provisioning and deploying VMs, operating systems, and middleware to create a development environment for the application to run in. Once that environment was provisioned, it required patching and ongoing maintenance. New environments were then created as the application moved through the deployment pipeline.
With Cloud Foundry, the application or task itself becomes the single unit of deployment. Developers no longer need to concern themselves with which application container to use, which version of Java, and which memory settings or garbage-collection (GC) policy to employ. They just push their application to Cloud Foundry, and it runs. Cloud Foundry removes the cost and complexity of configuring infrastructure and middleware per application. Using a self service model, users can:
-
Deploy applications
-
Provision and bind additional services, such as messaging engines, caching solutions, and databases
-
Scale applications
-
Monitor application health and performance
-
Update applications
-
Delete applications
Zero-Downtime Upgrades
Applications running on the platform can be updated with zero downtime through a technique known as blue-green deployments.
Removing the infrastructure, OS, and middleware configuration concerns from developers allows them to focus their whole effort on the application instead of deploying and configuring the supporting technologies. This keeps the development focus where it needs to be, on the business logic that generates revenue.
Aggregated Streaming of Logs and Metrics
Cloud Foundry provides insight into both the application and the underlying platform through aggregated logging and metrics. The logging and metrics system within Cloud Foundry is the inner voice of the system, telling the operator and developer what is happening. It is used to manage the performance, health, and scale of running applications and the platform itself.
Logs provide visibility into the behavior of running applications and system components, while metrics provide visibility into the health of components running the application. Operators can use metrics information to monitor an instance of Cloud Foundry.
Insights are obtained through storing and analyzing a continuous stream of aggregated, time-ordered events from the output streams of all running processes and backing services. The benefits of aggregated log, metric, and event streaming include the following:
-
Logs are streamed to a single endpoint.
-
Streamed logs provide timestamped outputs per application.
-
Both application logs and system-component logs are aggregated, simplifying their digestion.
-
A metrics collector gathers and streams metrics from the system components.
-
Operators can use metrics information to monitor an instance of Cloud Foundry.
-
Logs can be viewed from the command line or drained into a log management service such as an ELK stack or Splunk.
-
Events show specific events like an application being started or stopped. Viewing events is useful when you are debugging problems by identifying crash information, such as a memory limit being exceeded.
The cost of implementing an aggregated log and metrics-streaming solution involves bespoke engineering to orchestrate and aggregate the streaming of both syslog and application logs from every component within a distributed system into a central server. Using a platform removes the ongoing cost and associated maintenance of bespoke logging solutions.
Release Engineering through BOSH
IT operations are tasked with achieving operational stability. Historically, operational stability was achieved by reducing risk through limiting change. Limiting change is in direct conflict to shipping smaller features frequently. In order to manage risks involved in frequent software releases, release-engineering tool chains are used. Release engineering is the part of the operations team typically concerned with turning source code into finished software components or products through the following steps:
-
Compilation
-
Versioning
-
Assembly/packaging
-
Deploying
Automating release-engineering concerns through a tool chain reduces risk, allowing for faster deployments with little to no human interaction.
Release-engineering tool chains are essential because they provide consistent repeatability. Source code, third-party components, data, and deployment environments of a software system are integrated and deployed in a repeatable and consistent fashion, with a historical view to track all changes made to the deployed system. This provides the ability to audit and identify all components that make up a particular release. Security teams can easily track contents of a particular release and recreate it at will if the need arises. Consistent repeatability de-risks software releases.
Cloud Foundry leverages a release-engineering tool chain known as BOSH. BOSH is a recursive acronym for BOSH Outer Shell. The outer shell refers to BOSH being a release tool chain that unifies release-engineering, deployment and life-cycle management of cloud-based software. BOSH is designed for large distributed systems such as Cloud Foundry but can equally be used to deploy smaller individual components such as etcd or redis.
Rather than leveraging a bespoke integration of a variety of tools and techniques that provide solutions to individual parts of the release engineering goal, BOSH is designed to be a single tool covering the entire set of requirements of release engineering. BOSH enables software deployments to be:
-
Automated
-
Reproducible
-
Scalable
-
Monitored with self-healing failure recovery
-
Updatable with zero-to-minimal downtime
BOSH translates intent into action via repeatability by always ensuring that every provisioned release is identical and repeatable. This removes the challenge of configuration drift.
BOSH configures infrastructure through code. By design, BOSH tries to abstract away the differences between infrastructure platforms (IaaS or physical servers) into a generalized, cross-platform description of your deployment. This provides the benefit of being infrastructure agnostic (as far as possible).
BOSH performs monitoring, failure recovery, and software updates with zero-to-minimal downtime. Without such a release-engineering tool chain, all these concerns remain the responsibility of the operations team. A lack of automation exposes the developer to unnecessary risk.
Deploying and Scaling
Deploying and scaling applications are completely independent operations. This provides the flexibility to scale at will, without the cost of having to redeploy the application every time. Through commercial products such as Pivotal Cloud Foundry, auto-scaling policies can also be set up for dynamic scaling of applications when they hit certain configurable thresholds.
A Marketplace of On-Demand Services
Applications often require additional services. For example, they may require a persistent datastore for storing information or a message broker for communicating with other applications. Each environment within Cloud Foundry has the concept of a marketplace. A marketplace lists the services that have been made available to that specific environment by the platform operator. A service can offer different service plans to provide varying levels of resources or features for the same service. An example of service plans is a database service offering small, medium, or large plans with differing levels of concurrent connections and storage sizes. Service instances can be provisioned on demand by a user. The provisioned service provides a unique set of credentials that can be used to bind and connect an application to the service.
Chapter Summary
Cloud Foundry does more on your behalf with the use of all its features and the BOSH release engineering tool chain.
Cloud Foundry provides:
-
Built-in resiliency through automated recovery and self-healing of failed applications, components, and processes
-
Built-in resiliency through striping applications across different resources
-
Authentication and authorization for both users and applications, with the addition of role-based access control for users
-
Inbuilt security
-
The ability to update the platform with zero-downtime rolling upgrades across the system
-
Applications with the ability to connect to an infinite array of services via both platform-managed service brokers and services running in the existing IT infrastructure
-
Built-in management and operation services for your application, such as metrics and log aggregation, monitoring, auto-scaling, and performance management
Get Cloud Foundry now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.