Chapter 4. Do More

“Here is my source code, run it on the cloud for me. I do not care how!”

Onsi Fakhouri, VP of Engineering of Pivotal Cloud Foundry

Platforms abstract the underlying infrastructure and middleware to offer a rich set of capabilities. These capabilities include providing runtime environments and services for running applications. Deploying and configuring middleware consumes time and prolongs release cycles; platforms remove the requirement for this individualized effort.

A primary goal of Cloud Foundry is to enable the development-to-deployment process to be as fast as possible. However, platform capabilities do not stop here. Platforms are intrinsically about doing more. Specifically, platforms take on more of the mandatory undifferentiated heavy lifting discussed in Chapter 1. This is beneficial; the less you are required to take on, the higher your velocity will be.

This chapter unpacks Cloud Foundry’s capabilities that remove the extraneous and undifferentiated heavy lifting from provisioning infrastructure, runtime environments, applications, and services. The application life cycle may be the same as in the client-server era, but developers can now iterate around that life cycle with velocity, using a self-service model every step of the way.

This chapter explores the following built-in platform capabilities:

Resiliency and fault tolerance through self-healing and redundancy
User management
Security and auditing
Application life-cycle management, including aggregated streaming of logs and metrics
Release engineering, including provisioning VMs, containers, middleware, and databases

When there is a technological ecosystem that spends considerable time and effort building these capabilities, it is prudent to leverage them for high-velocity delivery as opposed to spending time and effort building a bespoke, hand-crafted solution.

Resiliency and Fault Tolerance

Cloud Foundry provides built-in resiliency based on control theory. Control theory is a branch of engineering and mathematics that uses feedback loops to control and modify the behavior of a dynamic system. Resiliency is about ensuring that the actual system state (the number of running applications, for example) matches the desired state at all times, even in the event of failures; it is an essential but often costly component of business continuity.

Cloud Foundry automates the recovery of failed applications, components, and processes. This self-healing removes the recovery burden from the operator, ensuring speed of recovery. Cloud Foundry achieves resiliency and self-healing through:

Restarting failed system processes
Recreating missing or unresponsive VMs
Deployment of new application instances if an application crashes or becomes unresponsive
Application striping across availability zones to enforce separation of the underlying infrastructure
Dynamic routing and load balancing

Cloud Foundry deals with application orchestration and placement focused on even distribution across the infrastructure. The user should not have to care about how the underlying infrastructure runs the application beyond having an equal distribution across different resources (known as availability zones). The fact that multiple copies of the application are running with built-in resiliency is what matters.

Cloud Foundry provides dynamic load balancing. Application consumers use a route to access an application; each route is directly bound to one or more applications in Cloud Foundry. When running multiple instances, Cloud Foundry balances the load across the instances, dynamically updating its routing table. Dead application routes are automatically pruned from the routing table with new routes added when they become available.

Without the preceding capabilities, the operations team is required to continually monitor and respond to pager alerts from failed apps and invalid routes. By replacing manual interaction with automated, self-healing software, applications and system components are restored quickly with less risk and downtime. The resiliency concern is satisfied once, for all applications running on the platform, as opposed to developing customized monitoring and restart scripts per application. The platform removes the ongoing cost and associated maintenance of bespoke resiliency solutions.

User Access and Authentication Management

Role-based access defines who can use the platform and how. Cloud Foundry uses role-based access control (RBAC), with each role granting permissions to a specific environment the user is targeting. All collaborators target an environment with their individual user accounts associated with a role that governs what level and type of access the user has within that environment.

Cloud Foundry’s User Account and Authentication (UAA) is the central identity-management service for both users and applications. In addition, the UAA’s user-identity store can be configured by connecting to external user stores through LDAP or SAML. UAA is based in the latest of security standards like OAuth, OpenID Connect, and SCIM.

Security

Cloud Foundry protects you from security threats by applying security controls and isolating applications and data in the following ways:

It manages software-release vulnerability using new Cloud Foundry releases, created with timely updates to address code issues.
It manages OS vulnerability using a new OS created with patches for the latest security fixes.
It implements role-based access controls, applying and enforcing roles and permissions to ensure that users of the platform can only view and affect the resources they have been granted access to.
It secures both the code and the configuration of an application within a multitenant environment.
It deploys each application within its own self-contained and isolated containerized environment.
It prevents possible denial-of-service attacks through resource starvation.
It provides an operator audit trail showing all operator actions applied to the platform.
It provides a user audit trail recording all relevant API invocations of an application.
It implements network traffic rules (security groups) to prevent system access from and to external networks, production services, and between internal components.

Why is this important? Securing distributed systems is complex. For example, think about these issues:

How much effort is required to automatically establish and apply network traffic rules to isolate components?
What policies should be applied to automatically limit resources in order to defend against DoS attacks?
How do you implement role-based access controls with inbuilt auditing of system access and actions?
How do you know which components are potentially affected by a specific vulnerability and require patching?
How do you safely patch the underlying OS without incurring application downtime?

These examples are standard requirements for most systems running in corporate datacenters. The more bespoke engineering you use, the more you need to take on securing and patching that system. Distributed systems increase the security burden because there are more moving parts. Additionally, when it comes to rolling out security patches to update the system, many organizations suffer from configuration drift.

The Challenge of Configuration Drift

Deployment environments (such as staging, QA, and production) are often complex and time-consuming to construct and administer, producing the ongoing challenge of trying to manage configuration drift to maintain consistency between environments and VMs. Reproducible consistency through release engineering tool chains, such as Cloud Foundry’s BOSH component, addresses this challenge.

Cloud Foundry manages OS and software-release vulnerability using a new OS and new software releases, created with the required patches for the latest security fixes and code remediation.

Cloud Foundry eases the burden of rolling out these OS and software-release updates. Every component within Cloud Foundry is created with the same OS image. To patch Cloud Foundry, you do not apply the patch to a running OS or component; instead, you redeploy Cloud Foundry with an updated OS or software release. Cloud Foundry’s BOSH component redeploys updates component by component to ensure zero-to-minimal downtime. This removes the patching and updating concerns from the operator and provides a safer and more resilient way to update Cloud Foundry while keeping applications running.

In addition to patching, if for any reason a component becomes compromised, it can instantly be recreated using a known and clean software release and OS image, and the compromised component can be removed into a quarantine area for further inspection.

This ability to redeploy Cloud Foundry components at will, from a known, healthy OS image and software release, and with zero-to-minimal downtime, provides an additional level of security and resiliency to the system. Your applications remain available for longer through a simple mechanism for applying updates.

The Application Life Cycle

Typically, in most traditional scenarios, the application developer:

Develops an application
Deploys application services
Deploys an application and connects (binds) it to application services
Scales an application, both up and down
Monitors an application
Upgrades an application

This application life cycle is in play until the application is decommissioned and taken offline. Cloud Foundry simplifies the application life cycle by offering self-service capabilities to the end user. Adopting a self-service approach removes handoffs and potentially lengthy delays between teams. For example, the ability to deploy an application, provision and bind applications to services, scale, monitor, and upgrade are all offered by simple call to the platform.

Traditionally, deploying application code required the provisioning and deploying VMs, operating systems, and middleware to create a development environment for the application to run in. Once that environment was provisioned, it required patching and ongoing maintenance. New environments were then created as the application moved through the deployment pipeline.

With Cloud Foundry, the application or task itself becomes the single unit of deployment. Developers no longer need to concern themselves with which application container to use, which version of Java, and which memory settings or garbage-collection (GC) policy to employ. They just push their application to Cloud Foundry, and it runs. Cloud Foundry removes the cost and complexity of configuring infrastructure and middleware per application. Using a self service model, users can:

Deploy applications
Provision and bind additional services, such as messaging engines, caching solutions, and databases
Scale applications
Monitor application health and performance
Update applications
Delete applications

Zero-Downtime Upgrades

Applications running on the platform can be updated with zero downtime through a technique known as blue-green deployments.

Removing the infrastructure, OS, and middleware configuration concerns from developers allows them to focus their whole effort on the application instead of deploying and configuring the supporting technologies. This keeps the development focus where it needs to be, on the business logic that generates revenue.

Aggregated Streaming of Logs and Metrics

Cloud Foundry provides insight into both the application and the underlying platform through aggregated logging and metrics. The logging and metrics system within Cloud Foundry is the inner voice of the system, telling the operator and developer what is happening. It is used to manage the performance, health, and scale of running applications and the platform itself.

Logs provide visibility into the behavior of running applications and system components, while metrics provide visibility into the health of components running the application. Operators can use metrics information to monitor an instance of Cloud Foundry.

Insights are obtained through storing and analyzing a continuous stream of aggregated, time-ordered events from the output streams of all running processes and backing services. The benefits of aggregated log, metric, and event streaming include the following:

Logs are streamed to a single endpoint.
Streamed logs provide timestamped outputs per application.
Both application logs and system-component logs are aggregated, simplifying their digestion.
A metrics collector gathers and streams metrics from the system components.
Operators can use metrics information to monitor an instance of Cloud Foundry.
Logs can be viewed from the command line or drained into a log management service such as an ELK stack or Splunk.
Events show specific events like an application being started or stopped. Viewing events is useful when you are debugging problems by identifying crash information, such as a memory limit being exceeded.

The cost of implementing an aggregated log and metrics-streaming solution involves bespoke engineering to orchestrate and aggregate the streaming of both syslog and application logs from every component within a distributed system into a central server. Using a platform removes the ongoing cost and associated maintenance of bespoke logging solutions.

Release Engineering through BOSH

IT operations are tasked with achieving operational stability. Historically, operational stability was achieved by reducing risk through limiting change. Limiting change is in direct conflict to shipping smaller features frequently. In order to manage risks involved in frequent software releases, release-engineering tool chains are used. Release engineering is the part of the operations team typically concerned with turning source code into finished software components or products through the following steps:

Compilation
Versioning
Assembly/packaging
Deploying

Automating release-engineering concerns through a tool chain reduces risk, allowing for faster deployments with little to no human interaction.

Release-engineering tool chains are essential because they provide consistent repeatability. Source code, third-party components, data, and deployment environments of a software system are integrated and deployed in a repeatable and consistent fashion, with a historical view to track all changes made to the deployed system. This provides the ability to audit and identify all components that make up a particular release. Security teams can easily track contents of a particular release and recreate it at will if the need arises. Consistent repeatability de-risks software releases.

Cloud Foundry leverages a release-engineering tool chain known as BOSH. BOSH is a recursive acronym for BOSH Outer Shell. The outer shell refers to BOSH being a release tool chain that unifies release-engineering, deployment and life-cycle management of cloud-based software. BOSH is designed for large distributed systems such as Cloud Foundry but can equally be used to deploy smaller individual components such as etcd or redis.

Rather than leveraging a bespoke integration of a variety of tools and techniques that provide solutions to individual parts of the release engineering goal, BOSH is designed to be a single tool covering the entire set of requirements of release engineering. BOSH enables software deployments to be:

Automated
Reproducible
Scalable
Monitored with self-healing failure recovery
Updatable with zero-to-minimal downtime

BOSH translates intent into action via repeatability by always ensuring that every provisioned release is identical and repeatable. This removes the challenge of configuration drift.

BOSH configures infrastructure through code. By design, BOSH tries to abstract away the differences between infrastructure platforms (IaaS or physical servers) into a generalized, cross-platform description of your deployment. This provides the benefit of being infrastructure agnostic (as far as possible).

BOSH performs monitoring, failure recovery, and software updates with zero-to-minimal downtime. Without such a release-engineering tool chain, all these concerns remain the responsibility of the operations team. A lack of automation exposes the developer to unnecessary risk.

Deploying and Scaling

Deploying and scaling applications are completely independent operations. This provides the flexibility to scale at will, without the cost of having to redeploy the application every time. Through commercial products such as Pivotal Cloud Foundry, auto-scaling policies can also be set up for dynamic scaling of applications when they hit certain configurable thresholds.

A Marketplace of On-Demand Services

Applications often require additional services. For example, they may require a persistent datastore for storing information or a message broker for communicating with other applications. Each environment within Cloud Foundry has the concept of a marketplace. A marketplace lists the services that have been made available to that specific environment by the platform operator. A service can offer different service plans to provide varying levels of resources or features for the same service. An example of service plans is a database service offering small, medium, or large plans with differing levels of concurrent connections and storage sizes. Service instances can be provisioned on demand by a user. The provisioned service provides a unique set of credentials that can be used to bind and connect an application to the service.

Chapter Summary

Cloud Foundry does more on your behalf with the use of all its features and the BOSH release engineering tool chain.

Cloud Foundry provides:

Built-in resiliency through automated recovery and self-healing of failed applications, components, and processes
Built-in resiliency through striping applications across different resources
Authentication and authorization for both users and applications, with the addition of role-based access control for users
Inbuilt security
The ability to update the platform with zero-downtime rolling upgrades across the system
Applications with the ability to connect to an infinite array of services via both platform-managed service brokers and services running in the existing IT infrastructure
Built-in management and operation services for your application, such as metrics and log aggregation, monitoring, auto-scaling, and performance management

Get Cloud Foundry now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Cloud Foundry by Duncan C. E. Winn