Chapter 4. The Path to the Cloud

There is no question that, whether public or private, cloud computing reigns as the new industry standard. This, of course, does not mean everything shifts overnight, but rather that data architects must ensure that their decisions fit with this path forward.

In this chapter, we take a look at the major shifts moving cloud architectures forward, and how you can best utilize them for data processing.

Cloud Is the New Datacenter

Today, cloud deployments have become the preferred method for new companies building data processing applications. The cloud has also become the dominant theme for traditional businesses as these organizations look to drive new applications and cost-optimize those already existing.

Cloud computing has essentially become the shortcut to having your own datacenter, albeit now with on-demand resources and a variety of built-in services.

Though early implementations of cloud computing came with some inherent differences compared to traditional datacenter architectures, those gaps are closing quickly.

Architectural Considerations for Cloud Computing

Understandably, cloud computing has a few architectural underpinnings different from traditional on-premises deployments. In particular, server persistence, scalability, and security need a new lens (Figure 4-1).

dwaa 0401
Figure 4-1. Architectural considerations for cloud computing

Persistence

Perhaps one of the most noticeable differences between traditional on-premises and cloud architectures is server or machine persistence. In the on-premises world, individual servers ran specific applications and architects worked diligently to ensure that each individual server and corresponding application had a high availability plan, typically implemented with redundancy.

In the cloud world, servers are much more ephemeral, and persistence is more often maintained outside of the server itself. For example, with the popular AWS offerings, the server might rely on storage options from S3 or Elastic Block Storage to maintain persistence. This approach understandably requires changes to conventional applications.

That said, it is and should be the new normal that, from an application perspective, cloud servers are persistent. That is, for the cloud to be successful, enterprises need the same reliability and availability from application servers that they saw in on-premises deployments.

Scalability

Conventional approaches also focused on scale-up computing models with even larger servers, each having a substantial compute and memory footprint. The cloud, however, represents the perfect platform to adopt distributed computing architectures, and this might be one of the most transformative aspects of the cloud.

Whereas traditional applications were often designed with a single server in mind, and an active–passive or active–active paired server for availability, new applications make use of distributed processing and frequently span tens to hundreds of servers.

Security

Across all aspects of computing, but in particular data processing, security plays a pivotal role. Today cloud architectures provide robust security mechanisms, but often with specific implementations dedicated to specific clouds or services within a designated cloud.

This dedicated security model for a single cloud or service can be challenging for companies that want to maintain multicloud architectures (something we discuss in more detail in Chapter 9).

Moving to the Cloud

Given cloud ubiquity, it is only a matter of time before more and more applications are cloud-based. Although every company has its own reasons for going to the cloud, the dominant themes revolve around cost-optimization and revenue creation, as illustrated in Figure 4-2.

dwaa 0402
Figure 4-2. Economic considerations for moving to the cloud

Cost Optimization

Building and operating a datacenter involves large capital expenditures that, when coupled with software and maintenance costs, force companies to explore cost-savings options.

Startup costs

Startup costs for cloud architectures can be low, given that you do not need to make an upfront investment, outside of scoping and planning.

Maintenance cost

Because many cloud offerings are “maintained” by the cloud providers, users simply consume the service without worrying about ongoing maintenance costs.

Perpetual billing costs

This area needs attention because the cloud bills continuously. In fact, an entire group of companies and services has emerged to help businesses mitigate and control cloud computing costs. Companies headed to the cloud must consider billing models and design the appropriate governance procedures in advance.

Revenue Creation

Businesses further cloud deployments with expectations to drive new revenue streams, and benefit from the following approaches.

Rapid prototyping

With virtually unlimited resources available on demand, companies can rapidly set up new application prototypes without risking significant capital expenditures.

Temporary application deployments

For cases in which a large amount of computing power is needed temporarily, the cloud fills the gap. One early cloud success story showcased how The New York Times converted images of its archive using hundreds of machines for 36 hours. Likely without this on-demand capability, the solution would have been economically impractical. As Derek Gottfrid explains in a Times blog post:

This all adds up to terabytes of data, in a less-than-web-friendly format. So, reusing the EC2/S3/Hadoop method I discussed back in November, I got to work writing a few lines of code. Using Amazon Web Services, Hadoop and our own code, we ingested 405,000 very large TIFF images, 3.3 million articles in SGML and 405,000 XML files, mapping articles to rectangular regions in the TIFF’s. This data was converted to a more web-friendly 810,000 PNG images (thumbnails and full images) and 405,000 JavaScript files—all of it ready to be assembled into a TimesMachine. By leveraging the power of AWS and Hadoop, we were able to utilize hundreds of machines concurrently and process all the data in less than 36 hours.1

New applications

Nothing gets businesses moving on new applications quite like an opportunity for revenue. One emerging area these days is Customer 360, and being able to assemble a wide variety of data from disparate sources to create a unified picture of customer activity.

Because application data in these cases comes from a variety of web and mobile sources, the cloud is a perfect aggregation point to enable high-speed collection and analysis.

Choosing the Right Path to the Cloud

When considering the right choices for cloud, data processing infrastructure remains a critical enablement decision.

Today, many cloud choices are centered on only one cloud provider, meaning that after you begin to consume the offerings of that provider, you remain relatively siloed in one cloud, as depicted in Figure 4-3.

dwaa 0403
Figure 4-3. A single cloud provider approach

However, most companies are looking toward a hybrid cloud approach that covers not only public cloud providers but also enterprise datacenters and managed services, as shown in Figure 4-4.

dwaa 0404
Figure 4-4. The multicloud approach

The multicloud approach for data and analytics focuses on solutions that can run anywhere; for example, in any public cloud, an enterprise datacenter, or a managed service. With this full spectrum of deployment options available, companies can take complete advantage of the cloud while retaining the flexibility and portability to move and adapt as needed.

Get Data Warehousing in the Age of Artificial Intelligence now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.