Chapter 4. Multicloud Security Use Cases

In Chapters 2 and 3, we covered multicloud security at the core and the edge, but there are other use cases for multicloud security that we need to explore in more depth. Some of the use cases described in this chapter are not applicable to every multicloud installation. But it is important to be aware of their existence in the event that they become relevant.

Each use case described in this chapter was designed to improve availability, reduce the threat of targeted application attacks, or mitigate the threat of DDoS attacks.

DNS Resiliency and Traffic Steering

When we talk about securing the edge of your multicloud architecture, it’s important to begin with DNS. Surprisingly, using DNS to enhance the security of a multicloud environment is often an afterthought, or not considered at all. But there are a lot of security enhancements that high-quality DNS infrastructure can provide.

Selection of a DNS provider is important because without reliable DNS your application will be effectively unreachable. The right DNS provider can enhance the security of web applications by improving resilience and optimizing traffic management.

DNS Resiliency

There are a couple of ways that the appropriate DNS architecture can improve the resiliency of your multicloud architecture. The first is by hosting primary and secondary DNS name servers on separate networks and different provider platforms. When registering a new domain, the registrar asks for a list of name servers. Most people default to the servers the registrar provides to them. Or, if they have a separate DNS provider, they use the name servers given by that provider.

But the DNS protocol allows for primary and secondary name servers. A primary name server is the authoritative name server that hosts the zone file, which contains the relevant information for a domain name or subdomain. The secondary name server receives automatic updates from the primary and does not need to reside on the same network. In fact, hosting the secondary name service on a different network is highly recommended. This increases the resiliency of the DNS setup and means that even if there were a complete outage within the primary provider, the secondary DNS service would continue to serve at least some traffic.

The second way that DNS providers promote resiliency is with the implementation of anycast protocols on their authoritative name servers. Anycast is a routing protocol commonly (but not always, so check with your provider) implemented by DNS providers as a way to improve availability and speed up responses to DNS queries. The anycast protocol allows multiple, geographically diverse servers to share the same IP address. For example, the “hints” file that sits on every recursive server points to 13 root servers, but those 13 servers actually mask more than 600 servers dispersed around the world. The IP address for each root server is an anycast address for multiple servers.

When choosing or changing DNS providers, it is important to find out whether the authoritative DNS servers used for your domain sit behind anycast IP addresses. Anycast doesn’t just act as a force multiplier, it actually helps speed up response by using existing routing protocols to ensure that the request for the IP address is fulfilled by the closest network server. DNS providers using anycast don’t just increase resiliency by having multiple DNS servers behind an anycast address. They also increase performance by ensuring the closest server fulfills each request.

DNS Traffic Steering

The DNS resiliency features discussed in the previous section are built in to DNS and help DNS work effectively in large-scale deployments. But there are features that some DNS providers include as add-ons that aren’t built in to the DNS protocol. This doesn’t mean that these features are any less valuable or important to security. It just means that fewer DNS providers offer them.

Most of these enhanced features revolve around traffic steering, the process of redirecting requests between different sites in a multicloud architecture based on a previously-defined set of rules. DNS sits in a unique position because it is the first step in the process for requesting content from a web application. As a result, DNS is an effective place to implement traffic steering rules.

For example, organizations in the process of adding a new site to a multicloud solution might want to ease it into the rotation to ensure that the new site is as resilient as the others. In a case like this, the organization might choose to direct the traffic-steering capable DNS server to send only 20% of the traffic to the new site.

A more common approach is to use a DNS provider to perform health checks on your services prior to responding to a DNS request with an answer. In this type of traffic steering set up, the DNS provider continuously monitors the health of all web properties, regardless of where they are hosted. If a cloud provider stops responding to requests, the DNS server will no longer direct DNS requests to that cloud provider until it is responding again.

Using DNS in this manner is not quite as reliable as using a load balancer, but it can be much more cost effective. Customers or end users will be able to access the web applications without interruption, even during a major outage at a cloud provider’s datacenter.

Finally, some DNS providers are able to use traffic steering as a way to maximize the performance of web applications. Performance optimization can take many forms. A common type of performance optimization is redirection of traffic based on geolocation information. In a multicloud environment, organizations might have the same web application running in dozens or even hundreds of locations. DNS providers can optimize incoming DNS requests and send visitors to the closest cloud providers. For example, suppose that the multicloud architecture is set up in datacenters in Miami, San Francisco, London, and Tokyo, and a DNS request comes in from a location in Atlanta. A DNS provider that has traffic steering capabilities can direct that request to the datacenter in Miami.

You can use these same techniques to identify potential new locations in which to expand the cloud architecture. For example, if the DNS provider begins recording an increasing number of DNS requests from Morocco, and this is confirmed as a growing trend, it might be time to consider expanding to a cloud provider with a datacenter in Africa.

Again, these capabilities are not inherently part of DNS and not all DNS providers support them. If your DNS provider does support them, these features can help to enhance the security and availability of your multicloud architecture.

Bot Management

Organizations running public-facing web applications have undoubtedly run into problems with bots. Bots are programs that automate activity across the internet. Some bots are useful, such as Google’s search crawler, whereas other bots engage in malicious activity. Malicious bot traffic accounts for almost 44% of all website traffic, according to Distil Networks.

Malicious bots engage in a wide range of activity from information stealing and automated hacking attempts to DDoS attacks. We discuss DDoS prevention in a later section. For now, let’s focus on preventing bots from stealing information or hijacking services.

Companies often use bots to target competitors’ sites and gather pricing information to ensure that their prices are lower. Although these simple bots are mostly an annoyance, more advanced bots can be used to cause serious damage.

For example, bots often target airline and hotel reservation sites. These bots will make reservations for flights or hotels, keeping legitimate clients from being able to make reservations and forcing those clients to go to a competitor. The bots then cancel the reservations, or simply let their carts expire (depending on the site and service), costing the airline or hotel site millions in lost revenue.

Other bots engage in even more malicious activity. These bots scan hundreds of millions of sites looking for specific flaws to exploit. It might be an XSS attack, or they might be scanning for sites with vulnerable versions of common applications, such as WordPress or Drupal. Upon finding a vulnerable site, they will exploit it and steal sensitive information or install malware or a form-jacking script to intercept credit card data.

Unfortunately, no matter how secure your site is, it is still possible to be attacked by these bots. With hundreds of thousands of bots scanning the internet around the clock, your security needs to be perfect. But the bad guys need to get lucky only once.

A multicloud installation of any size requires a cloud-based bot mitigation solution. Stopping malicious bots before they have a chance to interact with web applications keeps web services available for clients and end users. Outsourced bot protection services also offer several advantages over the do-it-yourself approach.

For starters, because these services have seen thousands of bots, they have the ability to detect bot traffic earlier than if you were trying to do it yourself. They effectively identify patterns because they are monitoring for signs of bot traffic across all of their customers, not just you. They can even detect bot traffic that is operating in “low and slow” mode, avoiding detection by accessing the target web application infrequently and from a range of IP addresses designed to look innocuous.

These services also have ways of challenging potentially suspicious traffic, while not disrupting service if the traffic is legitimate. One way that sites manage this type of behavior is through the use of CAPTCHAs, which are little challenges that are designed to distinguish human from bot. If you have ever seen the question, “How many of these pictures have traffic lights?” or “How many images contain cars?” you have experienced a CAPTCHA challenge.

Unfortunately, bots are getting very good at solving CAPTCHAs—some bots are better at it than a lot of people. Rather than relying on faulty CAPTCHAs to distinguish humans from bots, bot management services will try JavaScript challenges and other methods of querying the browser to make that distinction. Because bots don’t have full browsers behind them, they almost always fail these types of challenges.

Bot management services can significantly reduce the amount of malicious bot traffic that reaches your web application. Cloud-based bot management services can be quickly deployed across a multicloud architecture, and you can easily add or remove them as you scale up or scale down services within the multicloud environment.

API Protection

In Chapter 3, we discussed the importance of APIs in a multicloud architecture. APIs are used to connect all of the disparate services running in a multicloud environment and are critical for getting information from one source to another and presenting it in a unified manner to an end user or client.

This is why API protection is so important. Attackers have become wise to the fact that APIs can provide them with a treasure trove of sensitive information. As a result, these bad actors are constantly looking for ways to exploit APIs, including the use of bots.

API protection technology encompasses a number of different areas:

  • Limiting who or what can access APIs

  • Limiting how much data can be retrieved at any one time

  • Ensuring that all data transmitted via API is properly encrypted

Many of these protections can be put in place using a combination of an API gateway and a WAF. This means that APIs don’t necessarily require an investment in new technology. Organizations simply need to take advantage of the right API gateway and WAF features. Let’s take a closer look at the process of ensuring API protection.

For the most part, a human should never access an API directly. An API is designed to programmatically share information from one system to another. The second system will render the information and present it in a way that humans can understand and use. Clients and end users should automatically be blocked from making API calls, and those calls should be coming only from authorized systems that are part of the multicloud architecture or from trusted partners that might also be querying your backend systems.

Users are restricted from directly making API calls, but they can still find ways to manipulate authorized systems to make malicious API calls. This is why it is not enough to restrict which systems can make calls. The calls themselves should also be limited by the amount and type of data returned.

For example, an API call from a public-facing system should never reveal multiple usernames and passwords. (In fact, it is probably a bad idea to have the ability to reveal a password at all.) Additionally, there shouldn’t be a way to pull all customer names and email addresses from a public-facing system. Using the API gateway to control access and creating signatures on the WAF to block these types of queries can prevent data leaks from occurring, even if an API is unintentionally left exposed.

You also can use API gateways and WAFs to enforce encryption policies across API communication. Most API communication should be conducted using Transport Layer Security (TLS) encryption. If for some reason an API is not encrypted natively, the call can be forced to run across a VPN to provide the communication with some level of encryption.

APIs are a critical component of multicloud architecture. It takes a great deal of planning to deploy APIs securely and ensure that the data shared between different systems via API calls remains protected.

Application-Layer DDoS Protection

DDoS attacks are a reality that any organization hosting a public-facing web application must face. DDoS attacks can be launched at any time against any organization for any reason, or no reason. Protecting against these attacks should be part of your design plan.

There are two types of DDoS attacks that we focus on in this book: application-layer DDoS attacks and network-layer DDoS attacks.

As the name suggests, application-layer DDoS attacks target a specific web application or API, using up the Layer 7 (L7) resources while preventing legitimate users from accessing the services. Network-layer DDoS attacks flood the entire network with traffic, making all resources at your cloud provider unavailable, not just a specific web application or service.

Application-layer DDoS attacks are often more difficult to detect and stop because the attackers are making legitimate HTTP requests. As a result, sorting through the traffic and separating the attacker’s requests from legitimate requests can be a challenge.

There are a number of ways to implement effective application-layer DDoS protection. The most common method is to use a WAF to block malicious requests before they can reach the web application itself. This requires implementing a WAF that can process a high volume of traffic without slowing down legitimate requests.

It also often involves understanding the nature of each attack. The very nature of a DDoS attack means that the attack is distributed—originating from thousands or hundreds of thousands of IP addresses. The attacks can also blend in with legitimate traffic, at least at first glance. Fortunately, there are usually distinguishing features to application-layer DDoS attacks. These features allow security teams to build signatures that can be deployed to the WAF and stop that traffic without impeding the flow of legitimate traffic.

A cloud-based WAF enables organizations to quickly deploy these protections across the entire multicloud infrastructure. If your organization does not possess the skill sets required to identify these patterns and implement protections, it might be advantageous to take a look at managed WAF services, which are available from providers who can monitor and deploy protections on your behalf.

Another way to protect against some types of application-layer DDoS attacks is to use different types of challenges. We discussed this type of protection earlier when we took a look at bot protection. Some types of application layer DDoS attacks behave similarly to bots. In fact, they often use the same underlying technology. This means that they can be stopped by using many of the same techniques.

Presenting suspicious traffic with a CAPTCHA or a JavaScript challenge before they can proceed to the targeted web application will allow you to quickly distinguish between malicious and legitimate traffic. These types of interrogating behaviors don’t need to be applied universally. Instead, you can build rules that look for traffic patterns outside the norm and deploy the checks only when those patterns are identified. The advantage to this methodology is that you don’t need to identify suspicious traffic, only traffic that lies outside of normal behavior. A disadvantage of this approach is that you run the risk of slowing down legitimate requests, which can cause clients to abandon the web application entirely.

Network-Layer DDoS Protection

The second type of DDoS attack is one that occurs at the network layer. These attacks use network protocols, such as DNS, Network Transfer Protocol (NTP), or Memcached to flood an entire network with so much traffic that all of the systems are overwhelmed. The largest network-layer DDoS attack ever reported generated 1.35 terabits per second of sustained traffic, which is more than all but the largest of networks can handle.

Network-layer DDoS protection involves different types of security measures that should be implemented at the network layer. Most cloud providers will not be able to stop these attacks at your edge.

The trick is to put DDoS protections in place that will intercept malicious network traffic and stop it before it has a chance to even reach the cloud provider. DDoS protection services monitor for malicious traffic and stop it even before it can reach the edge of your architecture. Unlike application-layer DDoS attacks, network-layer attacks don’t “blend in” with existing traffic in your network, so there is little chance of disrupting legitimate traffic to your web application while stopping the DDoS attack.

This is another advantage of running a multicloud architecture. By building a geographically diverse environment that is hosted across multiple networks, you are building in redundancy that makes the web application less susceptible to network-layer DDoS attacks. Even if one site is temporarily taken offline, the other sites can manage the load.

As with other types of security measures, it is important to plan your DDoS mitigation strategy and test that plan repeatedly to verify that the sites function as planned during the different types of DDoS attacks.

Deep Internet Monitoring: Data Intelligence

We’ve touched on monitoring a few times throughout this book, but it’s worth a deeper examination. A multicloud infrastructure is complex by definition and requires a sophisticated monitoring solution. It is not simply a matter of monitoring the network infrastructure to ensure that it is up and running and monitoring the performance of the application. Organizations also need to understand how the web application performs from locations around the world.

Deep internet monitoring is about more than monitoring your architecture and how different sites connect to one another. It is about monitoring the performance of the internet itself. The internet is resilient, and a full outage is highly unlikely. But small, regional outages occur all the time. In 2018, there were 12,600 routing incidents worldwide. These types of disruptions can last for minutes, hours, or days and affect only part of the internet.

Organizations might be completely oblivious to these attacks as they occur. But they can affect the people trying to reach a web application no matter how much redundancy is in place. Your clients won’t care why there is a disruption; they’ll care only about the fact that they cannot reach your site. This is why it is so important to understand how your web application is performing from as many places as possible.

Collecting monitoring and performance data from a large number of sources and tracking performance over time allows organizations to better understand problems and react quickly to outages. These monitoring trendlines can help organizations pick the best cloud providers when they need to stand up additional infrastructure. They also help organizations determine which cloud providers are underperforming, ultimately saving money while organizations make the client experience better.

Combined Policy, Management, and Visibility

Everything described in this book works only if your organization combines policy, management, and visibility across all of your cloud providers. This is absolutely necessary, even though each cloud provider in a multicloud architecture has different capabilities and different provisioning and management platforms.

The only way for a multicloud solution to be efficient and effectively managed is if everything is viewed through the same platform on your end. Using a unified platform that is cloud-provider aware can make it easier to deploy, enforce policies, and manage and monitor all providers through the same “pane of glass.”

This also allows organizations to take advantage of efficiencies of scale. Organizations can quickly deploy to new datacenters, patch systems across all cloud providers simultaneously, and enforce new security policies, such as new DDoS protection rules. A centralized management and monitoring framework provides end-to-end visibility and gives organizations an early warning when systems are beginning to fail.

Complexity Requires Granularity of Policies

Although you can build out management and monitoring tools after the multicloud infrastructure is built, system policies should be decided prior to deployment. It is important to deploy compliant infrastructure from the start, whether these policies are related to security, information retention, or regulations such as the Payment Card Industry Data Security Standards (PCI DSS) or the Health Insurance Portability and Accountability Act (HIPAA).

Trying to retroactively make a non-policy compliant facility or system compliant after deployment is painful and can lead to gaps that violate the policy.

Policies also need to be granular to properly cover a multicloud deployment, especially if that multicloud deployment is spread across multiple countries. Different countries have different regulatory frameworks and might require various policies to remain compliant. For example, organizations using cloud providers in the European Union need to meet General Data Protection Regulation requirements. Understanding the policies for each location in which you are deployed and then enforcing those policies is a key part of the planning.

Beyond regulatory policies, organizations need to ensure that security is enforced in the same manner across all cloud installations. This is easier to do using containers over VMs. But you do need to worry about the hardware on which the containers operate. Understanding the patch and enforcement policies of each cloud provider and, if necessary, compensating for any deficiencies in their process should be part of your planning.

The complexity inherent in any multicloud solution can be somewhat mitigated by standardizing a management solution across all cloud providers that enforces policies and provides full visibility.

The Edge Allows for Simpler Managed Services Offerings

In the end, focusing security solutions at the edge rather than the core makes it easier to incorporate managed service offerings as part of your multicloud architecture. The organization will benefit from being able to quickly deploy these solutions across all cloud providers, and it will help keep costs down by using only the services it needs.

Edge deployment of managed services also makes it easier to deploy new managed services as needed. Organizations that are just starting the process of moving to the cloud might want to focus on deploying WAFs initially but hold off on other forms of protection. When you are ready to add additional services, it will be easier to find and deploy the right service across the entire multicloud architecture solution.

As web applications continue to grow, the managed services footprint can grow along with them. Organizations can add new providers and enable new services within existing providers. This type of deployment-on-demand also makes it easier to switch managed services offerings when a vendor isn’t performing properly or the organization simply outgrows its capabilities.

Conclusion

A multicloud architecture requires rethinking how security is deployed across all cloud providers. By securing the installation at the edge, organizations have the flexibility to deliver security across the entire architecture while increasing visibility and improving the performance of web applications.

By working closely with security partners, organizations can find solutions that fit their specific needs. These solutions can grow along with the web application and the organization itself.

To take full advantage of these solutions, organizations must first understand what the needs are and ask the right questions. Failure to do so can result in your being boxed into a solution that is not a good fit. With proper training and research, your organization can effectively secure multicloud architectures at the network edge.

Get Multicloud Architecture Migration and Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.