Chapter 4. Evaluating Cloud-Based Mitigation Vendors
We live in the world where cloud computing, essentially rented computing capacity, is commonplace. Vendors such as Amazon Web Services (AWS) and Microsoft Azure allow you to utilize their computing power without building your own. Among the broad umbrella of cloud computing services, there are subcategories such as Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS).
In this chapter, we will focus on DDoS mitigation vendors who fall under the SaaS model, where they offer their software as a service, often charging a license fee to start and a metered usage fee when you use their services. As active DDoS practitioners, we are familiar with current vendor brands and offerings in the marketplace today. However, we want to focus on the technologies and features instead of any particular vendor brands or their “secret sauce.”
In this chapter, we want to answer the question of whether to build your own on-premise DDoS solution, buy the service from a cloud-based provider, or do both. By understanding the pros and cons of using a cloud-based DDoS mitigation provider, you can start to reflect back to your own network and conclude with your own answer to the build-versus-buy question.
We will dig deeper into the operational model and techniques of a cloud DDoS scrubbing center. The chapter will conclude with an evaluation checklist derived from the topics we covered.
Why Use Cloud-Based DDoS Mitigation?
The advantages of using a cloud-based DDoS mitigation solution are very similar to the reasons you would use a cloud-based solution for your infrastructure. Some of the advantages, such as lowered cost and faster time-to-build, are self-evident. However, there are additional advantages that are specific to cloud-based DDoS mitigation solutions, such as real-time updated attack patterns. Let’s take a look at the main advantages of having a cloud-based DDoS mitigation solution.
Overall Cost Savings
Let’s face it—like most infrastructure components, building an effective DDoS mitigation solution can be complicated and entail upfront investment. The investment can be in the form of time, money, and knowledge. After all, not only does the company need to buy hardware, but they also need to train staff to operate the hardware, set up the protection parameter, and constantly update and adjust to the state of their infrastructure.
Two of the often overlooked costs of setting up an on-premise DDoS mitigation solution are the average cost per mitigation and the cost of solution upkeep. Not all companies are targeted the same way by attackers. Generally speaking, the bigger the company, the more attacks and attack varieties they receive. But that is not always the case—for example, a small online radio station advocating opposition views to a dictatorship country elsewhere can often be a target of state-sponsored DDoS attacks. It stands to reason that the more you experience attacks, the less money you spend per attack from your investment. The cost savings of build versus rent should be compared against the level and sophistication of the DDoS attacks that you receive.
The cost of upkeep is also another hidden cost that may be overlooked. For example, if your on-premise solution uses a blacklist of IP addresses, there is a certain upkeep cost of keeping the list up-to-date. With the size of attacks getting bigger by the month, hardware often needs to be refreshed and adjusted to accommodate bigger attacks.
All of the costs mentioned here mostly exist regardless of whether it is on-premise or in the cloud. However, the cloud mitigation provider can often aggregate the demands from several customers, which results in lower cost per customer. The customer does not need to keep an always-on solution and is charged based on usage. This might speak well for companies that favor variable operation cost instead of upfront fixed cost.
Proven Operating Procedure and Knowledge
As discussed, not all companies are DDoS attacked at the same frequency or size. If the attacker is motivated by ransom money, it makes sense to attack a more established company. On the other hand, smaller companies often do not experience DDoS attacks unless triggered by an event. A disgruntled former employee who decides to spend a few dollar to attack his or her former employer is not unheard of, but will likely catch the on-call engineer off guard due to its rarity.
An IT department might still want to be prepared for so-called Black Swan events but cannot afford to invest in the ever-changing landscape of long-tailed events. The cloud provider, in this case, can give the customer a proven standard operating procedure (SOP) for each of these events and provide a guiding hand. This is especially true for enterprises that need to support a variety of technologies but on a smaller scale. They simply do not have the manpower or resources to go deep into a particular technology vertical.
More Network Visibility and Fewer Bottlenecks
The internet is a giant, enormous web of connected computers and networks. The inter- and intra-network connections vary in size: some interconnections are big whereas others are small, ranging from tens to hundreds of gigabits per second in the core to kilobits per second on the edge. As a network administrator, you will have control over your own network but the other networks on the internet are outside of your control.
The DDoS attacks that exhaust network bandwidths take advantage of the fact that enterprise or small service providers have to be connected to other networks but often have only a limited amount of exit points. How well you defend against these type of attacks depends on whether you can stop the attack closest to the source and how well you can balance your incoming traffic.
For example, a common practice for large-scale networks is to have a presence at strategic locations called internet exchange points. Even though these network administrators have no control over other networks, they are able to have enough exposure at these exchange points to make better judgments about the source of the attacks and therefore keep the network resources available. The cloud-based mitigation providers are usually better positioned to be at such internet exchange points than typical enterprise and smaller network operators.
The network footprint is such an important factor in DDoS mitigation that the cloud-based providers often list their presence at the various exchange points. Another way to think of it is: if your public exposure surface is big, then it takes more attacks to cover all of the surfaces.
Dedicated Staff and Better Reaction Time
Because the core competency of the cloud-based mitigation provider is indeed DDoS mitigation, they can dedicate more resources to the effort. As many business owners can tell you, one of the most expensive resources is human capital. A well-trained, experienced engineer who is well versed in different DDoS attack pattern is worth their weight in gold. The cloud-based mitigation provider can often afford to have many experienced engineers in-house.
The providers can oftentimes provide dedicated teams to monitor the internet DDoS weather by investigating darknet activities, and closely monitoring newly disclosed vulnerabilities and any network anomalies. These actions can lead to faster reaction times when the DDoS happens and eventually less time to restore services.
It cannot be stressed enough that the biggest value cloud providers bring to the table is the aggregation ability to spread the cost amongst different entities. This directly results in lower fixed and variable cost per attack, better operational maturity, quicker responses, and more experience in defending against DDoS attacks.
When Not to Use Cloud-Based DDoS Mitigation
There are many advantages to using a cloud-based DDoS mitigation provider, as we have seen in the previous section. In this section, we will look at the other side of the coin and examine some of the reasons you might not want to utilize a cloud-based solution.
Many of the differences boil down to a rent-versus-own comparison. When you rent something, typically you pay a fee for the right to use the item for a limited period of time, with limited control over the item. However, when you own the item, you are able to have full control and modification rights. You are able to modify and tweak the solution to be 100% compatible with the rest of your infrastructure.
Let’s look at the reasons you would not want to use a cloud-based DDoS solution in more detail.
The biggest reason to build your own on-premise DDoS mitigation solution might come down to control. If you have the solution on premise, you have total control as opposed to handing over control to the vendor. The control is both in terms of giving up some of the control over your own network as well as lack of control over the vendor’s mitigation strategy. Much like using a cloud provider for your compute and storage needs, if you utilize cloud-based DDoS mitigation, you are extending your network to an outside vendor. You will need to be comfortable with the amount of control you are giving up based on the type of mitigation technique.
Trusting Your Vendor
It is worth stressing one more time that, if choosing to go with a cloud provider, your service availability is in their hands. This point is sometimes missed in the minds of many because this is not seen until a breach event.
In a reactive mode, there are generally two types of traffic redirection techniques utilized by cloud-based DDoS mitigation to direct traffic to them: DNS redirection and BGP network advertisement. Both of these require giving the cloud-based provider the rights to redirect traffic from your own premise to theirs so they can scrub the traffic clean by filtering out malicious traffic.
The DNS redirection schema requires a change of DNS mapping from your original IP address to one that is owned by the cloud provider. This is less intrusive but could take some time while the DNS change is being propagated throughout the internet.
As many of you already know, DNS change is slow to propagate and relies on your end user’s setting. Cloud-based DDoS providers that use DNS often drop the TTL for the protected domain to one second, so they can make changes and have them updated instantly.
However, a lot of ISPs won’t honor a one-second TTL on their recursive DNS servers, and most organizations rely on their ISPs’ recursive DNS servers. So, even though the protection is active and enabled, a large chunk of the internet might not be going to the right place.
The BGP network advertisement change is immediate and requires the cloud provider to advertise your block on your behalf. For example, if your company owns 220.127.116.11/24 block and you advertise it to the upstream provider, the cloud provider will now advertise the same block. Due to the nature of internet routing and common practice, you need to have a registered public IP block that is bigger than /24 (254 public IP address) in order to utilize this method.
Other Network Restrictions
There are generally more restrictions on BGP redirection than DNS change. Some providers require separate public IP blocks for establishing tunnels, and most of them recommend modifying an upstream access list to only allow tunnel traffic inbound during mitigation. Please check carefully during the evaluation stage to make sure you are comfortable with what they are recommending.
Once the network is redirected to the cloud provider, the inbound traffic will be scrubbed and the clean traffic will be passed back. Using parenting as an analogy, having your network traffic traverse through someone else’s network is almost like having your kids sleep over at other people’s house. There is always an uneasy feeling about it regardless of how much you trust the other party.
This lack of control can somewhat be easier to accept if you are already using the third party for other services, such as CDN, and the DDoS mitigation is another service that is added on top of it. If you are in this camp, making plans for vendor diversity would be a good idea.
Another area that makes cloud-based mitigation unfavorable is the lack of customization you can extend for your company. Because your company’s business model is different than other businesses, your traffic pattern is often different. A gaming company’s network traffic is very different than, say, a web hosting company’s traffic. For example, if your network’s traffic is only going to be small packets with 64 bytes of payload, an effective DDoS mitigation strategy might be to drop any traffic with a payload larger than 64 bytes.
Customization options when you utilize cloud-based mitigation are limited. The providers build the solution that appeals to the majority of the customers they intend to serve but neglect long-tail requirements. This is one of the necessary trade-offs for demand aggregation. There are certain knobs and switches you can leverage but the scope of change is within the control of the vendor. You are purchasing a predefined service that is suitable for 80% of the customers; if you happen to be the 20% need some customized work, the chance of the vendor catering to your needs is slim to none.
Earlier in the section, we covered the traffic redirection technologies that required a close collaboration between the user and the cloud-mitigation vendor. In an always-on scenario, the setup is even more tightly integrated. There are marketing materials that may lead people to believe that because the vendor exists in the cloud and the customer pays for the usage fee only, they can be free to switch vendors if need be. However, we would argue that if done correctly the setup is so integrated that the DDoS vendor lock-in is sometimes even more so than an on-premise solution. You might be able to switch between AWS and Azure when it comes to launching virtual machines in the cloud, but imagine needing to change your internet peering, upstream access list, or DNS authoritative pointing—these are not something you can switch at a moment’s notice.
This vendor lock-in might be even more of a consideration since the field is somewhat new and full of start-up companies. What if the vendor is bought or become insolvent? What is the cost of alternative and ramp-up time if you were to switch? The potential cost of switching might make a cloud-based DDoS solution less desirable than originally thought.
Companies that need top security clearance might not have an option to use cloud-based providers due to security boundary concerns. In the United States, there are regulations regarding traffic that deals with consumer privacy, healthcare records, and government entities. In certain parts of Europe, such as Germany, data sovereignty is required where data that originated in one country cannot leave the country.
If your company operates with business certifications such as the ISO 9000 family of quality assurance certifications, you need to make sure your cloud provider is in line with the necessary qualifications.
If your company operates in one of the vertical markets or countries with laws regarding traffic patterns, you will need to pay close attention when considering whether to use a cloud-based mitigator. This is not to say a cloud mitigation provider cannot operate within the guidelines of your requirements, but be aware that not all cloud-based mitigation providers can conform to the security boundaries needs.
Cloud-Based DDoS Mitigation Methods
So far in this chapter, we have looked at the reasons to use and not to use cloud-based DDoS mitigation vendors. We covered some of the basic operations as they support our reasoning. In this section, we will take a more in-depth look at the operations of cloud-based mitigation vendors.
The cloud-based DDoS mitigation vendor life cycle consists of detection, mitigation, and reporting—some of which can be a hybrid model, such as integrating some of the reporting into your on-site tools. As more companies migrate services toward the cloud, the DDoS mitigation strategy should increasingly adopt the hybrid model. The cycle is a continuum, consisting of the triggered event, traffic reporting, evaluation, and then feedback to create better future detection and mitigation.
DDoS Detection Mechanism in the Cloud
The DDoS detection mechanism in the cloud is not much different than the on-premise detection mechanism. The two most important mechanisms for DDoS detection are NetFlow and packet traces. The exception would be that for NetFlow, the export destination could be at an external public IP. Exporting NetFlow data might be outside the comfort level of some customers, in which case the vendor might provide the customer with a vendor-controlled virtual machine as the NetFlow collection, and manage the virtual machine jointly with the customer. The general process is shown in Figure 4-1. This setup also applies to log collection for the cloud-based vendor.
Packet traces remain an important part of DDoS detection, especially for application-level attacks. This is often done by placing agents able to perform packet capture at strategic locations inside of the customer’s network. One thing that is different from the cloud-based solution when doing packet capture is that oftentimes due to the risk of exposing customer data, only header data is sent over to the cloud-based vendor or traffic is differentiated to remove risk before being forwarded on.
There is a growing trend of real-time analysis of data. Since the cloud provider typically aggregates data feeds from various customers, coupled with big data analysis, they can arguably detect reoccurring or reused DDoS attacks better than the on-premise solution since the customer only has a limited amount of data.
DDoS Mitigation Mechanism in the Cloud
The DDoS mitigation mechanism in the cloud requires careful consideration and upfront work before the DDoS event happens. Compared to an on-premise setup where you have complete control, in the case of an external scrubbing center, the traffic shift and management at each point needs to be mapped out. Also worth pointing out is that the traffic shift typically does not take place right away and may not shift due to external factors. For example, in the BGP advertisement model where the cloud provider advertises your public block with higher preference, you can use typical BGP attributes to influence the decision making, but there are always ways for other parties to override your “suggestions.” In other words, traffic shift in BGP is more of an art than science.
The first cloud-based mitigation method is an always-on solution. The most common always-on deployment architecture is for the customer to couple the service with the Content Delivery Network (CDN). If you already use a particular vendor for CDN to distribute your content, they are already your gatekeeper with firsthand information about the traffic. It could make sense to do an extra layer of analysis to drop any suspicious traffic. In Example 4-1, we can see an example of Cisco.com pointed to an Akamai (one of the major CDN providers in the world) edge network.
The trade-off in this model is that you are putting a lot of trust that the CDN network will be up 100% of the time. You are also giving up the visibility of customer traffic since you are one layer removed from them. Since this is an always-on model, you are likely to pay for an always-on upkeep fee as well as a traffic-based fee when the DDoS attack happens. Also, keep in mind that this mitigation method only involves traversing through hostnames via DNS; if the attack is directed toward IP address, this mitigation does not take effect.
Example 4-1. Cisco.com CNAME points to Akamai Edge
$ dig www.cisco.com <skip> ;; QUESTION SECTION: ;www.cisco.com. IN A ;; ANSWER SECTION: www.cisco.com. 3406 IN CNAME www.cisco.com.akadns.net. www.cisco.com.akadns.net. 300 IN CNAME wwwds.cisco.com.edgekey.net. wwwds.cisco.com.edgekey.net. 18911 IN CNAME wwwds.cisco.com.edgekey.net.globalredir.akadns.net. wwwds.cisco.com.edgekey.net.globalredir.akadns.net. 3406 IN CNAME e2867.dsca.akamaiedge.net. e2867.dsca.akamaiedge.net. 20 IN A 18.104.22.168
If you prefer to stay away from an always-on solution, you can choose to redirect your traffic in a reactive mode after a DDoS attack occurs. In this case, you can utilize a DNS change or redirection from your own server to the cloud-based scrubbing center. The dirty traffic will be dropped and clean traffic will be sent back to your premises. This path from cloud vendor back to your premises can be a physical link, but more likely will be a virtual tunnel from the provider to your equipment. There are three key items that need extra attention:
Where does the authoritative DNS record reside? This is the party responsible for making the DNS record change.
The path of the clean traffic path needs to be carefully planned out and it is strongly recommended that it be established and tested prior to the actual DDoS event. In the case of the virtual tunnel, the customer needs to make sure in the case of traffic congestion due to a DDoS attack, the traffic can still make its way back to your premises.
The tunnel endpoint should not be allowed to be targeted externally.
Another way to redirect traffic without a DNS change would be a BGP advertisement change. In this case, the public IP block with your resource is advertised by the cloud provider on your behalf upon detection of an attack. The operation itself is pretty straightforward; however, the devil is always in the details. Two key items that you should make sure in this scenario are:
- The cloud mitigation provider is well established and well peered in the various internet exchange points. A good resource for checking is www.peeringdb.com.
- The cloud provider has set up agreements with its peers for them to advertise your IP block. To prevent BGP hijacking, service providers now typically implement access checking to make sure the advertisements they receive are from legitimate owners. Since the cloud provider is advertising on your behalf, you would need to authorize the cloud provider to advertise on your behalf. In Figure 4-2, the looking glass provided by NTT America can be a tool used to see how the BGP prefix is viewed inside of the NTT America network.
DDoS Event Reporting
Since you do not have visibility into the cloud provider’s devices and network, a solid reporting and feedback loop is even more important in the setup. The rule of thumb is: the more reporting the better, but there is no right or wrong answer on how much reporting is needed.
At the very least, we believe a near real-time report of the start of the event, the anomaly detected, and end of the event, as well as various network statistics such as packets-per-second and bandwidth utilization, are required. A more useful and improved reporting mechanism would be a uniformed reporting mechanism from the cloud provider that can be managed via API so the customer can ingest and analyze the data automatically.
You need to also understand that traffic is being handled as perceived by the provider and realized by internal tooling in an overlay fashion. In Figure 4-3, we can see an example of alert reporting from one of the cloud-based DDoS detection providers.
You can often mix and match many of the techniques above. Just as the attacks have gotten multimetric, mitigation solutions have gotten more sophisticated as well. For example, you can utilize the always-on model for a general scrubbing for the big volumetric attacks, then use an on-premise equipment for application-level attack detection. In the case where the on-premise equipment runs out of capacity, as a third option, you can redirect your traffic to the cloud mitigation provider.
A hybrid model is what we would recommend if you are able to do so. Unfortunately, in a hybrid model, you are essentially building two sets of mitigation solutions and all of the items we mentioned in the chapter are applicable to you. The bright side is that you will hopefully enjoy the benefits of both on-premise and cloud-based solutions.
In this chapter, we looked at the reasons why one would or would not use cloud-based DDoS mitigation providers, as well as the methods of utilizing cloud-based DDoS providers. Let us summarize by putting the items on a checklist:
Needless to say, the checklist is not a one-time process. The landscape of attack and mitigation is always shifting, so you should revisit your DDoS mitigation strategy every few months to make sure the setup still fits your needs. For example, you might initially choose to deploy a strategy involved cloud-based mitigation, but as your company grows you can decide to build your own on-premise mitigation.