Chapter 8. SPOF Testing
On November 12, 2014, for 90 excruciating minutes, customers of Google’s DoubleClick for Publishers (DFP) service experienced an outage. It is estimated that over 50,000 websites were affected, costing millions of dollars in lost advertising revenue. In addition to the direct loss of revenue, there was a secondary effect. Some websites that depended on DFP started experiencing outage-like behavior of their own. Users were unable to access these sites because the pages effectively froze waiting for network activity with the DFP server. This scenario is known as a single point of failure (SPOF) of frontend code, in which one weak link can take the whole site down.
Brian O’Kelley, CEO of AppNexus, operator of a large real-time online ad platform and a DoubleClick rival, estimated the disruption cost publishers $1 million per hour in aggregate.
Wednesday’s outages affected more than 55,000 websites, according to Dynatrace, which monitors website and web application performance for companies including eight out of the 10 largest retailers in North America.
A SPOF is able to happen due to the way browsers handle unresponsive servers. When a server experiences an outage similar to what happened to Google’s ad service, websites that depend on it fail to communicate. The browser’s normal recourse is to try again. As the browser unsuccessfully attempts to reach the downed server, the original request is left hanging. When this request ...