People, systems, and the California fires
Watching the O'Reilly teams accept the challenge of the Kincade fire and subsequent power outages was nothing short of remarkable.
Natural disasters are fundamentally about how people deal in the worst of circumstances. In California, it’s the recent horrific fires that have tested how people come together in the aftermath to survive and thrive. California communities have demonstrated the kind of social cohesion and mutual support that is an essential part of responding to a disaster. But surviving and thriving in a systems world doesn’t just happen because of resilience in the aftermath. It’s a proactive function of planning and preparing for whatever may come.
Since 2018, O’Reilly has been executing a move to the cloud for our Learning and Conferences Divisions, to take advantage of everything a cloud infrastructure has to offer, including resilience in the face of disasters. Recently, we were happy to declare that O’Reilly online learning was “in the cloud” with our Conferences business to follow in 2020. But like many other companies, we are not alone in managing so-called “legacy” systems: ones with deep links and tangled processes that somehow surprise us from time to time. This recently happened to O’Reilly during Northern California’s Kincade Fire. Mandatory evacuations and power outages at our Sebastopol headquarters took parts of O’Reilly online learning, our corporate website, and our “on-prem” conference registration and event service offline once our backup generator ran out of power. With no one allowed near our headquarters to refuel the generators, and our employees facing personal loss, some of our services stayed down for a few days.
Why hadn’t we moved both businesses to the cloud simultaneously? O’Reilly is a self-funded, privately held company, and we’re subject to the same limitations as every company when it comes to investment and cash flow. We have to prioritize, and our decision was to cloud-enable O’Reilly online learning (our largest division) first. We believe both the human and material resources have to be available to execute on any strategy, and we couldn’t fund both at once.
Running a financially responsible organization is a series of constant trade-offs. At O’Reilly, we work to maintain balance for our customers, employees, and shareholders. We make investments we can afford when we can afford to make them. We make sure we have the staff to deploy new initiatives, and we prioritize based on our long-term goals. We balance our costs and headcount and make sure we allocate cash and personnel time when we create a new initiative.
Years ago we started talking about creating a “legacy” O’Reilly conference to help companies deal with the trade-offs of managing legacy systems in a constantly evolving technology landscape. That need still exists today, and we’re dedicating a track at our Infrastructure & Ops Conference in June 2020 to do exactly that. I’m sure we’ll be sending our own employees as well!
While we wish that none of our customers had been affected, we know that the bulk of them understood and supported us in the worst of circumstances.
But Twitter was not so kind. We put out notices to keep our customers aware of what was going on. Nasty responses surfaced with pictures of our site reliability titles, telling us to read our own books. Of course, these tweets were not our first priority because we were trying to account for every employee living in an evacuation zone. During the 2017 fires, a former employee who remains close to the company lost everything so that pain is always carried in our hearts. Yes, our website was down; yes, our customers weren’t able to access parts of our platform; but in the end, our employees were safe and their homes were saved, and that mattered more.
And while people can be fundamentally good, like the bulk of our customers, some tried to use the disaster to be cruel, thinking they were funny or trying to hold us accountable in this “call-out culture” world. But neither the vitriol of those tweets nor being evacuated without knowing whether they would be able to return to their homes stopped our employees from putting our customers first.
Watching our teams accept the challenge of the fire and subsequent outages was nothing short of remarkable. The entire engineering, product, and IT/ops teams worked 24-7 to bring back essential services for our customers. Evacuated employees, with their parents, spouses, and small animals in tow, worked the O’Reilly TensorFlow World conference in Santa Clara without complaint. Marketing communications and sales teams addressed customer questions and offered regular updates on the status of our sites. Our evacuated employees even took rerouted customer service calls and managed the Sebastopol office facilities issues from their temporary locations.
The fires are now mostly under control. Our sites are back up, and our employees are back in their homes and safe, but the destruction and loss of homes and livelihoods still seem insurmountable to those who have experienced them. Our hearts go out to those who have lost so much and to the firefighters and first responders who continue the fight.
This will not be the last fire or the last disaster. We are, as a world, unprepared for many of the challenges of the 21st century. And no matter how much we prepare, we will get some things wrong. When we do, we can either tear each other down or we can lift each other up. But only in the latter will we find true resilience.