Steven Shorrock on the myth of human error

The O’Reilly Security Podcast: Human error is not a root cause, studying success along with failure, and how humans make systems more resilient.

By Courtney Nash
January 18, 2017
Coffee spill. Coffee spill. (source: Pixabay)

In this episode, I talk with Steven Shorrock, a human factors and safety science specialist. We discuss the dangers of blaming human error, studying success along with failure, and how humans are critical to making our systems resilient.

Here are some highlights:

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Humans are part of complex sociotechnical systems

For several decades now, human error has been blamed as the primary cause of somewhere between 70% to 90% of aircraft accidents. But those statistics don’t really explain anything at all, and they don’t even make sense because all systems are composed of a number of different components. Some of those components are human—people in various positions and roles. Other components are technical—airplanes and computer systems, and so on. Some are procedural, or are soft aspects like the organizational structure. We can never, in a complex sociotechnical system, isolate one of those components as the cause of an accident, and doing so doesn’t help us prevent accidents, either.

There is no such thing as a root cause

We have a long history of using human error as an explanation, partly because the way U.S. accident investigations and statistics are presented at the federal level highlights a primary cause. That is a little naïve (primary and secondary causes don’t really exist; that’s an arbitrary line), but if investigators have to choose something, they tend to choose a cause that is closest in time and space to the accident. That is usually a person who operates some kind of control or performs some kind of action, and is at the end of a complex web of actions and decisions that goes way back to the design of the aircraft, the design of the operating procedures, the pressure that’s imposed on the operators, the regulations, and so on. All of those are quite complicated and interrelated, so it’s very hard to single one out as a primary cause. In fact, we should reject the very notion of a primary cause, never mind assigning the blame on human error.

Studying successes along with failures

If you only look at accidents or adverse events, then you’re assuming that those very rare unwanted events are somehow representative of the system as a whole, but in fact, it’s a concatenation of causes that come together to produce a big outcome. There’s no big cause; it’s just a fairly random bunch of stuff that’s happened at the same time and was always there in the system. We should not just be studying when things go wrong, but also how things go well. If we accept that causes of failure are inherent in the system, then we can find them in everyday work and will discover that very often they’re also the causes of success. So, we can’t simply eliminate them; we’ve got to look deeper into it.

Humans make our systems resilient

Richard Cook, Ohio State University SNAFU catcher, says that the most complex sociotechnical systems are constantly in a degraded mode of operation. That means that something in that system (and usually a lot of things) is not working as it was designed. It may be that staffing numbers or competency aren’t at the level they should be, or refresher training’s been cut, or the equipment may not be working right. We don’t notice that our systems are constantly degraded because people stretch to connect the disparate parts of the systems that don’t work right. You know that, in your system, this program doesn’t work properly and you have to keep a special eye on it; or you know that this system falls down now and then, and you know when it’s likely to fall down, so you keep an eye out for that. You know where the traps are in the system and, as a human being, you want the resilience, you want to stop problems from happening in the first place. The source of resilience is primarily human; it’s people that make the system work.

People can see the purpose in a system, whereas procedures can only look at a prescribed activity. In the end, we have a big gap between at least two types of work—work as imagined (what we think people do), and work as done (what people actually do)—and in that gap is all sorts of risk. We need to look at how work is actually done by being mindful of how far that’s drifted from how we think it’s done.

Related resources:

Post topics: O'Reilly Security Podcast