“The mind is like a parachute. It will only work when it is open.”
(Frank Zappa)
During trainings, I notice that people are used to seeing safety as an absence of danger or threat; a situation or action where nothing goes wrong; or at least a situation where as little as possible goes wrong. Safety is: not getting hurt, not being able to fall, slip, not being able to lose your job, etc; in short, ‘safe’ is a situation in which something (negative) should not or cannot happen.
‘Safety is the freedom from accidental injury’
(U.S. Agency for Healthcare Research and Quality)
‘Safety is the state in which harm to persons or of property damage is reduced to, and maintained at or below, an acceptable level through a continuing process of hazard identification and risk management’
(International Civil Aviation Organization)
It is understandable that people focus on things that could go wrong; it is unexpected and undesirable for something to go wrong, it can actually damage us. For centuries, misshaps are being seen as ‘God’s hand’, or an ‘act of Nature’, as something over which we humans had no control.
That changed during the (second) technical revolution in the mid-18th century. With the rapid mechanisation of work the number of accidents grew overwhelmingly because of appliances and machines breaking down or malfunctioning. As a result, people focused on guarding machinery, stopping explosions and preventing structures from collapsing. Technology was seen as the root cause of problems and therefore technical solutions were found for these problems.
Until in the second half of the last century, around 1975 – 1980, more and more serious accidents showed that managing technology alone was not enough to prevent misery. The accident at the Three Mile Island nuclear power plant and the two 747s that crashed in Tenerife caused a shift from focusing on technical breakdowns to a view that included human contribution: The Human Factor.
Subsequently the loss of the space shuttle Challenger in 1986, reinforced by the Chernobyl nuclear disaster, indicated that besides technical failures and Human Factors, the influence of Organisational or ‘System’ errors, and the prevailing Safety Culture also constituted a substantial risk to safety performance.
To explain how an accident can manifest itself, one traditionally looks at how causes lead to effects. Reasoning back from the accident, one looks for ultimately The ‘Root’ Cause of the accident, possibly multiple causes; and by preventing these causes, supposedly similar accidents were prevented from happening again.
All negative outcomes have a cause and since all causes can be found (e.g. Root Cause Analysis, Swiss Cheese and Domino Model), all accidents can therefore be prevented.
The historical development and this rather linear cause-effect model have led to the following views of safety:
‘Accidents or undesirable outcomes happen as a result of preceding mistakes and poor functioning, while positive outcomes come about because everything, including people, functions as it is supposed to function’; furthermore
‘Mechanisms that cause undesirable outcomes are different from those that result in successful outcomes.’
Safety in this context is a state in which as few negative outcomes (accidents/ incidents/ ‘near misses’) occur as possible.
Managing safety is aimed at preventing technical failure by correcting design and production, address human behaviour and create procedures to eliminate the negative human contribution as much as possible and where practicable have machines do the work to ‘eliminate’ human errors, and make rules and regulations for organizations that amongst other things reduce the possibilities for humans to make mistakes.
The funny thing is that safety in this form is defined as something that happens when the self [safety] is absent. Safety is measured by looking at the consequences of the absence of safety. This approach is also known as Safety I.
Safety I assumes that systems work because they are well designed and maintained and because procedures are complete and error-free as the designers can predict and anticipate even the smallest disturbances. People behave as expected of them (cfm established procedures) and, above all, as they have been taught and trained to do. Hence, within Safety I, the emphasis is on compliance with regard to how work is performed.
This thinking has brought ‘us’, aviation, to where we are today. Commercial aviation is the safest form of transport; one miss in a million actions or operations (slightly cutting corners here, but still). It’s just that in recent decades, it appears that things have stagnated here. Despite everything people try (more rules, more procedures, more technical support etc etc etc), it is not really getting safer. Something also experienced in the offshore and nuclear industries. So perhaps we need to start looking at what is happening around us in a different way to move forward and become even ‘safer’. In addition, learning from mistakes, because Safety I looks at the mistakes and learns from them, becomes very difficult when you only have one in a million misses. There are relatively few learning opportunities then.
One approach to further development is to look at the socio-technical systems within which work is performed. Whereas these used to be regarded as ‘difficult’, it is increasingly being found that today’s systems, especially in connection with, and influenced by their environment, have become ‘complex’. In a complex system it is not ‘equally easy’ to see what cause and effect are. The system is influenced by the environment and the environment by the system; linear cause-and-effect models (see above Swiss Cheese) and matrices (e.g. Reason culpability matrix) do not work in a complex system, at least not for the complex issues in those complex systems.
It has been shown that people working in a system (complex or not) basically adapt to perform their work to the best of their ability. They are flexible and inventive and, regardless of shortages, inadequate tools, poorly applied procedures and mostly restrictive laws and regulations, try to perform as well as possible. As a result, the work is almost always carried out with good results and people ‘in the office’ feel that everything works as they imagined it would. While the work as actually performed is often completely different from the way ‘office’ has prescribed it and thinks it is being performed. We call this Work as Imagined (WAI – as imagined, the people telling how the work should be done) and Work as Done (WAD – as it is actually performed on the shop floor).
People’s ability to adapt is one of the main reasons why we as humans have come this far. To perform a task, time is often (too) short and often (economic pressure?) resources are scarce or just barely adequate; the environment is not optimal (cold, heat, pressure, noise, you name it) and laws and regulations, if these were complied with, often make it impossible to perform the work (notorious example is the regulation that while going up or down a staircase on a certain production platform, you should always have four points of contact with that staircase. Try it). That ability to adapt to circumstances, consciously and calculatingly deviate from regulations and procedures and find alternatives to non-working or absent resources is called ‘Performance Variability’. People recognise the actual and real needs and adapt their actions accordingly; they interpret the context and adjust procedures to best suit the circumstances of the moment and the prevailing situation. Smooth adjustments necessary for a safe and effective workflow. Performance Variability is the necessary condition for today’s socio-technical systems to function in their ever-changing context.
Because constant adaptation to ever-changing resources and circumstances relies partly on trial and error of the individual’s judgement, it will never be 100%. It may happen that the adaptation is just not sufficient or adequate and then results in a failure or incident. This shows that the same mechanisms that result in success and safety are also the mechanisms that can cause failure or unsafe situations.
The above no longer corresponds to Safety I, where it is assumed that ‘an error is caused by dysfunction, mishandling and/or not following procedures’; in other words, that an error is caused ‘by other actions’ than something that runs successfully. ‘Safety I-responding’ to an error therefore involves looking for ‘The Cause’ and subsequently implementing tighter procedures, more regulations and additional training; all with the aim of eliminating deviations from, and initiatives outside, procedures and regulations, control and manage ‘Work as Done’ as much as possible: ‘Work as Imagined’. Safety I reacting to deviations is disastrous for Performance Variability and will not produce the desired effect in the current work environment. Worse even, there is a high likelihood of a decline in results, and/or an increased likelihood of incidents.
For Performance Variability to have a greater chance of successful outcomes, instead of ‘preventing something from going wrong’, and instead of ‘restricting people in their actions’, it is necessary to ensure that Work ‘has the opportunity’ to go right. In other words, capitalise on how people can adapt and promote this.
Since both success and failure are caused by the same mechanisms, it is no longer necessary to have different approaches to the things that go well (work as done every day) and the things that go wrong (incidents and accidents).
Make sure Safety Management focuses on the things that are going well and builds on this; at the same time, you reduce the chances of things going wrong. This is the basis for Safety II: Make Work Go Well. As Hollnagel warns us:
“Constraining performance variability to remove failures will also
remove successful everyday work “
Within this safety mindset, humans are not seen as a threat (as in Safety I), but the Human Factor is the source of necessary adaptations and therefore precisely indispensable for success.
Safety II thinking ultimately offers many more opportunities to improve, to evolve. Just imagine a system where 1 in 10,000 things go wrong (Compared to commercial aviation quite an ‘unsafe’ system).
- Safety I approach is then: The way things go right is a special form of the way things go wrong, or:
Successes = failures that are absent. Therefore, the best way to improve system security is to study how things go wrong, and then take action and learn from that view. Potential data source is 1 in 10,000 cases
- Safety II approach is then:
The way things go wrong is a special form of the way things go right, or:
Failures = successes that go wrong. Therefore, the best way to improve system security is to study how things are going well and then take action and learn from that view..
Potential data source is 9,999 out of 10,000 cases
Talk about big data! It seems to me that with a Safety II approach, many more learning opportunities can be exploited. This will require a completely different way of thinking, though.
Safety II forces us to start looking at what is going well how and why and start recording it to study and use for the future.
Safety I versus Safety II in a nutshell:
Definition safety Safety-I: “As little goes wrong as possible”
versus
Definition safety Safety-II: “As much as possible goes well”
Safety Management Principle Safety-I: “Mainly reactive; React when something happens”
versus
Safety Management Principle Safety-II: “Proactive, Stay ahead of developments and events”
Accident explanations Safety I: “Accidents are caused by failure and dysfunction“
versus
Accident explanations Safety II: “Things basically happen the same way regardless of the outcome“
The Safety I idea about Human Factor is: “A negative risk factor“
versus
The Safety II idea about Human Factor is: “An most important tool“
If we want to grow and build safety further, we cannot limit ourselves to ‘Safety I-thinking’. We will have to move forward in a Safety II way.
But Safety I should not be discarded! It got us where we are today, and Safety II is (still) very difficult to imagine and give hands and feet to. Safety II should be applied together with, and as a follow-up to Safety I.
We must learn to see why and how everyday things go the way they do, and that is a rather different approach compared to trying to make events go right by preventing them from going wrong.
We need to start perceiving and recognising things that are not easy to see and that is a serious challenge.
We have to make ever-day work better instead of focussing only on the mishaps.
And to do so we have to trust workers, cherish expertise, listen to all stakeholders and, if things go wrong anyhow, be prepared to focus on restoration of damage and confidence instead of aiming for retribution.
This article is a simplified compilation of research, publications and presentations by, among others: Heinrich; Hollnagel; Dekker; Weick; Perrow; Woods