Fail-Safe

hidet77
Feb 21, 2023
4 min read

2023/2/17 JAXA and Mitsubishi Heavy Industry “canceled” the launch of the H3 rocket since one of the support boosters did not ignite. The controversy followed when one newspaper reporter arrogantly defined this as a “failure.” While JAXA explained that the “Fail-safe” mechanism activated; therefore, they “canceled” the launch. They plan to relaunch in March. Discussion is around whether this was a “failure” or “cancelation.”

https://asia.nikkei.com/Business/Aerospace-Defense-Industries/Japan-s-H3-rocket-launch-aborted-after-booster-fails-to-ignite

I am in no position to judge this activity. Instead, there are so many things I learned from this incident.

“Fail-safe” mechanism. Fascinating design concept from engineering. Such a mechanism activates when a specific error happens so that the damage to the equipment or the environment is minimal.

We need a fail-safe mechanism in operations. There are many approaches to such mechanisms. The unique mechanism of the Toyota Production System is the Andon system, in which the boss comes to help with the failure.

“Failure.” We should avoid failure. In Toyota Production System, there is a concept called “Poka-yoke,” which is a device that prevents the worker from making mistakes. Preventive maintenance is required to keep the machine running correctly.

Yet, failure happens. Just because the ideal is to avoid all failures, expecting no failure to happen is highly idealistic. In operations, we typically have four mechanisms to prevent the failure from stopping the system.

Inventory
People
Capacity
Lead time

Inventory is called the safety stock. Safety stock is materials that are kept so that when we have unexpected events, those materials protect from the consequences.

People are extra personal working in the operations, dealing with problems. The repair man is a specific position that focuses on repairing quality defects. Another example I have seen is a position called “floater,” in which personnel moves around helping one problem to the other.

Capacity is to keep extra capabilities to catch up with the losses. The machine or the line moves faster to produce more to meet expectations. Working overtime is another example of capacity that is used to catch up.

Lead time is the extra time kept to catch up with the failures. A typical flight schedule has extra time planned. Even with some issues, they will say we are on time at the end of the flight because those issues were dealt with.

TPS uses another approach, the Andon. When there is a failure, the Andon is pulled. The team leader responds within the allocated time. If the problem is not covered in time, the line stops, and the problem is escalated to a higher manager.

There are philosophical differences in the above four approaches compared to the Andon.

1. Respect for people

The first and most significant difference is respect for people. The four approaches will have people deal with the problem, while Andon will have the manager come and help. Failure is not a standard condition. As it happens, many judgment needs to happen. How bad is the situation or the condition? How to recover? Forcing people to deal with such situations is considered disrespectful in TPS. Instead, call the help of the manager. Stop the line. Take time to understand and recover. Trying to make those judgments under pressure will worsen the situation.

One note. The “floater” might sound similar to the TPS team leader. Yet, what I have observed is the complete opposite. In many cases, I have seen the floater running away from failures.

2. Problem-solving

Every failure requires some problem-solving. If not, the failure will repeat. To conduct the problem-solving, the failure must be understood.

The four approaches have a distance between the problem-solver and the failure. They try to cover this distance with data. Yet, data is typically someone’s opinion. The fact is not captured. And, the fact is the most important thing to solve a problem. Since Andon pulls the problem solver to the point of cause, it helps to capture the fact. Direct observation of a failure helps us to problem-solve.

3. Sense of urgency

When extra inventory, people, capacity, and lead time are kept, the organization will have less sense of urgency. The organization will know that additional resources are somewhere to cover the failure. And we relax to respond and problem-solve the failure.

Andon is designed around this sense of urgency. The line will stop if the team leader does not respond on time. The longer it takes to investigate, the line will stop longer. The organization operates with a sense of urgency.

4. Culture

The four resources can lead to a culture of hiding failures. But hiding failures will keep the failures and increase the cost. There will be no sufficient amount of resources to hide all issues. Then there will be blaming games. But the truth is that there are so many failures they kept for a long time. Most likely, people are busy firefighting. People will be drained and feel exhausted.

This could change when the leader welcomes failures. Leaders can show people that failures are welcome if it is open and something can be learned. Managers should have the capability to problem-solve properly and the capacity to work on problem-solving. While the general managers solve most problems, specialists should be available upon pull.

So what does your system’s “Fail-safe” look like? Is it functioning?

Fail-Safe

Recent Posts

Comments