A failure mode and effects analysis, often abbreviated FMEA, refers to a risk management technique that adds to quality assurance processes. In this article, we are going to see a definition of the concept, followed by a step-by-step guide on how to create a failure mode and effects analysis on your own.
The failure mode and effects analysis also called failure modes in some publications, is a structured and systematic technique that focuses on failure analysis. It appeared in the late 1950s, being developed by reliability engineers. Their goal was to study problems that can appear following malfunctions of military systems. Often, you can see this technique integrated as the first step in a system reliability study, or combined with other risk mitigation strategies.
It is an analysis mostly used by stakeholders or the upper management. FMEA is a core task for fields such as safety, reliability, and quality engineering, but you can also find it in other domains. There are three main types of FMEA analyses:
Here you have some of the purposes of such an analysis:
- Understand the business better;
- Comprehend the stakeholders’ goals;
- Create high-level test scenarios depending on the business and management interest;
- Derive test cases to cover better the risk-prone areas;
- Prioritize the test cases;
- Choose what to test and what not at any phase of the project, etc.
Creating a Failure Mode and Effects Analysis Step by Step
1. Identify Potential Failures & Effects
The first FMEA step you need to take is to have a look at the functional requirements. Check their effects as well, to identify all the possible failure modes. For example, you need to check for electrical short circuits, fracture, oxidation, warping, etc. If there are failure modes in one component, this can lead to problems with other components as well.
Make a list of all the failure modes, grouped by function. Keep in mind the ultimate effects for each of them, and note the failure effects (such as overheating, an abnormal shutdown of equipment, noise, etc.). If you are applying this to a business, start by analyzing each process.
2. Assess the Severity
Now, you need to see how serious the consequences of failure effects would be. Usually, people rely on numbers from 1 to 10 to rate them. Here is a brief table with the ratings and what they mean:
|1||No failure effect, no danger|
|2||Very minor consequence that can be noticed by very interested users or customers|
|3||Minor; it will affect only a minor part of the system, average users/customers may notice it|
|4 – 6||Moderate, it will affect most users/customers|
|7 – 8||High; the function or system loses its primary function and users or customers are dissatisfied|
|9 – 10||Very high/ hazardous; the entire product or service becomes inoperative, customers are angry. Possible safety hazard.|
3. Likelihood of Occurrence
Take each of the failure modes you noticed earlier and examine their causes. How often would that failure occur? Check out similar processes or products, as well as their failure modes. Ideally, you should identify all the possible failure causes and document them. The table above comes in handy here as well, since it uses the same scale to assess the likelihood of occurrence.
4. Failure Detection
After you saw what the remedial actions are, it’s time to test the product or service for efficacy and efficiency. If you are in the electronics field, have engineers inspect your current system controls in charge of preventing failure modes. They can also detect failures before the user/customer is impacted. Next, you need to look at your competitors and see what techniques are they using to detect failures. Check the following table for the ratings and meanings:
|1||Certain to catch fault by testing|
|2||Almost certain to catch fault by testing|
|3||High probability for the tests to catch fault|
|4 – 6||Moderate probability|
|7 – 8||Low probability|
|9 – 10||Fault will not be detected and passed to user/customer|
5. Risk Priority Number (RPN)
After you followed the four basic steps, you need to assess the risk priority number as well. These are values that influence the choice of action against any failure modes. The formula you need to calculate it is the following:
RPN = S x O x D
S = failure effects severity ranking;
O = failure modes occurrence ranking;
D = failure modes detection value.
The RPN value is calculated for the entire design/process and it’s documented in the failure mode and effects analysis. The result you get show you what areas are the most problematic. If you have a high RPN, you should get the highest priority when it comes to corrective measures.
6. Implement Corrective Measures
There are plenty of actions you can take when you notice that you have a high RPN. Here you have a brief list of the measures you can take:
- New inspections;
- New tests and procedures;
- Changing the design of the product;
- Choosing different components;
- Added redundancy;
- Modifying limits, etc.
Naturally, there are plenty of other solutions you can adopt, depending on the field you’re in and your plans. If you decide to adopt these measures, you should know there are specific goals they aim for:
- Eliminate all the failure modes (even though some of them are more preventable than others);
- Minimize their severity;
- Cut down on the occurrence of failure modes;
- Improve their detection.
After you implement the corrective measures, it’s time to calculate the RPN again and document the results in the FMEA.
To draw a conclusion, the FMEA is a very useful risk assessment technique that lets you know what areas can become problematic for your company. Even though it’s mostly used in engineering, you can export the steps and indications to any field you wish. You need to start by identifying potential failures and assessing their severity. After you know how likely they are to occur, you can start thinking about corrective measures and implementing them. Finally, reassess everything until the situation is within the parameters you wish.
Image source: depositphotos.com