“You can’t manage what you don’t measure.” – Peter Drucker
In both personal and professional life, improvement begins with measurement. For maintenance and engineering managers, this principle is even more critical. Without tracking performance, it’s impossible to validate progress, identify inefficiencies, or ensure alignment with organizational goals.
That’s where key performance indicators (KPIs) come in. Among the most powerful are:
Mean Time to Repair (MTTR)
Mean Time Between Failure (MTBF)
Availability
These metrics provide visibility into equipment reliability, workforce effectiveness, and ultimately, the financial impact on the business.
Sometimes referred to as maintainability, MTTR is the measure of the department’s ability to perform maintenance to retain or restore assets to a specified condition. It measures the average time required to restore an asset to its full operational condition after a failure. Typically is expressed in hours, the equation is straightforward: the total repair time divided by the number of repairs or replacement events.
For example, a facility is responsible for maintaining a standard Chiller Unit that has operated for 3,600 hours over the past two years. The Chiller Pump unit has failed 12 times over this period resulting in 720 minutes of repair time. Taking the total time to repair the unit (720) and dividing that number by the number of repairs (12) produces an average time to repair the unit of 60 minutes. So, the MTTR is one hour.
MTBF is a basic measure of an asset’s reliability. It is calculated by dividing the total operating time of the asset by the number of failures over a given period of time.
Taking the example of the Chiller Unit above, the calculation to determine MTBF is: 3,600 hours divided by 12 failures. The result is 300 operating hours.
This measurement expresses the probability that an asset can perform its intended function satisfactorily when needed in a stated environment. The availability of an asset will diminish over time as the equipment is being used. The availability will not improve unless changes are made to upgrade the asset.
Technicians can extend the equipment’s availability by increasing its reliability. There is a generally accepted availability standard of 95 percent for equipment, but mission- critical equipment in facilities requires a much higher level of availability.
To calculate availability, use the formula of MTBF divided by (MTBF + MTTR).
By continuing with the above example of the Chiller Unit, its availability is: 300 divided by 360. The result is 83.3 percent availability.
Probability of Failure
This calculation gets a little more complicated mathematically. At times, managers need to calculate the probability that a piece of equipment will fail. Continue with example of the Chiller. A manager needs to ensure the availability of the Chiller for the next 72 hours. What is the probability of failure?
The Reliability Function for the Exponential Distribution
R(t)=e ^ −λt
Given a failure rate, lambda, we can calculate the probability of success over time, t.
In probability theory and statistics, the exponential distribution, which is also known as negative exponential distribution, is the probability that describes the time between events
t is 1 divided by MTBF. In the Chiller example, the MTBF is 300, so 1 divided by 300 is 0.00333.
So the calculation is: R(72) = e – ^ (72)(0.00333). The result is 78.68 percent probability of failure.
“Are you tracking MTTR, MTBF, and Availability in your operations? If yes, what insights have you gained and if not, what’s holding you back? Let’s discuss how measurement can transform reliability.”