Está en la página 1de 15

CALIBRATION INTERVAL ADJUSTMENT METHODS:

QUANTITATIVE COMPARISON AND OPTIMIZATION


Mark Kuster Greg Cenker Dr. Howard Castrup
Pantex Metrology Southern California Edison Integrated Sciences Group
Hwy 60 and FM 2373 7300 Fenwick Lane 14608 Casitas Canyon Rd
Amarillo, TX 79120 Westminster, CA 92649 Bakerseld, CA 93306
806-477-4306 714-895-0674 661-872-1683
mkuster@pantex.com Greg.Cenker@sce.com hcastrup@isgmax.com
Abstract - NCSLI Recommended Practice RP-1, Establishment and Adjustment of Calibration
Intervals, provides decision trees for selecting a calibration interval adjustment method based on
inventory size, the information sources available, quality emphasis, and budgeting priorities, as well
as detailed descriptions of the pros and cons of each method. To date, the recommendations have
been qualitative, based on expert knowledge and experience. The NCSLI 173.1 Calibration Intervals
Subcommittees working group is developing quantitative performance data on interval adjustment
methods in order to provide information to further substantiate or tune the existing recommendations.
The ultimate goal is to identify and map the optimal interval adjustment method into each region of the
operational space dened by equipment inventory size, reliability target, and other parameters, enabling
the future RP-1 reader to condently select the most appropriate method for the situation at hand.
This paper reports the latest progress toward that goal: Improvements to the simulation framework for
Methods A1, A2, A3, S1 for variable initial and correct intervals and variable resubmission times;
comparisons to the null and divine baseline methods; and optimization of adjustment method
parameters. We report results based on an excess relative cost metric, which estimates the cost premium
an interval adjustment method, or lack thereof, incurs in a measurement and testing program.
Introduction and Background
Why the interest in calibration intervals and therefore in interval analysis? Because calibration intervals directly
aect test and measurement programs cost and execution. Eective interval analysis reduces total operating and
consequence costs and helps produce higher quality products and processes. Also, organizations using eective
interval analysis have quantitative justication for interval changes that interest laboratory assessors, leverage
their laboratory management software investments, and gain the competitive edge.
The NCSLI 173.1 Calibration Intervals Subcommittees working group, which maintains RP-1 [1], would like
to add quantitative methodology selection guidance to the RP. We have reported progress toward that goal in
previous papers [2, 3, 4], demonstrating by analysis and simulation the relative cost eectiveness of the algorithmic
methods and statistical method S1. The current RP (released November 2010) more strongly deprecates Methods
A1 and A2 based on those and other results. This paper updates and extends the working groups endeavor.
Whereas previous papers evaluated interval analysis (IA) methodology under a composite scenario of many
individual cases, this paper examines and compares performance under a conservative initial interval scenario that
reects real calibration workloads. This conservative interval scenario spans continuous distributions of initial and
correct intervals in which initial intervals tend to underestimate correct intervals. Conservative initial intervals typify
risk-adverse organizations that safely select initial intervals when little or no instrument performance information
is available. They may use manufacturer-suggested intervals, borrow intervals from similar organizations, or set
initial intervals arbitrarily or by convention and await future calibration results. Such organizations may or may
not have IA systems to adjust intervals afterward.
As shown in RP-1 and elsewhere [1, 5, 9], the most eective approach to establishing optimal intervals via
attributes (pass or fail) data involves reliability modeling. Statistical methods use this approach to control the
growth of uncertainty in the conformance of equipment to specications by modeling the equipments in-tolerance
probability (measurement reliability) as a function of time elapsed since calibration. Reliability targets functionally
express the acceptable conformance uncertainty level. Once analysis selects a model and estimates its parameters,
the model predicts when the measurement reliability falls below the target. However, statistical methods may have
diculty with small data sets and some organizations consider statistical methods dicult to implement.
By contrast, algorithmic methods respond to recent calibration results and thus require little or no historical
reliability data. For this reason, as well as the fact that they are relatively easy to implement, algorithmic methods
appeal to many organizations. Whereas the statistical methods assume or determine an instruments most likely
reliability model given the available attributes data, the algorithmic methods endeavor to adjust intervals without
any knowledge of an instruments underlying behavior. Of interest then are the methods relative performance.
The Excess Cost Metric
Paraphrasing RP-1 [1], optimal intervals have the following characteristics: They meet reliability targets, respond
quickly to changes in reliability, and interval analysis derives them with minimal time and cost. Perhaps more
importantly, optimal intervals incur the least end-to-end cost for operating a measurement and testing program,
because this criterion encompasses all others.
The process of selecting an IA methodology, choosing reliability targets, and adjusting intervals is a balancing act.
Setting reliability targets pits shorter intervals with higher maintenance and support costs (cost of high quality)
against longer intervals with higher risks to tested product, called consequence cost (cost of low quality). Intervals
that deviate in either direction from optimality incur a net increase in programmatic costs, so the quicker a system
identies and establishes the correct interval, and the closer the system maintains intervals to the correct interval,
the more cost savings equipment owners will realize. For purposes of comparing IA methods, we assume that the
chosen reliability target and thus the correct interval are optimal.
We would like to have quantitative comparison data for the various methodsstandard metrics that enable direct
comparison of methods, preferably directly related to the relevant costs. To this end, the initial paper [2] developed
a practical excess cost metric M based on a series of n intervals I
i
, their eective average I
a
, and the correct
interval I
c
, namely
M =
1
nI
a
n

i=1
|I
c
I
i
|. (1)
The metric indicates how much excess cost and risk we expect a production process to bear if one selects a given
IA method relative to the production process operating with optimal intervals. The production process realizes its
optimum cost (zero excess cost) only with perfect knowledge of the optimum intervals. In other words, if what
should be a 1.0 M$/yr process uses an IA method with a 50 % excess relative cost metric, the program will
actually cost 1.5 M$/yr.
Pros and Cons
The above metric is simple to calculate, penalizes both long and short intervals, and applies to any IA method
without requiring knowledge of any specic programmatic details. Since it is a relative cost, it does not require
knowledge of the actual program costs either. However, please note that this metric ignores the partial cancellation
of support and consequence cost increases and greatly oversimplies the consequence side of the equation. One
should only use the metric to compare IA methods. See [5, 6] for a comprehensive cost model.
Interval Analysis Methods Simulated
The simulation framework currently includes modules for IA Methods A1, A2, A3, S1, an experimental method
currently dubbed E1, and pseudo methods N1 and D1. We have not implemented Methods S2 or S3, nor any
variables data methods. Details of the mechanisms and operation for Methods A1-S1 appear in RP-1 [1] and the
initial paper [2] in this series. Papers [2, 3] also demonstrate example behaviors under various conditions. Each
method may have variations or selectable parameters governing its behavior, which the simulation framework
handles as follows.
Method A1
The Simple Response Method employs two parameters that are used to trade o response time and stability. The
rst is an interval extension factor a and the second is a reduction factor b. To adapt Method A1 to accommodate
reliability targets, we set the b parameter as a function of a and the reliability target. See [1] or [2] for details.
Simulations to date have considered a = 0.01 and a = 0.1 to contrast slow response and stability with fast
response and volatility.
Method A2
RP-1 details two variations, V1 and V2, of the Incremental Response Method that dier in their aggressiveness
toward increasing intervals. The simulations test both variations.
Method A3
The Interval Test Method has one parameter, the interval change condence or signicance level C. Higher
signicance levels promote stability and lower values speed interval corrections. The simulation covers the normal
range of use by testing Method A3 at 50 %, 70 % and 90 % signicance levels. Method A3 has three options
for determining a new interval: interpolation via a binary search, exponential extrapolation, and condence-
compensated extrapolation. Method A3 also uses minimum and maximum interval change factors to limit interval
extrapolation risk, currently (as typical) set to limit interval changes to no less than half or more than double the
previous interval and no more than 20 % above the longest observed interval. The simulations reported herein
used condence-compensated extrapolation. Previous simulations [2, 3] found no signicant performance dierence
between the extrapolation options when restricting interval changes as stated above. Future A3 simulations will
investigate performance dierences without the interval change restrictions in place.
Method E1
A coauthor of this paper, Greg Cenker, created an experimental successive approximation IA algorithm, which we
are currently evaluating alongside the other methods. This method ts a curve between an estimated beginning-
of-period reliability (BOPR) based on false accept risk and the average observed end-of-period reliability (EOPR)
using the average resubmission time. It employs safeguards to bias observed EOPR below BOPR. Simulations to
date have xed the BOPR at 99.2 %, based on a typical 0.8 % false accept risk target. The method will t any
choice of two-parameter reliability curves. All E1 simulations run to date used a random-walk reliability curve but
we have recently completed preliminary runs with straight-line and exponential reliability model ts that show
little performance dierence. If Method E1s results hold up as expected, the RP may include it as Method A4.
Method S1
We implemented the always-adjust version of Classical Method S1 as described in RP-1, that is, out of the
box with no parameters or options. However, we believe a variant that biases the observed EOPR down in order
to speed response under high reliability conditions will perform better but have not yet simulated this option.
Method N1
The Null Method, N1, represents the lack of a formal IA method with metrologists setting initial intervals by
convention, borrowing, or engineering analysis, and make few if any adjustments thereafter. We implement this
as a method that always sets the new interval to the existing interval and use it as a performance comparison
baseline.
Method D1
The Divine Method always knows the correct interval. We implemented this ctional method secondarily as a
check standard, which ideally incurs zero excess cost. In reality, the rst interval (before running Method D1 on
the results) incurs some non-zero cost. Method D1s prime benet however, lies in the simulation frameworks
ability to handicap it in order to measure the costs of such practices as minimum and maximum intervals, interval
rounding, etc.
Simulation Methodology
This section documents the procedure used to evaluate the various methods with the benchmark metric. We
performed the bulk of the simulation work using the commercial @RISK software package linked to spreadsheets
set up with each methods nuts and bolts. The spreadsheets perform a single run over an inventorys life of a
particular method under particular conditions. The @RISK software iterates the spreadsheet runs to collect the
metrics distribution and statistics. As an accuracy check, we also implemented each adjustment method and
calibration simulation routines in Mathcad.
The simulation framework runs as follows:
1. Set method parameters, instrument group size, and reliability target for the case at hand.
2. Randomly draw a correct interval from a predened distribution.
3. Regress the correct interval to determine the mean expected initial interval.
4. Randomly draw the initial interval from a distribution whose mean equals the regressed interval.
5. Randomly draw from a resubmission time distribution centered on the current interval.
1
6. Calculate each methods EOPR from a reliability model and its current resubmission time.
7. Compare a uniformly distributed random number to the EOPR.
8. Record the as-found calibration result out of tolerance (OOT) if the random number exceeds the EOPR
or in tolerance (IT) otherwise.
9. Obtain a new interval from each IA method based on the data so far.
10. Repeat steps 5 through 9 for the inventory life (currently set at 30 intervals).
11. Calculate and record each IA methods performance metric.
12. Repeat the life cycle (steps 2 through 11) simulation 50,000 times (typical) for acceptable repeatability.
1
Resubmission time implementation is in process.
13. Collect each methods metric statistics.
We repeat the entire process for all method parameters and variations, reliability targets and group sizes of interest.
Note that the simulation presents each IA method the same random number set; i.e., each method sees the same
scenario in a given equipment life cycle. The previous sections discussed the simulated IA Method parameters and
variations; the next sections cover the remaining simulation detail.
Inventory Size
As mentioned before, IA Method eectiveness depends on the data available, which depends on the inventory
size. Identical instruments form an instrument group; each method analyzes the data as a group or as individual
instruments. We have tested IA Methods on two group sizes: single instrument groups and groups averaging 33.33
instruments, which total 30 and 1000 complete intervals, respectively, assuming a 30-interval equipment service
time. Future simulations will measure performance as a continuous function of group size.
Reliability Targets
We continue to simulate three separate reliability target values: 70 %, 80 %, and 90 %. Future simulations will
measure performance as a continuous function of reliability target.
Variable Correct Intervals
Prior simulations revolved around a single correct interval, arbitrarily set at I
c
= 300 d (days). The simulation
framework now uses variable correct intervals. In the rst approximation this feature adds no benet because all
results are calculated relative to the correct interval, thereby making the actual correct interval value irrelevant.
However, in practice the costs of such realities as minimum and maximum allowed intervals, the obvious restraint
that all intervals are non-negative, and practices such as rounding intervals to the nearest day, week, year, etc. vary
with the actual correct interval simulated. Also, interval rounding with some Method A1 parameter choices leads
to bounded intervals [2] for low initial intervals. Therefore, simulating a realistic distribution of correct intervals
provides more accurate results.
To this end, we analyzed several years actual calibration interval data from an inventory with widely varied
instrumentation to estimate a realistic distribution of correct intervals. Intervals we judged correct had less than
50 % rejection condence (per Method A3), no limiting maximum or minimum intervals, and would not be rejected
upon the next calibration, regardless of result. After tting several candidate analytical distributions using @Risks
MLE (maximum likelihood estimation) tting algorithm, we selected the inverse Gaussian (Wald) distribution
based on the chi-squared t results. Figure 1 contrasts the resulting distribution with the single correct interval
previously simulated and shows how the cumulative distribution function ts the actual data.
(a) Prior (red) and Current (blue) Probability Distributions (b) Cumulative Distribution Fit
Figure 1: Correct Interval Distribution
The Wald distribution parameters are = 395.93 d and = 1810.65 d with left endpoint location l =
36.610 d. The probability density function (pdf) is
f
InG
(x, , , l) =


2(x l)
3
exp
(x l)
2
2
2
(x l)
. (2)
Variable Initial Intervals
The initial intervals assigned to calibrated equipment are rarely correct so an IA methods performance at correcting
intervals weighs heavily on costs. Prior simulations included this eect by including an equally weighted mixture of
initial intervals set to I
c
, I
c
/3 and 3I
c
. To simulate conditions more realistically, we gathered the initial calibration
intervals from the history of items judged to have reached their correct interval and analyzed the deviations. In this
particular case, we found that the correct interval predicts the mean initial interval according to the straight-line
regression

Ii
(I
c
) = 0.340I
c
+ 137 d, (3)
where
Ii
represents the mean initial interval for the given correct interval. In this scenario, the mean initial
interval equals the correct interval at about 208 d: Without interval analysis, the laboratory signicantly (safely)
underestimated the correct intervals above 208 d and somewhat overestimated the shorter intervals. Overall, since
the underestimated intervals outweighed the overestimated intervals, we observed that the conservative interval
scenario
2
reigned.
The initial intervals standard deviation also related linearly to the correct interval, such that a random variable
drawn from a unity-mean distribution used as a multiplier of the regressed interval modeled the data well. Initial
intervals are non-negative so the distribution is bounded on the left at zero but unbounded to the right. We
used @Risks MLE tting algorithm to evaluate a list of logical candidate distributions. A left-truncated Wald
distribution with parameters = 1.9649, = 90.1883, l = 0.96879 received the most favorable chi-squared
t score. Simulations employing the top four initial interval candidate distributions (inverse Gaussian, Pearson
5, log-normal, and gamma) showed little sensitivity to the distribution choice, averaging 0.67 % between the
highest and lowest metric results with standard deviation 0.50 %. Worst case (4.04 %) occurred at 90 % targeted
reliability, Method S1, between the gamma and Pearson 5 distributions. No other cases had metric dierences
exceeding 1.79 %.
Figure 2 illustrates the random error in the data and a sample set of simulated initial intervals, both vs. the
correct interval. Figure 3 depicts the chosen initial interval multiplier distribution and its tted CDF. Figure 4
contrasts conservative interval distributions against the discrete initial intervals as they would have played out in
the composite scenario. The composite scenario cases had more extreme initial intervals logarithmically centered
on the correct interval, whereas the conservative initial interval scenario compresses and shifts the initial intervals.
(a) Initial Interval Error (data) (b) Simulated Initial Intervals (example)
Figure 2: Initial Intervals vs. Correct Interval
2
Dr. Steven Dwyer, U.S. Naval Surface Warfare Center, suggested this name.
(a) Probability As a Multiplier of the Regressed Interval (b) Cumulative Distribution Fit
Figure 3: Initial Interval Distribution
(a) 50 d Correct Interval (b) 208 d Correct Interval (c) 1200 d Correct Interval
Figure 4: Initial Interval Probability Distributions for Composite (red) vs. Conservative (blue) Scenarios over a
Range of Correct Intervals
Variable Resubmission Time
For simplicity, simulations to date have assumed that laboratories recalibrate items exactly on their due dates,
i.e., the simulated resubmission time exactly equals the assigned interval. In reality, several random errors cause
either delayed or early calibration relative to the assigned interval. Such errors include due date error, delivery
error, and laboratory turn-around time. Logistical practices such as altering due dates to avoid non-working days
and rounding intervals to the nearest convenient time unit introduce errors in the due date relative to the assigned
interval. Users make their equipment available for calibration with some spread around the due date, introducing
a delivery date error relative to the due date. Finally, laboratories have non-negative turn-around time between
delivery and recalibration.
Since the individual queuing errors add together, the resubmission time error distribution is the convolution of
the individual distributions. The simulation framework may then either simulate the total resubmission time error
distribution or simulate each component and sum the individual errors. After some investigation, we chose the
former for simplicity and the lack of analytical distributions clearly tting the data sets, which also steered us away
from determining the total distribution via convolution. We performed a preliminary round of random resubmission
time simulations using a log-normal error distribution with parameters = 3.368 d, = 0.591 d, l = 19.147 d
that very roughly t the total error data.
However, from prior experience with data from several locations, we deemed an exponential distribution more
appropriate for each side of the resubmission time error. In practice though, users tend to resubmit equipment types
with functional issues in uniformly distributed time starting immediately after calibration. The distribution tails for
an inventory then have more weight than an exponential distribution. As a result, the best tting distribution found
to date involves a triangular and an exponential component for each tail. The tted distribution has parameters
_

m
m
p
m
m
b
_

_
=
_

_
1.64377
5.96204 10
2
1.15068 10
1
2.60438 10
5
1.38469 10
52
1.74681 10
2
_

_
for the exponential decay rates, triangle slopes, and y-intercept. This particular case essentially does not use the
left side triangle (practically innite slope). With the normalization constant
A
n
=
2
p

m
m
m
m
p
2m
m
m
p
(
p
+
m
) +b
2

m
(m
p
m
m
)
,
the pdf is
f
e
(x, ,
p
,
m
, m
m
, m
p
, b) = A
n
_
e
p(xa)
if x > a
e
m(ax)
if x a
+
_

_
m
p
(x a) +b if 0 < x a
b
mp
m
m
(x a) +b if
b
mm
x a 0
0 otherwise
. (4)
Since we have only implemented these resubmission time distributions in the Mathcad simulator, results reported
herein concern xed resubmission times only. We may use yet another distribution in the @Risk-spreadsheet
simulatora Pearson 5 distribution under investigation. Figure 5 compares the resubmission data with the log-
normal and double exponential distributions.
Figure 5: Log-Normal and Double Exponential + Triangular Probability Distributions vs. Resubmission Time Data
Reliability Model
The simulation framework currently uses only the (non-intercept) exponential reliability model. That model gives
the expected EOPR for the current interval, reliability target and correct interval as
R(I, I
c
, R
t
) = R
I
Ic
t
. (5)
Simulation Results
During each simulated equipment life cycle, each IA Method yields an unpredictable series of intervals in response
to the random IT/OOT results. The simulation framework calculates the excess cost metric outlined above on
each interval series. Together, the metrics for each life cycle simulation form a Monte Carlo-style excess cost
distribution as discussed in [2, 3, 4], which we boil down to the mean excess cost for each simulated case. Each
combination of discrete reliability target, inventory size, method parameter value, and method variation composes
a separate simulation casethose variables will eventually form the space over which we optimize and measure the
various methods costs. Each simulation case in the conservative interval scenario represents the costs averaged
over the full distributions of correct and conservative initial intervals. The following sections discuss subsets of
the simulation cases.
Method and Scenario Comparison
Table 1 lists and Figure 6 illustrates the overall mean costs for each simulated method under the composite and
conservative interval scenarios. The conservative interval scenario costs run lower than those previously reported
[4] for the composite scenario that comprises an average of all combinations of the reliability targets, initial
intervals, and inventory sizes mentioned above with a 300 d correct interval. Specically, the cost of methods A1,
A2, A3, E1, and S1 decreased 20 %, 7 %, 6 %, 2 %, and 5 %, respectively, somewhat attening the performance
dierences. The cost changes between scenarios indicate the various methods sensitivity to extreme interval errors.
Method A1 in particular incurred 20 % more excess cost when faced with the composite scenarios more extreme
initial interval distribution.
Method N1 A1 A2 A3 E1 S1
Composite Scenario [4] 88.9 %
3
66 % 57 % 43 % 34 % 46 %
Conservative Initial Interval Scenario 54.4 % 46 % 50 % 37 % 32 % 41 %
Table 1: Conservative Interval and Composite Scenario Excess Cost Results by Method
Figure 6: Mean Excess Costs by Scenario and IA
Method
In this conservative interval scenario, the lack of interval
analysis increases test and measurement costs by 54 %.
Experimental Method E1 generates the least excess cost in
this scenario. Of the methods simulated and currently in
RP-1, Method A3 generates the least excess cost. Methods
A1 and A2 performed least satisfactorily. We expect some
methods become less ecient when dealing with random
resubmission times relative to the xed resubmission times
since the IA system receives reliability data that does not
exactly correspond to the assigned interval, contributing
error to the assigned intervals until the noise averages out.
Methods that recognize resubmission times (A3, E1, S1, S2,
S3) should handle this issue in stride.
So how do the excess cost results aect test and
measurement budgets? Consider as an example that the
optimal budget while expending 10 M$/yr annually for test
and measurement with no interval analysis would have
been (10 M$/yr)/1.545 = 6.5 M$/yr. Under Method A3,
expenditures would have been 6.5 M$/yr 1.38 or about
8.9 M$/yr, saving about 1.1 M$ annually. Table 2 lists the cost avoidance realized by switching methods under
the example 10 M$/yr existing spending level and the conservative interval scenario.
General Simulation Cases
Table 3 breaks out the overall mean costs for each simulation case averaged over the various method parameter
values and variations. As reported before [2, 3, 4], all methods experience more diculty with extreme reliability
targets and small instrument groupings. Unfortunately, Methods A1 and A2 have no mechanism to handle more
than one instrument per group and suer accordingly in the average. Instrument groupings allow the other methods
Savings From \ To: E1 A3 S1 A1 A2
N1 1.3 1.1 0.6 0.4 0.2
A2 1.1 0.9 0.4 0.2
A1 0.9 0.7 0.2
S1 0.7 0.5
A3 0.2
Table 2: Examples: Cost Avoidance by Switching Analysis Methods for 10 M$/yr Annual T&M Expenditures
in M$/yr
to receive and analyze more data per unit time, cutting excess costs relative to single instrument groups by a
factor of 3.3 on average.
Method S1s large and small group size results (14 % vs. 75 %) indicate MLE Methods superiority with, and
dependence on, sucient data. Furthermore, we know that Method S1 (out of the box) does not make any interval
adjustments until it sees at least one OOT. Therefore, situations involving few OOTs, such as high reliabilities or
shorter-than-correct initial intervals cause S1 some diculty and S1 incurs signicantly more cost in those specic
cases that it does otherwise [2]. We expect that the reliability-biased S1 variant mentioned above will respond
faster.
Experimental Method E1 performs similar to S1, but with less penalty analyzing small instrument groups and
Method A3 also performs well overall. Optimally then, one would cover a varying inventory with a combination of
one MLE method and one algorithmic method choosing between the two at some (as yet undetermined) group-size
breakpoint for a given reliability target.
Consideration Value N1 A1 A2 A3 E1 S1 Overall
Rel. Target 50 % 54.5 % 43 % 44 % 36 % 29 % 34 % 37 %
Rel. Target 70 % 54.6 % 46 % 49 % 37 % 33 % 42 % 41 %
Rel. Target 90 % 54.5 % 55 % 61 % 42 % 44 % 59 % 52 %
Group Size 1 54.4 % 48 % 51 % 49 % 56 % 75 % 56 %
Group Size 33.33 54.7 % as above as above 27 % 14 % 14 % 18 %
Overall 54.5 % 48 % 51 % 38 % 35 % 45 %
Table 3: Mean Excess Cost by Simulation Case and IA Method
Analytical Verications
As a further check on the simulation, we derive the excess cost for Methods N1 and D1. Adapting equation (1),
the relative excess cost function for a single interval in an interval series is
C
er
(I
c
, I, I
m
) =
|I I
c
|
I
m
. (6)
Method N1
In the case of N1, the assigned interval is always the initial interval. Therefore, in the composite interval scenario,
the mean expected excess cost of no interval analysis with the non-negative xed correct interval I
cf
and three
discrete initial intervals I
cf
, I
cf
/3 and 3I
cf
is
mean
__
I
cf
I
cf
I
cf
1
3
I
cf
I
cf
1
3
I
cf
3I
cf
I
cf
3I
cf
__
=
8
9
88.89 %. (7)
In the conservative interval scenario we use the following function to limit intervals above a minimum value
(currently I
min
= 30 d) and round the mean initial interval to the nearest day:
I
im
(I) = round(max(I
min
,
Ii
(max(I, I
min
)))). (8)
Integrating the cost function over all initial and correct intervals then gives
_

l
Ic
f
InG
(I
c
,
Ic
)
_

l
Ii
f
InG
(k,
Ii
)C
er
(I
c
, kI
im
(I
c
), I
im
(I
c
)) dk dI
c
= 54.4 %, (9)
which closely agrees with the simulation, where for a given distribution, represents the parameters (, , l) and
k is the random initial interval multiplier. The random initial intervals are I
i
= kI
im
(I
c
).
Method D1Interval Rounding Costs
Once Method D1 runs (after the rst complete interval), the assigned interval switches from the initial interval
to the correct interval, though both may be rounded or restricted. Ideally, the excess cost is zero for all but the
rst interval. In practice, we have the restricted and rounded starting interval
I
s
(k, I
c
, r
I
) = D1(kI
im
(I
c
), r
I
), (10)
the restricted and rounded nal interval
I
f
(I
c
, r
I
) = D1(I
c
, r
I
), (11)
and the mean interval over the equipment lifetime
I
m
(k, I
c
, r
I
, T
g
) =
I
s
+T
g
I
f
T
g
+ 1
, (12)
where r
I
is the interval rounding resolution and T
g
is the number of interval computations for the instrument
group. Dropping the variable parameters, the relative excess cost of interval rounding for a given correct interval,
equipment lifetime, interval resolution and initial interval multipler is then
C
D1
(I
c
, T
g
, r
I
, k) =
C
er
(I
c
, I
s
, I
m
) +T
g
C
er
(I
c
, I
f
, I
m
)
T
g
+ 1
.
If we so modify Method D1 to round assigned intervals to the nearest logistically convenient time period, then
a Method D1 simulation should produce the excess cost attributable to that rounding. In that case, the mean
excess cost over all initial intervals at a xed correct interval is
C
D1f
(I
c
, T
g
, r
I
) =
_

l
Ii
f
InG
(k,
Ii
)C
D1
(I
c
, T
g
, r
I
, k) dk,
which leads to
C
rnd
(T
g
, r
I
) =
_

l
Ic
f
InG
(I
c
,
Ic
)C
D1f
(I
c
, T
g
, r
I
) dI
c
(13)
for the interval-rounding excess cost over entire calibration workload. Table 4 compares the equation (13) and
simulated results averaged over both group sizes. We did not simulate rounding to the nearest week. Note that
in this conservative interval scenario, perfect intervals rounded to the nearest year incur more excess cost than
having no IA system at all! The costs are shown for the same 10 M$/yr example expenditure level.
Round to Calculated Simulated Example Cost
Day 0.61 % 0.56 % 55, 563 $/yr
Week 1.17 % - 115, 277 $/yr
Month 3.25 % 3.24 % 313, 665 $/yr
Year 67.8 % 66.8 % 4, 005, 075 $/yr
Table 4: Excess Cost by Interval Rounding Unit, for 10 M$/yr Annual T&M Expenditures
Method A1
Looking at Table 3 above, Method A1 wins only in the case of single-instrument groups (and low reliability).
Since A1 handles only single-instrument groupings, we omit group size results here. With a mixed inventory of
varying-sized groups, the small margin for the small group size may not justify implementing a second method for
that special case. Table 5 implies that tuning A1s interval extension factor for the reliability target would improve
its performance but one would expect similar improvements by tuning Method A3s parameters, rendering any
advantage of the A1 improvement moot.
Consideration Value a = 0.01 a = 0.01
Rel. Target 70 % 49 % 37 %
Rel. Target 80 % 49 % 44 %
Rel. Target 90 % 49 % 60 %
Overall 49 % 47 %
Table 5: A1 Metric Results
Method A2
Per Table 3, Method A2 appears to have no application whatsoever under the conservative interval scenario. Table
6 breaks out the results for A2s two variations, where we again omit group size for lack of applicability. Variation
1 is more conservative in that it favors interval decreases over increases and thus it suers relative to variation
2 when intervals are too short (as in this conservative interval scenario) or when reliability targets are relatively
low. As reliability targets increase, variation 1 beats the more reckless variation 2.
Consideration Value V1 V2
Rel. Target 70 % 48 % 40 %
Rel. Target 80 % 50 % 49 %
Rel. Target 90 % 52 % 69 %
Overall 50 % 53 %
Table 6: A2 Metric Results
Method A3
Table 3 shows that Method A3 remains more cost eective in the conservative interval scenario than A1 and A2
overall and in nearly all specic cases. Given the narrow scope in which A1 came out ahead, A3 seems the method
of choice among the algorithmic RP-1 methods. The overall results appear to indicate that A3 is slightly superior
to S1 overall, even though Method S1 is optimized in the MLE sense to the exponential reliability model used.
Method A3 does have the advantage with small data sets, more extreme reliability targets and optimal or short
intervals [4]. Method A3 is also reliability model independent, and thus may outperform Method S1 under other
reliability models. However, these results do not necessarily demonstrate absolute superiority, since the selected
group size mix may not represent a given real environment and both methods have additional variants not yet
evaluated.
Table 7 gives Method A3s performance details, which oer possibilities for tuning the condence level parameter.
Since lower values provide faster response and higher values more stability, lower condence levels perform better
on smaller instrument groupings and higher condence levels with larger instrument groupings. The highest
condence level tested (90 %) proved more cost eective at reliability targets below 90 % and vice versa.
Consideration Value C = 0.5 C = 0.7 C = 0.9
Rel. Target 70 % 41 % 36 % 31 %
Rel. Target 80 % 41 % 38 % 33 %
Rel. Target 90 % 44 % 40 % 40 %
Group Size 1 % 48 % 50 % 51 %
Group Size 33.33 % 37 % 26 % 19 %
Overall 42 % 38 % 35 %
Table 7: A3 Metric Results
Further Observations
The full data set of 52 new and 216 old simulation cases supports the additional observations listed in Table 8
with our proposed explanations.
Observation Probable Cause
Recovering from low initial intervals is costly with
reliability targets > 50 %. We theorize the same for high
initial intervals and targets < 50 %.
High (low) reliability targets require more (less) data to
lengthen than to shorten intervals.
Extreme reliabilities increase costs. High reliability
targets challenge all methods tested. The same problem
should occur at reliability targets near zero but
calibration programs have not typically operated in that
region.
The attributes data information content is highest at
50 % reliability. At extreme reliability targets the IA
methods starve for the meager information in the
minority OOT or IT events.
Statistical methods outperform algorithmic methods in
large inventories only.
Statistical uncertainty begins relatively high with small
data sets and decreases with more data.
Table 8: Further Observations and Probable Causes
Future Investigation
RP-1 contains decision trees for selecting an IA method based on inventory size, the information sources available,
quality emphasis, and budgeting priorities, as well as detailed descriptions of the pros and cons of each method.
Fully quantifying the relative costs of each method under varying conditions with optimal parameter choices would
provide information to further substantiate or tune this advice. To that end, research continues toward developing
comprehensive excess cost simulation results for the existing IA methods and improving the simulation framework.
We plan the following improvements:
Adding the resubmission time distribution
Adding the remaining reliability models used in practice
Developing a more accurate excess cost metric
Evaluating resubmission time windowing and assigned interval windowing
Simulating reliability model error
Simulating instrument aging or reliability degradation
With such a simulation framework in place, we may then proceed with the following work:
Improving Method A3s interval extrapolation
Adding interpolation to the Method A3 implementation
Adding Method S1 variants
Developing Method E1
Adding Method S2
Determining continuous metric functions of the reliability target and each methods parameters
Optimizing the parameter settings for each method as a function of the reliability target
Mapping the most cost-eective method into each region of the reliability target-group size space
Updating RP-1 with the new information
Testing new methods and variations to determine their place, if any, on the method selection map.
Some of the above results may be available by the time of this papers presentation.
Conclusion
Given the low price of computing, readily available software, easy implementation, and the fact that long-term
benets will outweigh initial implementation costs, we assume that all algorithmic IA methods are on everyones
table. In view of the return on investment, the apparent complexity should not discourage IA implementation.
Method A3 performs robustly enough for all-around use, oering signicant savings to the measurement and
testing budget in nearly all cases. Moreover, one can optimally tune its parameters by applying its statistical
test to higher level groupings. However, in the case of suciently large instrument groupings and time frames, no
algorithmic method is optimal and MLE methods are preferred for attributes (pass / fail) data. Given the methods
in RP-1 now, an MLE method backed up by Method A3 for cases where data is sparse is the optimal choice.
Simulations to date have uncovered only one rare case each (single-instrument-per-model inventories with
reliabilities below 80 %) in which Method A1 (conservative interval scenario only) or A2 (composite initial interval
scenario only) recovers marginally more excess cost than other methods. Future simulations will determine the
inventory size breakpoints above which Methods A1 and A2 oer no value. We do not recommend A1 or A2 for
realistic MTE inventories in the meantime.
Though not in RP-1, Method E1 is the most cost-eective method yet simulated. We will evaluate E1 further
and most likely propose it for inclusion.
All the methods discussed herein use attributes data, which means they use only a small portion of the information
obtained during calibration. The S1 results indicate that even MLE statistical methods have signicant excess
costs without large data sets. We therefore recommend further development and application of variables data
interval adjustment methods [7, 8, 9], as well as the laboratory management software packages that collect and
store full variables data (actual measurements with uncertainty). Research on data pooling and sharing for interval
analysis is also recommended.
Regardless of IA methodology, we recommend that organizations consider carefully before rounding or truncating
assigned intervals to arbitrary time periods. Unless unusual logistical costs apply otherwise, rounding intervals
wastes considerable resources.
Acknowledgements
The authors wish to thank B&W Pantex Metrology, Southern California Edison, Integrated Sciences Group and
Cherine Marie-Kuster for supporting this work.
References
1. NCSLI Calibration Interval Committee, Establishment and Adjustment of Calibration Intervals,
Recommended Practice RP-1, National Conference of Standards Laboratories International, 2010.
2. Kuster, M., Cenker, G., and Castrup, H., Calibration Interval Adjustment: The Eectiveness of Algorithmic
Methods, Proc. 2009 NCSLI Workshop and Symposium, San Antonio, July 27-30.
3. Kuster, M., Cenker, G., and Castrup, H., A Quantitative Comparison of Calibration Interval Adjustment
Methods, Measurement Science Conference Proceedings, Pasadena, 2010.
4. Kuster, M., Cenker, G., and Castrup, H.,Calibration Interval Adjustment Methods: Quantitative
Comparison and Optimization, Proc. 2010 NCSLI Workshop and Symposium, Providence, July 26-29.
5. JPL, MetrologyCalibration and Measurement Processes Guidelines, NASA Reference Publication 1342,
June 1994.
6. Castrup, H., Calibration Requirements Analysis System, Proc. NCSL Workshop and Symposium, Denver,
CO, 1989.
7. Castrup, H., Calibration Intervals from Variables Data, Proc. NCSLI Workshop and Symposium,
Washington DC, 2005.
8. Jackson, D., Calibration Intervals and Measurement Uncertainty Based on Variables Data, Measurement
Science Conference Proceedings, Anaheim, January 2003.
9. ISG, Calibration Interval Analysis Concepts and Methods, ISG Training Course Handbook, Ch 7, c 2006-
2010, Integrated Sciences Group.