Making Sense of Trueness, Precision, Accuracy, and Uncertainty

STIMULI TO THE REVISION PROCESS 838
Stimuli articles do not necessarily reect the policies of the USPC or the USP Council of Experts
Pharmacopeial Forum Vol. 34(3) [MayJune 2008]
Making Sense of Trueness, Precision, Accuracy, and Uncertainty

Walter W. Hauck,*William Koch, Darrell Abernethy, Roger L. Williams, USP
Stimuli to the Revision Process
ABSTRACT Understanding terminology is important in order for scientists to be able to communicate with each another. This Stimuli article thus reviews the terms accuracy and precision, along with the related terms, trueness and uncertainty. The goals are to be clear where there is agreement between the usage in USPs compendia and in the international metrological community and where there is not and to make recommendations regarding use of these terms in USP and its compendia.
INTRODUCTION Understanding terminology is important for scientists so they can communicate with one another, particularly in national and international harmonizing activities and in interdisciplinary activities where the likelihood of using similar terms with different meanings is high. This Stimuli article had its beginning in two such observations. The rst pertained to differences between the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) and the International Organization for Standardization (ISO) regarding their use of the term accuracy. The second arose from many conversations between the rst author, a statistician, and analytical scientists during which they used the term precision with differing meanings. This Stimuli article thus reviews the terms accuracy and precision, and the related terms, trueness and uncertainty, with three objectives: The rst is to aid discourse by clarifying where there is and is not agreement between the usage in the USPs compendia, starting with the United States Pharmacopeia (USP), and in the international metrological community. T he second is to mak e re comme ndatio ns fo r implementation in the short term regarding use of these terms in USP and, as appropriate, in other USP compendia such as National Formulary (NF) and the Food Chemicals Codex (FCC). The last objective is to begin a process that could lead in the long term to harmonization of USP terminology with that of metrology. The concepts underlying these terms are important for any measurement procedure and statistical use of measurements. Section I: Background reviews the underlying concepts without using the technical terminology that is the topic of this paper. Statements in this section are consistent with common English usage or with introductory statistics texts. Section II: Terminology reviews the terms. For each term, the Section provides various denitions with any explanatory text provided by sources (as referenced), and then adds comment and recommendations for USP. Though not shown in quotes, the denitions are copied from the sources without editing. An
* Correspondence should be addressed to: Walter W. Hauck, PhD, Senior Scientic Fellow, US Pharmacopeia, 12601 Twinbrook Parkway, Rockville, MD 20852-01790; phone 301.816.8390; e-mail wh@usp.org.
ellipsis (. . .) indicates if some source text has been eliminated. In addition to documents from ICH, ISO, and USP, denitions are included from the International Vocabulary of Metrology (VIM), 3rd edition. VIM was revised by an international working group under the Joint Committee for Guides in Metrology (JCGM), chaired by the Director of the International Bureau of Weights and Measures (BIPM). The Joint Committee was made up of representatives of: BIPM, the International Electrotechnical Commission (IEC), the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC), ISO, the International Union of Pure and Applied Chemistry (IUPAC), the International Union of Pure and Applied Physics (IUPAP), the International Organization of Legal Metrology (OIML), and the International Laboratory Accreditation Cooperation (ILAC). Section I: Background When considering any measurement method, metrologists and other scientists must consider two general questions that underlie all the terms that will be discussed here: First, do the measurements tend to center around the (unknown) true or correct result? Second, do the measurements tend to be close together or spread apart (dispersed)? Simulated data representing weighings of an item with an actual (true) weight of 1.00 g are provided for illustrative purposes (Figures 13). Measurements centered on the true result are commonly said to be unbiased or to have no systematic error; they could be said to be on target(Figure 1). The distance between the center of a large (innite) number of measurements and the correct value (1.00 g in the example) is the bias. (For purposes of this paper, the issue of how best to measure the center will not be considered.) The actual data, because they are only a sample and not an innite number of measurements, will not necessarily average to the correct value. The average of the 50 values provided in the example is 1.03 g. The concept of being on target is covered under the heading Trueness in Section II.
# 2008
The United States Pharmacopeial Convention, Inc.
All Rights Reserved.
STIMULI TO THE REVISION PROCESS
839
Figure 1. Example of data that are on target but widely dispersed.
Being on target, however, does not say anything about how close together the measurements are. The data of Figure 1, for example, are fairly widely spread. In basic statistics, spread or consistency (hereafter, dispersion), such as shown in Figure 1, is often measured by the standard deviation, variance, or coefcient of variation (relative standard deviation). Dispersion is a property of how the data are obtained; i.e., the measurement
process. Reducing the dispersion requires changing the measurement process. Dispersion is covered under the heading Precision in Section II. Knowing the degree to which the data are dispersed does not say whether the data are on target. Figure 2 shows data that display little spread (relative to that of Figure 1) but that are off target.
Figure 2. Example of data that show little dispersion but are off target.
Ideal measurements are ones for which there is little dispersion (there is always some dispersion), and the results center on the true value, as in Figure 3. Because neither the bias nor dispersion, by itself, captures this combination, analysts need another measure. Common ones in the statistical literature are
the average of the squared differences between the results and the true value (mean squared error = variance + bias2) and the (square) root of the mean squared error. The combination term is considered further under the heading Accuracy in Section II.
Figure 3. Example of data that are both on target and show little dispersion.
Once measurements are available, some form of statistical analysis can be applied. For example, we may average the set of measurements to provide a single resulting value as an estimate of the true value. Of course, estimates are never exactly
# 2008
correct. Estimates can be wrong because of systematic errors (bias) and random errors associated with the dispersion in the measurements. The systematic error typically is measured by bias, as discussed earlier. The random contribution often is

measured by the standard error of the estimate, which depends on the dispersion of the measurements and the sample size; i.e., it depends on the measurement process (dispersion) and the quantity of data (sample size). The random component also can be captured in a statistical condence interval that represents a range of values for the true value that are consistent with the data obtained (with a specied degree of condence). This is further discussed under the heading Uncertainty in Section II. When we consider the uncertainty of, for example, an average weighing, we need to consider sources of variability that may not be captured in the scatter of the data and hence may not be included in the standard error as calculated according to procedures outlined in statistics texts. For our weighing example, any error in the calibration of the balance adds uncertainty. The intent of the calibration is to reduce bias. If the 50 measurements are all from the same balance without recalibration, any uncertainty due to the calibration is not included in the standard error. If, instead, the 50 measurements were from 50 different balances, each with its own calibration, the uncertainty from the calibration will be part of the calculated standard error. This is further considered under the heading Types A and B Uncertainty in Section II. Section II: Terminology Trueness USP: Not dened ICH: Not dened ISO 2, ISO 3, and ISO 5: The closeness of agreement between the average value obtained from a large series of test results and an accepted reference value. The measure of trueness is usually expressed in terms of bias. VIM: Closeness of agreement between the average of an innite number of replicate measured quantity values and a reference quantity value. Measurement trueness is not a quantity and thus cannot be expressed numerically, but measures for closeness of agreement are given in ISO 5725. Measurement trueness is inversely related to systematic measurement error, but is not related to random measurement error. Measurement accuracy should not be used for measurement trueness and vice versa. Comments: Although the terminology of trueness differs from that of statistics texts, the concept is that of statistical bias. As explained in ISO 5725-1, The term bias has been in use for statistical matters for a very long time, but because it caused certain philosophical objections among members of some professions (such as medical and legal practitioners), the positive aspect has been emphasized by the invention of the term trueness (p. iv). Recommendations for USP: Because USP and ICH share a denition of accuracy that corresponds to trueness, USP General Information Chapter Validation of Compendial Procedures h1225i should add the same clarication as does ICH (see Accuracy section). Precision USP: The precision of an analytical procedure is the degree of agreement among individual test results when the procedure is applied repeatedly to multiple samplings of a homogeneous
sample. The precision of an analytical procedure is usually expressed as the standard deviation or relative standard deviation (coefcient of variation) of a series of measurements. ICH: The precision of an analytical procedure expresses the closeness of agreement (degree of scatter) between a series of measurements obtained from multiple sampling of the same homogeneous sample under the prescribed conditions. Precision may be considered at three levels: repeatability, intermediate precision, and reproducibility. . . . The precision of an analytical procedure is usually expressed as the variance, standard deviation, or coefcient of variation of a series of measurements. ISO 2, ISO 3, and ISO 5: The closeness of agreement between independent test results obtained under stipulated conditions. Precision depends only on the distribution of random errors and does not relate to the true value or the specied value. The measure of precision is usually expressed in terms of imprecision and computed as a standard deviation of the test results. VIM: Closeness of agreement between indications or measured quantity values obtained by replicate measurements on the same or similar objects under specied conditions. Measurement precision is usually expressed numerically by measures of imprecision, such as standard deviation, variance, or coefcient of variation under the specied conditions of measurement. The specied conditions can be repeatability conditions of measurement, intermediate precision conditions of measurement, or reproducibility conditions of measurement . . . Sometimes measurement precision is erroneously used to mean measurement accuracy. Comments: One needs to use care in identifying the measurements to which the concept of precision is applied. Here the language varies. USP and ISO use test results, VIM uses measured quantity values, and ICH uses series of measurements. Only ISO denes what is meant (5725-1, pg 2 and 3534-1, pg 33): A test result is the value of a characteristic obtained by carrying out a specied test method. The clarifying note continues, The test method should specify that one or a number of individual measurements be made, and their average or another appropriate function (such as the median or the standard deviation) be reported as the test result. It may also require standard corrections to be applied, such as correction of gas volumes to standard temperature and pressure. Thus a test result can be a result calculated from several observed values. In the simple case, the test result is the observed value itself. One observes some conict here between metrological usage and common statistical usage. Precision in metrology is a property of a measurement procedure, which means (following ISO) the precision of whatever the test procedure denes as the test result, which could be an individual observed value. In contrast, statisticians often use precision as a property of a quantity being determined, such as the content of a reference standard. The common statistical usage thus corresponds more closely to the random component of uncertainty in metrological language. Recommendations for USP: There is good agreement between the sources on what is meant by precision and on the conditions of measurement (repeatability, intermediate precision, and reproducibility). USP h1225i should clarify what is meant by test result.
# 2008
STIMULI TO THE REVISION PROCESS
841
Accuracy USP: The accuracy of an analytical procedure is the closeness of test results obtained by that procedure to the true value . . . Accuracy is calculated as the percentage of recovery by the assay of the known added amount of analyte in the sample, or as the difference between the mean and the accepted true value, together with condence intervals. ICH: The accuracy of an analytical procedure expresses the closeness of agreement between the value which is accepted either as a conventional true value or an accepted reference value and the value found. This is sometimes termed trueness. ISO 3 and ISO 5 (not dened in ISO 2): The closeness of agreement between a test result and the accepted reference value. The term accuracy, when applied to a set of test results, involves a combination of random components and a common systematic error or bias component. VIM: Closeness of agreement between a measured quantity value and a true quantity value of a measurand. The concept measurement accuracy is not a quantity and is not given a numerical quantity value. A measurement is said to be more accurate when it offers a smaller measurement error. The term measurement accuracy should not be used for measurement trueness and the term measurement precision should not be used for measurement accuracy, which, however, is related to both these concepts. Measurement accuracy is sometimes understood as closeness of agreement between measured quantity values that are being attributed to the measurand. Comments: ICH and USP use accuracy for lack of bias (trueness). ISO uses accuracy for trueness (systematic error) and precision (random component). ISO 5725-1 notes the change: The term accuracy was at one time used to cover only the one component now termed trueness, but it became clear that to many persons it should imply the total displacement of a result from reference value, due to random as well as systematic effects (p. iv). The ISO concept of accuracy thus corresponds to the statistical concept of total error (combining random and systematic components). The target analogy is again useful. In ISO usage, trueness refers to being centered properly (but without regard to dispersion), precision refers to tightness of results (without regard to centering), and accuracy covers both (on center and tightly together). Recommendations for USP: Because accuracy in USP does not agree with ISO usage, at minimum some clarifying note to that effect should be added to h1225i and to the glossary being developed for the new bioassay chapters. Longer term, USP needs to decide whether to align with ISO on this language. Doing so, however, would be contrary to longstanding USP practice and to ICH and thus likely would be contrary to common industry usage. Uncertainty USP: not dened ICH: not dened ISO 2 (not dened in ISO 5): [Measurement] parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand. The parameter may be, for example, a standard deviation (or a given multiple of it), or the half-width of an interval having a stated level of condence. ISO 3: An estimate attached to a test result which characterizes the range of values within which the true value is asserted to lie . . . Uncertainty should be distinguished from an
# 2008
estimate attached to a test result which characterizes the range of values within which the expectation is asserted to lie. This latter estimate is a measure of precision rather than of accuracy and should be used only when the true value is not dened. When the expectation is used instead of the true value the expression random component of uncertainty should be used. VIM: Non-negative parameter characterizing the dispersion of the quantity values being attributed to a measurand, based on the information used. Measurement uncertainty includes components arising from systematic effects, such as components associated with corrections and the assigned quantity values of measurement standards, as well as the denitional uncertainty [component of measurement uncertainty resulting from the nite amount of detail in the denition of a measurand]. Sometimes estimated systematic effects are not corrected for but, instead, associated measurement uncertainty components are incorporated. The parameter may be, for example, a standard deviation called standard measurement uncertainty (or a specied multiple of it), or the half-width of an interval, having a stated coverage probability. Comments: ISO 2 is not clear that the standard deviation referred to is meant to be the standard error (standard measurement uncertainty). The broadness of the denition of test result (see Precision section) enters here. As dened by ISO, the test result could be something calculated from a series of measurements and is not necessarily the original observed values. ISO 3 and VIM bring systematic components into the uncertainty denition. Although the language is different, the uncertainty calculations in ISO documents mostly follow formulas for what has been traditionally called propagation of errors and are for what are traditionally called standard errors. Recommendations for USP: The denition of uncertainty (and of Types A and B evaluation; see next section) needs to be included somewhere in USP. USP General Information Chapter Measurement Principles and Variation h1010i is a good candidate. Types A and B Evaluation of Uncertainty USP: Not dened ICH: Not dened ISO 2, ISO 3, and ISO 5: Not dened. However, Types A and B evaluation are referred to in ISO 2 and described, though not explicitly identied, in a note accompanying the denition of uncertainty: Uncertainty of measurement comprises, in general, many components. Some of these components may be evaluated from the statistical distribution of the results of a series of measurements and can be characterized by experimental standard deviations. Other components, which also can be characterized by standard deviations, are evaluated from the assumed probability distributions based on experience or other information. VIM, A: Evaluation of a component of measurement uncertainty by a statistical analysis of measured quantity values obtained under dened measurement conditions. VIM, B: Evaluation of a component of measurement uncertainty determined by means other than a Type A evaluation of measurement uncertainty. [As examples,] evaluation based on information: associated with authoritative published quantity values; associated with the quantity value of a certied reference material; obtained from a calibration certicate; about

drift; obtained from the accuracy class of a veried measuring instrument; obtained from limits deduced through personal experience. Comments: The Type AType B distinction is new language reecting, in part, old practices. For example, many assays determine a content or potency relative to that of a reference standard. The nal determination of content or potency then uses the value assigned to the reference standard. Normal practice is to include (propagate) the uncertainty of the assigned value in determining the uncertainty of the nal content or potency. The uncertainty of the reference standard is a Type B uncertainty. SUMMARY This discussion highlights opportunities for clarication in USP among the terms trueness, accuracy, and uncertainty to assist scientists whose work crosses among USP, ICH, and ISO. In the short term, these clarications could be added easily to h1010i and h1225i. Responses to this Stimuli article will advance these additions. In the longer term, USP encourages continued harmonization of terminology among ISO, VIM, ICH, the compendia, and other interested parties.
ACKNOWLEDGEMENT The authors thank Charles Y. Tan of Merck & Co. and a member of the USP Statistics Expert Committee for his comments on an earlier draft. SOURCES ICH. Q2(R1) Validation of Analytical Procedures: Text and Methodology. Geneva, Switzerland: ICH; 2005. ISO 2. 21748 Guidance for the Use of Repeatability, Reproducibility, and Trueness Estimates in Measurement Uncertainty Estimation. Geneva, Switzerland: ISO; 2004. ISO 3. 3534-1 StatisticsVocabulary and Symbols, Part 1: Probability and General Statistical Terms. Geneva, Switzerland: ISO; 1993. ISO 5. 5725-1 Accuracy (Trueness and Precision) of Measurement Methods and Results, Part 1: General Principles and Denitions. Geneva, Switzerland: ISO; 1994. USP. USP 30NF 25,Validation of Compendial Procedures h1225i. Rockville, MD: USP; 2007. VIM. International Vocabulary of MetrologyBasic and General Concepts and Associated Terms, 3rd ed. Geneva, Switzerland: ISO; 2007.
# 2008

Making Sense of Trueness, Precision, Accuracy, and Uncertainty

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Making Sense of Trueness, Precision, Accuracy, and Uncertainty

Cargado por

Copyright:

Formatos disponibles

STIMULI TO THE REVISION PROCESS 838

Pharmacopeial Forum Vol. 34(3) [MayJune 2008]