Chi Cuadrado

382 simuaTION MopeUNG AND ANALYSIS tests, First, fiilure to reject Hf, should no being true.” These tests are often not ver sample sizes n; that is, they are not very sdysitive to subtle disagreedyonts between the data had the fitted distribution. Inskead, they should be regaNed a8 a systematic apploach for detecting fairly groXy differences, On the oth hand, if m is very Jalge, then these tests will almgst always reject 2, [see Gibbons (1985, p. 76)].\Since H, is virtually never ex¥gtly true, even a minute departure from the hypolyesized distribution will be detected for large m. This is an unfortunate property ef these tests, since it is usually sufficient to have a distribution that is “nearly” vorrect. Chi-Square Tests. The oldest goodness-of-fit hypothesis test is the chi-square Jest, which dates back at Jeast to the paper of K, Pearson (1900). As we shall see, a chi-square test may be thought of as a more formal comparison of a histogram or line graph with the fitted density or mass fonction (see the frequeney comparison in Sec. 6.6.1) To compute the chi-square test statistic in either the continuous or discrete case, we must first divide the entire range of the fitted distribution into k adjacent intervals (a,4,), [@,,%),...,[@,-14@4), where it could be that 4 =~, in which case the first interval is (~©9, a,), or a, = +, or both, Then wwe tally Nj= number of X's in the jth interval [4)_,,4,) for j=1,2,...,&. (Note that B}., N)=n.) Next, we compute the expected proportion p, of the X;'s that would fall in the jth interval if we were sampling from the fitted distribution, In the continuous case, 4 an [l Kees aj where fis the density of the fitted distribution, For discrete data, 2 Be) sna where f is the mass function of the fitted distribution. Kinally, the jest statistic is (N= npyy np ‘Tomado para fines exelusivamente de docencia dle: LAW, Averill y David KELTON, 1991, Simulation Modeling and Analysis. Tercera edicion, Editorial MeGraw-Hill.: SELECHNO INFUT PROBABILITY DISTRIBUTIONS SBS Since py s the expected number of the » X's that would fall ja the jth interval if Hy were true (sce Prob. 6.17), we would expect x’ to be small if the fit is good. Therefore, we reject Hy if x” is too large, The precise forma of the test depends on whetber ox aot we have estimated any of the parameters of the fitted distibotion from our data. First, suppose that all parameters of the fitted distribution ace known; i.e., we specified the fitted distribution without making use of the data in any way. [This allhparameters-known case might appear to be of little practical use, but there are at least two applications for it in simulation: (1) in the Poisson-process test (Iater in this section), we test to see whether times of artival can be regarded as being IND U(0, 7) random variables, where T is a constant independent of the data; and (2) in empirical testing of random number generators (Sec. 7.4.1), we test for a U(0,1) distcibution.] Then it Hy Js true, x converges in distribution (a n—» 9) to a chi-square distribution with 'é~10f, which is the same as the gamma((k~ 1)/2,2) distribution. Thus, for large a, @ test with approximate level a is obtained by rejecting Hy it 2S), (ee Fig. 6.37), where xf-1,.-2 is the upper 1 — a critical point for a chisquare distribution with k~ 1df. (Values for xj..1,1-_ can be found in Chisquare density with k~ 1d Shaded sron= Matter igure 6.37 hie chi-squave fest when all parameters are known. ‘Tomado para fines exelusivamente si de docencia d LAW, Averill y David KELTON. 1991 " is or rill y David KELTON, 1991, Simulation Modeling and Analysis, Tercera edicién, Editorial McGraw-Hill384° smavLATION MODELING AND ANALYSIS ‘Table T.2 at the end of the book.) Note that the chi-square test is only valid, i.e, is of level a, asymptotically as n=, Second, suppose that in order to specify the fitted distribution we had to estimate m parameters (m= 1) from the data, When MILEs are used, Chernoff and Lehmann (1954) showed that if Hy is true, then as ~> the distribution function of y* converges to a distribution function that lies berween the distribution functions of chi-square distributions with &—1 and k-m—1df. (See Fig. 6.38, where F,_, and F,_,,., represent the distribution functions of chi-square distributions with k— and k~m—16f, respeotively, and the dotted distibution function is the one to which the distribution fonetion of x” converges as n—.) If we let x{-_ be the upper 1~e critical point of the asymptotic distribution of x’, then teal Xbom-tina 5 Xtne S Xbntitow 25 shown in Fig, 6.38; unfortunately, the value. of x}. Will not be known ia general. Itis clear that we should reject A if x"> Xp-1,1-4 and we should not reject Hy if x?
Xf4,:-a5 since this is conservative; i.t., the actual probability a’ of committing’ a ‘Type I ervor [rejecting Hy when it is true (see Sec, 4,5)] is at least as small as the stated probability « (see Fig. 6.38). This choice, however, will entail Joss of power (probability of rejecting a false Hy) of the test, Usually, m will be no more Acymptotte dstrbution frmetion oF x fy ste From If 0 Memertea “| |} not eejeet - ae Reject» FIGURE 6.38 ‘The chi-square test when m parameters are estimated by their MLEs. ‘Tomado para fines exclusivamente de doceneia de: LAW, Averill y David KELTON, 1991, Sinalation Modeling and Analys n. Editorial MeGraw-SELECTING INPUT PRODADTLITY DISTRIBUTIONS 385 than 2, and if & is fairly large, the difference between Xf —),)-4 and Meee will not be too great. Thus, we reject Hy if (and only if) y? > y2 lacus a8 in the all-pacameters-known case. ‘The rejection region for y* is indicated in i 6.38 ‘The most troublesome aspect of carrying out a chi-square testis choosing the number and size of the intervals. This is a diffiectt problem, and! no definitive prescxij st be inverted (ee Example 6.14 below), Fithermore, for diserete distributions, we will generally be able to make thy ns only approximately equal (see Bxample 6,15), We now diseuss how to choose the intervals to ensure validity” of the test, Lot a= min, np, and let y(5) be the number of np's less than 5. Based es extensive theoretical and empirical investigations (for the all-parameters. known case), Yarnold (1970) states that the chi-square test will be approxi mately valid if k®3 and a=59(5)/k. For equiprobable intervals. thes Conditions will be satisfied if #=3 and np) =5 for al) j ‘We now turn our attention to the power of the 6 said to be unbiased if it is more likely to reject H, when itt in othér words, power is greater than the probability of a Type I Sronn, lest without this property would certainly be undesirable. Tt cae be shown that the chi-square test is always unbiased for the equiprobabie ap- Proach {see Kendall and Stuart (1979, pp. 455-461)}, If the np;'s axe not equat fond many are small), itis possible to obtain a valid test these highly biased [see Haberman (1988). In Beneral, there it no rule for choosing the intervals so that high power is btained for all alternative distributions. For a particular null distribution, a lied sample size n, and the equiprobable approach, Kallenberg, Oosterhort, ancl Schriever (1985) showed empirically that power ig an increasing functior of {te cumber of intervals & for some alternative distributions, and & decreasing function of k for other alternative distributions, Surprisingly, they also found in certain cases thal the power was greater when the np;’s were smaller in the ils (ee Prob. 6.18). Jn the absence of a definitive guideline for choosing the intervals, we kecommend the equiprobable approach and np) ® 5 for all j in the contintious fist, This guarantees a valid and unbiased tost {n the disecete case, we suggest making the np,’s approximately equal and all at least 5. The nck of. char mescription for interval selection is the major drawback of the chi-square tt {" some situations entiely diferent conclusions can be reached fips se seme 1 exclusivamente de docencia de: H ‘impeniuan LAW, Avy Dosis KELTON. 991 Suir Model an nap. Tecra stn, Hiv Meet386 sia ATION MODELING AND ANALYSIS data set depending on how the intervals are specified. The chi-square test nevertheless remains in wide use, since it can be applied to any hypothesized distribution; as we shall see below, other goodness-of-fit tests do not enjoy such a wide range of applicability. Example 6.14, We now vse a chi-square test to compare the 1 = 219 intorarrival times of Table 6.7 with the fitted exponential distribution having distribution function B(x)=1— "7/2" for x=0. If we form, say, k=20 intervals with p, = 17k = 0.05 for j=1,2,...,20, then mp, = (219)(0.05) = 10.950, so that this satisfies the guidelines that the intervals be chosen with equal p,'s and np, =5. In this case, it is easy to find the a,'s, since F can be inverted. ‘Thats, we set ay =0 and @,,=, and for j 19 we want a, to satisfy P(a,)= j/20; this is equivalent to setting aj~~0.399In(1—j/20) for j=1,2,...,19 since a= P°'(j/20). (For continuous distributions such as the normal, gamma, and beta, the inverse of the distribution function does not have a simple closed form. In these cases, however, F~' can be evaluated by aumerical methods; consult the references given in Table 6.11.) The computations for the test are given in Table, 6.12, and the value of the test statistic is x? = 22.188. Referring to Table T.2, wo see that 3, 9,99 27.204, which Is not exceeded by x7, so we would not reject Ht, at the a= 0.10 level. (Note that we would also not reject Hy for certain larger values of a such as 0.25.) Thus, this test gives us no reason to conclude that our data are poorly fitted by the expo(0.399) distribution. TABLE 6.12 A chi-square goodness-of-fit test for the Interarrival-time data Way J Interval % ™ a 1 (0,0.020) 8 10.850 0.795 2 (0.020,0.042) u 10.950 0.000 3 (0.042,0.063) 4 10.950 0.850 4 {0.065,0.089) rr) 10.950 0.850 5 {0.089,0.115) 16 10.950, 2.329 6 (0.415,0.142) 10 10,950 0.082 7 fo.142,0.172) 7 10.950 125 8 (0.172,0.204) 5 10.950, 3.233, 9 {0.204,0.239) B 10.950 0.384 0 10.239,0.271) a 10.950 0.101 a {0.277,0.319) 7 10.950 1425 2 {0.319,0.366) 7 10.950 1.425 B (0.366,0.419) 2 10.950 0.101 4 {0.419,0.480) 10 10.950 0.082 5 {0.490,0.553) 20 10.950 7.480 16 0.553,0.602) 9 10.950, 0347 "7 (0.642,0.757) a 10.950 0.000 13 (0.757,0.919) 9 30.950 oa? 9 (0.919,1.195) M 10.950 0.850 20 (1.195, 10 10,950 0.082 ‘Tomado para fines exclusivamente de docencia de: LAW, Averill y David KELTON, 1991, Sinnlation Modeling and Analysis, Tercera edicién, Editorial McGraw-Hill,Nota: En la pagina 384 clice: “When MLE are used...” MLE. son fas siglas de maximum likelihood ostimation, en castellano estimacién por méxima verosimilitd, este t6pico fue explicado on sus cursos provios de estadistica,

Chi Cuadrado

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Chi Cuadrado

Cargado por

Copyright:

Formatos disponibles

También podría gustarte