2 Capitulos Cohen

The t Test for Means 2 INFREDLCTION AND Use (he arithmetic mean is by far the most frequently used measure of location by behavioral scieatisis, and hypotheses about means the most Frequently tested. The tables have boon designed to render very simple the procedure far power analysis in the case where (wo samples, each of n cases, fn randomly and indepenceniy drawn from normal populations, estigator wishes to (esl the null hypothesis zhat their respective 5 maeans are equal, He? mg — Mg = O(Hays. 1973, p.408f: Edwards, 1972, p. 86), referre jow as Case 0, The testis, jependent ‘means. The tables can also be wsed te analyze power far (a) the ¢ test on means of twa independent samples when 1g 2 my (Case D, (0) of dependent samples, Edwards, 1972, p.247F following consideration of the (Case 0) € test for independen: from equally varying populaiions and besed on oq) samples, Finally, the tables cam also be used for significanee testing. as detailed in Section 2.5. In the formal development of the € distribu {wo independent mess, the assumption is made tha: the popu ‘are normally distribwied und that they are af homogeneous (ie., equal) ‘on the validity of both Type Vand Type II error wlaely true for nondizeeticnal tests and as sample 1»2» 2 THEE Fest FOR MEANS sizes increase above 20 or 30 cases. The onl ‘above is under the condition of substantially unequal variances together substantially unequal sample sizes (whether small or large). Summaries of the evidence in regard to the “robustness” of thet (and F) test is provided by Scheffé (1959, Chapter 10), and in less technical terms, by Cohen (1965, ce also Boneau (1960, 1962). noteworthy excep! the 22 Tue Errect Size INDEX: d As noted above (Section original measurement ), we need a “pure” number, one free of our with which to index what can be alter cal degree of departure from the null hypothesis of the hypothesis, or the ES (effect size) we wish to detect. This is accomplished by e raw effect size as expressed in the measurement unit of 2) for the directional (one-tailed) case, and [m= mol 222) d= for the nondirectional (two-tailed) case, dex for t tests of means in standard unit, jon means expressed in raw (original measurement) and e=the standard de assumed equal). n of either population (since they are ‘The use of d is not only a necessity demanded by the practical require ments of table making, but proves salutary in those areas of the behavioral sciences where raw units are used which are quite arbitrary or lack meaning ‘which they are used, or both. Consider, for ex- ‘A and B differ in their favor- bbe indexed by an tion of the items. If the A popul tion a mean of 270, the question “How large is the effect?” can only be 2.2 THE EFFECT SIZE INDEX: d a answered with “ten points,” @ generally sence of a basis for answering the necessari factory answer in the ab- following question, “ Well, my—my_ 280-270 _ 10 a er) i.e, the means tor and denominator are expressed in scale u is a pure number (here a ratio), freed of dependence upon any specific unit of measurement. On the other hand, consider the circumstance when o = $ rather than 100. Now, lo a=". 5720, i.e, the means differ by two standard deviations. This is obviously a much larger difference than is d=.1 ‘But how large are each of these differences, and how much larger is second than the first? There are various ways the values of d may be under~ stood. 2.2.1 das Percent Nonovertar: THe U Measures. If we mai the assumy s being compared are ‘equal variability, and conceive the to define measures of nonoverlap (U) associated with d which are intui compelling and meaningful. As examples: 1, When d =0, and therefore either popul distribution is perfectly When d =0, U; = 500% 2. When d'=.1 as in the above example, the distribution of the population with the larger mean, B, is almost superimposed on A, but with some slight excess, ie., some nonoverlap. U, here equals 7.7% that is, 7.7% of the area covered by both populations combined is not overlapped. For U;,2 2. THEE TEST FOR MEANS the value is 52.0%, i, the hi the lowest 52.0% of the A popul upper 50% of population B exceeds 54.0 tion. 3. When we posited the smaller o (=5), we found d= 2.0. U, then ‘equals 81.1%, the amount of combined area not shared by the two popula tion distributions. In this case, the highest 84.1% of the B pop =xceeds the lowest 84.1% of the A population, thus U, = 84.1%, Finally, the upper half of the B population exceeds 97.7% of the A population, so that Us =97.7%. ao22 asians 6 . 4 a 5 6: 3 ss 3 ne 3 Bea 5 isk 5 ws é ts 1 a 7 26 So Bs ad 93 29 ranean than 99.95 2.2 ‘THE EFFECT SIZE INDEX: é B “The reader is free to use whichever of these U measures he finds most meaningful to hit jon. They are simply related tod and each taken as a devi aarca (population of cases) falling below a given normal de 2.23) 2. (225) ‘Table 2.2.1 presents Uy, Uz, and U, for values of d=.1 (1) 2.0 (2) 4,0. Its use will be illustrated after we have considered two other bases for the understanding of d. 2.2.2 d IN TERMS OF CORRELATION AND PROPORTION OF VARIANCE, Membership in the A or in the B population may be considered to be a simple dichotomy of a two poi Se . for example, 0 for mem ship in A and 1 for membershij (the values assigned are immate fone can express the relationship between population membership and any ‘other variable as a Pearson product-moment correlation coefficient (F). Each member in the two populations may be characterized by a pair of ‘variables, the “score” on population membership (X) and the value of the ble (Y), and the r between X and Y can then be found by any of ing formulas for r (Hays, 1973, p. 631f; Cohen & Cohen, F more readily as the point biserial r (Cohen & Cohen, tors may prefer (0 think of effect sizes for mean rather than d's, and they are related by 1975, pp. 32-35) 75, p. 35). differences in terms 2.2.6) r= — appropriately used when the A and B popu ‘such that they can be conceived as equally numerous. This wi the case when A and B represes presence or absence of a for some abstract property (e.g., high versus low anxiety lev ‘versus foreign speaker), as well as when the dichotomy represents ‘equally numerous populations, as is the case (at least approxim: males and females. The case of equally numerous populations is the usual ‘one. This is the case assumed for the values of r given in Table 2.2.1 ‘When, however, the populations are concrete and unequal collections ofry 2. THEE TEST FOR MEANS cases, the inequality should figure in the assessment of the degree of relationship (€-8. finally diagnosed schizophrenics versus others on a diagnostic psychological test). The more general formula for r should then be used: 4 @27) 4+, Vd? + (1/pq) where p= proportion of A’s in combined A and B populations, and q=1—p (ie., proportion of B's) [The reader will note that when p=q=.5, formula (2.2.7) reduces to formula (2.2.6)) ‘Once a difference between population means of A and B can be expressed ‘and usually most usefully be expressed as r*, the proportion variance (PV) of ¥ in the combined A and B populations as- for by population membership (X =0 oF Table 2.2.1 present values of both r and r? equiv: where equally numerous populations are assumed. If the me equally numerous populations on a variable Y differ by d = 1.0, then popu- fion membership relates to Y with r = .447, and r? = .200 of the combined population variance in ¥ is associated with A versus B membership (X). of the tot sociated with or acco Meptum,” AND “Larce” d Vatues, When working: with a variable Y tudied, the selection of an effect size expressed in d offers no particular difficulty. On the one hand, estimates of the within-population o are readily at hand and the number of raw points difference between A and B population means to be detected (or to serve as fan alternate hypothesis to the null) arise naturally out of the content of the inquiry. Thus, a psychologist studying the effects of treatment in phenylpyruvic mental deficiency ly have an estimate of the v of 1Q in such a 223 “Swat, social class differences in heigh ‘mated o of height, for example, 2.51 ‘was seeking to detect between two social class populations, could then find his difference expressed as d == 2/2.5, which But consider now the frequently arising circumstance where the variable Y is a new measure for which previously collected data or experience are sparse or even nonexistent, Take, for example, an especially constructed test of learning ability appropriate for use with phenylpyruvic mental westigator may well be satisfied with the relevance of the test to his purpose, yet may have no idea of either what the o is or how many points of difference on Y between means of treated and untreated 2.2 THE ERECT SIZE INDEX! 4 25 compute d. the apparent posited difference between means and au can posit d directly. Thus, if the treatment method on learning abil posit ad value such as 2 oF 3. IF he ar das 8 or 1.0, If he expects it to be imply seeks to straddle the fence on the issue), he might select some such value asd =.5, ‘The terms “small,” “medium,” and “large” are relative, not on} ‘each other, but to the area of behavioral science or even more particul 10d being employed in any given to be gained than lost reference which is recommended for use only when no ‘mating the ES index is available, SMALL EFFECT size: d=.2, In new areas of research inquiry, effect sizes ate likely to be si are not zero!). This is becaust phenomena under study ate typically not under good experimental measurement control or both, When phenomena are studied which car be brought into the laboratory, the influence of ') makes the size of the effect sm: to detec means can be seen in Table 2.2.1, When d=.2, normal ns of equal size and variability have only 14.7 If B is the popul 4 of the B popul measure of | exceeded by of equally between ion membership (A vs. B) and 1 and F? is accordingly .010. The latter can be ii ‘membership accounts for 1% of the bined A and B popu ‘The above sounds indeed small (but see Section 11,2). Yet preted as meaning that nce of ¥ in the com26 2 THEE THST FOR MEANS ‘magnitude of the difference in mean 1Q between twins and nontwins, the latter being the larger (Hiusén, 1959). [tis also approximately the size of the difference in mean height between 15-and 16-year-old girls in, where the is about 2.1). Other examples of small effect sizes are adult sex differences on the Information and Picture Completion Subtests of the Wechsler Adult igence Scale, favor a difference favoring women on the ‘Symbol Test which is twice as large (Wechsler, 1958, p. 147). MEDIUM EFFECT size: d=.5. A cone large eno naked eye. some aware of an average difference in terms of measures of nonoverlap (Table 2.2. 33.0% (=U,) of the combined area covered by two normal equal-sized ly varying populations is not overlapped; that (where my >) 59.9% U;) of the B population exceeds 59.9% of the A popul that the upper half of the B population exceeds 69.1 population. In terms of correlation, d= .5 means a point biserial r between popula- yn membership (A v5 who had been teachers versus those who had been general clerks (Hai and Harrell, 1945, pp. 231-232). Depending on his frame of reference, reader may consider such differences cither small or large. We are thus reminded of the arbitrariness of this assignment of quantitative operational definitions to qualitative adjectives. See Section 11.2. LARGE EFFECT SIZ ated as to make d=.8, overlapped. U, = 65.5) the lowest 65.5% of ns are so separ- areas are not 2.3 POWER TABLES Pa example, educational psychologists) do not ordinarily consider an r of .371 and typical college freshmen, or between college graduates and persons ‘only a 50-50 chance of passing in an academic high school curr (Cronbach, 1960, p, 174). These seem like grossly perceptible and ther large differences, as does the mean difference in height between 13- and 18+ year-old girls, which is of the same size (4= 2.3 Power Tastes ‘The power tables are used when, in addition to the significance criterion and ES, the sample size is also specified; the tables then yield power values. Their major use will then be past hoc, ic, to find the power of a test after the experiment has been performed. They can, of course, also be used in experimental planning by varying n (or ES or a or all these) to see the consequences to power of such alternatives 23.1 Case 0: aq = 09,4 = Mp. The power tables are designed to yield fe difference between the means of two independent samples of eq drawn from normal populations having equal variances are described for such use below, and in a n for other conditions (Cases 1-4). Tables list values for a, d, ignificance Criterion, a. There are tables for the following values of a, = 05, a, =.10; a; = 01, a; =.05, a, =.10, where the sub- e power at a, is to an adequate ter than (say) 10, my—m, d= " ‘where the alternate hypothesis specifies that mg > m4, and is the common, Within-population standard deviation (.e., 04 = o9=).28 2. THEE TEST FOR MEANS 2.3 POWER TADLES 29 Table 231 “Tle 23:1 fcontinve) Power of tt of m, =m, . 4, 10.20.30 WO 450 60.70.80 1.00 1,20 1.40 ~ ~ ~ nn se 8 tt Et ey 2 OS 7 4 RYRRRE? 2 2 8 % 2 xO & oS me gO BERR ES i pee ane 6 ob & & 9 37 no moe ef boas Reg 3 3 EER ES H 20 2 ~ NT * Oo » 2 Os BB 38 100.33 “6 8 7 fF BREE £23 & 2 BER? 2 228 B28 e 2 28 & 2 288 B53 — eee £3 oe B a 8 82 ? mR RS 2 228 23 g 2 iy 7 sey 22 a ogas 28 = i a8 3 #88 a3 mo 2 2038: E3 es a i % 23 9% % Zou 39 ih oor 6 + Power values below this polnt are greater than 995. e Beg 2 $ 228 3 g o33 3 3 f 238 3 $8 8 3 330 2 THEE TEST FOR MEANS 2.3 POWER TABLES 3 vais: ves2hant fvedteetacaen ah _ "er _ 7 3 EEeegggecs oF 10 13 1S SHG BE 2 g i EEE EES ohne ee ames 4 ag e82 22 8 12s 33 Kt 51 GO 9. g peeegegss 8 18 26 3h Sh 63 BOT 3 es epegex Bee ee EEE ETE i PERE EB wae eee Eee SD 3 Rag Eg: SEEPS RE EE ER : nena s Sr zee Be RB 7 B 3 ao ez 2 SEER Sa RR 2G i EEE yuk ees 7 Ba 2 3 ZERRS SRE RE o9 16 26 37 50 62h BS ys a @ean 10 16 26 38 St EHD. B 1 BR oe fd HERS Bu ek hanes a8 % | EEE GSE aE woe aR? PERR SERRE BS oeE EE . to ioe ? fi BLN ER gas Boye i BEGEEEES : £33 i PRES 8 EB Baa: Ey no3s KD = ae R 2 36 sz 69 BR OH 100 = ” are aS ee * Power values below this point are greater then .995, r EREEE RR i £2228 3 82 2 THEE TEST FOR MEANS 2:3 POWER TABLES 33 Tatie233, “ove 2.3.3 (conned) Power of Lust of m, =m, aba, #310 a oo nd oO 2030 MO 50 60-70-80 1.00 1.20 1.40 . 4, «10 20 30 to 50.60.70 801.00 Salina 2 8 me % % we 4 FF 2% 8 7 0 % 9% 8 7% 8 ow 0 7 mH 3 wm ow 2 ke ff 8 %@ 7 3 io mh 1 se tr 5k, nom @ & 0 9 B Boe 8 M8 2 7 8 gee YB pag nx *e 2 7 3s 3 & 3 a &@ e 2 2 n 8 7 Me 6 OF OB. 28 & SR & ? 21 39 jo te Km OO Mo & 8 os 9 2 3 fo 5 ff 7 fy OF 3 4 a a Bue gg hop 7 BR y 5 3 * % 49 om 97s oR Re oF eh 8 st hm oF mh MB Be 2» 2 we Oe 3% 3 & 2 7 6 8 2B mn Re ot 5 B® ze Mw » 7 9 % 3 30 & 7 5 0 oh % 8 oS hs 7 8 F nf yo 2 & 2B Bb & & 6 a» 0 5 FB 3 8 B 8 2 ® 7 Be 25 re en ee ~ 9 Fd Boo Bo & fm x 9 3 6 OF 2 Bs % 8 >» mee S 2 Bo SR we ee oe we OO 3 mM i Ms Be 6 9 RS meen 2 fo logs 2° 33018 30s Gots x 9 Mo 6 Et Bow oh BH mB we BF F s ° 8 EM 3 hh 8 @ % ue OB OF 00k? 3 38 RY 8 ne eT fo ye * Sto 2k Be me BR Oe wo Oly oO foo ke % 2 Bw & me wm se goles 36 boo BW Bm om se reo os x 30 3 50% Se 8 Soop eR om oH Sw 3s 3 pe Hh @ eH 8B Power values below this polnt are greater than «995 40 2 0 os 2 & mB 2 7 ow re ope 8 De 8 oF ® ro mon 8 BR & h 8 OH 6 on yD 6 Rh mh Oe ao eon 8 es oF36 2. THEE TEST FOR MEANS Taste 2358 Tate 23.5 continued) Power oft eof my =p at ay #05 350.60 70.80 1.00 1,20 1.40 * 419.20 30 40.50.6070. 801,00 1.0 — — 8 70 3 8 + 8 8 in * 6 n % @ 7 oe B 5 Se Bost 8S 03 Ey % 39 08 %6 9% 3 on om 96 a oe n 7 9 fo 90 3 Bs x 3 os 8 8S 3 @ e 2 BS rn 6 H ° 8s 2 5 &% 3 oo B96 3 9 88 Fa &@ % 2 9 3° 3 a 3 m5 3 * 7 & 95 9 3 3 & 9 «© m8 7 x no» 9» 9 3 % 5 8 2 7 38 3 PR a 9 eR % 2 oF 59 Be BD * 8 Power values below this point are greater than "88 SEsEs eueee36 2. THEE TEST FOR MEANS Taste 2358 Tate 23.5 continued) Power oft eof my =p at ay #05 350.60 70.80 1.00 1,20 1.40 * 419.20 30 40.50.6070. 801,00 1.0 — — 8 70 3 8 + 8 8 in * 6 n % @ 7 oe B 5 Se Bost 8S 03 Ey % 39 08 %6 9% 3 on om 96 a oe n 7 9 fo 90 3 Bs x 3 os 8 8S 3 @ e 2 BS rn 6 H ° 8s 2 5 &% 3 oo B96 3 9 88 Fa &@ % 2 9 3° 3 a 3 m5 3 * 7 & 95 9 3 3 & 9 «© m8 7 x no» 9» 9 3 % 5 8 2 7 38 3 PR a 9 eR % 2 oF 59 Be BD * 8 Power values below this point are greater than "88 SEsEs eueeex 2 Tat TesT FOR MEANS 2.3. POWER TABLES @ anes Table 23.6 feontinsee! Power of es of my 7 my atts = 10 ca —_ — ‘ ® ¢, ozo «30 40 450-60 70 80 1,00 1420 1.40 re ee eR ey np y % 3 pp BM @ mon 3 M57 th eo BH @ 1 & DB 9% RoR ww 8 oF F 7 eM HO & & % 98 Rm } 95 99 me 9 6 98 % 0 7 noe 8 a 3 se Bf 89 B ss 9 5 9 9 Py » 9 : * x 3 "88 SEES Beaes * Power values below this point are greater than .995, SESBS LeRRE eeses40 2 THE TEST FOR MEANS tests [formula (22.2)], = lmcm where the nate hypothesis 5 only thatm, # my. 1. This is the size of each of the two samples being. nn is made for m= 8 (1) 40 (2) 60 (4) 100 (20) 200 (50) the body of the table are the power of the of tests carried out under the given condi Illustrative Examples 2A. An experimental psychologist designs a study to appraise the effect of opportunity to explore a maze without reward on subsequent maze learning in rats. Random samples of 30 cases each are drawn from the available supply and assigned to an experimental (E) group which is given an exploratory ‘nd a control (C) group, which is not. Following this, the 60 rats are tested and the number of trials needed to reach a criterion of two successive errorless runs is determined. The (nondirect null hypothesis is |m; ~ mc| = 0. She anticipates that the ES would be such that the highest 60% of one population would exceed the lowest 60% of the other, i.e., U; = 60% (Section 2.2). Referring to Table 2.2.1, she finds that U, = 59.9% is equivalent to our conventional definition of a medium effect: d= .50. Thi that the poy The significance is the power of the test? Summarizing the speci- means differ by half a criterion is a, =.05. Whi fications, a,=.05, d= .50, ngenc=n=30, In Table 2.3.5 (for a; =.05), for column d = .50 and row n= 30, power 2.3. POWER TABLES a equals 47. Thus, for the given sample sizes and using the az = 08 signifi- or does not quite have a fily-fity chance of detecting d =.50. The choice of d need not have proceeded by asserting the expectation that the ES was “medium” and using the conventional d= .5 value. Experi- the subjects and the m for “medium” d =.50: n=30, — power=.61, for explicit d (from (2.2.1))=.71: m=3, — power =.86. a, =.05 agai n= 30, a, = 05, power effort involved in performing the exper 2.2 A psychiatric investigator, in pursuing certain endocrinological factors implicated in schizophrenia, performs an experiment in which urine samples of 500 schizophrenics and 500 comparable normals are analyzeda 2. THEE TEST FOR MEANS for a certain relevant metaboli distributed with homogeneo logical factor is only indirectly re and pethaps for other reasons, he anticipates only a smal that d=.20, He selects the conservative significance et What is the power of hist test? Summarizing the specifica a,=.01, d=20, ns=ny=500. In Table 2.34 (for a, =.01 =n. Were he to be satisfied with the less stringent a, =.05 significance criterion, he would find (from Table 2.3.5) power equal to .88. Note that rather large samples are required to de for column 4 =.20, row a = 500, power Wn such as a=.01, He may well want to consider increasing his ‘Type I (a) error risk to perhaps .10 in order to keep the magnitude of his ‘Type II (b) error risk from becoming so large as to make the experi uninformative in the likely event of a nonsignificant difference. Naturally, the increase ina is made before, not after, the data are coll 23.2 CASE I: my 7M, o4=q The power tables approximate values when, from the two normal equally va samples of different sizes are drawn. In such cases, compute the har mean of ny and ng, ever, Further, once nis, between n, and ny continue to conceive of the populations as equally numerous, although the samples are of unequal n. the € test for m as based on df =2n'—2, when 2.3 POWER TABLES eB ‘strative Example 13 Ina psychological service center, cases are assigned by an essen- standard” ‘a period of hod A and 60 cases by Method B. termine whether the new method (B) is better .), using final staff conference consensus ratings of impr e B population ch, say the a, = .05 significance eriterion. ‘Thus, the specifications are a=005, d=.6(U,=38.2 9, = 90:4 60=M5 With unequal n, he finds [From (2.3.1 ny my _ 2190) (60) _ 10800 atm, 90460 n= =n. (Note that n’, the harmonic mean, is smaller than the arithmetic mean, which is (90 + 60)/2=75.) In Table 23.2 (for a, ~.05), column d= .6, row n= 72, he finds power equal to .97 (a trivially small underestimate). Note that had they performed a ondirectional test which would have ted the conclusion that B was worse than A, power (Table 2.3.5 f level not much sion that Bis worse pen = .05) would have been .94. Power is less, but at thi they might consider the possibility of reaching the c than A worth the small loss of power. 2333. ns of unequal CASE 2: 04% 0p, M4 =A. For normal popu ed Vv. 1939; Cohen, 1965) Approximations adequate for m purposes are at Tt should be kept in mind that when 0, #09, the def“ 2 THEE TEST FOR MEANS slightly modified, Since there is no longer a common within-population 0, 4 is defined as above (formulas (2.2.1) and (2.2,2)), but instead of @ in the denominator, the formula requires the root mean square of e, and that is, the square root of the mean of the two variances: @32) on ft = The unequal variabil n 2.2. Given need not affect the conception of d developed there is a difference between 0, and o,, we merely standard deviation to standar- no longer be generally defined and the Table 2.2.1 U columns However, interpreting d in terms of r and +? proceeds completely unaffected by eq ops and the conventional definitions of small, medium, and large d nue to be used. @, #0q and itis also the case that n, mg, the nominal values may differ greatly from Under these conditions rables 2.3 may be greatly Mlustrative Example 2.4 A labor economist plans a sample survey of men and women workers in a given occupation to determine whether their mean weekly ‘wages differ. He proceeds to do a t test,> using random samples of 100 cases i nificance criterion of a =.01. He possible that the we fers between the two He may arrive at the ES=d he tecting in any of the following ways: 4. He may plan for allowing that the difference between {is $2.00 a week, and that the “average” variability of his value is not the standard devia jorkers of that of women workers, (wo popul ion of either the population of of the population distributions should not materially ‘ofthis size 23. POWER TABLES 45 but the root mean square of their respective population standard deviations, o” (formula (2.3.2)). He then finds d by formula (2.2.2), at $2.00/$4.00 =.5. tion of ¢. 3. Correla convenient to work in cor seeks (0 detect as a degree of (Poi weekly Wage as F~.25, or as the amount of wage variance associated sexasr? = .06, In Table 2.2.1, he finds that r = .243 and r? = 059 are eq lent tod ~.5. The fact that o, #2, does not at all affect the validity of the ns the U measures no longer apply. ‘Thus, by any of the above routes, we have the specifications: aj=.01, d= 5, m= ng= 100. In Table 2.3.4, for column d = 5, row n= 100, he finds power equal to 82. Ihe is prepared to work with the less stringent a, = 05, he would find from Table 2.3.5 power equal to .94, On the other hand, if he is prepared to ict his test to detecting a wage difference favoring men workers and rot the opposite, he would use the a, = 01 level and from Table 2.3.1 find power = 88 23.4 Case 3: One SAMPLE OF m OnseRVATIONS. Up to this point we have considered the most frequent application of the € te: to cases involving the difference between two sample means where we test the hypot means are equal or, equivalently, that their differ tence is zero, The t test can also be used with a single sample of observations fhe hypothesis that the population mean equals some specified value, Hy:m =e. The value specified is relevant to some theory under considera~ mn. As an example, consider an anthropological field study of a preliterate fandom sample of m children is tested by means of a igence test which yields an IQ whose mean, as standar~ ‘hypothesis then is that the popula- ample, consider value of 6 (as in Thurstone equal sample of n subjects, one can test the null hypoths from whence they are drawn is, on the average, neu _ Rejection with a sample mean greater than 6 yields the conclusion that the6 2. THEE TEST FOR MEANS population is on the average “favorable” toward the social object, and on the average “unfavorabl 233) 1a the ES index. Conceptually there has been no change: d,’ is the differ- tion mean (m) and the mean specified by interpretation of dy’ proceeds exactly as described in St to Table 2.2.1 and the operational definition of small, medium, and large effects. ‘However, the tables cannot be used as for the Case 0 two-sample test for two reasons: are two sample means, cach he observed sample difference only one sample mean population parameter between means, wl based on n cases, the value € being a hypothet ‘and thus without sampling error. 2. The power tables were computed on the basis that n is the size of each of two samples and that therefore thet test would be based on 2(n ~ 1) degrees of freedom. In the one-sample case, is perforce based on only nm — 1 degrees of freedom. ‘Thus, if one simply used the power t irectly for dy’ and n for the ‘one-sample case, one would be pr g (a) twice as much sampling error with consequently less power and (b) twice the number of degrees of freedom with consequently more power than the values on which the tables" prepara tion was predicated, These are not, however, equal influences; unless the sample size is small (say less than 25 or 30), the effect of the underestimation of the degrees of freedom is negligible. On the other hand, the doubling of ‘have a substantial effect for all values of n, How= ever, the is readily compensated for. For the one-sample case, use the power tables with and 234) d= 4y'v2. ‘Multiplying 4," by ,/2 (approximately 1.4) compensates for the tables’ assumption of double the error variance. The other problem resulting from the use of a is that the tabled value for power presumes that the degrees of 2.3 POWER TABLES a freedom are 2(n — 1), when actually there are only n ~ 1 degrees of freedom. since t approximates the limiting normal distribution fairly well is degrees of freedom are as few as 25 or 30, power values based 1¢ actual degrees of freedom will not be materially overestimated samples. =4,'V2 raises the troublesome problem of numbers alue of d,' (not d) may be a equivalences with the U and F requires the further conceptualization that lation mean, formula (2.3.3)] is the mean of a normal pop equal to that of the population being sampled. In summary, for Case 3, one defines 4,’ as above and value of the popu tion whose rets it exactly the power tables imp! ‘a sample of 60 animals is reared from birth i ested in whether, under these ex ‘weight gain of a population of ‘mean of 70 in either direction, eve tests is Hy: m =e = 70. Th as a conventional ope! of a slight departure. He uses the relatively lenient significance criterion of a; = .10. In order to allow for the fact that we have only one sample mean contri- buting to error, rather than the two which the co jon of the tables presumes, the tables must be considered not sing formula (2.3.4), for d=dy'V2=.20 (1.4) =.28. Thu jons for estimating power are columns ely. Linear interpolation In Table 2.3.6. (for a; =.10), 4 =.20 and d=.30 to be 29 and .50, respect“8 2. THEE TEST FOR MEANS between these values yields approximate power at d=.28 of .8(.50~.29) +.29 = 46, 2.3.5 Cask 4: ONE SAMPLE OF m DisFERENCES BETWEEN PAIRED Ousex- VATIONS. Although the general one-sample case as described in Case 3 above does not occur with much frequency in behavioral a special form of it appears quite often, Data are frequent X, Y pairs which are matched in some relevant way so that there aren ‘of X, ¥ observations. The t test of the my ~ my difference proceeds with the differences, X —Y = Z. the a 2.3.5) The Z subscript is used to em our raw score unit is no longer or Y, but Z. If the investigator is content to work with o as the standardizing unit, he can proceed to do so as described for Case 3, ',and looking in the power tables ford = dy’ V2 {formula (2.3.4) for the one described the random pairing of X and Y values i lation F of zero. Now, the a, of the denominator in formula (2.3.4), and hence the unit in which the ES index dy’ for the difference in matched pairs is expressed, is given by 036) ay =o nye Vag? boy? —Droxey Note that as r (the population between X and Y as paired) increases, oz decreases. In the case of matched pairs here being considered, on the assumption of equal variance, @37 og ox-y = V20" = ea? = oVI “Thus, the relative size of the standardizing unit for the dy’ of Case 4 2.3 POWER TAMLES 0 (dependent) to the d of Case 0 (independent) is oV In other words, a given difference between popul (dependent) samples is standardized by a value whi ts would be the case were they independent. Alternatively (and equivalen thedy’ value used as an ES index for means from matched samples, when txpressed in the same terms as for independent samples, namely 2, the common within-population standard deviation, is 1/¥2I—r) larger than the d value forthe same raw score difference in independent samples. hough one can treat the matched pairs in Case 3 form, the standard- the size ofr, as shown in formula (23.8) a= ly the same index as the d of formulas (2.2. ized by the . jon o, As was the case for dy’, all the interpretive material (e.g., of Section 2.2 holds. However, for correct power values, the value in the power tables is nor d,', but rather is procedure leads to an overestimate of power which bles assume 2(0 — 1) degrees of freedom where only n—1 are actually available “The advantages of matching can now be made readily apparent. Con- sider an investigation which is to concern itself with the question of a sex diflerence in some aptitude variable. Assume that clementary school boys and girls each have population o = 16, and one wishes to detect a difference jn raw population means of & points, using samples of n= 40 subjects Assume the testis to be performed at the twortalled .05 level (a; = 05), The relevant power table is 2.3.5. Case 0. Since the plan is to work with independent samples of 40 boys and 40 girls, we use n = 40 and aa Ma mal_ 8 =5 to find power = .60.50 2 THE TEST FOR MEANS the investi- difference. Case 4, Instead of independent samples of boys and girl gator plans to draw 40 brother-sister pairs to detect the 8 poi ‘There is the same ES, namel However, be estimates the r between brothers and tude variable as .6 and in Table 2.3.5 for n = 40 and design with an estimated matching r of .60 has resulted in power of .93 instead of only 60. Note that ifr were .40 instead of .60, he would look for the value ‘matching ris smaller. See Section 11.4 for a general treatment of the r power of difference and regressed difference scores. Mustrative Examples 2.6 An educational researcher has developed two different programed tests for teaching elementary algebra. From a high school grade, h irs of pupils so that the two members of each pair have 1Qs jigns the members of each pair to the n, tests all subjects on a commor a small to medium value, using the a; =.05 significance criterion. It would not be correct to lo ie in the power table d,’ =.40, because value does not take the advantageous effect of matching. ‘The appropriate ES for this 2.3 POWER TABLES st ris the population correlation between 1Q-matched pairs in algebra power analysis is Fas .55. Thus, 44 dh Vins ~ a0” If he were lacking a basis reach detect in units, was (from formula dy’ = 42, so that, would use the power tables for d= 422 = .60 [formula (2.3.4)} ‘Thus, in either instance, summarizing his specifications: a,=.05, d=60, n=50. 1g F, the investigator would have he ES he was seeking to A-B=Z From Table 23.5, Note that had the same pr random samples of $0 cases with be only .50 (Table 2.3.5). The eff Jumn d = 60, row n= 50, he finds power = 84 ken with independent d= .40, power would ing with an r of 55 makes large increase in power (from .50 to and the significant Y and represent “before” the paired observai straightforward instance of Case 4. Somet and “after some intervening experimental manipulation whose effect on a dependent variable is to be scrutinized. (in their failure to control for other ints of time, such studies may be misleading.) Consider a study to appraise the efficacy of prescribing a program of diet and exercises to a group of overweight male students. The researcher gets subject his “before” weight X, prescribes the program, and checks weight ¥ 60 days later. The study employs a sample of 80 subjects. The researcher wishes to know the power of a test at a, = .01 to detect ‘a mean loss (Z =X —Y¥) of 4 Ib where the estimate of the population2 2 THEE TEST FOR MEANS that under these circumstances the population F of before with after weight yy of .80. Thus, his effective d (from formula (2.3.9)] Alternatively, he might have avoided the need to estimate r and reasoned that, considering the distribution of weight loss Z, he wanted to detect a ‘mean loss of about .5 of the standard deviation of weight losses, ic. formula Qs To find the effective d, .5V2=.71, or, in this instance, about the same value (.74) found from the approach via formula (2.39). Summarizing the specifications: a=0, d=. n=60, In Table 2.3.1 (for a, =.01), in the row n= 60, columns d= .70 and ge matching r's are not infrequent in own- control designs in behavioral science. 24 SAMPLE Size TABLES The tat jon use values for ES to be detected, and the desired power to determine the sample size, They would therefore be of primary uti planning of experiments to provide a basis for the decision as to how many sampling units (n) are to be used. Although decisions about sample size in behavioral science are frequently made by appeal to tradition or precedent, ready availability of data, or (Cohen, 1965, p. 97ff), unless Type II error rate considerations bute to the decision, they can hardly be rational, he significance 24.1 Case 0: 0,= 05, M4 =My. AS was done in Section 2.3 for the power tables, the use of the sample size tables is first described for the ms for which they were optimally designed, Case 0, where they yield the sample size, n, for each of two independent samples drawn from normal 24° SAMPLE SIZE TABLES 53 populations having equal variances. Their use in other cases is described Tater. Tables are used for a, d, and the desired powers 1. Significance Criterion, a. The same values of a are provided as for the power tables. For each of the following a levels, a table is provided: a, = 01 (a; =.02), a, =.05 (@,=.10), a, =.10 (a) =. 01 @= (005), and a, =.05 (a, = .025). 2. Effect Size, d. This value is defined and interpreted as above (formulas (22.1, 2.2.2)] and used as in the power tables, The same provision is made: .10 (10) .80 (.20) 1.40, To find n for a value of d not provided, an adequate approximation is given by substituting in the following: (241) an peel where n 9 is the necessary sample size for the given d= .10, andd is the nontabulated ES. Round the resul 3. Desired Power. The sample size tables list desired values of .25, .50, £60, 2/3, .70 (.05), .9 Some comment about 1e selection of the above values is in order. The 25 value is given only to help provide a frame of reference in sample size ination; it seems very unlikely that a behavioral scientist would nor- ‘The values mally desire only one chance in four of rejecting a null hypothes are about equally spaced between .50 and .99. An exception to this equality ‘of power interval is the provision of power of 2/3. This was made s0 as to sive the sample size at which the odds are two to one that a given d would be detected. Entries for desired power values of .99, .95, and .90 are offered. This makes possible the setting of Type Il error risk equal {0 the conventional . There are cone yypothetical value of d, the investTHE TEST FOR MEANS. 2.4. SAMPLE SIZE TABLES 55 given the nature of sta ly, b risks close to e Type. Je as an opera- if b is made very small (desired required sample sizes becom desired power values as s desired a significance criteria on the basis of the consideration of the56 2 TWEE TEST FOR MEANS seriousness of the consequences of the two kinds of errors and the cost of obtaining data. He cannot literally place a dollar value on the “cost” of each kind of error, as can the industrial quality control engineer who uses exactly the same formal statistical inferential procedures. He can, however, is approach by subje of these ‘and the cost of gener Il decide that Type serious and ther errors, which rest less serious than finding something that is not there accords with the conventional scientific view. 1 is proposed here as a convention that, when the investigator has no ‘other basis for setting the desired power value, the value .80 be used. This means that b is set at .20. This arbitrary but reasonable value is offered for several reasons (Cohen, 1965, pp. 98-99). The chief among them takes into feration the implicit convention for a of .05. The b of .20 is chosen with the idea that the general relative seriousness of these two rs is of the order of .20/.05, iat Type I errors are of the ordi fs. This .80 desired power com be ignored whenever an inves ive concerns in his specific research invest stub, These determine n, a significance criterion wi we necessary size of each sample to detect d at the the desired power. Ilustrative Examples 28 Reconsider example 2.1 for the Case 0 use of the power tables in which an experimental psychol studying the effect of opportunity squent maze-learning in rats. As described there, 1 an ES of d = .S0at a, = .05. Her plan to.use nm = 30 animals in each of her E and C groups resulted in a power estimate of 47, She will likely consider this value too low. Now let us assume that 2.4. SAMPLE SIZE TABLES 37 she wishes power to be .80 and wants to know the sample size necessary to accomplish this. The specifications thus are a,=.05, d=.50, power =.80. In Table 2.4.1 for a, =.05, column d = .50, row power = 80, n (=n¢= nny equals 64. She will need wo samples of 64 animals each to have an .80 inder these conditions, ‘of 30 per group to ‘g0 from power of .47 to power of .80. If, on the other hat (our conventional definitio -ge ES), which she wished to detect with the same power at the same a level, then a,=.05, d=.80, — power=.80. 1 fora, = .05, column d = .80, row power = .80, (ion of a small ES), for the same significance criterion and desired \¢ Specifications are: a,=.05, d=.20, power =.80. Again in Table 2.4.1 for a, =.05, column d=.20, the same row power = .80, nis 393 for each group. the importance of putting oneself I planaing. Depending on ive conditions (ie., a; = 05, ower = 80}, one nesis two samples of 26 or 393 animals forthe Case 0 design. H seems fa that experimental planning can hardly proceed in the absence of a prior rendering of judgment about the size of the effect one wishes to detect. “The researcher can, of course, reduce the n demanded by making his regard 10 cither the significance level or these are tolerable alternatives. to take an extreme case wi increase his a risk 10 Phenomenon” in dieeetional terms, ie. pred other specifications forthe original problem, he has: =.10, d=.50, power =.80. In Table 2.41 for a, =.10, for column d=.50, row power=.80, he finds n (=ne=ng) = 36, compared with n= 68 for ay = 05 (Same d and Power). mp 3h OS 2S 8 F to 37 pa 8 9 rn 36 8 8 or » 8 8 & eM g 3 8 & oS es 8 2a 3. THE SIGNIFICANCE OF A PRODUCT MOMENT F, 3.3 rower TABLES 8 Toole 3.3.2 continued! i032 ee _ _ Power of tet of =| . . * re 1020430 KO 5060708090 . , 20.30 ho 60.70.80 .90 50 235 mn 8 HO = = i Bi wo 89 * 226 wR 2 % 8 oe 1% 37 82 SOT ie 22 a eB ’ 58208 3 Oh 58 218 Sh 8 8 nm % & BH 99 © 3% Hm Ri 3 3 9 oF OF a 2 8 OF gy % 3k oF 7 & no a) % ho 7 hm OB n 2 2 8 97 eo Be & 8 so 8 % nh oe 8 wan 0 wos 8 8 De 8 2 6 6 Oe 426 0 a & BH 8 8 sia 0 33 50 3 RS &% 9 0 i eo on 8 oF 2 Be Sf 388 36h 8D x Bo 0 8 2 a8 2 7 6 Oe 100 wm on» 2 363 2 3s 58 7 98 1 bon & F 2 360 B ko & 7 BR 8 io 2 8 B bs BoM @ BF 8 160 3588 a 2 a 180 38888 5 37k SD 200 “a 2% Bok 3 a 8 2 33 OS OMS io a 5 ES 37 BOT 350 5 3 a MUM ON OF boo a 8 450 6 * 30 ws) gw 0 0} OB 500 om oR 3 hg B 00 9 2 Bim 0B 700 062 Be a oh OO 8 00 058 8B 300 5 % 283 ns BoB 9 1000 oO 36 28 2% 9 7 25 3 be 38 mn 3B * Power values below this polnt are greater than -955 3 267 a a 40 264 5 te 357 OT ry 251 37k ee 46 6 BO 8 20 396789588 3. THE SIGNIFICANCE OF A PRODUCT MOMENT F, 33. POWER TABLES in tomate a3 Power of ttest of r=Oatay =.10 i . a re 10.20.30 40.50 60.70 ee eee ee ‘ Bos 8 ee Fe 4s ¢ BE a % 2 i 5 2 6 rar ar re ae i 2 2 $ # FS RARE ES Ey 2B eB F uw 6 ee wy 8 we we ® x 8B SRE Ee BRS 8 pe eg B oF BRE & BS » a REE & Be Ye ego 8 By 4 a8 3 eB ou Re RR HS FS " Be BB 3st aw mo we oy ® pn 2 BO B35 6b SB % ER 2 2 ye Roe ga kB s eons Be 2% & & B® : eRe Be B28 2 2 3 % a5 4 2 wo 2 em 9 vw ee rR BERS eS ve aE s 2 Bag eh BS in Bo 3 in xe B&F ? eat 20°F BERS SE 1 yp = wm on ef ee BE @ 2% 8 Bm S m0 g 4 e m@ BBE ES Ss we @ 2 2 2 Rear 2s = a B @ £8 88 FB x a is & 30 aah 2B 2 oo a 3s 99 = Bo BEER ES on » BB BB & & Ke B Py Hy 2B BB gag & es = o 2 RP Ge hf ? 8 x = 2 moms ww a 3 ¥ RB eS BF ¥ oR gon & BOR RS B 2 a ce ee ome 3 BRB RS ws wm oe @ 2S ERS 2 om RE RS S oom oF RRR S St 7 Be RS3. THE SIGNIFICANCE OF A PRODUCT MOMENT F, Tobie 3.36 Power oft tet of = Oat a, =.01 2 @ o 2 o 03 3 3 93 3 o o om ob o o 0s BRASS 238 SS SBBSR RARE ho SRERE BLESS SITS +888 $8888 SRBSB SEVIS + S8eee % 88 3 % 7 9 9 3.3. POWER TABLES Table 3.3.4 (continued . te 1020.30 HO 450.60 «70 50 128 Dt 5 3h M88 3 38} 3388 56 BBD S89 8 36 0OCOK OO 60 30h & 320k 9G a sot ty BO 2 joo 50. % 2H hos 80 260k & 2 sg ca mm 05 hh 92 267 SSC 96 22S BH 100 269 120 Be oy 5 BB tho 27 Sg 160 203g 180 ios Be 200 wo 250 1 OB 8B 300 ig 20 350 38h BS 00 ng BOB 450 Ri 32% 500 7 60 is 59 700 3% 800 60 300 8 1000 2 on * Power values below this potnt are gr: jer than 995.2 3. THE SIGNIFICANCE OF A FRODUCT MOMENT F, Table335, Power oft test of ¢= Oates =.05 ° rp 118 20.30 wh 5040708090 8 wo 1 25 37h sh 3 566 Os re 2s 2 33 49 68 By 8 602 3% HH BO 8 58 2% lo 8 7 8 55308 2 4% & 8 9 OF 53208 3047 OSG 15 su 06 32 50 70 % hor or 53 0 8 7 sez 7 0 OO 18 687 9 Oh 19 456 OT ts 20 My 7 4 he 2 433 oF 6686 2 4237 8 6 8 7 3 m3 a? 49 Tl Bp 98 a oh 7 st 9098 5 36. Bm 8 6 ew oe 2 8 27 3 8 28 sh 58 8% 99 3 3678 5 8S 98 30 ne 3 355819 3B BG 2 a8 209k 97 3 jew 20 KT 3 yok % who 8 36 329 kh 7 35908 3 oO OB 9 6B 40 32 oh 42 joe 10h 50K a 27 = 10 eB 46 211027 hOB 48 2s 08s 3.3. POWER TABLES ‘Table 3:35 fcontinved) ® ' 10.20 30 40.50.60 50 78 7 52 59 887 By Se 88 56 6 87 38 8 8 58 ca 6% 9 99 ou oH 9 a mn 8 98 R non F % % 95 80 B96 2 % 97 8 e 8 2 3 8 % 5 8 100 8% 99 120 2 + vo 95 160 37 180 38 200 3 250, 300 350 feo 850 500 60 700 0 300 1000 * Power values below this point are greater then .995. 93Power of ttestof r= 0 3 THE SIGNIFICANCE OF A PRODUCT MOMENT r, Tobie. 2 8 85 7 57 mh 9088 a np 3 9 7 8 95 + n a7 97 mh 8 98 Bo 8 2 3 9 B85 99 5 OF 7 7 8 97 9 98 2 98 B 8 9 5 9 es 9 % 96 37 7 # 8 58 Fa 9 2 9 9 9 6 8 96 @ 8 97 8 87 6 8 oO 8 8 3.3. POWER TABLES Table 3.3.6 continued! ° re 10.20.30 40.50.60 50 235 M6 90 98 2 21 om 88 By 226 B RR 9 56 mm on BB 3 28 ts 75h 89 cy 2 hg a 2m M9 OF 6 2 50 896 2 2 2 8 9 % mh 8 88 80 3 8 8 98 a m8 88 cd m9 8 8 32 2 6 90 8 % % 8 9 8 100 2 & 2 99 120 3 on 140. 2 7 8 160 35 59 180 388896 3. THE SIGNIFICANCE OF A PRODUCT MOMENT F, Mlustrative Examples 3.1 A personality psychologist has performed an experiment in which he obtained paired measures on a sample of 50 subjects. One of these variables 1. the other a neurophysiotogical hence he posits r =.30 (PV =r? =.09). What is the power of the test of the significance ofr, he performs? His specifications are a,=.05, r=30, n=50. In Table 3.3.5 (for a; =.05), column r =.30, row n= 50, power =.57. ‘Thus, a significance test with 50 subjects at ana, =.05 criterion has not much, more than a 50-0 chance of rejecting the null hypothesis when the population r= 30. theory which leads to so nonobvious a prediction ‘would have formulated his null and alternate hypothesis F<0,H,:1= +.30) and, le jions unchanged, may have instead used a one-tailed significance criterion, thus ay=.05, r= 30, 9 =50, In Table 3.3.2 for a, =.05 (instead of Table 3.3.5 for a, =.05), column + =.30, row n = 50, power = .69. The use of a directional instead of a non- nal test under these conditions (of a, r, and n) would result in his of rejecting the null hypothes improved from .57 to .69. the is properly formula tables may be used in experimental planning for seeking an optimum. ‘Ths could include the decision as to whether to state the hypothesis direc tionally or nondirectionally and would lead to such comparisons as the above. If we take this to be the case in the above example, the psychologist would then need to decide whether, under the given conditions, the gain in power 3.3 POWER TABLES ” from .57 to .69 is worth forgoing the possibility of concluding that F is fe. This decision will be made, of course, on substantive and n 3.2 An educational psychologist is consulted by the dean responsible sr admission at a small college with regard to the desirability of supple- ing freshmen, X is not correlated wi ion with Y, if any, represents incremental validity beyond present mn practices. The decision is made that if r= 10, then it is worth adding (0 the selection procedure. Each annual freshmen class numbers about 500, The educational psychologist first seeks to determine power under ‘these conditions if the decision to proceed is made at the a, =.O1 anda; =.05 criteria. Her specifications are a,=.01, r=10, m=500, 05, r= n= 500, In Table 3.34 fora, = 01, with column = 10 and row n = 500, power = 37. Then in Table 3.3.5 (for a,=.05) for the same column and row, power = 61 The educational psychologist finds herself dis th these result since, even with the a; = .05 risk, she has only a three in five chance of. tecting r = .10. She checks the consequence of a = .10 (Table 3.3.6) for these conditions and finds power = .72, the same as for a, = .10 (Table 3.3.2). Thus, even if she were to use an a; = .10.c dean judge to be too large a ‘The psychologist considers an experimental plan which involves combin- ing the data for two successive years, so that a will equal about 1000. The conditions now are aol, 4-05, F 10, n= 1000, n= 1000, She uses Table 3.3.4 (for ay = th column r = .10 and row n = 1000, and finds power = .72. Then, she considers Table 3.3.5 (for a; = .05) and finds power = 89. She suggests to the dean that if two successive years?98 3. THE SIGNIFICANCE OF A PRODUCT MOMENT F, al year’s delay) and that if ion r= .10 ‘admissions can be used (resulting in an adk yn with ¥ in the popu- ove yield power values 3500) = 1000 S995 S995 5995 ifr is as large as 20, it hardly matters what alpha year to bring n from n may be in experimental 3.3. An industrial psychologist is asked to perform an the relationship between weekly wages (which vary as a fu and experience) and work output for a given job. The client to decide on wage and qualification poli of the situation are such that if am a population standard deviations of X and Y are needed, Assume values are available, and are oy = 8 and oy = 80. Thus, from formula ( ox ap 8 FB Fa) = M0 ‘Thus, the specifications are a=0l, F=40, n=120, 34. SAMPLE SIZE TABLES 5 34. SAMPLE Size TaBLes ‘The tables in this section list values for the significance criterion, the =ES) to be detected, and the desired power. The number of paired observa- (X, Y) required in the sample, n, is then completely determined. These tables are designed primar e planning of experiments, during 1a on sample size is made. As already noted (Set and ES are formulated, attention to the question: How much power (or le Type 1 isk) is desired? “The use of these tables is subject to the same assumptions of normality ty as those applying to the power tables in the previous section (see Section 3.1). Tables give values for a, r, and desired power: 1. Significance Criterion, a. The same values of a are provided as for = 20), a; =.01 (a, =.005), ‘ 2. Effect Size, ES. The population r serves as ES. For problems in Which the effeet size is expressed as a regression coefficient, it is converted to by means of formula (3.1.3). The same provision for r is made as in the power tables: .10 (10) .90. For F values other than the nine provided, the ‘approximation? G41) where a, atr-= 10 (obtained from the table), and z A check on formula (3.4. 025,050, and 01 was made by 29100 3. TME SIGNIFICANCE OF A PRODUCT MOMENT F, the nontabled r value. The constant .100 is the value of the z transformation for equalizing a and b risks, and the rationale of a proposed convention of desired power of .80, see Section 2.4 ‘Summarizing the use of the m tables which follow, the investigator finds ‘chapter the Fisher 2 value for his r, and enters it and 9 49 in formul to compute Hunter, and Urry (1976), Raju, Edwards, and LoVerde (1985), Alexander, Carson, Alliger, and Barrett (1985), and their references. lustrative Examples 34. Reconsider the conditions of example 3.1, in which a personality psychologist is concerned with the relationship between a neurophysiological measure and a questionnaire score on extraversion. As originally described, he wishes to detect an ES of r = .30at a, =.05. His plan to use n = 50 subject resulted ly consider value too low. Assume that he wishes power to be at the conventional £0 value and wants to know the sample size necessary for this. The specifications are =.05, r=.30, power =.80. In Table 3.4.1 for a, =.05, column ¢ = .30, row power =.80, he finds ‘n= 85. Thus, with these specifications of a and r, he will require 85 subjects to achieve power of 80. What if this psychologist had instead an between the (wo variables, r= .50 (our opel ES), using the same a and power: a,=.05, F=.50, power =.80. ipated a strong relationship jonal definition of a large i eee BBS Sesh] z Ske RBa3 beh) F 3.4. SAMPLE SIZE TABLES 204 sar 8 8 moo mean novels102 3 TIE SIGNIFICANCE OF A PRODUCT MOMENT F, Table 3.4.1(continued) ee a ee ra wt 5 50 oo mw 8 50 mr 8. 7 28 Ce a ) 7 mo 8 237103 CD 7 ra ss2 280113 2TH 8 20 10328715 8 85 we 320930 8 20 ver 9651588 Bt 8 95 W713 468902” " 9 700588841378 8 2p = la, = 005) ee a a = ~o @ 8 6 58 4 3 ‘50 a2 2 6 0 7 6 4 60 s 28 w 2 8 68 § 28 Cr 70 7 7 2 6% wo 7 6§ 78 % 4 2 7 nN 8 6 20 & 6 2% 6 2 8 6 35 7 8 2 2 w w 6 0 1300 Teta ro er a a 9 1 105 st other extreme of our oper (@ small ES), keeping t a,=.05, r= InTable 3.4.1 for a; =.05, fore Again we see how crucis size, Over our range from large to medium to small ES, from 28 to 85 to 783. Reversing the argumer about sample size implies some value for r Many experiments are undertaker 3.4. SAMPLE SIZE TABLES 103, very large ES, since presumably he would not bother to do the experiment if he thought he had a low probe Another point incidental r scale: At any given desired power level, ., 8 varies approximately as the square of planning may involve preparing tabl he n's necessary under varying. ‘example is shown in Table 3.4.2. Tableaaz ‘An Example of a Sample Size Planning Table Power 80 0 ES=7 es=6 20 2.30 40 MO 201 br 08 «DOSS nT cn ee ) 82 m3 4 7 8D 237 2712568965158 183 toe 5525911 ‘An experimenter th such a table before him is in a position to make a choice of an experimental plan which is consonant both with his knowledge and informed hunches of his substantive field and with statistical analytic issues. Thus, he might decide after reviewing the table that he is prepared to ‘expend the money and effort involved in running 85 or 86 subjects, but would prefer the 85 subjects called for when he posits r = .30at power = .80 for a; = .05 rather than the 86 called for when, with more stringent a; = -O1 and greater power = .90, he must posit r= .40; he may not consider the risk of assuming F so high worth the a and power advantage. He may consider least desirable the plan which calls for n = 82, which allows for a distinctly smaller ES or r =.20, but at the cost of less power (70) and a large, one- tailed Type I risk (ay = .10) or equivalently an even larger two-tailed Type I risk (a, = 20),104 3. THE SIGNIFICANCE OF A PRODUCT MOMENT F, 35 A social psychologist is planning an experiment in which college students selected with regard to @ personality questionnaire measure ( will be subjected to various tude change. Before that it be demonstrable that his measure (Y) nor ‘measure of social desiral of having to prove the m However, instead of demanding of himse = 0, he may r isto an attempt to dem ‘which is probably sypothesis that = 0, relationship in behavioral 5). He may consider an r no greater absolutely than 10 as meeting this criterion in this context. It now becomes possible to mount an experiment from which the cone Lr small may properly be drawn. He sets up as the ES he wishes to detect r =.10. To assure himself a good chance of detecting this value ifit should obtain, he demands prepared to run a large risk that y setting a, =.10, He now seeks the a which ions, which, summarized, are power =.90, Table 3.4.1 for a, =.05 (a, = 10), for column r = .10, row power =.90, yields n = 854. (Since both X and ¥ are obtained by group procedures, this large sample may well be within his resources.*) ‘Assume that the data are collected and he finds F,=.04, which is not significant at a, =.10, He can conclude that the popu zero. This is because, if the popul as large as .10, (b= 1 — power = 1-90 =.10) that he would have failed to find r, signific ccan be organized which can accomplish what is In this way, experim really sought when we done inst set up this small value as the ES which has enough power to de depend on this ¥ being the dependent covariate oF “adju ity control measure (X) would be the variable, See Chapter 9. 3.5. THE USE OF THE TABLES FOR SIGNIFICANCE TESTING OF 10s 3.6 A research cl rate of decay of the patient groups. An issue amount of confusion as rated by trained observers (C). In the context of the study, she decides that if the proportion of variance in OR associated with C she wants to perform a preliminary experiment at the level which will have power of .90 to detect PV =r =.10, ES =r = y.10 = .32, a value not provided in Table 3.4.1. She thus takes recourse to formula (3.4.1), which requiresn 9 (from Table 3.4.1 fora, ~ 10) and z, the Fisher z transformation of an r of .32, The latter is found in Table 4.2.2 of the next chapter to be 2 =.332. a io is found in Table 3, ‘column r = .10, row power = .90, as 854, Entering these val- 100? nts) a2 ‘Thus, if she is to have a .90 probability of detecting r = 32 (PV = ° = a slightly larger value. 3.5. THe Use oF THE TABLES FOR SIGNIFICANCE TESTING OF F ‘mined, ‘The power tabi column, the sampler, necessary to the sample size of the row in which (of either sign) for nondi sign in directional (on 85 appear in some statistical texts, but provide many more and for. this chapter (Tables 3.3.1-3.3.5) contain, in the F, in the significance level appears. The f, is taken as absolute ss, and as of the appropriate sailed) tests. These values are of the same kind ues, both for106 3. THE SIGNIFICANCE OF A PRODUCT MOMENT Fr, lustrative Examples 3.7. Consider the analysis ofthe data arising from the experiment relating ‘extraversion to a neurophysiological measure given in example 3.1. Assume that the data have been collected as planned, and the sample r, is found to equal —.241. The specifications for the significance test are a =05, n=50, = 241. Table 3.3.5 (for a, = 05) is used for n= 50, and the F, value is found to equal .279. Since .241 (the sign is ignored because the twortaled) is smaller than r,, the null hypothesis is not rejected. io emp 22, ios ie vay. ak predict freshman grade point average is prior to data collect a, =.05 and n= 500. When the data are collected, found to equal .136. Thus, a,=.05, 1 =500, In Table 3.35 (for ay =.05) at n ure and grade point average. 3.9. The industrial psychologist in example 3.3 designed an experiment using 120 paired observations to determine whether a regression coefficient of wages on work unit output was significant at a, =.01. In that example, it was demonstrated how the regression coefficient could be converted to an ‘rand the tables of this chapter could be applied. In planning, his alternate hypothesis was r=.40. When the sample data were analyzed, the r, was found to equal +.264. The following specifications, then, are the conditions for his test of the null hypothesis that population r = 0: ay=.01, m= 120, r= +266. He uses Table 3.3.1 (fora =.01) at row n = 120 and finds that, = 212. Since his sample F, exceeds the a, =.01 criterion value 212, and is of the proper sign (since the test was directional), the null hypothesis is rejected Note that rejecting Hy: = O means rejecting Hy: B= thecorrela- tion is not zero, nether is the regression coefficient (as discussed in Section 3.1). 'Note, £00, that although the sample fof 264 is much smaller than the anticipated population r of 40 which figured in the experimental planning, itis nevertheless significantly different from zero. (This comes about because 3.8. THE USE OF THE TABLES FOR SIGNIFICANCE TESTING OF € 107 the power of the experime rejection of is is False (subject of course to the Type I ri this regard. ). See Cohen (1973) in

2 Capitulos Cohen

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

2 Capitulos Cohen

Cargado por

Copyright:

Formatos disponibles

También podría gustarte