Nonparametric Statistics

Nonparametric Statistics
I. Introduction 1. Why Not Used All the Time? 2. Relation to Parametric Tests Chi Square 1. One-Sample Case 1. Research Question 2. Hypotheses 3. Assumptions 4. Decision Rules 5. Computation 6. Decision 2. Two-Sample Case - [Minitab] [Spreadsheet] 1. Research Question 2. Hypotheses 3. Assumptions 4. Decision Rules 5. Computation 6. Decision
II.
Homework
I. Introduction
Nonparametric tests are sometimes called distribution free statistics because they do not require that the data fit a normal distribution. More generally, nonparametric tests require less restrictive assumptions about the data. Another important reason for using these tests is that they allow for the analysis of categorical as well as rank data. 1. Why Not Used All the Time? Since nonparametric tests require fewer assumptions and can be used with a broader range of data types, the question becomes, "Why not use them all of the time?" Parametric tests are often preferred because: 1. They are robust. 2. They have greater power efficiency, in other words, they have greater power relative to the sample size. 3. They provide unique information (e.g., the interaction in a factorial design).
4. Parametric and nonparametric tests often address two different types of questions.
2. Relation to Parametric Tests The Summary of Statistical Tests should help put into perspective where nonparametric tests fit into what we have learned. For example, we have already learned about the binomial test for the simplest case of nominal data and Spearman's Rho for correlations involving rank data. In this unit, we will learn about the chi-square test. The other tests listed in the table (that we have not yet covered) are beyond the scope of the course. It is important to note that even with metric data, if assumptions are badly violated, nonparametric tests are likely to be employed.
II. Chi Square

This statistic is used to test expected versus observed frequencies. There are two situations in which it is used. 1. One Variable (or Sample) Case This is sometimes called the goodness of fit test. Consider an example. 1. Research Question Do people have a preference for movie type? 2. Hypotheses
In words: HO The observed distribution fits the expected or, in other words, there is no preference. HA The observed distribution does not fit that expected (there is a preference).
2. Notice that there is no mention made of parameters. 3. Assumptions 1. The sample is chosen randomly.
2. The scores are independent (i.e., each subject is allowed only one preference). 3. The null hypothesis. 4. Decision rules Let c equal the number of columns. In this case, there are four preferences or columns. Thus, df=c-1 or 4-1=3 and with an level of .05 the critical value of chi square is 7.82 (see table). If x2obs7.82, reject Ho. If x2obs<7.82, do not reject Ho. 5. Computation The appropriate descriptive statistic is the percentages of people prefering each type of movie. If it looks like these percents are worthy of additional analysis, we must first determine the expected frequencies. If we are asking folks which of four movie types they prefer and there is no preference, we would expect 25% to prefer each type. Let:
Ej = the Expected frequency in the j-th column. Oj = the Observed frequency in the j-th column. In our example, j = the number of types of movies.
6. Then:
7. 8. Now let's consider the following data:

Comedy Horror Drama Sci fi Expected Observed % 25 35 35 25 30 30 25 20 20 25 15 15 as %s so n=100
3. Substituting the numbers in the formula gives:
4.
5.
6. 7. 6. Decision Since x2obs (10.00) > x2crit (7.82), we reject Ho and conclude that folks do have a preference for which type of movie they like best. They like comedy the best and sci fi the least. 2. Two Variable (or Sample) Case - [Minitab] [Spreadsheet] This test goes by several names. It is most commonly called the Pearson Chi Square, but is sometimes called a test of independence between two variables or crosstabs. Consider the following data (called a contingency table) on drug usage that I collected when I was a student in college.
Contingency Table
Categories
1-3
Frequency of Marijuana Use

< 3 times/week 3 times/week 26 6 Total 32
of Other 4-6 Drugs Tried

Total
17 43
25 31
42 74
3. It looks like folks that smoked marijuana more frequently also tried more categories of other drugs. 6. Research Question Is frequency of marijuana smoking related to number of other drugs tried? 7. Hypotheses
In words: HO There is no relationship (or contingency) between the two variables, that is, they are independent. HA The two variables are related.
4. Again, notice that there is no mention made of parameters. 3. Assumptions 0. The individuals in each sample are chosen randomly. 1. The scores are independent (i.e., each subject fits in only one cell of the table). 2. For a 2x2 table, all expected cell frequencies should be at least equal to 10 (for larger tables, this value is 5). 3. The null hypothesis. 4. Decision rules Again, let c equal the number of columns. Since we are also considering another variable, let r equal the number of rows. Thus, df=(c-1)(r-1) or (2-1)(2-1)=1 and with an level of .05 the critical value of chi square is 3.84 (see table). If x2obs3.84, reject Ho. If x2obs3.84, do not reject Ho. 5. Computation First we must determine the expected frequencies. Let:
Ejk = the expected frequency of the cell defined by the j-th column and the k-th row. Ojk = the observed frequency of the cell defined by the j-th column and the k-th row. Where j = # columns and k = # rows.
5. 6. And:
7. Note, a helpful check is that the sum of the expected cell frequencies is equal to N, that is:
8. 9. Then:
10. 11. So, let's compute the Ejks for the data above.
Contingency Table
1-3 Categories of Other Drugs Tried 4-6 Total
Frequency of Marijuana Use

< 3 times/week 3 times/week 26 (18.59) 17 (24.41) 43 6 (13.40) 25 (17.59) 31 Total 32 42 74
12. To be clear, E11 = (32*43)/74 = 18.59 and checking our work, 18.59 + 13.40 + 24.41 + 17.59 = 73.99 74. 13. So 6/32 or about 19% of folks who had tried 1-3 other drugs smoked marijuana frequently whereas 25/42 or about 60% of folks who had tried 3-6 other drugs smoked frequently. These percentages are the relevant descriptive statistics that give us the reason for performing the chi square test. 14. Substituting the values in the formula gives:
15.
16.
17.
18. 6. Decision Since x2obs (12.4) > x2crit (3.84), we reject Ho and conclude that frequent users of marijuana are more likely to have tried more categories of other drugs.
Copyright 1997-2009 M. Plonsky, Ph.D. Comments? mplonsky@uwsp.edu.

Nonparametric Statistics

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Nonparametric Statistics

Cargado por

Copyright:

Formatos disponibles

Nonparametric Statistics

II. Chi Square

7. 8. Now let's consider the following data:

3. Substituting the numbers in the formula gives:

Frequency of Marijuana Use

of Other 4-6 Drugs Tried

Frequency of Marijuana Use

También podría gustarte