Chi Square

1
Chi-Square
Heibatollah Baghi, and
Mastee Badii
2
Different Scales, Different Measures of
Association
Scale of Both
Variables
Measures of
Association
Nominal Scale Pearson Chi-Square:
2

Ordinal Scale Spearmans rho

Interval or Ratio
Scale
Pearson r

3
Chi-Square (
2
) and Frequency Data
Up to this point, the inference to the population has been
concerned with scores on one or more variables, such as
CAT scores, mathematics achievement, and hours spent on
the computer.
We used these scores to make the inferences about
population means. To be sure not all research questions
involve score data.
Today the data that we analyze consists of frequencies; that
is, the number of individuals falling into categories. In
other words, the variables are measured on a nominal
scale.
The test statistic for frequency data is Pearson Chi-Square.
The magnitude of Pearson Chi-Square reflects the amount
of discrepancy between observed frequencies and expected
frequencies.
4
Steps in Test of Hypothesis
1. Determine the appropriate test
2. Establish the level of significance:
3. Formulate the statistical hypothesis
4. Calculate the test statistic
5. Determine the degree of freedom
6. Compare computed test statistic against a
tabled/critical value
5
1. Determine Appropriate Test
Chi Square is used when both variables are
measured on a nominal scale.
It can be applied to interval or ratio data that have
been categorized into a small number of groups.
It assumes that the observations are randomly
sampled from the population.
All observations are independent (an individual
can appear only once in a table and there are no
overlapping categories).
It does not make any assumptions about the shape
of the distribution nor about the homogeneity of
variances.
6
2. Establish Level of Significance
is a predetermined value
The convention
= .05
= .01
= .001
7
3. Determine The Hypothesis:
Whether There is an Association
or Not
H
o
: The two variables are independent
H
a
: The two variables are associated

8
4. Calculating Test Statistics
Contrasts observed frequencies in each cell of a
contingency table with expected frequencies.
The expected frequencies represent the number of
cases that would be found in each cell if the null
hypothesis were true ( i.e. the nominal variables
are unrelated).
Expected frequency of two unrelated events is
product of the row and column frequency divided
by number of cases.
F
e
= F
r
F
c
/ N

9

=
e
e o
F
F F
2
2
) (
_
10

=
e
e o
F
F F
2
2
) (
_
11
5. Determine Degrees of
Freedom
df = (R-1)(C-1)
12
6. Compare computed test statistic
against a tabled/critical value
The computed value of the Pearson chi-
square statistic is compared with the critical
value to determine if the computed value is
improbable
The critical tabled values are based on
sampling distributions of the Pearson chi-
square statistic
If calculated _
2
is greater than _
2
table
value, reject H
o
13
Example
Suppose a researcher is interested in voting
preferences on gun control issues.
A questionnaire was developed and sent to a
random sample of 90 voters.
The researcher also collects information
about the political party membership of the
sample of 90 respondents.
14
Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f
row
Democrat 10 10 30 50
Republican 15 15 10 40
f
column
25 25 40 n = 90
15
Contingency Table
row
f
column
25 25 40 n = 90
16
Contingency Table
row
f
column
25 25 40 n = 90
R
o
w

f
r
e
q
u
e
n
c
y

17
Contingency Table
row
f
column
25 25 40 n = 90
Column frequency
18
1. Determine Appropriate Test
1. Party Membership ( 2 levels) and Nominal
2. Voting Preference ( 3 levels) and Nominal
19
2. Establish Level of Significance
Alpha of .05
20
3. Determine The Hypothesis
Ho : There is no difference between D & R
in their opinion on gun control issue.

Ha : There is an association between
responses to the gun control survey and the
party membership in the population.
21
row
Democrat f
o
=10
f
e
=13.9
f
o
=10
f
e
=13.9
f
o
=30
f
e
=22.2
50
Republican f
o
=15
f
e
=11.1
f
o
=15
f
e
=11.1
f
o
=10
f
e
=17.8
40
f
column
25 25 40 n = 90
22
row
Democrat f
o
=10
f
e
=13.9
f
o
=10
f
e
=13.9
f
o
=30
f
e
=22.2
50
Republican f
o
=15
f
e
=11.1
f
o
=15
f
e
=11.1
f
o
=10
f
e
=17.8
40
f
column
25 25 40 n = 90
= 50*25/90
23
row
Democrat f
o
=10
f
e
=13.9
f
o
=10
f
e
=13.9
f
o
=30
f
e
=22.2
50
Republican f
o
=15
f
e
=11.1
f
o
=15
f
e
=11.1
f
o
=10
f
e
=17.8
40
f
column
25 25 40 n = 90
= 40* 25/90
24
8 . 17
) 8 . 17 10 (
11 . 11
) 11 . 11 15 (
11 . 11
) 11 . 11 15 (

2 . 22
) 2 . 22 30 (
89 . 13
) 89 . 13 10 (
89 . 13
) 89 . 13 10 (
2 2 2
2 2 2
2
= _
= 11.03
25
5. Determine Degrees of
Freedom
df = (R-1)(C-1) =
(2-1)(3-1) = 2
26
6. Compare computed test statistic
against a tabled/critical value
= 0.05
df = 2
Critical tabled value = 5.991
Test statistic, 11.03, exceeds critical value
Null hypothesis is rejected
Democrats & Republicans differ
significantly in their opinions on gun
control issues
27
SPSS Output for Gun Control
Example

Chi-Square Tests
11.025
a
2 .004
11.365 2 .003
8.722 1 .003
90
Pearson Chi -Square
Li kel i hood Rati o
Li near-by-Li near
Associ ation
N of Val id Cases
Value df
Asymp. Si g.
(2-si ded)
0 cell s (.0%) have expected count l ess than 5. The
mini mum expected count i s 11.11.
a.
28
Additional Information in SPSS
Output
Exceptions that might distort
2

Assumptions
Associations in some but not all categories
Low expected frequency per cell
Extent of association is not same as
statistical significance

Demonstrated
through an example
29
Another Example Heparin Lock
Placement
Complication Incidence * Heparin Lock Placement Time Gr oup Crosstabulation
9 11 20
10.0 10.0 20.0
18.0% 22.0% 20.0%
41 39 80
40.0 40.0 80.0
82.0% 78.0% 80.0%
50 50 100
50.0 50.0 100.0
100.0% 100.0% 100.0%
Count
Expected Count
% within Heparin Lock
Placement Time Group
Count
Expected Count
Count
Expected Count
Had Compilca
Had NO Compilca
Complication
Incidence
Total
1 2
Heparin Lock
Total
from Polit Text: Table 8-1
Time:
1 = 72 hrs
2 = 96 hrs
30
Hypotheses in Heparin Lock Placement
H
o
: There is no association between
complication incidence and length of
heparin lock placement. (The variables are
independent).
H
a
: There is an association between
complication incidence and length of
heparin lock placement. (The variables are
related).
31
More of SPSS Output
32
Pearson Chi-Square
Pearson Chi-Square =
.250, p = .617
Since the p > .05, we fail to
reject the null hypothesis
that the complication rate
is unrelated to heparin
lock placement time.
Continuity correction is
used in situations in which
the expected frequency for
any cell in a 2 by 2 table is
less than 10.
33
More SPSS Output
Symmetric Measures
-.050 .617
.050 .617
-.050 .100 -.496 .621
c
-.050 .100 -.496 .621
c
100
Phi
Cramer's V
Nominal by
Nominal
Pearson's R Interval by Interval
Spearman Correlation Ordinal by Ordinal
N of Valid Cases
Value
Asymp.
Std. Error
a
Approx. T
b
Approx. Sig.
Not assuming the null hypothesis.
a.
Using the asymptotic standard error assuming the null hypothesis.
b.
Based on normal approximation.
c.
34
Phi Coefficient
Pearson Chi-Square
provides information
about the existence of
relationship between 2
nominal variables, but not
about the magnitude of the
relationship
Phi coefficient is the
measure of the strength of
the association

Symmetric Measures
-.050 .617
.050 .617
-.050 .100 -.496 .621
c
-.050 .100 -.496 .621
c
100
Phi
Cramer's V
Nominal by
Nominal
N of Valid Cases
Value
Asymp.
Std. Error
a
Approx. T
b
Approx. Sig.
Not assuming the null hypothesis. a.
Using the asymptotic standard error assuming the null hypothesis. b.
Based on normal approximation. c.
N
2
_
| =
35
Cramers V
When the table is larger than 2
by 2, a different index must be
used to measure the strength of
the relationship between the
variables. One such index is
Cramers V.
If Cramers V is large, it means
that there is a tendency for
particular categories of the first
variable to be associated with
particular categories of the
second variable.
Symmetric Measures
-.050 .617
.050 .617
-.050 .100 -.496 .621
c
-.050 .100 -.496 .621
c
100
Phi
Cramer's V
Nominal by
Nominal
N of Valid Cases
Value
Asymp.
Std. Error
a
Approx. T
b
Approx. Sig.
) 1 (
2
=
k N
V
_
36
Cramers V
When the table is larger than 2
by 2, a different index must be
used to measure the strength of
the relationship between the
variables. One such index is
Cramers V.
If Cramers V is large, it means
that there is a tendency for
particular categories of the first
variable to be associated with
particular categories of the
second variable.
Symmetric Measures
-.050 .617
.050 .617
-.050 .100 -.496 .621
c
-.050 .100 -.496 .621
c
100
Phi
Cramer's V
Nominal by
Nominal
N of Valid Cases
Value
Asymp.
Std. Error
a
Approx. T
b
Approx. Sig.
) 1 (
2
=
k N
V
_
Number of
cases
Smallest of number
of rows or columns

37
Take Home Lesson
How to Test Association between
Frequency of Two Nominal Variables

Chi Square

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Chi Square

Cargado por

Copyright:

Formatos disponibles

1

También podría gustarte