Está en la página 1de 24

The 2 Distribution

Applied Statistics and Probability

2014/2015

Ira M. Anjasmara

Jurusan Teknik Geomatika

Inferences About Variances


Until now we have conducted hypothesis tests on sample and population
means. What if we want to test variances (and standard deviations)?
This is a common problem in surveying, because the standard deviation
is a direct measure of the precision of an experiment. Typically, one
would:
set a precision requirement for a measurement (the population
standard deviation);
check whether that precision is being met in practice (the sample
standard deviation).
Hence, we wish to make inferences about the population variance using
the sample variance. To do this we use the 2 (chi-squared) test.

Applied Statistics and Probability

2/24

The 2 Distribution

Inferences About Variances


Samples of size n are drawn repeatedly from a normally-distributed
population with variance 2 , and a sample variance, s2 , is computed for
each sample. The resulting distribution of variances is not normally
distributed like the sampling distribution of the means, but forms a 2
distribution, where:
s2
2 = 2
(1)

and, as before, = n 1, the number of degrees of freedom.


Like the t distribution, the 2 distribution has a set of distributions for
each degree of freedom (). It also has the following properties:
it is asymmetric;
there are no negative values of the test statistic;
it approaches a normal distribution as the degree of freedom increases
to infinity.
Applied Statistics and Probability

3/24

The 2 Distribution

Inferences About Variances


The probability density function for the 2 distribution is:
f (x, ) =

/2

x
x /2 1 e /2 , x > 0
( 2 )

(2)

where is the gamma function (see standard maths texts).

Applied Statistics and Probability

4/24

The 2 Distribution

The 2 Table
The table gives a value of 2, corresponding to the area in the upper tail
(), for a specific degree of freedom (). The tables for the 2
distribution look something like this:

The numbers in the first column give the degrees of freedom; the numbers
in the first row represent the area of the upper tail.
Applied Statistics and Probability

5/24

The 2 Distribution

The 2 Table
The numbers in the main body of the table give the 2 -score
corresponding to those particular values of and , i.e., 2, .
The highlighted value in the table gives the 2 -score for 14 degrees of
freedom, and an area in the upper tail of 0.1 (10%), i.e.: 214,0.1 = 21.0642.

Applied Statistics and Probability

6/24

The 2 Distribution

Hypothesis Testing
The procedure for hypothesis testing of variances follows the same 8-step
procedure as for a t distribution, except we use the following test statistic:
2 =

s2
2

(3)

and we use the following critical values:


1-tailed test: use 2, in place of t, as the critical value;
use 2,/2 and 2,1/2 in place of t,/2 since the distribution is not
symmetric.
Remember, we are now testing variances, so the null and alternative
hypotheses must be framed as, e.g., for a 1-tailed test:
H0 : 2 02
Ha : 2 < 02
where 02 is the numeric value of the population variance.
Applied Statistics and Probability

7/24

The 2 Distribution

Example

A manufacturer of handheld GPS receivers claims they have a horizontal


precision of 50 m. In a student project, 30 observations at a standard
survey mark yielded a standard deviation of 42.4 m relative to the mark.
Test at the 0.05 level of significance, whether the receiver differs from the
manufacturers claims.
Take 50 m as the population standard deviation. We therefore want to test
whether the sample standard deviation from the new data (42.4) indicates
that this value is incorrect. We have: = 50, s = 42.4, n = 30, = 0.05.

Applied Statistics and Probability

8/24

The 2 Distribution

Example
Step 1
Formulate alternative hypothesis: Ha : 2 6= 2500
i.e., test whether the new survey refutes the manufacturers claims.
Formulate null hypothesis: H0 : 2 = 2500
i.e., assume the given precision is correct, and the sample data are
misleading.
Step 2
Determine number of tails.
This is a 2-tailed test, because the null hypothesis has an equality.

Applied Statistics and Probability

9/24

The 2 Distribution

Example
Step 3
Determine level of significance and degrees of freedom:
We are told that the significant level is = 0.05.
From n=30, we get v=30-1=29
Step 4
Determine the critical value of 2 :
We have a 2-tailed test, so we need to find ,/2 = 229,0.025 and
,1/2 = 229,0.975
From the 2 distribution table, we have:
229,0.025 =45.7222
229,0.975 =16.0471
Applied Statistics and Probability

10/24

The 2 Distribution

Example
Step 5
Determine the rejection region:
The null hypothesis will be rejected if 2 6= 2500, so we have the following
situation:

Since we are testing 2 6= 2500, we are at both ends of the 2


curve,therefore the rejection regions are 2 < 16.0471 and 2 > 45.7222.
Applied Statistics and Probability

11/24

The 2 Distribution

Example
Step 6
Determine the test statistic (2 -score) from the sample data:
2 =

s2
29 42.42
=
= 20.854
2
502

(4)

Step 7
Compare the test statistic against its critical value:
16.0471 < 20.854 < 45.7222, therefore 2 , and hence s2 , the sample
mean, do not lie in the rejection region.
Hence, we do not reject H0 at the 0.05 significance level.
Step 8
Our sample measurement is compatible with the supposed population
mean at 95% confidence level. Therefore it follows that the measured
precision of the receivers agrees with the manufacturers claims, at this
level.
Applied Statistics and Probability

12/24

The 2 Distribution

Confidence Intervals

If we are given the sample variance (s2 ), the number of observations in the
sample (n), and a significance level (), we can determine a range of
values that the population variance may assume.
The (1 )% confidence interval for the population variance is given by:
s2
s2
2

2,/2
2,1/2

(5)

i.e., there is a 1 probability that the population variance lies in this


range.

Applied Statistics and Probability

13/24

The 2 Distribution

Example
A BC-2 analytical stereo plotter is used to repeatedly measure the height
of a control point. A sample of 15 measurements provides a variance of
0.0625 m2 . Develop a 95% confidence interval for the population standard
deviation of the control point height.
We have n = 15 ( = 14), s2 = 0.0625 m2 , = 0.05. So,
2,/2 = 214,0.025 = 26.1190 and 2,1/2 = 214,0.975 = 5.62872
Therefore,

14 0.0625
14 0.0625
2
26.1190
5.62872
0.0335 2 0.1555m2
0.183 2 0.394m

Applied Statistics and Probability

14/24

The 2 Distribution

P-Values

As for the t distribution, determining P-values for the 2 distribution


requires the use of a computer program.
Microsoft Excel has the function CHIDIST to work out P-values for the 2
distribution, where:
p(2 > 20 ) = CHIDIST(20 , )
for some numerical value 20 .

Applied Statistics and Probability

15/24

The 2 Distribution

Example
A manufacturer of handheld GPS receivers claims they have a horizontal
precision of 50 m. In a student project, 30 observations at a standard
survey mark yielded a standard deviation of 42.4 m relative to the mark.
What is the P-value for these data?
=29
2
2 = s
=
2

2942.42
502

= 20.854

Using Excel, we find:


P=p(s 42.4)=p(2 20.854) =1 - CHIDIST(20.854,29)=0.1356
(Note that we are using the left-hand tail of the 2 curve because
s2 < 2 .)

Applied Statistics and Probability

16/24

The 2 Distribution

Goodness of Fit Test


A goodness of fit test focuses on the differences between observed and
expected frequencies:
this difference is measured by the 2 statistic.
The test is used to determine if a hypothesised probability distribution
provides a good description of a particular population of interest. It is
applicable to nominal and ordinal data, as long as:
1

the data are stated as frequencies;

the classes are mutually exclusive;

there are not many classes which have a frequency below 5.

We consider the case where each element of a population is assigned to


one, and only one, of several categories, or classes:
this is called a multinomial distribution.
Applied Statistics and Probability

17/24

The 2 Distribution

Goodness of Fit Test


The 8 steps to hypothesis testing are followed fairly similarly to previously.
We first form the two hypotheses:
H0 is that the observed and expected frequencies are equal;
Ha is that the observed data differs from what is predicted.
We then obtain a sample of n items from the population, and assign each
item to one of k classes. The degrees of freedom are then given by:
v =k1
We then record the observed frequency (Oi ) for each class, and then
determine the expected frequency (Ei ) for each class:
Ei = (percentage in class) (sample size)
In the case where the observed and expected frequencies are equal, the
sample data provide a perfect fit to the hypothesised distribution.
Applied Statistics and Probability

18/24

The 2 Distribution

Goodness of Fit Test

If the sample size is large, then the statistic:


2 =

k
X
(Oi Ei )2
i=1

Ei

has a 2 -squared distribution, where i represents a class. The requirement


of a large sample is that:
Ei 5, i

Applied Statistics and Probability

19/24

The 2 Distribution

Example

Patients arriving at hospital are placed in one of 3 classes (k = 3): stable,


serious, or critical. Records show that usually 50% are stable (p1 ), 30%
are serious (p2 ), and 20% are critical (p3 ). However, a recent sample of
200 patients showed:
stable: O1 = 98
serious: O2 = 48
critical: O3 = 54
Do the sample data refute the hypothesised proportions at = 0.05?

Applied Statistics and Probability

20/24

The 2 Distribution

Example
Step 1
H0 : p1 =0.5, p2 =0.3, p3 =0.2
Ha : p1 =
6 0.5, p2 6=0.3, p3 6=0.2
Step 2
Determine number of tails. This is a 2-tailed test, because the null
hypothesis has an equality.
Step 3
Determine level of significance and degrees of freedom:
Were told that the significance level is = 0.05.
We have 3 classes, so = 3 1 = 2.
Applied Statistics and Probability

21/24

The 2 Distribution

Example

Step 4
Determine the critical value of 2 :
We have a 2-tailed test, so we have:
2,/2 = 22,0.025 = 7.378
2,1/2 = 22,0.975 = 0.051
Step 5
Determine the rejection region: We are at both ends of the 2 curve,
therefore the rejection regions are 2 < 0.051 and2 > 7.378.

Applied Statistics and Probability

22/24

The 2 Distribution

Example
Step 6
Determine the test statistic (2 -score) from the sample data. First,
determine the expected frequencies:
E1 = 0.5 200 = 100
E2 = 0.3 200 = 60
E3 = 0.2 200 = 40
These are all 5, fulfilling the large sample requirement.
Then, determine the test statistic:
2

3
X
(Oi Ei )2
i=1

Ei

Applied Statistics and Probability

(98 100)2 (48 60)2 (54 40)2


+
+
= 7.34
100
60
40

23/24

The 2 Distribution

Example

Step 7
Compare the test statistic against its critical value:
0.051 < 7.34 < 7.378, therefore 2 does not lie in the rejection region.
Hence, we do not reject H0 at the 0.05 significance level.
Step 8
Our sample data are compatible with the expected proportions at 95%
confidence level. Therefore it follows that the expected proportions of
patient arrivals do not need to be changed, at this level.

Applied Statistics and Probability

24/24

The 2 Distribution