Está en la página 1de 2

STAT6202 Exercises 3

1. The following data are the average hourly wage in pounds, x, and the quit rate, y, (the number of em-
ployees per 100 who left jobs during the year) for 15 different industries. (Source: Economic Enquiry,
1986.)
x 8.20 10.35 6.18 5.37 9.94 9.11 10.59 13.29 7.99 5.54 7.50 6.43 8.83 10.93 8.80
y 1.4 0.7 2.6 3.4 1.7 1.7 1.0 0.5 2.0 3.8 2.3 1.9 1.4 1.8 2.0

(a) Draw a scatter diagram of y against on x. Label the axes informatively and write a caption.
(b) Calculate the summary statistics x̄, ȳ, sx , sy and the correlation coefficient rxy . Comment on what
the value of rxy tells you.
(c) Calculate the slope b, intercept a and residual standard deviation sres for the least squares regression
line of y on x. Compare sres to sy and comment.
(d) Draw the regression line on your scatter diagram. Do you think that the line is a good description
of the relationship between average hourly wage and quit rate? Give reasons for your answer.
(e) Give a physical interpretation of the slope of the line. Also give an interpretation of the residual
standard deviation.
(f) Estimate the mean quit rate for industries with an average hourly wage of £9.50.

2. The following data give the expenditure on roads, in millions of pounds, for the tax year April 1991 to
March 1992, (x), and the surfaced road length, in thousands of kilometres, (y) for 9 regions of England
and Wales. (Source: Regional Trends, 1994.)
Region x y Region x y
North 150.4 23.33 South East 811.9 70.66
Yorks and H’side 222.6 29.19 South West 215.7 48.25
East Midlands 165.4 28.78 North West 263.3 25.91
East Anglia 96.8 20.55 Wales 220.8 33.53
West Midlands 216.8 30.17

(a) Draw a scatter plot of the data.


(b) Calculate the correlation coefficient and the rank correlation coefficient between x and y.
(c) Mark one point on your scatter diagram that could be considered an outlier and state which region
this point is associated with.
(d) Mark one point on your scatter diagram that could be considered a remote point and state which
region this point is associated with.
(e) Explain in what way the outlier and remote point differ from the other points.
(f) How would the correlation coefficient and rank correlation coefficient change if the outlier point
was omitted?

3. A large supermarket chain is considering reducing the price of some of its own brand products. The
manager carried out an experiment to investigate how changing the price of a 100g jar of their own brand
of instant coffee would affect the demand for the product. Eight supermarkets in the chain, with near
enough the same demand for coffee, were selected and different prices were assigned to each. The same
advertising was used for all eight supermarkets and the number of 100g jars of their own brand of instant
coffee sold in each of the supermarkets in the following week was recorded. The results were as follows.
The price x is in pounds. The number, y, refers to the number of jars sold.
Price (x) 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30
Number (y) 1326 921 784 794 553 518 417 409
loge y (z) 7.190 6.825 6.664 6.677 6.315 6.250 6.033 6.014

s2y = 96321.64 s2x = 0.015 s2z = 0.1701817


P
xy = 6187.8
P
xz = 58.12013 x̄ = 1.125 ȳ = 715.25 z̄ = 6.496126

A scatterplot of (i) y on x and (ii) z on x are given below.

(i)

(ii)

(a) Compute the regression equation for y on x and z on x (i.e. calculate the intercept and gradient
for the equations y = a + bx and z = a + bx, respectively), and draw the lines on the respective
scatterplots.
(b) Interpret the calculated intercept and gradient for the equations y = a + bx and z = a + bx,
respectively. What assumption are you making for the interpretation of a? Do you think this
assumption is reasonable in this context?
(c) Compute the correlation between y and x, and between z and x. Why is there a difference between
the two correlation coefficients? What can be concluded about the relationship between y and x?
(d) We are interested in estimating the number of 100g jars of own brand instant coffee that would be
sold if the price was reduced to 0.90. Which of the two regression equations of question (a) is the
most appropriate to use and why?
(e) Using the selected least squares regression line, estimate the number of 100g jars of own brand
instant coffee that would be sold if the price was reduced to 0.90. What assumption are you making?

También podría gustarte