Está en la página 1de 3

A large Darwin-based company has employed you to determine whether there is any association between the average volume of

sales per month and the period of selling experience of its sales representatives. You have been supplied with the following
information for a random sample of 10 of the company’s sales persons:

Sales (Y) Period of experience (X)


(average number per month) (years)
5 4
2 1
8 5
4 2
10 6
12 7
7 4
7 5
5 3
6 2
Σ Y = 66 Σ X = 39 Σ XY = 304
Σ Y2 = 512 Σ X2 = 185

On the basis of the above data:


a) Draw a scatter diagram of these data.
b) Find the equation that describes the regression line for these data.
c) Does the value of b1 show sufficient strength to conclude that β 1 is greater than zero at the α = 0.05 level?
d) Find the correlation coefficient and the coefficient of determination, and comment on the strength of the linear
relationship.
e) Find the 95% confidence interval for the estimation of β 1.
Solution SSx = 32.9 SSy = 76.4 SSxy = 46. = 3.9  = 6.6
a) Scatter diagram to be drawn on the board.
b) Linear (regression) equation is given by: ŷ = b0+b1X b1 = SSxy / SSx b0 = -b1
c) H0:β 1=0, no significant relationship. H1: β 1>0, there is significant relationship.
Test statistic: t = b1-β 1/Sb1
Significance level: α = 0.05
Critical value: tα , n-2
Decision rule: Reject H0 if t > 1.86
Value of the test statistics: t = b1/Sb1 Sb1= Se/√SSx Se = √(SSE/n-2) SSE = SSy – SSxy2/ SSx t = = b1 -β 1/Sb1= 1.42/0.20
= 7.1
Conclusion: Since t = 7.1 > 1.86, we reject H0. There is sufficient evidence to indicate that
d) Correlation coefficient (r) = SSxy/√(SSxSSy) There is a very strong linear relationship between period of experience and sales.
Coefficient of determination (r2) = (0.92)2 = 0.86 Therefore 86% of the variation in sales is explained by the variation
in period of experience.
e) 95% confidence interval for β 1: b1 ± tn-2, 0.05 Sb1 LCL = b1 - tn-2, 0.05 Sb1 UCL = b1 + tn-2, 0.05 Sb1 LCL < β 1
< UCL

A point estimate for a parameter is a single number designed to estimate a quantitative parameter of a population, usually the
value of the corresponding sample statistic.
An interval estimate is an interval bounded by two values and used to estimate the value of a population parameter. The values
that bound the interval are statistics calculated from the sample that is being used as the basis for the estimation.
A confidence interval is an interval estimate with a specified level of confidence.

A management consultant has been asked to investigate the punctuality of domestic passenger flights of a major Australian Airline
arriving at Darwin International Airport. The consultant has taken a random sample of 20 incoming flights during a particular month
and found that on average the planes are 2.73 minutes late with a standard deviation of 4.74 minutes. What is the 95 percent
confidence interval for the mean delay for the month as a whole?
Given: n = 20, x = 2.73 minutes, s = 4.74 minutes, α = 0.05, α /2 = 0.025
Required interval: x ± t(α /2, n-1) s/√n LCL = x + t(α /2, n-1) s/√n UCL = x − t(α /2, n-1) s/√n

The following data consists of a random sample of 75 individuals who were involved in bicycle accidents during a one-year period in
a regional city. Of the 75 individuals, 20 were wearing a safety helmet at the time of the accident, and 55 were not. 25 of the 75
individuals sustained head injuries and 50 did not. The data is presented in the table below.
Test the hypothesis using either the classical approach or the p-value approach, “head injury is independent of the wearing of a
helmet”.
Use α = 0.05.
Wearing
Helmet
Head Yes No Total
Injury
Yes 8 17 25
No 12 38 50
Total 20 55 75
Solution Expected frequency table e.g. calculation of expected frequency: e11 = (25)(20)/75 = 6.67
Wearing
Helmet
Head Yes No Total
Injury
Yes 6.67 18.33 25
No 13.33 36.67 50
Total 20 55 75
H0: Head injury is independent of wearing of helmet. Ha: There is a dependency
Degrees of freedom: = (r-1)(c-1 Level of significance (α ) = 0.05 Critical value χ 2(df, α ) Decision rule: Reject H0 if χ 2c >
χ (df, α )
2

Value of test statistics: χ 2c = Σ [(0-E)2/E]


Conclusion (classical approach): Since χ 2c < χ 2(df, α ), we do not reject H0, and conclude that the data provides insufficient evidence
(at 0.05 level) that there is any dependency between helmet wearing and head injury.

The mean distance travelled per shopping trip for local residents has historically been 8.32 kilometres. In 2008 a random sample of
25 shopping trips gave a mean distance travelled of 7.84 kilometres with a standard deviation of 5.12 kilometres. Does the data
provide sufficient evidence to conclude that the 2008 mean distance travelled per shopping trip has changed from the historical
mean of 8.32 kilometres? Assume distance travelled is approximately normal. Use α = 0.01 for the hypothesis test.
Solution
Given: µ = 8.32, n = 25, x = 7.84kms, s = 5.12kms, and α = 0.01 Since σ is unknown, n small, use t distribution. H0: µ =
8.32kms H1: µ ≠ 8.32kms
α = 0.01, α /2 = 0.005 critical value: t0.005, 24 decision rule: Reject H0 if |tc| > t0.005, 24
Value of test statistics: tc = (- µ )/s, s = s/√n
Conclusion: Since | tc | < t0.005, 24, we do not reject H0, and conclude that there is insufficient evidence (at 0.01 level) that mean
distance has changed since 2008.

What sample size would be needed to estimate the population mean to within one-half standard deviation with 95% confidence?
Maximum Error (E) = 0.5σ Z(α /2) = Z(0.025) = 1.96 n = [Z(α /2) σ /E]2 E= Z(α /2)( σ /√n)

In estimating average waiting time in an outpatients department based on a sample of patients it is proposed that the standard
error of the estimated time should not be greater than 5% of the estimated mean waiting time. On the basis of a pilot study a
rough estimate of the average waiting time of 75 minutes with standard deviation of 27 minutes has been made. What size of
sample of patients would be required to meet the above requirements?
Solution Average waiting time = 75 minutes Standard Error = S/√n (0.05)(75) = 27/√n √n = (27/3.75)
n = (7.2)2

Central Limit Theorem states that whenever a random sample of size n is taken from any distribution with mean µ and variance σ 2,
then the sample mean  will be approximately normally distributed with mean µ and variance σ 2/n. The larger the value of the
sample size n, the better the approximation to the normal.

The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible values

A discrete probability distribution is defined over a set value (such as a value of 1 or 2 or 3, etc). A continuous probability
distribution is defined over an infinite number of points (such as all values between 1 and 3, inclusive).

Criteria for a Binomial Probability Experiment: An experiment is said to be a binomial experiment provided that is satisfies the
following.
a. The experiment is performed a fixed number of times. Each repetition of the experiment is called a trial. Denote the number of
trials as n.
b. Each trial has only two mutually exclusive outcomes success, denoted by s, and failure, denoted by f. (Note that the term
"success" does not necessarily imply a good thing.)
c. The trials are independent. That is, the outcome of one trial will not affect the outcome of the other trials.
d. The probability of success remains the same from trial to trial. It is called the success probability, denoted by p. (Thus, the
probability of failure is (1 – p).)
4. The random variable X is called a binomial random variable and denotes the number of successes in n Bernoulli trials. The
possible values of this random variable are 0 ≤ x ≤ n.
5. Note that since each trial is independent, the probability of each possible outcome of a binomial experiment (a combination of n
s's and f's) can be found by multiplying the corresponding number of p's and (1 – p)'s. A tree diagram can help keep track of these
outcomes and their probabilities.

Properties of a Normal Distribution: the normal curve is symmetrical about the mean μ; the mean is at the middle and divides the
area into halves; the total area under the curve is equal to 1;it is completely determined by its mean and standard deviation σ (or
variance σ2)

The scatter diagram graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them. If the
variables are correlated, the points will fall along a line or curve. The better the correlation, the tighter the points will hug the line.

regression analysis involves finding the best straight line relationship to explain how the variation in an outcome (or dependent)
variable, Y, depends on the variation in a predictor (or independent or explanatory) variable, X. Once the relationship has been
estimated we will be able to use the equation: ŷ = b0+b1X in order to predict the value of the outcome variable for different values
of the explanatory variable

correlation analysis is concerned with determining the extent to which the variables of interest are related. It is a procedure that
provides a measure of the relative strength of the relationship.

Student’s t-statistic - When s is used as an estimate for σ , the test statistic has two sources of variation:  and s
The chi-square statistic is a nonparametric statistical technique used to determine if a distribution of observed frequencies differs
from the theoretical expected frequencies. Chi-square statistics use nominal (categorical) or ordinal level data, thus instead of using
means and variances, this test uses frequencies. The value of the chi-square statistic is given by χ 2c = Σ [(0-E)2/E]. Generally the
chi-squared statistic summarizes the discrepancies between the expected number of times each outcome occurs (assuming that
the model is true) and the observed number of times each outcome occurs, by summing the squares of the discrepancies,
normalized by the expected numbers, over all the categories. Data used in a chi-square analysis has to satisfy the following
conditions 1 Randomly drawn from the population, 2 reported in raw counts of frequency, 3 measured variables must be
independent, 4 observed frequencies cannot be too small, and 5 values of independent and dependent variables must be mutually
exclusive.

También podría gustarte