Está en la página 1de 40

5

Limit
Random Sample and Central

Theorem; X-Bar and R control


charts.

4
Exercise 1: (Example 1)
Suppose X1, X2, …, X20 is a sample from normal distribution N (µ , σ
σ 2 = 4. Find
(a) Expectation and Variance of
(b) Distribution of X
X
2
) with µ = 5,

Exercise 2: (Example 2)
Given that X is normally distributed with mean 50 and standard deviation 4, compute the
following for n=25.

(a) Mean and variance of X


(b) P ( X ≤49 )
(c) P ( X >52 )
(d) P ( 49 ≤ X ≤51 .5)

1
Probability and Statistics Work Book

Exercise 3: (Tutorial 5, No.1)


Given that X is normally distributed with mean 20 and standard deviation 2, compute the
following for n=40.
(a) Mean and variance of X
(b) P ( X ≤19 )
(c) P ( X >22 )
(d) P (19 ≤X ≤21 .5)

Solution:
(a) Mean of X = 20 and variance of X = 4/40 = 0.1
19 − 20
(b) P ( X ≤ 19 ) = P ( Z ≤ ) = P ( Z ≤ −3.16 ) = 0.000789
0 .1
(c)
22 − 20
P ( X > 22 ) = P ( Z > ) = P ( Z > 6.32 ) = 1 − P ( Z ≤ 6.32 ) = 1 −1 = 0
0.1)

19 − 20 21 .5 − 20
P (19 ≤ X ≤ 21 .5) = P ( ≤Z ≤ ) = P (−3.16 ≤ Z ≤ 7.9)
(d) 0.1 0.1
= Φ(7.9) − Φ( −3.16 ) = 1 − 0.000789 = 0.999211

Exercise 4: (Tutorial 5, No.2)


Let X denote the number of flaws in a 1 in length of copper wire. The pmf of X is given in
the following table

X=x 0 1 2 3
P(X=x) 0.48 0.39 0.12 0.01

100 wires are sampled from this population. What is the probability that the average number
of flaws per wire in this sample is less than 0.5?

Solution: Given that,


Mean of X = 0(0.48) + 1(0.39) + 2(0.12) + 3(0.01)=0.66
Variance of X =[ 02(0.48) + 12(0.39) + 22(0.12) + 32(0.01) ] – (0.66)2 = 0.5244
If n=100, the mean of X is 0.66 and the variance of X is 0.5244/100 = 0.005244
0.5 − 0.66
So, P ( X < 0.5) = P ( Z < ) = P( Z < −2.21) = 0.0136
0.005244

2
Probability and Statistics Work Book

Exercise 5: (Tutorial 5, No.3)


At a large university, the mean age of the students is 22.3 years, and the standard deviation is
4 years. A random sample of 64 students is drawn. What is the probability that the average
age of these students is greater than 23 years?

Solution: Given that, the mean of X is 22.3 and the variance of X is 16

If n = 64, the mean of X is 22.3 and the variance of X is 16/64 = 0.25

23 − 22 .3
P ( X > 23 ) = P ( Z < ) = P ( Z < 1.4) = 1 − P ( Z ≤ 1.4)
So, 0.25
= 1 − Φ(1.4) = 1 − 0.919 = 0.081

Exercise 6:
The flexural strength (in MPa) of certain concrete beams is X ~ N (8, 2.25). Find the
probability that the sample mean of strength of 16 concrete beams will belong to (7.55, 8.75)

3
Probability and Statistics Work Book

Exercise 7(Example 3)
A component part for a jet aircraft engine is manufactured by an investment casting process.
The vane opening on this casting is an important functional parameter of the part.
We will illustrate the use of X and R control charts to assess the statistical stability of
this process. The table presents 20 samples of five parts each. The values given in the table
have been coded by using the last three digits of the dimension; that is, 31.6 should be
0.50316 inch.

X
Sample Number x1 x2 x3 x4 x5 r
1 33 29 31 32 33 31.6 4
2 33 31 35 37 31 33.4 6
3 35 37 33 34 36 35.0 4
4 30 31 33 34 33 32.2 4
5 33 34 35 33 34 33.8 2
6 38 37 39 40 38 38.4 3
7 30 31 32 34 31 31.6 4
8 29 39 38 39 39 36.8 10
9 28 33 35 36 43 35.0 15
10 38 33 32 35 32 34.0 6
11 28 30 28 32 31 29.8 4
12 31 35 35 35 34 34.0 4
13 27 32 34 35 37 33.0 10
14 33 33 35 37 36 34.8 4
15 35 37 32 35 39 35.6 7
16 33 33 27 31 30 30.8 6
17 35 34 34 30 32 33.0 5
18 32 33 30 30 33 31.6 3
19 25 27 34 27 28 28.2 9
20 35 35 36 33 30 33.8 6

(a) Construct X and R control charts.


(b) After the process is in control, estimate the process mean and standard deviation.

4
Probability and Statistics Work Book

Exercise 8(Tutorial 5, No.4)


The overall length of a skew used in a knee replacement device is monitored using and R
charts. The following table gives the length for 20 samples of size 4. (Measurements are
coded from 2.00 mm; that is, 15 is 2.15 mm.)

Observation Observation
Sample 1 2 3 4 Sample 1 2 3 4
1 1 1 1 13 11 1 1 1 1
6 8 5 4 4 5 3
2 1 1 1 16 12 1 1 1 1
6 5 7 5 3 5 6
3 1 1 2 16 13 1 1 1 1
5 6 0 3 7 6 5
4 1 1 1 12 14 1 1 1 2
4 6 4 1 4 4 1
5 1 1 1 16 15 1 1 1 1
4 5 3 4 5 4 3
6 1 1 1 15 16 1 1 1 1
6 4 6 8 5 6 4
7 1 1 1 15 17 1 1 1 1
6 6 4 4 6 9 6
8 1 1 1 16 18 1 1 1 1
7 3 7 6 4 3 9
9 1 1 1 16 19 1 1 1 1
5 1 3 7 9 7 3
10 1 1 1 13 20 1 1 1 1
5 8 4 2 5 2 7

(i) Using all the data, find trial control limits for and R charts, construct the chart, and
plot the data.
(ii) Use the trial control limits from part (a) to identify out-of-control points. If necessary,
revise your control limits, assuming that any samples that plot outside the control limits
can be eliminated.
(iii) Assuming that the process is in control, estimate the process mean and process standard
deviation.

5
Probability and Statistics Work Book

Solution:

(i) The trial control limits are as follows.

6
Probability and Statistics Work Book

(ii) Based on the control charts, there is a single observation beyond the
control limits. Observation 14 is above the upper control limit on the R
chart.
With Observation 14 removed, the control limits and charts are as follows.

.0

All points are within the control limits. The process is said to be in
statistical control.

(iii) The estimate process mean is 15.14


The estimate process standard deviation is 3.895/2.059 = 1.892

7
Probability and Statistics Work Book

Exrcise 9:
The thickness of a printed circuit board (PCB) is an important quality parameter. Data on
board thickness (in cm) are given below for 25 samples of three boards each.

Sample 1 2 3 Sample 1 2 3
1 0.0629 0.0636 0.0640 14 0.0645 0.0640 0.0631
2 0.0630 0.0631 0.0622 15 0.0619 0.0644 0.0632
3 0.0628 0.0631 0.0633 16 0.0631 0.0627 0.0630
4 0.0634 0.0630 0.0631 17 0.0616 0.0623 0.0631
5 0.0619 0.0628 0.0630 18 0.0630 0.0630 0.0626
6 0.0613 0.0629 0.0634 19 0.0636 0.0631 0.0629
7 0.0630 0.0639 0.0625 20 0.0640 0.0635 0.0629
8 0.0628 0.0627 0.0622 21 0.0628 0.0625 0.0616
9 0.0623 0.0626 0.0633 22 0.0615 0.0625 0.0619
10 0.0631 0.0631 0.0633 23 0.0630 0.0632 0.0630
11 0.0635 0.0630 0.0638 24 0.0635 0.0629 0.0635
12 0.0623 0.0630 0.0630 25 0.0623 0.0629 0.0630
13 0.0635 0.0631 0.0630

(i) Using all the data, find trial control limits for and R charts, construct the chart, and
plot the data.
(ii) Use the trial control limits from part (a) to identify out-of-control points. If necessary,
revise your control limits, assuming that any samples that plot outside the control limits
can be eliminated.

8
Probability and Statistics Work Book

(iii) Assuming that the process is in control, estimate the process mean and process standard
deviation.

6 Hypothesis Testing
- One Population

5
Exercise 1: (Example 1)
A manufacturer of sprinkler systems used for fire protection in office buildings claims that
the true average system- activation temperature is 1300. A sample of 9 systems, when tested
yields an average activation temperature of 131.080F. If the distribution of activation times is
normal with standard deviation 1.50F, does the data contradict the firm’s claim at level of
significance a = 0.01. What is the P-value for this test?

Exercise 2: (Example 2)
A random sample of 50 battery packs is selected and subjected to a life test. The average life
of these batteries is 4.05 hours. Assume that the battery life is normally distributed with
standard deviation equals 0.2 hour. Is there evidence to support the claim that mean battery
life exceeds 4 hours? Use a = 0.05. What is the P-value for this test?

9
Probability and Statistics Work Book

Exercise 3:
A new cure has been developed for a certain type of cement that results in a compressive
strength of 5000 kilograms per square centimeter with a standard deviation of 120 kilograms
follow the normal distribution. To test the null hypothesis that µ = 5000 against the
alternative that µ < 5000, a random sample of 50 pieces of cement is observed. The critical
region is defined to be X < 4970.

(a) Find the probability of committing a type I error when H0 is true.


(b) Evaluate β (the probability of type II error) if µ = 4960

Exercise 4: (Tutorial 6, No.1)


A civil engineer is analyzing the compressive strength of concrete. Compressive strength is
approximately normally distributed with variance σ 2 = 1000psi2. A random sample of 12
specimens has a mean compressive strength of x =3255.42 psi.
(a) Test the hypothesis that mean compressive strength is 3500psi. Use a fixed-level
test with α =0.01;
(b) What is the smallest level of significance at which you would be willing to reject the
null hypothesis?;
(c) Construct a 95% two-sided CI on mean compressive strength; and
(d) Construct a 99% two-sided CI on mean compressive strength. Compare the width of
this confidence interval with the width of the one in part (c). What is your comment?

Solution:

(a) (i) The parameter of interest is the true mean compressive strength, μ.
(ii) The hypothesis Testing:

vs

(iii) The significance level α = 0.01


(iv) The test statistics is:

Computation

x = 3255 .42 , σ = 31 .62


3255 .42 −3500
⇒z 0 = = −26 .79
31 .62 / 12
(v) Decision:

10
Probability and Statistics Work Book

Reject H0 if z0 <- z/2 where z0.005 = 2.58 or z0 > z/2 where z0.005 = 2.58

(vi) Result and conclusion:

Since -26.79 < -2.58, so we reject the null hypothesis and conclude the true mean
compressive strength is significantly different from 3500 at α = 0.01.

(b) The smallest level of significance at which we are willing to reject the null
hypothesis is P-value = 2[1 - φ (26.84)]=2[1-1]=0
(c) A 95% two-sided CI on mean compressive strength is

With 95% confidence, we believe the true mean compressive strength is between
3237.53psi and 3273.31psi.

(d) A 99% two-sided CI on mean compressive strength is

With 99% confidence, we believed that the true mean compressive strength is between
3231.96 psi and 3278.88 psi.

The 99% confidence interval is wider than the 95% confidence interval.
We can conclude that the confidence interval with the larger level of confidence will
always result in a wider confidence interval when x , σ 2, and n are held constant.

11
Probability and Statistics Work Book

Exercise 5: (Example 3)
A new process for producing synthetic diamonds can be operated at a profitable level only if
the average weight of the diamonds is greater than 0.5 karat. To evaluate the profitability of
the process, six diamonds are generated with recorded weights, 0.46, 0.61, .52, .48, .57 and .
54 karat.
(a) At 5% significance level Do the six measurements present sufficient evidence that
the average weight of the diamonds produced by the process is in excess of .05 karat?
(b) Use the P-value approach to test the hypothesis null.
(c) Construct a 95% CI on the average weight of diamonds.

Exercise 6: (Tutorial 6, No.2)


One of the Cigarette Company claims that their cigarettes contain an average of only 10mg
of tar. A random sample of 25 cigarettes shows the average tar content to be 12.5mg with
standard deviation of 4.5mg.

(a) Construct a hypothesis test to determine whether the average tar content of
cigarettes exceeds 10mg. using the P-value approach;
(b) Construct a 95% two-sided CI on the average tar content of cigarettes.

Solution:
(a) (i) The parameter of interest is the true mean tar content, μ.
(ii) The hypothesis testing:

H 0 : µ = 10 mg
vs
H 1 : µ > 10 mg

(iii) The test statistics is:

12
Probability and Statistics Work Book

x −µ 12 .5 −10
t0 = = = 2.778
s/ n 4.5 / 5

(v) Decision:
Reject H0 if P-value is smaller than 0.05

(vi) Conclusion:
From a t-distribution table, for a t – distribution with 24degree of freedom, that t0
=2.778 falls between two values: 2.492 for which α =0.01 and 2.797 for which
α =0.005. So the P-value is : 0.005 < P < 0.01. Since P<0.05, thus we reject H0 and
conclude that the mean tar content of the cigarette exceeds 10mg.

(b) A 95% two-sided CI on mean tar content is

x =12 .5, s = 4.5, n = 25 , tα / 2, n −1 = t 0.025 , 24 = 2.064


 s   s 
x −tα / 2, n −1 
  ≤ µ ≤ x + tα / 2, n −1 
  

 n  n
 4.5   4.5 
12 .5 − ( 2.064 )   ≤ µ ≤12 .5 − ( 2.064 )
  

 25   25 
10 .642 ≤µ ≤14 .358

Exercise 7: (Example 4)
Regardless of age, about 20% of Malaysian adults participate in fitness activities at least
twice a week. In a local survey of 100 adults over 40 years old, a total of 15 people indicated
that they participated in a fitness activity at least twice a week.
(a) Do these data indicate that the participation rate for adults over 40 years of age is
significantly less than 20%? Carry out a test at 10% significance level and draw
appropriate conclusion.
(b) Construct a 95% two-sided CI on the participation rate.

13
Probability and Statistics Work Book

Exercise 8: (Tutorial 6, No.3)


A survey done one year ago showed that 45% of the population participated in recycling
programs. In a recent poll a random sample of 1250 people showed that 588 participate in
recycling programs.
(a) Test the hypothesis that the proportion of the population who participate in
recycling programs is greater than it was one year ago. Use a 5% significance level.
(b) Construct a 95% two-sided CI on the proportion.

Solution:

(a) (i) The parameter of interest is the proportion of the population who
participate in
recycling program, p.
(ii) The hypothesis testing:
H 0 : p 0 = 0.45
vs
H 1 : p 0 > 0.45
(iii) The significance level α = 0.05
(iv) Test statistics is:

pˆ = X / n − p0 588 / 1250 − 0.45


z0 = = = 1.449
p0 (1 − p0 ) / n (0.45)( 0.55) / 1250
(v) Decision:

Reject H0 if z0 > zα where zα = z0.05 = 1.645.


(vi) Conclusion:
Since 1.449 < 1.645, thus we do not reject the null hypothesis and
conclude that 45% of the population who participate in recycling program
is true at the 0.05 level of significance.

(b) 95% two-sided CI is

14
Probability and Statistics Work Book

ˆ (1 − p
p ˆ) ˆ (1 − p
p ˆ)
ˆ − Zα / 2
p ≤p≤p
ˆ + Zα / 2
n n
(0.47 )( 0.53 (0.47 )( 0.53 )
0.47 −1.96 ≤ p ≤ 0.47 +1.96
1250 n
0.442 ≤ p ≤ 0.498

Since p =0.45 is inside the interval, then we cannot reject the null hypothesis.
Exercise 9:
A Ipoh city council member gave a speech in which she said that 18% of all private homes in
the city had been undervalued by the county tax assessor’s office. In a follow-up story the
local newspaper reported that it had taken random sample of 91 private homes. Using
professional evaluator to evaluate the property and checking against county tax records it
found that 14 of the homes had been undervalued.
(i) Does this data indicate that the proportion of private homes that are undervalued by the
county tax assessor is different from 18%? Use a 5% significance level.
(ii) Construct a 95% two-sided CI on the proportion.

Exercise 10: (Example 5)


Engineers designing the front-wheel-drive half shaft of a new model automobile claim that
the variance in the displacement of the constant velocity joints of the shaft is less than 1.5
mm. 20 simulations were conducted and the following results were obtained, x = 3.39 and
s = 1.41.
(i) At α = 0.05, do these data support the claim of the engineers?
(ii) What is the P-value for this test?
(iii) Construct a two-sided CI for σ.

15
Probability and Statistics Work Book

Exercise 11: (Tutorial 6, No.4)


An Aerospace Engineers claim that the standard deviation of the percentage in an alloy used
in aerospace casting is greater than 0.3. 51 parts were randomly selected and the sample
standard deviation of the percentage in an alloy used in aerospace casting is s =0.37.
(i). At α = 0.05, do these data support the claim of the engineers?
(ii) What is the P-value for this test?
(iii) Construct a 95% two-sided CI for σ . What is conclusion?

Solution:
2
(i) (a) The parameter of interest is the population variance σ .
(b) The hypothesis testing:
H 0 : σ 2 = (0.3) 2
vs
H1 : σ 2 > (0.3) 2
(c) The significance level α = 0.05
(d) Test statistics is:
(n − 1) s 2 50(0.37 ) 2
χ02 = = = 76 .056
σ 02 (0.3) 2
(e) Decision:
Reject H0 if χ0 >χ0.05 , 50 = 67 .50
2 2

(f) Conclusion:
Since 76.056 > 67.50, thus we reject the null hypothesis and conclude that
the engineers claim is true at the 0.05 level of significance.
(ii) From the χ2 table, χ02.1,50 = 76 .15 , χ02.25 ,50 = 71 .42 . Since
71.42<76.056< 76.15, so the P-value is 0.1 < p < 0.25. Because the P-value is
large, then we do not reject the null
hypothesis.
(b) 95% two-sided CI is

16
Probability and Statistics Work Book

(n − 1) s 2 (n − 1) s 2
≤ σ 2

χ α2 / 2, n −1 χ12−α / 2, n −1
50(0.37)2 50(0.37) 2
≤σ2 ≤
71.42 32.36
0.442 ≤ p ≤ 0.498

Exercise 12:
The scientists claim that the variance of sugar content of the syrup in canned peaches thought
to be 18 mg2. From a random sample of 10 cans yields a sample deviation of 4.8mg.
(i) At α = 0.05, do these data support the claim of the scientists?
(ii) What is the P-value for this test?
(iii) Construct a 95% two-sided CI for σ . What is conclusion?

17
Probability and Statistics Work Book

7 Hypothesis Testing
-Two Population

5
Exercise 1: (Example 1)
A random sample of size n = 25 taken from a normal population with σ = 5.2 has a mean
equals 81. A second random sample of size n = 36, taken from a different normal population
with σ = 3.4, has a mean equals 76.
(a) Do the data indicate that the true mean value µ 1 and µ 2 are different? Carry out
a test at α = 0.01
(b) Find 90% CI on the difference in mean strength

Exercise 2: (Example 2)
Two machines are used for filling plastic bottles with a net volume of 16.0 oz. The fill
volume can be assumed normal with, s1 = 0.02 and s2 = 0.025. A member of the quality
engineering staff suspects that both machines fill to the same mean net volume, whether or
not this volume is 16.0 oz. A random sample of 10 bottles is taken from the output of each
machine with the following results:
(a) Do you think the engineer is correct? Use the p – value approach.
(b) Find a 95% CI on the difference in means.

18
Probability and Statistics Work Book

Exercise 3: (Tutorial 7, No.1)


Two machine are used to fill plastic bottles with dishwashing detergent. The standard
deviations of fill volume are known to be σ 1= 0.01 and σ 2 = 0.15 fluid ounce for two
machines, respectively. Two random samples of n1 = 12 bottles from machine 1 and n2=10
bottles from machine 2 are selected, and the sample mean fill volumes are x 1 =30.61
x 2 =30.24 fluid ounces. Assume normality.
(i) Test the hypothesis that both machines fill to the same mean volume. Use the P-value
approach;
(ii) Construct a 90% two-sided CI on the mean difference in fill volume; and
(iii) Construct a 95% two-sided CI on the mean difference in fill volume. Compare and
comment on the width of this interval to the width of the interval in part (ii).

Exercise 4: (Example 3)
To find out whether a new serum will arrest leukemia, 9 mice, all with an advanced stage of
the disease are selected. 5 mice receive the treatment and 4 do not. Survival, in years, from
the time the experiment commenced are as follows:

Treatment 2.1 5.3 1.4 4.6 0.9

No treatment 1.9 0.5 2.8 3.1

At the 0.05 level of significance can the serum be said to be effective? Assume the two
distributions to be of equal variances.

19
Probability and Statistics Work Book

Exercise 5: (Tutorial 7, No.2)


A new policy regarding overtime pay was implemented. This policy decreased the pay factor
for overtime work. Neither the staffing pattern nor the work loads changed. To determine if
overtime loads changed under the policy, a random sample of employees was selected. Their
overtime hours for a randomly selected week before and for another randomly selected week
after the policy change were recorded as follows:

Employees: 1 2 3 4 5 6 7 8 9 10 11 12
Before: 5 4 2 8 10 4 9 3 6 0 1 5
After: 3 7 5 3 7 4 4 1 2 3 2 2

Assume that the two population variances are equal and the underlying population is
normally distributed.
(i) Is there any evidence to support the claim that the average number of hours worked as
overtime per week changed after the policy went into effect. Use a P-value approach in
arriving at this conclusion.
(ii) Construct a 95% CI for the difference in mean before and after the policy change.
Interpret this interval.

Exercise 6:
The diameter of steel rods manufactured on two different extrusion machines is being
investigated. Two random samples of sizes n1 = 15 and n2 = 17 are selected, and
x1 = 8.37 , Assume
respectively. s12 = 0.that
35 data = 8.68
andarex2drawn = 0.40 with equal variances.
, s22 distribution
normal

(a) Is there evidence to support the claim that the two machines produce rods with
different mean diameters ? Use the p – value approach.
(b) Construct a 95% CI on the difference in mean rod diameter.

Exercise 7: (Example 4)

20
Probability and Statistics Work Book

The following data represent the running times of films produced by 2 motion-picture
companies. Test the hypothesis that the average running time of films produced by company
2 exceeds the average running time of films produced by company 1 by 10 minutes against
the one-sided alternative that the difference is less than 10 minutes? Use a = 0.01 and assume
the distributions of times to be approximately normal with unequal variances.

Time

Company
X1 102 86 98 109 92

X2 81 165 97 134 92 87 114

Exercise 8:
Two companies manufacture a rubber material intended for use in an automotive application.
25 samples of material from each company are tested, and the amount of wear after 1000
cycles are observed. For company 1, the sample mean and standard deviation of wear are
x1 = 20 .12 mg / 1000 cycles and s1 = 1.9mg / 1000 cycles
and for company 2, we obtain x2 = 11 .64 mg / 1000 cycles and s2 = 7.9mg / 1000 cycles

(a) Do the sample data support the claim that the two companies produce material with
different mean wear? Assume each population is normally distributed but unequal
variances?
(b) Construct a 95% CI for the difference in mean wear of these two companies.
Interpret this interval.

Exercise 9: (Tutorial 7, No.3)

21
Probability and Statistics Work Book

Professor A claims that a probability and statistics student can increase his or her score on
tests if the person is provided with a pre-test the week before the exam. To test her theory she
selected 16 probability and statistics students at random and gave these students a pre-test the
week before an exam. She also selected an independent random sample of 12 students who
were given the same exam but did not have access to the pre-test. The first group had a mean
score of 79.4 with standard deviation 8.8. The second group had sample mean score 71.2
with standard deviation 7.9.
(i) Do the data support Professor A claims that the mean score of students who get a pre-
test are different from the mean score of those who do not get a pre test before an exam.
Use the P-value approach and assume that their variances are not equal.
(ii) Construct a 95% CI for the difference in mean score of students who get a pre-test and
those who do not get a pre-test before an exam. Interpret this interval.

Exercise 10: (Example 5)


A vote is to be taken among residents of a town and the surrounding county to determine
whether a proposed chemical plant should be constructed. If 120 of 200 town voters favour
the proposal and 240 of 500 county residents favour it, would you agree that the proportion
of town voters favouring the proposal is higher than the proportion of county voters? Use a =
0.05

Exercise 11: (Tutorial 7, No.4)

22
Probability and Statistics Work Book

The rollover rate of sport utility vehicles is a transportation safety issue. Safety advocates
claim that the manufacturer A’s vehicle has a higher rollover rate than that of manufacturer
B. One hundreds crashes for each of this vehicles were examined. The rollover rates were
pA=0.35 and pB=0.25.
(i) By using the P-value approach, does manufacturer A’s vehicle has a higher rollover rate
than manufacturer B’s?
(ii) Construct a 95% CI on the difference in the two rollover rates of the vehicle. Interpret
this interval.

Exercise 12:
Professor Rady gave 58 A’s and B’s to a class of 125 students in his section of English 101.
The next term Professor Hady gave 45 A’s and B’s to a class of 115students in his section of
English 101.
(i) By using a 5% significance level, test the claim that Professor Rady gives a higher
percentage of A’s and B’s in English 101 than Professor Hady does. What is comment?
(ii) Construct a 95% CI on the difference in the percentage of A’s and B’s in English 101
given by this two professors.

23
8
Probability and Statistics Work Book

Simple Linear Regression

5
Exercise 1: (Example 1)
The manager of a car plant wishes to investigate how the plant’s electricity usage depends
upon the plant production. The data is given below

Production 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
(RMmillion)
(x)
Electricity 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
Usage
(y)

Y = β0 + β1 x
(a) Estimate the linear regression equation
(b) An estimate for the electricity usage when x = 5
(c) Find a 90% Confidence Interval for the electricity usage.

Exercise 2:
An experiment was set up to investigate the variation of the specific heat of a certain
chemical with temperature. The data is given below

Temperature oF 50 60 70 80 90 100
(x)

Heat 1.60 1.63 1.67 1.70 1.71 1.71


(y) 1.64 1.65 1.67 1.72 1.72 1.74

(a) Estimate the linear regression equation Y = β0 + β1 x


(b) Plot the results on a scatter diagram
(c) An estimate for the specific heat when the temperature is 75oF
(d) Find a 95% Confidence Interval for the specific heat.

Exercise 3: (Example 2)

24
Probability and Statistics Work Book

An engineer at a semiconductor company wants to model the relationship between the device
HFE (y) and the parameter Emitter - RS ( x).1 Data for Emitter - RS was first collected and
a statistical analysis is carried out and the output is displayed in the table given.

Regression Analysis: y = 1075.2 – 63.87x1


Predictor Coef SE Coef T P-value
Constant 1075.2 121.1 8.88 0.000
x1 -63.87 8.002 -7.98 0.000
S = 19.4 R-Sq = 0.78

Analysis of variance
Source DF SS MS F
Regression 1 23965 23965 63.70
Residual 18 6772 376
Total 19 30737

(a) Estimate HFE when the Emitter - RS is 14.5.


(b) Obtain a 95 % confidence interval for the true slope β.
(c) Test for significance of regression for a = 0.05.

Exercise 4:

25
Probability and Statistics Work Book

An chemical engineer wants to model the relationship between the purity of oxygen (y)
produced in a chemical distillation process and the percentage of hydrocarbons (x ) that are
present in the main condenser of the distillation unit. A statistical analysis is carried out and
the output is displayed in the table given.

Regression Analysis: y = 74.3 + 14.9x


Predictor Coef SE Coef T P-value
Constant 74.283 1.593 46.62 0.000
x1 14.947 1.317 11.35 0.000
S = 1.087 R-Sq = 87.7%

Analysis of variance
Source DF SS MS F
Regression 1 152.13 152.13 12.86
Residual 18 21.25 1.18
Total 19 173.38

(a) Estimate the purity of oxygen when the percentage of hydrocarbon 1%.
(b) Obtain a 95 % confidence interval for the true slope β.
(c) Test for significance of regression for a = 0.05.

26
Probability and Statistics Work Book

Exercise 5: (Tutorial 8, No.1)


Regression methods were used to analyze the data from a study investigating the relationship
between roadway surface temperature (x) and pavement deflection (y). The data follow.

Temperature Deflection Temperature Deflection


x y x y
70.0 0.621 72.7 0.637
77.0 0.657 67.8 0.627
72.1 0.640 76.6 0.652
72.8 0.623 73.4 0.630
78.3 0.661 70.5 0.627
74.5 0.641 72.1 0.631
74.0 0.637 71.2 0.641
72.4 0.630 73.0 0.631
75.2 0.644 72.7 0.634
76.0 0.639 71.4 0.638

(a) Estimate the intercept and slope regression coefficients. Write the
estimated regression line.
(b) Compute SSE and estimate the variance.
(c) Find the standard error of the slope and intercept coefficients.
(d) Show that
(e) Compute the coefficient of determination, R2. Comment on the value.
(f) Use a t-test to test for significance of the intercept and slope coefficients at
. Give the P-values of each and comment on your results.
(g) Construct the ANOVA table and test for significance of regression using the P-
value. Comment on your results and their relationship to your results in part (f).
(h) Construct 95% CIs on the intercept and slope. Comment on the
relationship of these CIs and your findings in parts (f) and (g).

27
Probability and Statistics Work Book

Exercise 6: (Tutorial 8, No.2)


The designers of a database information system that allows its users to search backwards for
several days wanted to develop a formula to predict the time it would be take to search.
Actually elapsed time was measured for several different values of days. The measured data
is shown in the following table:

Number of Days 1 2 4 8 16 25
Elapsed Time 0.6 0.79 1.36 2.26 3.59 5.39
5

(i) Estimate the intercept and slope regression coefficients. Write


the estimated regression line.
(ii) Compute SSE and estimate the variance.
(iii) Find the standard error of the slope and intercept coefficients.
(iv) Show that
(v) Compute the coefficient of determination, R2. Comment on the value.
(vi) Use a t-test to test for significance of the intercept and slope
coefficients at . Give the P-values of each and comment on your
results.
(vii) Construct the ANOVA table and test for significance of regression
using the P-value. Comment on your results and their relationship to your
results in part (vi).
(viii) Construct 95% CIs on the intercept and slope. Comment on the
relationship of these CIs and your findings in parts (vi) and (vii).

28
Probability and Statistics Work Book

9 Multiple Linear Regressions

Exercise 1: (Example 1)

5
Given the data:

Test Number y x1 x2
1 1.6 1 1
2 2.1 1 2
3 2.4 2 1
4 2.8 2 2
5 3.6 2 3
6 3.8 3 2
7 4.3 2 4
8 4.9 4 2
9 5.7 4 3
10 5 3 4

(a) Fit a multiple linear regression model to these data.

29
Probability and Statistics Work Book

Exercise 2:
Given the data:

Observation Number Pull Strength y Wire Length x1 Die Height x2


1 9.95 2 50
2 24.45 8 110
3 31.75 11 120
4 35.00 10 550
5 25.02 8 295
6 16.86 4 200
7 14.38 2 375
8 9.60 2 52
9 24.35 9 100
10 27.50 8 300
11 17.08 4 412
12 37.00 11 400
13 41.95 12 500
14 11.66 2 360
15 21.65 4 205
16 17.89 4 400
17 69.00 20 600
18 10.30 1 585
19 34.93 10 540
20 46.59 15 250
21 44.88 15 290
22 54.12 16 510
23 56.63 17 590
24 22.13 6 100
25 21.15 5 400

(b) Fit a multiple linear regression model to these data.

30
Probability and Statistics Work Book

Exercise 3:
A study was performed to investigate the shear strength of soil (y) as it related to depth in
meter (x1) and percentage moisture content (x2). Ten observations were collected and the
following summary quantities obtained:

n = 10 , ∑x i1 = 223 , ∑x i2 = 553 ,∑y i = 1,916 ,


∑x 2
i1 = 5,200 .9, ∑x = 31,729 ,
2
i2 ∑x x
= 12 ,352 ,i1 i 2

∑x i1 yi = 43 ,550 .8, ∑x y = 104 ,736 .8, ∑y = 371 ,595 .6


i2 i
2
i

(a) Estimate the parameters to fit the multiple regression models for these data.
(b) What is the predicted strength when x1=18meter and x2= 43%.

31
Probability and Statistics Work Book

Exercise 4: (Example 2)
A set of experimental runs were made to determine a way of predicting cooking time y at
various levels of oven width x1, and temperature x2. The data were recorded as follows:

y x1 x2
6.4 1.32 1.15
15.05 2.69 3.4
18.75 3.56 4.1
30.25 4.41 8.75
44.86 5.35 14.82
48.94 6.3 15.15
51.55 7.12 15.32
61.5 8.87 18.18
100.44 9.8 35.19
111.42 10.65 40.4

(a) Fit a multiple linear regression model to these data.


(b) Estimate and the standard errors of the regression coefficients.
(c) Test for significance of and .
(d) Predict the useful range when brightness = 80 and contrast = 75. Construct a 95%
PI.
(e) Compute the mean response of the useful range when brightness = 80 and contrast =
75. Compute a 95% CI.
(f) Interpret parts (d) and (e) and comment on the comparison between the 95% PI and
95% CI.

32
Probability and Statistics Work Book

Exercise 5: (Tutorial 9, No.1)


An article in Optical Engineering (“Operating Curve Extraction of a Correlator's Filter,” Vol.
43, 2004, pp. 2775–2779) reported the use of an optical correlator to perform an experiment
by varying brightness and contrast. The resulting modulation is characterized by the useful
range of gray levels. The data are shown

Brightness (%): 5 6 6 10 10 10 50 57 54
4 1 5 0 0 0
Contrast (%): 5 8 7 50 65 80 25 35 26
6 0 0
Useful range (ng): 9 5 5 11 96 80 15 14 25
6 0 0 2 5 4 5

(a) Fit a multiple linear regression model to these data.


(b) Estimate and the standard errors of the regression coefficients.
(c) Test for significance of and .
(d) Predict the useful range when brightness = 80 and contrast = 75. Construct a 95%
PI.
(e) Compute the mean response of the useful range when brightness = 80 and contrast =
75. Compute a 95% CI.
(f) Interpret parts (d) and (e) and comment on the comparison between the 95% PI and
95% CI.

33
Probability and Statistics Work Book

Exercise 6: (Tutorial 9, No.2)


A study was performed on wear of a bearing y and its relationship to x1 = oil viscosity and
x2 = load. The following data were obtained:

x 1.6 15.5 22.0 43.0 33.0 40.0


1
x 85 816 1058 120 135 111
2 1 1 7 5
y 29 230 172 91 113 125
3

(a) Fir a multiple regression model to these data.


2
(b) Estimate σ and the standard errors of the regression coefficients.
(c) Use the model to predict wear when x1 = 25 and x2 = 1000.
(d) Fit a multiple regression model with an interaction term to these data.
2
(e) Estimate σ and se(β j) for this new model. How did these quantities change? Does
this tell you anything about the value of adding the interaction term to the model?
(f) Use the model in (d), to predict when x1=25 and x2=1000. Compare this prediction with
the predicted value from part (c) above.

34
Probability and Statistics Work Book

10 Factorial Experiments
– 22 Factorial design

10
Exercise 1: (Example 1)
An engineer is investigating the thickness of epitaxial layer which will be subject to two
variations in A, deposition time (+ for short time, and – for long time) and two levels of B,
arsenic flow rate (- for 55% and + for 59%). The engineer conduct 22 factorial design with n
= 4 replicates. The data are as follow:

35
Probability and Statistics Work Book

Arsenic Level
B– B+
(Low - 55%) (High – 59%)

Deposition Time
14.037 13.880
14.165 13.860
A - (Long) 13.972 14.032
13.907 13.914

14.821 14.888
14.757 14.921
A + (Short) 14.843 14.415
14.878 14.932

a) Construct the 2 X 2 factorial design table.


b) Find the estimate of all effects and interaction.
c) Construct the ANOVA table for each effect, test the null hypothesis that the effect is
equal to 0.

Exercise 2: (Tutorial No1)


A two factor experimental design was conducted to investigate the lifetime of a component
being manufactured. The two factors are A (design) and B (cost of material). Two levels ((+)
and (-)) of each factor are considered. Three components are manufactured with each
combination of design and material, and the total lifetime measured (in hours) is as shown in
table below

Total lifetime of 3
Design Material
AB components
Treatment A B
(in hours)
Combination

36
Probability and Statistics Work Book

(1) - - + 122
a
+ - - 60
b - + - 120
ab + + + 118

(a) Perform a two way analysis of variance to estimate the effects of design and material
expense on the component life time.
(b) Based on your results in part (a), what conclusions can you draw from the factorial
experiment?
(c) Indicate which effects are significant to the lifetime of a component.
(d) Write the least square fitted model using only the significant sources.

Exercise 3:
An engineer suspects that the surface finish of metal parts is influenced by the type of paint
used and the drying time. He selected three drying times – 20, 25, and 30 minutes and used
two types of paint. Three parts are tested with each combination of paint typoe and drying
time. The data are as follow:

Drying Time (min)


Paint 20min 25min 30min
ICI 74 73 78
64 61 85
50 44 92
NIPPON 92 98 66
86 73 45
68 88 85

37
Probability and Statistics Work Book

(a) Compute the estimates of the effects and their standard errors for this
design.
(b) Construct two-factor interaction plots and comment on the interaction of
the factors.
(c) Use the t ratio to determine the significance of each effect with
.Comment on your findings.
(d) Compute an approximate 95% CI for each effect. Compare your results
with those in part (c) and comment.
(e) Perform an analysis of variance of the appropriate regression model for this
design. Include in your analysis hypothesis tests for each coefficient, as well as residual

Exercise 4: (Tutorial 10, No.2)


An experiment involves a storage battery used in the launching mechanism of a shoulder-
fired ground-to-air missile. Two material types can be used to make the battery plates. The
objective is to design a battery that is relatively unaffected by the ambient temperature. The
output response from the battery is effective life in hours. Two temperature levels are
selected, and a factorial experiment with four replicates is run. The data are as follows:

Temperature (°F)
Material Low High
1 13 15 2 70
0 5 0
74 18 8 58
0 2
2 13 11 9 10
8 0 6 4
16 3816 8 60
8 0 2
Probability and Statistics Work Book

(a) Compute the estimates of the effects and their standard errors for this design.
(b) Construct two-factor interaction plots and comment on the interaction of the factors.
(c) Use the t ratio to determine the significance of each effect with .Comment
on your findings.
(d) Compute an approximate 95% CI for each effect. Compare your results with those
in part (c) and comment.
(e) Perform an analysis of variance of the appropriate regression model for this design.
Include in your analysis hypothesis tests for each coefficient, as well as residual
analysis. State your final conclusions about the adequacy of the model. Compare your
results to part (c) and comment.

Exercise 5:
An article in the IEEE Transactions on Semiconductor Manufacturing (Vol. 5, 1992, pp. 214-
222) describes an experiment to investigate the surface charge on a silicon wafer. The factors
thought to influence induced surface charge are cleaning method (spin rinse dry or SRD and
spin dry or SD and the position on the wafer where the charge was measured. The surface
charge ( X1011 q/cm3) response data are shown.

Test Position
L R
1.66 1.84
Cleaning SD 1.90 1.84
Method 1.92 1.62
-4.21 -7.58

39
Probability and Statistics Work Book

SRD -1.35 -2.20


-2.08 -5.36

(a) Compute the estimates of the effects and their standard errors for this design.
(b) Construct two-factor interaction plots and comment on the interaction of the factors.
(c) Use the t ratio to determine the significance of each effect with .Comment
on your findings.
(d) Compute an approximate 95% CI for each effect. Compare your results with those
in part (c) and comment.
(e) Perform an analysis of variance of the appropriate regression model for this design.
Include in your analysis hypothesis tests for each coefficient, as well as residual
analysis. State your final conclusions about the adequacy of the model. Compare your
results to part (c) and comment.

40

También podría gustarte