Está en la página 1de 32

ONE-WAY ANALYSIS OF VARIANCE

(ANOVA)

A. CHAPTER OBJECTIVES
B. INTRODUCTION
C. UNDERSTANDING THE FUNDAMENTALS
OF ANOVA
D. UNDERSTANDING THE ANOVA TABLE
E. ANALYZING THE DATA IN MINITAB
F. ANOVA APPLICATION EXAMPLE
G. ANOVA PRACTICAL APPLICATION
EXAMPLE
H. ANOVA TEAM EXERCISE

20-1
CHAPTER OBJECTIVES

To understand the
fundamentals of ANOVA
including its components,
general definitions, statistical
assumptions and basic
concepts.
Provide an understanding of the
ANOVA table.
Be able to conduct an
experiment using ANOVA

20-2
INTRODUCTION

One-way Analysis of Variance (ANOVA) is a statistical


technique that enables us to test the significance of
difference between more than 2 sample means.
For example, based upon results of previous tests,
suppose you want to try a second process change to
reduce customer response cycle time. To compare the
cycle times of the 3 processes, we would conduct a one-
way ANOVA.
Using ANOVA, we will make inferences about whether our
samples are drawn from populations having the same
mean.

20-3
INTRODUCTION

As indicated by the null and alternate hypothesis, an


ANOVA tests whether any population mean differs from
each other.

H o : 1 2 3 4
H a : At least one k is different

20-4
UNDERSTANDING THE
FUNDAMENTALS OF ANOVA

When conducting tests concerning the differences of


means, you analyze the variation within each sample
and compare it to the variation between samples.

In order to break down total variation into its


components, Sum of Squares is a mathematical
technique to compute the combined effect of different
sources of variability.

To analyze the total sum of squares, we need to break


it into 2 parts; Within and Between sum of squares.

20-5
CALCULATING THE SUM OF SQUARES
The following is the formula used for calculating the sum of squares.

SST SS B SSW
g n g g n

j 1 i 1
( X ij X ) = n ( X j X )
2

j 1
2
+
j 1 i 1
( X ij X j )2

Total Between ( Factor ) Within ( Error )

Where;
X ij Individual values
g


j 1
Summation over all subgroups ( j 1 to g )
n

i 1
Summation over all individuals withi n t he subgroups (i 1 to n )

X Grand average (overall )


X j Average of subgroup j
SS B Between Sum of Squares ( Black noise, special cause effects )
SSW Within Sum of Squares (White noise, error , random cause effect )
SST Total Sum of Squares (White noise plus Black noise) 20-6
GENERAL DEFINITIONS

The following are some general definitions that will be used in


this chapter as well as subsequent chapters.
Response must be measured on an interval or ratio scale
(i.e.; temperature, inches, degrees, yield, etc.)
Factor or Input Variable is a controlled or uncontrolled
variable whose influence upon a response is being studied in
the experiment.
May be quantitative (i.e.; temperature, degrees, time) or
qualitative (i.e.; different machines, different operators).
Level the levels of a factor are the values of the factor being
examined in the experiment. For quantitative factors, each
chosen value becomes a level (i.e.; if the experiment is being
conducted at 4 different temperatures, then the factor
temperature has 4 levels). For qualitative factors, if 3 operators
run 6 machines, the factor machine has 6 levels whereas the
factor operator has 3 levels. 20-7
STATISTICAL ASSUMPTIONS

In order to use ANOVA, we must assume the following.

Population variances of the output (response) are equal across all levels
of the given factor (Test for Equal Variances). We can test this
assumption in Minitab using the Stat>ANOVA>Test for Equal Variances
procedure.

Response means are independently and normally distributed. If


randomization and adequate sample sizes are used, this assumption is
usually valid. Note: It is important to randomize in order to reduce
external influences.

The Errors of the mathematical model are independently and normally


distributed with a mean = 0 and a constant variance. 20-8
BASIC CONCEPTS

ANOVA is based upon the comparison of 2 different estimates of the variance


( e ) of our overall population. This includes the following 3 steps.
2

1. Determine 1 estimate of the population variance from the variance Between


the sample means (Between group variation). This estimate includes the
differences among factors means.

2. Determine a second estimate of the population variance from the variance


Within the sample (Within groups variation). This represents variation within
each factor and excludes any differences among the factor means.

3. Compare these 2 estimates using the F (variance ratio) test. If they are
approximately equal in value, accept the null hypothesis.

4. In order for there to be real differences among factor means, the between sum
of squares must be significantly bigger than the within sum of squares.
20-9
To determine whether we can accept or not accept the null hypothesis, we must
calculate the Test Statistic (F ratio) using the ANOVA table as shown below.

Source Degrees Sum of Mean F P


of Squares Square
Freedom
Between Factor K-1 SS(Factor) SS(Factor)/DF MS(Factor)/MS(Error) p-
value
Within Error (n-1)-(K-1) SS(Error) SS(Error)/DF
Total Total (n-1) SS(Total)
UNDERSTANDING THE

Source indicates the different variation sources in the ANOVA Table. Factor
represents the variation introduced between the factor levels (groups). The Error is
the variation within each of the factor levels. Also, Total is the total variation.
ANOVA TABLE

Degrees of Freedom (DF) the number of degrees of freedom related to each Sum
of Squares (SS). Note: K = number of levels (groups), n = number of samples in each
level.

Sum of Squares (SS) the sum of squares measures the variability associated with
each source. SS (Factor) is due to the change in the factor level; the larger the
difference between the means of a factor level, the larger the factor sum of squares
will be. SS (error) is due to the variation within each factor level. Also, SS (Total) is
the sum of the Factor and Error sum of squares.

Mean Square (MS) is the estimate of the variance for the factor and error sources
computed by MS = SS/DF.

F the ratio of the mean square for the Factor and the mean square for the Error.
P-Value this value is compared with the alpha ( ) level (i.e.; .05) and the
following decision rule is applied; if p< alpha, reject the null hypothesis; if P
alpha, do not reject the null hypothesis. 20-10
ANALYZING THE DATA
IN MINITAB
The following statistical, graphical and diagnostic techniques will be used to
analyze our data.

Statistical

Test for Equal Variances verify the assumption of equality of variance


for all levels using Stat>ANOVA>Test for Equal Variances.
Analysis of variance table using Stat>ANOVA>One way.

Graphical

Main Effects Plots


Interval Plots

Diagnostic

Residuals and Fits


20-11
Epsilon Squared (Practical Significance)
ANOVA APPLICATION EXAMPLE
In this example, we will analyze the Distance Traveled on 24 golf balls using 4
dimple patterns. The purpose of the experiment is to investigate the effect of the
4 dimple patterns on the distance traveled and answer the question; Does the
mean distance traveled differ for the different dimple patterns?

The golf balls were randomly assigned to Tiger Woods who was using the USGA
approved test driver. Also, the golf balls were tested in random order to
reduce/eliminate bias (i.e.; weather, different day, etc).

Note: Dimple pattern is the factor (input variable) and distance traveled is the
response (output).

Open-up Golfball.mtw, located in GBData, and proceed as follows.

Note: Responses have been stacked with the distances in column C5 and golf
ball pattern in column C6.

20-12
PERFORM TEST FOR EQUAL
VARIANCES

In order to validate the assumption that the population variances of


the responses are equal, a Test for Equal Variances is required.

Stat>ANOVA>Test for Equal Variances


In the dialog window
Enter Distance in the Response field
Enter Golf ball in the Factor field
Click OK

20-13
ANOVA APPLICATION EXAMPLE - CONT.
Results: Minitab produces a Variance test plot as well as the session window results.

The Variance Plot below displays a 95% confidence interval for the response standard
deviation for each level as well as the p-values for the Bartletts and Levenes test.

Note: The best practice is to always rely on the Levenes test since the test is good
whether you are dealing with normal or non-normal data. Bartletts test is very misleading
when there are even sight departures from normality.

20-14
ANOVA APPLICATION EXAMPLE -
CONT.
As indicated by the previous plot and session window below, a p-value of .581
(greater then .05) indicates there is no evidence to support different variances.
Test for Equal Variances: Distance versus Golf Ball

95% Bonferroni confidence intervals for standard deviations

Golf
Ball N Lower StDev Upper
1 4 4.05035 8.2209 49.2999
2 6 7.02936 12.6596 42.0684
3 6 4.11665 7.4140 24.6368
4 8 6.99307 11.7321 30.1086

Bartlett's Test (normal distribution)


Test statistic = 1.70, p-value = 0.636

Levene's Test (any continuous distribution)


Test statistic = 0.67, p-value = 0.581
20-15
ANOVA APPLICATION EXAMPLE -
ANOVA TABLE
To analyze the data using the ANOVA table, proceed as
follows.

Note: For this analysis, we will use stacked data. However,


you can also use unstacked data; Stat>ANOVA>One-Way
(unstacked).

Stat>ANOVA>One-Way
In the dialog window:
Enter Distance in the response field
Enter Golf Ball in the factor field
Click OK

20-16
ANOVA TABLE - CONT.
Results: As indicated by the Minitab session window below, the p-value = .000 indicates at least 1
group mean is different. In this case, we reject the hypothesis that all group means are equal. At
least 1 dimple pattern mean is different.

Also, when the F value (F-test) is close to 1.00, the group means are similar. In this case, the f-value
= 13.75 is much greater than 1.00.

Lastly, as indicated by the 95% Confidence Interval, the dimple patterns of A and D are different from
B and C (Levels 1 and 4 are different than levels 2 and 3).
One-way ANOVA: Distance versus Golf Ball

Source DF SS MS F P
Golf Ball 3 4626 1542 13.75 0.000
Error 20 2242 112
Total 23 6868

S = 10.59 R-Sq = 67.35% R-Sq(adj) = 62.45%

Individual 95% CIs For Mean Based on


Pooled StDev
Level N Mean StDev ------+---------+---------+---------+---
1 4 272.25 8.22 (-------*------)
2 6 294.67 12.66 (-----*-----)
3 6 303.83 7.41 (-----*-----)
4 8 272.25 11.73 (-----*----)
------+---------+---------+---------+---
270 285 300 315
Pooled StDev = 10.59

20-17
Note: Pooled StDev = 10.59 is the square root of the Mean Square (Error) = 112
ANOVA APPLICATION EXAMPLE -
GRAPHICAL ANALYSIS
(MEAN EFFECTS PLOT)

Main Effects Plots are used to


plot the data means when you
have multiple factors.
The points in the plot are the
means at the various levels of
each factor with a reference
line drawn at the grand means
of the response data.
When setting-up your
worksheet, data must be
stacked.

20-18
GRAPHICAL ANALYSIS -
MAIN EFFECTS PLOT CONT.
To generate a Main Effects Plot, proceed as follows.

Stat>ANOVA>Main Effects Plot


In the dialog window:
Enter Distance in the response field
Enter Golf Ball in the factor field
Click OK

20-19
GRAPHICAL ANALYSIS -
MAIN EFFECTS PLOT CONT.
Results: As indicated by the Main Effects Plot illustrated below, the Distance
Traveled using dimple patterns 2 and 3 is much farther than the Distance
Traveled using dimple patterns 1 and 4. For further investigation, you can
eliminate dimple pattern 1 and 4 and now focus on dimple patterns 2 and 3 to
determine if there is a significant difference between them.

20-20
ANOVA APPLICATION EXAMPLE -
GRAPHICAL ANALYSIS (INTERVAL PLOT)
The interval plot is another method to graphically analyze your data. This plot
produces a plot of group means and standard error bars about the mean.

To create an Interval Plot, proceed as follows.

Stat>ANOVA>Interval Plot
In the dialog window Interval Plots
Under One Y Select With Groups
Click OK
In the dialog window Interval Plot One Y, With Groups
Enter Distance in the Graph variables field
Enter Golf Ball in the Categorical variables for grouping
Click OK

Note: Minitab calculates the standard error bars as s / n away from the
mean. The default is 1.0 standard error. However, you can specify a multiplier
for the standard error (i.e.; 2.0).

20-21
GRAPHICAL ANALYSIS -
INTERVAL PLOT CONT.
Result: As indicated by the Interval Plot below, the mean of each group is plotted
with lines extending 1 standard error above and below the means. The variability
Between the groups of Golf Balls appears to be large (i.e.; distance between
groups 3 and 4) relative to the variability Within each group of Golf Balls. In
addition, as indicated by the previous Main Effects Plot, we should focus our
attention on Dimple patterns 2 and 3 since they provide the greatest distance
traveled.

20-22
ANOVA APPLICATION EXAMPLE -
DIAGNOSTIC ANALYSIS RESIDUALS AND FITS
ANOVA assumes the errors (Residuals) are normally distributed with a mean = 0 and a
constant sigma.

We can test this by reviewing the residuals, which is each score subtracted from its
sample mean.

To analysis the Residuals and Fits, proceed as follows.

Stat>ANOVA>One-way
In the dialog window
Enter Distance in the Response field
Enter Golf Ball in the Factor field
Click on Store Residuals
Click on Store Fits
Click on Graphs
In the Graphs dialog window
Click on Normal Plot of Residuals
Residuals Versus Fits
Residuals Versus Order
Enter Golf Ball in the Residuals Versus the Variables field
Click OK
Click OK 20-23
DIAGNOSTIC ANALYSIS -
RESIDUALS AND FITS
Results: In addition to creating 4 plots, Minitab adds 2 columns onto your
worksheet; RESI1 and FITS1. As indicated on the worksheet below, the
FITS1 is simply the mean of each dimple pattern group (i.e.; 272.250 is the
mean of dimple pattern 1) and the residual (RESI1) is each distance minus
the FITS1 (i.e; 268 272.250 = -4.2500).

RESI1: 268- 272.250 = -4.250

FITS1 is the mean


of Dimple Pattern 1
20-24
NORMAL PROBABILITY PLOT
OF THE RESIDUALS
The normal probability plot below shows an approximately linear pattern that is
consistent with a normal distribution. In effect, the residuals should be close to
normal as evidenced by the points falling on a straight line.

20-25
RESIDUALS FROM DISTANCE
VERSUS GOLF BALL
The plot illustrated below tests the assumption of equal variances across groups.
As indicated, randomness should be observed.

20-26
RESIDUALS VERSUS FITTED VALUES
The plot below investigates whether the mathematical model fits equally for low to
high values of the fits. As indicated, randomness should be observed.

20-27
RESIDUALS VERSUS THE
ORDER OF THE DATA
Lastly, the plot below investigates how the residuals behave across the
experiment. Once again, randomness should be observed; a nonrandom pattern
should be a warning.

Note: This is the most important plot, since it would signal something outside the
experiment might be operating.

20-28
ANOVA APPLICATION EXAMPLES -
EPSILON SQUARED
(PRACTICAL SIGNIFICANCE)
Analysis may indicate a factor (Golf Balls) is statistically significant.
However, this analysis may not have much practical significance.
To determine practical significance, we can calculate Epsilon Squared.

Epsilon Squared provides a measure of the amount of variation of the output


(Distance traveled) that the input (Dimple pattern) of interest explains.
The calculation is performed as follows.
Sum of squares ( Factor )
E2
Sum of squares (Total )

Referring to the ANOVA table, the SS (Golf Ball) = 4626 and the SS
(Total) = 6868. Therefore,
4626
E2 67%
6868
This indicates that 67% of the variation in distance traveled is attributed to
dimple pattern. 20-29
ANOVA PRACTICAL
APPLICATION EXAMPLE
As part of the Analyze phase, the team wanted to know if a warping condition on BA-43033 was cavity related. They
performed a One-way ANOVA using cavity as the input and defect observed as the output to see if there was
significant difference between cavity to cavity.
The One-way ANOVA indicated that Cavities 4 and 8 had significantly more defects compared to the other
cavities. As a result of this study, Tool Room was able to correct these two cavities that were out of specifications
improving the overall performance the production output.

One-way ANOVA: Quantity versus Cavity


Analysis of Variance for Quantity
Source DF SS MS F P
Cavity 7 160386.5 22912.4 351.37 0.000
Error 16 1043.3 65.2
Total 23 161429.8
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev --+---------+---------+---------+----
1 3 4.33 1.53 (-*)
2 3 21.00 3.00 (*)
3 3 6.67 2.08 (*)
4 3 152.00 15.10 (-*)
5 3 1.33 1.53 (*-)
6 3 9.67 0.58 (*-)
7 3 9.33 1.53 (*-)
8 3 231.00 16.52 (*)
--+---------+---------+---------+---- 20-30
Pooled StDev = 8.08 0 70 140 210
ANOVA TEAM EXERCISE
Now its your turn! Using the data recorded during the t test team exercise, you will
perform a One-Way ANOVA on the Brand A shot distances from team 1, 2 and 3s
catapults. The instructor will collect this data from each team and make it available to
you.

During this exercise, you will analyze the shot distances on the 60
projectiles fired from the 3 catapults (20 per catapult). The purpose of this
experiment is to investigate the effect of the 3 catapults on the shot
distance and answer the question, Does the mean shot distance differ
for the different catapults? 20-31
ANOVA TEAM EXERCISE - CONT.
To analyze this data:

Response = shot distance


Factor = Catapult
Levels = Catapult 1 (team 1), Catapult 2 (team 2) and Catapult 3 (team 3)

Procedure to conduct this experiment, the following steps must be


performed.

1. State the null (Ho) and alternalte hypothesis (Ha).


2. Stack the data.
3. Perform a Test for Equal Variances verify the assumption of equality of
variance.
4. Perform statistical analysis with ANOVA table.
5. Create and interpret the Main Effects and Interval Plots.
6. Perform a diagnostic analysis on Residuals and fits.
7. What is the practical significance (Epsilon Squared)?

Working as a team (2 3 individuals per team), conduct the tests, analyze the
results and be prepared to present/discuss your results with the class. Allow 1
hours for this exercise. 20-32

También podría gustarte