Documentos de Académico
Documentos de Profesional
Documentos de Cultura
edu/stat502)
Home > Lesson 10: Analysis of Covariance (ANCOVA)
You might find it interesting that historically when SAS first came out they had PROC ANOVA and PROC
REGRESSION and that was it. Then people asked,"What about the case when you have categorical
factors and you want to do an ANOVA but now you have this other variable, a continuous variable, that you
can use as a covariate to account for extraneous variability in the response?" So, SAS came out with
PROC GLM which is the general linear model. With PROC GLM you could take the continuous regression
variable pop it into the ANOVA model and it runs. Or, conversely, if you are running a regression and you
have a categorical predictor like gender, you could include it into the regression model and it runs. The
general linear model handles both the regression and the categorical variables in the same model. There is
no PROC ANCOVA is SAS but there is PROC MIXED. PROC GLM had problems when it came to random
effects, and was effectively replaced by PROC MIXED. The same sort of process can be seen in Minitab
and accounts for the multiple tabs under Stat > ANOVA and Stat > Regression. In SAS PROC MIXED or in
Minitab's General Linear Model, you have the capacity to include covariates and correctly work with random
effects. But enough about history, let's get to this lesson.
In the first lesson we will address the classic case of ANCOVA where the ANOVA is potentially improved by
adjusting for the presence of a linear covariate. In the second part we will deal with a little bit more
complexity by considering functions of the covariate that are not linear. We will generalize the treatment of
the continuous factors to include polynomials, with linear, quadratic, cubic components that can interact
with categorical treatment levels.
We find this idea of ANCOVA not only interesting in the fact that merges these two statistical concepts, but
can also be very powerful Aha! moment for students studying statistics.
These sources of extraneous variability historically have been referred to as ‘nuisance’ or ‘concomitant’
variables. More recently, these variables are referred to as ‘covariates’.
When a continuous covariate is included in an ANOVA we have the analysis of covariance (ANCOVA). The
continuous covariates enter the model as regression variables, and we have to be careful to go through
several steps to employ the ANCOVA method.
Inclusion of covariates in ANCOVA models often means the difference between concluding there are or are
not significant differences among treatment means using ANOVA.
Males Females
78 80
43 50
103 30
48 20
80 60
H0 : μMales = μFemales
Using SAS
Using Minitab
Because the p-value > α (.05), they can’t reject the H0.
However, they recognize that the length of time that someone has been out of college is likely to influence
how much money they are making. So they also included a question asking how many years they have
been out of college (ranging from 1 to 5 years for this sample):
Females Males
Salary years Salary years
80 5 78 3
50 3 43 1
30 2 103 5
20 1 48 2
60 4 80 4
We can see that indeed, there is a general trend for people to earn more the longer they are out of college.
The fundamental idea of including a covariate is to take this trending into account and effectively ‘control
for’ the number of years they have been out of college. In other words, we hope to include the covariate in
the ANOVA so that the comparison between Males and Females can be made without the complicating
factor of years out of college.
Yi = β0 + β1 (Xi ) + ϵ i
Where β0 is the intercept and β1 is the slope of the line. The significance of a regression is tested by
calculating a sums of squares due to the regression variable SS(Regr), calculating a mean squares for
regression, MS(Regr), and using an F-test with F = MS(Regr) / MSE. In the case of a simple linear
regression, this test is equivalent to the t-test for H0 : β1 = 0.
However, In adding the regression variable to our one-way ANOVA model, we can envision a notational
problem. In the balanced one-way ANOVA we have the grand mean (μ), but now we also have the intercept
β0 . To get around this, we can use
∗
X = Xij − X̄ ..
Secondly, we have to be sure that the regression relationship of the response with the covariate has the
same slope for each treatment group. This is an extremely important point. In our example, we need to be
sure that the lines for Males and Females are parallel (have equal slope).
Depending on the outcome of the test for equal slopes, we have two alternative ways to finish up the
ANCOVA: 1) fit a common slope model and adjust the treatment SS for the presence of the covariate, or 2)
evaluate the differences in means at at least three levels of the covariate.
Note: The figure above is presented as a guideline, and does require some subjective judgement. Small
sample sizes, for example, may result in none of the individual regressions in step 1 being statistically
significant, yet the inclusion of the covariate in the model may still be advantageous. Exploratory data
analysis and working with the regression diagnostics is an important aspect of ANCOVA.
Males
In both cases, the simple linear regressions are significant, so the slopes are not = 0.
We will also include a ‘treatment × covariate’ interaction term and the significance of this term answers our
question. If the slopes differ significantly among treatment levels, the interaction p-value will be < 0.05.
Note: In SAS, we specify the treatment in the class statement, indicating that these are categorical
levels. By NOT including the covariate in the class statement, it will be treated as a continuous
variable for regression in the model statement.
So here we see that the slopes are equal and in a plot of the regressions we see that the lines are parallel.
Please Note: In SAS, the model statement automatically creates an intercept, and so the ANCOVA model
is technically over-parameterized. To get the slopes and intercepts for the covariate directly, we have to re-
parameterize the model. This entails suppressing the intercept (noint), and then specifying that we want
the solutions, (solution), to the model:
Here is what the SAS code looks like for this (equal_sascode_05.txt [8]):
Here is the output:
In the first section of the output above is reported a separate intercept for each gender, the ‘Estimate’ value
for each gender, and a common slope for both genders, labeled ‘Years’.
Thus, the estimated regression equation for Females is y-hat = 2.7 + 15.1(Years), and for males it is y-hat =
25.1 + 15.1(Years)
To this point in this analysis we can see that 'gender' is now significant. By removing the impact of the
covariate, we went from
(without covariate consideration)
to
Females Males
Salary years Salary years
80 5 78 3
50 3 43 1
30 2 103 5
20 1 48 2
60 4 80 4
Males
Open the Male dataset in the Minitab project file salary_male_data.txt [9] [11].
From the menu bar, select Stat > Regression > Regression.
In the pop-up window, select salary into Response and years into Predictors as shown below.
Click OK, and here is the output that Minitab displays:
Females
From the menu bar select Stat > Regression > Regression.
In the pop-up window, select salary into Response and years into Predictors as shown below.
Click OK, and here is the output that Minitab displays:
In both cases, the simple linear regressions are significant, so the slopes are not = 0.
In Minitab we must now use GLM (general linear model) and be sure to include the covariate in the model.
We will also include a ‘treatment x covariate’ interaction term and the significance of this term is what
answers our question. If the slopes differ significantly among treatment levels, the interaction p-value will be
< 0.05.
First, open the dataset in the Minitab project file salary_data.txt [12].
Then, from the menu select Stat > ANOVA > GLM (general linear model).
In the dialog box, select salary into Responses and gender into Model, and type gender*years as well.
Then, in this dialog box, click on the button "Covariates..." under the text boxes. Select years as Covariates.
Next, click on the Model box, use the shift key to highlight the gender and years, and then 'add' to create
the gender*years interaction:
Click OK, and the OK again and here is the output that Minitab will display:
So here we see that the slopes are equal and in a plot of the regressions we see that the lines are parallel.
Females Males
Salary years Salary years
80 5 42 1
50 3 112 4
30 2 92 3
20 1 62 2
60 4 142 5
We would see in Step 2 that we do have a significant treatment × covariate interaction. Using this SAS
program with the new data in it, (unequal_sascode_01.txt [13]), shown below.
We can do the same thing with the unequal slopes model to generate individual slopes and intercepts for
'gender' as follows in SAS (unequal_sascode_02.txt [14]):
Output:
Here the intercepts are the Estimates for effects labeled 'gender' and the slopes are the Estimates for the
effect labeled 'years*gender'. Thus, the regression equations for this unequal slopes model are:
The slopes of the regression lines differ significantly and are not parallel:
Go to Stat > ANOVA > GLM (general linear model) and follow the same sequence of steps as in Lesson
10.4a.
So here we can’t simply remove the interaction term and compare the treatment means at the mean level of
the covariate (3 years out of college). The magnitude of the difference between males and females differs
(giving rise to the interaction significance).
Links:
[1] http://www.dynamicdrive.com
[2]
https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/ancova_example_sascode.txt
[3]
https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/ancova_example_data.txt
[4] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/equal_sascode_01.txt
[5] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/equal_sascode_02.txt
[6] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/equal_sascode_03.txt
[7] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/equal_sascode_04.txt
[8] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/equal_sascode_05.txt
[9] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/salary_male_data.txt
[10]
https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/salary_female_data.txt
[11] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/Male-salary.MPJ
[12] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/salary_data.txt
[13]
https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/unequal_sascode_01.txt
[14]
https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/unequal_sascode_02.txt
[15] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/salary-new_data.txt