Está en la página 1de 20

Published on STAT 502 (https://onlinecourses.science.psu.

edu/stat502)
Home > Lesson 10: Analysis of Covariance (ANCOVA)

Lesson 10: Analysis of Covariance (ANCOVA)


Key Learning Goals for this Lesson:

Introduce the General Linear Model (GLM)


How to include a continuous covariate variable in ANOVA.
Understand testing sequences for ANCOVA
Understand equal and unequal slopes model development

See Textbook: Chapter 22

A Few Comments About ANCOVA


In the next two units we are going to build on concepts that we learned so far in this course, but these next
two units are also going to remind us of the principles and foundations of regression that you learned in
STAT 501. These are going to expand on the idea of the general linear model and how it can handle both
quantitative and qualitative predictors.  In the general linear model, when we're talking about the analysis of
covariance, this can be thought of as sort of the larger picture, an 'umbrella' procedure if you will. If you
have a model where you have no continuous factors you simply have an ANOVA. If you have a model with
no categorical factors you simply have a regression. If you have a model that has both continuous and
categorical factors then this is a General Linear Model and you can use ANCOVA to include both of these
different types of factors.

You might find it interesting that historically when SAS first came out they had PROC ANOVA and PROC
REGRESSION and that was it. Then people asked,"What about the case when you have categorical
factors and you want to do an ANOVA but now you have this other variable, a continuous variable, that you
can use as a covariate to account for extraneous variability in the response?" So, SAS came out with
PROC GLM which is the general linear model. With PROC GLM you could take the continuous regression
variable pop it into the ANOVA model and it runs. Or, conversely, if you are running a regression and you
have a categorical predictor like gender, you could include it into the regression model and it runs. The
general linear model handles both the regression and the categorical variables in the same model. There is
no PROC ANCOVA is SAS but there is PROC MIXED.  PROC GLM had problems when it came to random
effects, and was effectively replaced by PROC MIXED.  The same sort of process can be seen in Minitab
and accounts for the multiple tabs under Stat > ANOVA and Stat > Regression.  In SAS PROC MIXED or in
Minitab's General Linear Model, you have the capacity to include covariates and correctly work with random
effects.  But enough about history, let's get to this lesson.

In the first lesson we will address the classic case of ANCOVA where the ANOVA is potentially improved by
adjusting for the presence of a linear covariate. In the second part we will deal with a little bit more
complexity by considering functions of the covariate that are not linear. We will generalize the treatment of
the continuous factors to include polynomials, with linear, quadratic, cubic components that can interact
with categorical treatment levels.

We find this idea of ANCOVA not only interesting in the fact that merges these two statistical concepts, but
can also be very powerful Aha! moment for students studying statistics.

Introduction to Analysis of Covariance (ANCOVA)


A ‘classic’ ANOVA tests for differences in mean responses to categorical factor (treatment) levels. When we
have heterogeneity in experimental units sometimes restrictions on the randomization (blocking) can
improve the test for treatment effects. In some cases, we don’t have the opportunity to construct blocks, but
can recognize and measure a continuous variable as contributing to the heterogeneity in the experimental
units.

These sources of extraneous variability historically have been referred to as ‘nuisance’ or ‘concomitant’
variables. More recently, these variables are referred to as ‘covariates’.

When a continuous covariate is included in an ANOVA we have the analysis of covariance (ANCOVA). The
continuous covariates enter the model as regression variables, and we have to be careful to go through
several steps to employ the ANCOVA method.

Inclusion of covariates in ANCOVA models often means the difference between concluding there are or are
not significant differences among treatment means using ANOVA.

10.1 - Role of the Covariate


To illustrate the role the covariate has in the ANCOVA, let’s look at a hypothetical situation wherein
investigators are comparing salaries of male vs. female college graduates. A random sample of 5
individuals for each gender is compiled, and a simple one-way ANOVA is performed:

Males  Females
78 80
43 50
103 30
48 20
80 60

H0 : μMales = μFemales

Using SAS
Using Minitab

SAS coding for the One-way ANOVA (ancova_example_sascode.txt [2])


Here is the output we get:

Because the p-value > α (.05), they can’t reject the H0.

A plot of the data shows the situation:

However, they recognize that the length of time that someone has been out of college is likely to influence
how much money they are making. So they also included a question asking how many years they have
been out of college (ranging from 1 to 5 years for this sample):

Females Males
Salary years Salary years
80 5 78 3
50 3 43 1
30 2 103 5
20 1 48 2
60 4 80 4

We can see that indeed, there is a general trend for people to earn more the longer they are out of college.
The fundamental idea of including a covariate is to take this trending into account and effectively ‘control
for’ the number of years they have been out of college. In other words, we hope to include the covariate in
the ANOVA so that the comparison between Males and Females can be made without the complicating
factor of years out of college.

10.2 - The Covariate as a Regression Variable


ANCOVA by definition is a general linear model that includes both ANOVA (categorical) predictors and
Regression (continuous) predictors. The simple linear regression model is:

Yi = β0 + β1 (Xi ) + ϵ i

Where β0 is the intercept and β1 is the slope of the line. The significance of a regression is tested by
calculating a sums of squares due to the regression variable SS(Regr), calculating a mean squares for
regression, MS(Regr), and using an F-test with F = MS(Regr) / MSE. In the case of a simple linear
regression, this test is equivalent to the t-test for H0 : β1 = 0.

However, In adding the regression variable to our one-way ANOVA model, we can envision a notational
problem. In the balanced one-way ANOVA we have the grand mean (μ), but now we also have the intercept
β0 . To get around this, we can use


X = Xij − X̄ ..

and get the following as an expression of our covariance model:



Y ij = μ + τi + γ (X ) + ϵ ij
The Type III (model fit) sums of squares for the treatment levels in this model are being corrected (or
adjusted) for the regression relationship. This has the effect of evaluating the treatment levels ‘on the same
playing field’, that is, comparing the means of the treatment levels at the mean value of the covariate. This
process effectively removes variation that was originally seen in the treatment level means due to the
covariate.

10.3 - Steps in ANCOVA


To use a coviariate in ANCOVA, we have to go through several steps. First, we need to establish that for at
least one of the treatment groups there is a significant regression relationship with the covariate. Otherwise,
including the covariate in the model won’t improve the estimation of treatment means.

Secondly, we have to be sure that the regression relationship of the response with the covariate has the
same slope for each treatment group. This is an extremely important point. In our example, we need to be
sure that the lines for Males and Females are parallel (have equal slope).

Depending on the outcome of the test for equal slopes, we have two alternative ways to finish up the
ANCOVA: 1) fit a common slope model and adjust the treatment SS for the presence of the covariate, or 2)
evaluate the differences in means at at least three levels of the covariate.

These steps are diagrammed below:

Note: The figure above is presented as a guideline, and does require some subjective judgement.  Small
sample sizes, for example, may result in none of the individual regressions in step 1 being statistically
significant, yet the inclusion of the covariate in the model may still be advantageous.  Exploratory data
analysis and working with the regression diagnostics is an important aspect of ANCOVA.

10.4 - Equal Slopes Model - SAS


Using our Salary example using the data in the table below, we can run through the steps for the ANCOVA.
Females Males
Salary years Salary years
80 5 78 3
50 3 43 1
30 2 103 5
20 1 48 2
60 4 80 4

Step 1: Are all regression slopes = 0.


A simple linear regression can be run for each treatment group, Males and Females. (Note: To perform
regression analysis on each gender group in Minitab, we will have to sub-divide the salary data manually
and save them separately.  See the next page for Minitab example.)

Running these procedures using statistical software we get the following:

Males

Use the following SAS code (equal_sascode_01.txt [4])

And here is the output that you get:


Females

Use the following SAS code (equal_sascode_02.txt [5])

And here is the output for this run:

In both cases, the simple linear regressions are significant, so the slopes are not = 0.

Step 2: Are the slopes equal?


We can test for this using our statistical software.
In SAS we now use proc mixed and include the covariate in the model (equal_sascode_03.txt [6]).

We will also include a ‘treatment × covariate’ interaction term and the significance of this term answers our
question. If the slopes differ significantly among treatment levels, the interaction p-value will be < 0.05.

Note: In SAS, we specify the treatment in the class statement, indicating that these are categorical
levels. By NOT including the covariate in the class statement, it will be treated as a continuous
variable for regression in the model statement.

So here we see that the slopes are equal and in a plot of the regressions we see that the lines are parallel.

Step 3: Fit an Equal Slopes Model


We can now proceed to fit an Equal Slopes model by removing the interaction term.  Again, we will use our
statistical software SAS (see equal_sascode_04.txt [7]).

and obtain the following final results:

Please Note: In SAS, the model statement automatically creates an intercept, and so the ANCOVA model
is technically over-parameterized. To get the slopes and intercepts for the covariate directly, we have to re-
parameterize the model. This entails suppressing the intercept (noint), and then specifying that we want
the solutions, (solution), to the model:

Here is what the SAS code looks like for this (equal_sascode_05.txt [8]):
Here is the output:

In the first section of the output above is reported a separate intercept for each gender, the ‘Estimate’ value
for each gender, and a common slope for both genders, labeled ‘Years’.

Thus, the estimated regression equation for Females is y-hat = 2.7 + 15.1(Years), and for males it is y-hat =
25.1 + 15.1(Years)

To this point in this analysis we can see that 'gender' is now significant. By removing the impact of the
covariate, we went from
(without covariate consideration)

to

(adjusting for the covariate).

10.4a - Equal Slopes Model - using Minitab


Using our Salary example and the data in the table below, we can run through the steps for the ANCOVA. 
On this page we will go through the steps using Minitab.

Females Males
Salary years Salary years
80 5 78 3
50 3 43 1
30 2 103 5
20 1 48 2
60 4 80 4

Step 1: Are all regression slopes = 0.


A simple linear regression can be run for each treatment group, Males and Females. (Note: To perform
regression analysis on each gender group in Minitab, we will have to sub-divide the salary data manually
and separately saving the male data into salary_male_data.txt [9] and female data into
salary_female_data.txt [10].

Running these procedures using statistical software we get the following:

Males

Open the Male dataset in the Minitab project file salary_male_data.txt [9] [11].

From the menu bar, select Stat > Regression > Regression.

In the pop-up window, select salary into Response and years into Predictors as shown below.
Click OK, and here is the output that Minitab displays:

Females

Open Minitab dataset salary_female_data.txt [10].

From the menu bar select Stat > Regression > Regression.

In the pop-up window, select salary into Response and years into Predictors as shown below.
Click OK, and here is the output that Minitab displays:

In both cases, the simple linear regressions are significant, so the slopes are not = 0.

Step 2: Are the slopes equal?


We can test for this using our statistical software.

In Minitab we must now use GLM (general linear model) and be sure to include the covariate in the model.
We will also include a ‘treatment x covariate’ interaction term and the significance of this term is what
answers our question. If the slopes differ significantly among treatment levels, the interaction p-value will be
< 0.05.

First, open the dataset in the Minitab project file salary_data.txt [12].

Then, from the menu select Stat > ANOVA > GLM (general linear model).

In the dialog box, select salary into Responses and gender into Model, and type gender*years as well.
Then, in this dialog box, click on the button "Covariates..." under the text boxes. Select years as Covariates.

Next, click on the Model box, use the shift key to highlight the gender and years, and then 'add' to create
the gender*years interaction:

 Click OK, and the OK again and here is the output that Minitab will display:
So here we see that the slopes are equal and in a plot of the regressions we see that the lines are parallel.

Step 3: Fit an Equal Slopes Model


We can now proceed to fit an Equal Slopes model by removing the interaction term. This can be easily
accomplished by Sarting agin with ANOVA>General Linear Model, but now click on the second item:
To generate the mean comparisot>Anova>General Linear Model, but now click on Comparisons.

10.5 - Unequal Slopes Model - SAS


If the data collected in the example study were instead as follows:

Females Males
Salary years Salary years
80 5 42 1
50 3 112 4
30 2 92 3
20 1 62 2
60 4 142 5

We would see in Step 2 that we do have a significant treatment × covariate interaction.  Using this SAS
program with the new data in it, (unequal_sascode_01.txt [13]), shown below.

We get the following output:

Generating Covariate Regression Slopes and Intercepts

We can do the same thing with the unequal slopes model to generate individual slopes and intercepts for
'gender' as follows in SAS (unequal_sascode_02.txt [14]):
Output:

Here the intercepts are the Estimates for effects labeled 'gender' and the slopes are the Estimates for the
effect labeled 'years*gender'. Thus, the regression equations for this unequal slopes model are:

Females y = 3.0 + 15(Y ears)

Males y = 15 + 25(Y ears)

The slopes of the regression lines differ significantly and are not parallel:

And here is the output:


In this case, we see a significant difference at each level of the covariate specified in the lsmeans
statement. The magnitude of the difference between males and females differs (giving rise to the interaction
significance). In more realistic situations, a significant treatment × covariate interaction often results in
significant treatment level differences at certain points along the covariate axis.

10.5a - Unequal Slopes Model - Minitab


With new a new data file, salary-new_data.txt [15],.  When we re-run the program with this new data and find
that we get a significant interaction between gender and years.

To do this, open the Minitab dataset salary-new_data.txt [15].

Go to Stat > ANOVA > GLM (general linear model) and follow the same sequence of steps as in Lesson
10.4a.

So here we can’t simply remove the interaction term and compare the treatment means at the mean level of
the covariate (3 years out of college). The magnitude of the difference between males and females differs
(giving rise to the interaction significance).

Source URL: https://onlinecourses.science.psu.edu/stat502/node/183

Links:
[1] http://www.dynamicdrive.com
[2]
https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/ancova_example_sascode.txt
[3]
https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/ancova_example_data.txt
[4] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/equal_sascode_01.txt
[5] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/equal_sascode_02.txt
[6] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/equal_sascode_03.txt
[7] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/equal_sascode_04.txt
[8] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/equal_sascode_05.txt
[9] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/salary_male_data.txt
[10]
https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/salary_female_data.txt
[11] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/Male-salary.MPJ
[12] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/salary_data.txt
[13]
https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/unequal_sascode_01.txt
[14]
https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/unequal_sascode_02.txt
[15] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/salary-new_data.txt

También podría gustarte