Regression and Correlation

Regression and Correlation
Correlation Coefficient
Testing the Correlation Coefficient
Simple Linear Regression
The linear correlation coefficient, denoted by , is a measure of the strength of the
linear relationship existing between two variables, say X and Y, that is independent
of their respective scales of measurement.
To visualize the possible underlying linear relationship between X and Y, we can plot
individual pairs of observations on a two-dimensional graph called a scatter
diagram.
Properties:
A linear correlation coefficient can only assume values between -1 and 1,
inclusive of endpoints.
The sign of describes the direction.
A positive value means that the line slopes upward to the right, and so as X
increases, the value of Y increases.
A negative value means that the line slopes downward to the right, and so as X
increases, the value of Y decreases.
If = 0, then there is no linear correlation between X and Y. However, this does

not mean a lack of association.
It is possible to obtain a zero correlation even if X and Y are related, though their
relationship is nonlinear, such as quadratic relationship.
Properties:
When is -1 or 1, there is a perfect linear relationship between X and Y and all
the points (x,y) fall on a straight line. A that is close to -1 or 1 indicates a strong
linear relationship.
A strong linear relationship does not necessarily imply that X causes Y or Y causes
X. It is possible that a third variable may have caused the change in both X and Y,
producing the observed relationship.
A point estimator of is the Pearson product moment correlation coefficient.
The Pearson product moment correlation coefficient between X and Y, denoted by
r, is defined as:
r=
n
n
n n
X
Y
(
X
)(
i=1 i i
i=1 i
i=1 Yi )
2 (n X )2 )(n n Y2 (n Y )2 )
(n n
X
i=1 i
i=1 i
i=1 i
i=1 i
r=
n
n
n n
X
Y
(
X
)(
i
i
i
i=1
i=1
i=1 Yi )
2
n
2 )(n n Y2 (n Y )2 )
(n n
X
(
X
)
i=1 i
i=1 i
i=1 i
i=1 i
Example: Hypothetical Data

x
y
1
3.5
2
4.5
2
4
2
3.5
3
6.5
3
8
3
6
4
7.9
4
7
x
y
5
9.4
6
9.3
6
11
6
7
7
10.5 12.4 11.5
7
10
8
15
8
11
8
13.7
Answer: 0.9511
3
7
Testing the Correlation Coefficient

Tests of Hypotheses for
Ho
Ha
= o
< o
> o
o
Test Statistic
t=
(r o ) n2
1r2 )
Region of
Rejection
t < t (v = n 2)
t > t (v = n 2)
|t| > t (v = n 2)
2
Example: Use the hypothetical data. Suppose that the linear correlation coefficient
between X and Y in the past is 0.90. We want to determine if the correlation has
significantly increased compared to the past. Test at 5% level of significance.
Simple Linear Regression

The simple linear regression model is given by the equation
Yi = 0 + 1 Xi + i
where
Yi is the value of the response variable for the ith element
Xi is the value of the explanatory variable for the ith element
0 is a regression coefficient that gives the Y-intercept of the regression line
1 is a regression coefficient that gives the slope of the line
i is the random error term for the ith element, where the i s are independent,
normally distributed with mean 0 and variance 2 for i = 1, 2, , n
n is the number of elements
Estimation using the Method of Least Squares

Formulas for b0 (estimate for 0 ) and b1 (estimate for 1 ):
b1 =
n
n
n n
X
Y
(
X
)(
i
i
i
i=1
i=1
i=1 Yi )
2 (n X )2
n n
X
i=1 i
i=1 i
b0 =
Y b1
X
= b0 + b1 X.
Thus, the estimated regression equation is given by Y

Example: Use the hypothetical data.
= 2.016 + 1.383X
Answer: Y
We can see that as X increases by one unit, the mean of the response variable, Y, is
estimated to increase by 1.383.
2.016 has no meaningful interpretation because X = 0 is not within the range of
values we used in estimation.
Substitute x = 4. Substitute x = 5.

A 100(1 )% CI estimate for 1 is (b1 t
v=n2
A 100(1 )% CI estimate for 0 is (b0 t
v = n2
Sb1 , b1 + t (v = n 2)Sb1 )
2
Sb0 , b0 + t (v = n 2)Sb0 )
2
To determine whether or not there is significant linear relationship between Y & X,

we test Ho: 1 = 0 against Ha: 1 0.
To assess whether 0 is different from 0, we test Ho: 0 = 0 against Ha: 0 0.

The coefficient of determination, denoted by R2, is defined as the proportion of
the variability in the observed values of the response variable that can be explained
by the explanatory variable through their linear relationship.
r2 will be between 0 and 1 because 1 r 1. If a model has perfect predictability,
then R2 = 1. If a model has no predictive capability, then R2 = 0.
Example: Use the hypothetical data.
R2 = (0.9511)2 = 0.9046
90.46% of the variability in Y can be explained by X through the SLRM.

Regression and Correlation

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Regression and Correlation

Cargado por

Copyright:

Formatos disponibles

Regression and Correlation

If = 0, then there is no linear correlation between X and Y. However, this does

Example: Hypothetical Data

Testing the Correlation Coefficient

Simple Linear Regression

Estimation using the Method of Least Squares

Estimation using the Method of Least Squares

Estimation using the Method of Least Squares

A 100(1 )% CI estimate for 0 is (b0 t

To determine whether or not there is significant linear relationship between Y & X,

Estimation using the Method of Least Squares

También podría gustarte