Quant1 Week8 OLS PDF

Ordinary Least Squares Regression
PO7001: Quantitative Methods I

Kenneth Benoit
24 November 2010
Independent and Dependent variables

I
A dependent variable represents the quantity we wish to

explain variation in, or the thing we are trying to explain
Typical examples of a dependent variable in political science:
I
I
I
An independent variable represents a quantity whose variation

will be used to explain variation in the dependent variable
Typical examples of independent variables in political science:
I
I
I
I
votes received by a governing party

support for a referendum result like Lisbon
party support for European integration
demographic: gender, national background, age

economic: socioeconomic status, income, national wealth
political: party affiliation of ones parents
institutional: district magnitude, electoral system, presidential
v. parliamentary
behavioural: campaign spending levels
Using this language implies causality: X Y
The importance of variation

I
Variation in outcomes of the dependent variable are what

we seek to explain in social and political research
We seek to explain these outcomes using (independent)

variables
The very language here, the term variable suggests that

the quantity so named has to vary
Conversely, a quantity that does not vary is impossible to

study in this way. This also applies to samples that do not
vary: these will not help us in research.
Typically when we collect data, we wish to have as much

variation in our sample as possible
Example: In the returns from schooling example, it would be

best to have as much variation in schooling as possible, to
maximize the leverage of our research
Different functional relationships

Linear A linear relationship, also known as a straight-line
relationship, exists if a line drawn through the central
tendency of the points is a straight-line
Curvilinear Exists if the relationship between variables is not a
straight-line function, but is instead curved
Example: Television viewing (Y ) as a function of age (X )
(More on this in Quant 2)
Correlations
The central idea behind correlation is that two variables have

a relationship such that there is a systematic relationship
between the their values
This relationship can be positive or negative
This relationship varies in strength
Bivariate associations are usually depicted graphically by use

of a scatterplot
We can also summarize (as per an earlier week) the

correlation using Pearsons r to show the direction and
strength of the bivariate relationship
Pearsons r revisited
=
=
P
)(Yi Y )
(X X
qPi i
2
2
i (Xi X ) (Yi Y )
Sum of Products
p
Sum of Squaresx Sum of Squaresy
SP
p
SSx SSy
Testing the significance of Pearsons r
Pearsons r measures correlation in the sample but the

association we are interested in exists in the population
Question: what is the probability that any correlation we

measure in a sample really exists in the population, and is not
merely due to sampling error
H0 : x,y = 0 No correlation exists in the population
HA : x,y 6= 0
Significance can be tested by the t-ratio:
r N 2
t=
1 r2
Pearsons r significance testing: Example

I
Lets assume we have data such that: r = .24, N = 8
So computation of t is:
t
=
=
=
.24 8 2
p
1 (.24)2
.24(2.45)
1 .0576
.59
.59
=
= .61
.97
.9424
From R (using qt()), we know that the critical value for t with
df=6, = .05 is 2.447 so we do not reject H0
In R there is a test for this called cor.test(x,y)
Pearsons r significance testing using R

> x <- c(12,10,6,16,8,9,12,11)
> y <- c(12,8,12,11,10,8,16,15)
> cor(x,y)
[1] 0.2420615
> # compute the empirical t-value
> (t.calc <- (.24*sqrt(8-2) / sqrt(1-.24^2)))
[1] 0.6055768
> # compute the critical t-value
> (t.crit <- qt(1-.05/2, 6))
[1] 2.446912
> # compare
> t.calc > t.crit
[1] FALSE
> # using Rs built-in correlation significance test
> cor.test(x,y)
Pearsons product-moment correlation
data: x and y
t = 0.6111, df = 6, p-value = 0.5636
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5577490 0.8087778
sample estimates:
cor
0.2420615
Regression analysis
I
Recall the basic linear model:

yi = a + bXi
Here the relationship is determined by two parameters:

a the intercept: this refers to the value of Y when
X is zero
b the slope: or the rate of change in Y for a
one-unit change in X . Also known as the
regression coefficient
Note that this implies a straight line, perfect relationship
however. Because this is never the case in real research, we
also have an error term or residual ei :
Yi = 0 + 1 xi + i
and we use the and terminology instead of the high-school
math quantities a and b
Example
35
30
25
20
15
Sentence length (Y)
40
45
par(mar=c(4,4,1,1))
x <- c(0,3,1,0,6,5,3,4,10,8)
y <- c(12,13,15,19,26,27,29,31,40,48)
plot(x, y, xlab="Number of prior convictions (X)",
ylab="Sentence length (Y)", pch=19)
abline(h=c(10,20,30,40), col="grey70")
Number of prior convictions (X)
10
Least squares formulas

For the three parameters (simple regression):
I
the regression coefficient:

P
(xi x)(yi y )
P
1 =
(xi x)2
the intercept:
0 = y 1 x
and the residual variance 2 :

1 X
2 =
[yi (0 + 1 xi )]2
n2
Least squares formulas continued
Things to note:
I
the prediction line is y = 0 + 1 x

the value yi = 0 + 1 xi is the predicted value for xi
the residual is ei = yi yi
The residual sum of squares (RSS) =
The estimate for 2 is the same as
2
i ei
2 = RSS/(n 2)
Example to show fomulas in R

> x <- c(0,3,1,0,6,5,3,4,10,8)
> y <- c(12,13,15,19,26,27,29,31,40,48)
> (data <- data.frame(x, y, xdev=(x-mean(x)), ydev=(y-mean(y)),
+
xdevydev=((x-mean(x))*(y-mean(y))),
+
xdev2=(x-mean(x))^2,
+
ydev2=(y-mean(y))^2))
x y xdev ydev xdevydev xdev2 ydev2
1
0 12
-4 -14
56
16
196
2
3 13
-1 -13
13
1
169
3
1 15
-3 -11
33
9
121
4
0 19
-4
-7
28
16
49
5
6 26
2
0
0
4
0
6
5 27
1
1
1
1
1
7
3 29
-1
3
-3
1
9
8
4 31
0
5
0
0
25
9 10 40
6
14
84
36
196
10 8 48
4
22
88
16
484
> (SP <- sum(data$xdevydev))
[1] 300
> (SSx <- sum(data$xdev2))
[1] 100
> (SSy <- sum(data$ydev2))
[1] 1250
> (b1 <- SP / SSx)
[1] 3
From observed to predicted relationship
In the above example, 0 = 14, 1 = 3
This linear equation forms the regression line
The regression line always passes through two points:

I
I
the point (x = 0, y = 0 )
the point (
x , y ) (the average X predicts the average Y )
2
i ei
The residual sum of squares (RSS) =
The regression line is that which minimizes the RSS
Plot of regression example

45
Regression line
30
25
20
15
Sentence length (Y)
35
40
Yhat = 14 + 3X
Y intercept
Number of prior convictions (X)
10
Requirements for regression
1. Both variables should be measured at the interval level

2. The relationship (in the population) between X and Y is
linear
I
I
A transformation may fix non-linearity in some cases

Outliers or influential points may need special treatment
3. Sample is randomly chosen (necessary for inference)

4. Both variables must be normally distributed, unless we have a
very large sample
Regression terminology
y is the dependent variable

I
referred to also (by Greene) as a regressand
X are the independent variables

I
I
also known as explanatory variables

also known as regressors
y is regressed on X
The error term is sometimes called a disturbance
Notation
Independent variables X
Dependent variable Y
Yi is a random variable (not directly observed)
yi is a realised value of Yi (observed)
So dependent variable sometimes refers to a set of numbers in

your dataset (y ) and sometimes to a random variable at each
i (Yi ).
Interpreting the regression results

I
The Y -intercept corresponds to the expected value of Y when

X = 0 (may or may not be meaningful)
The regression coefficient 1 refers to the expected change in

Y resulting from a one-unit change in X
Typically we are more interested in 1 than 0 , but 0 forms a

vital part of any regression estimation
We can make a prediction yi for any i (and xi , although we

should be careful to choose reasonable values of xi )
Example: For xi =13,
yi
= 0 + 1 (13)
= 14 + 3(13)
= 14 + 39
= 53
Sums of squares (ANOVA)
TSS Total sum of squares
P
(yi y )2
P
ESS Estimation or Regression sum of squares (
yi y )2
P 2 P
RSS Residual sum of squares
ei = (
yi yi )2
The key to remember is that TSS = ESS + RSS
R2
How much of the variance did we explain?

PN
PN
2
(yi y )2
RSS
i=1 (yi yi )
= 1 PN
= Pi=1
R =1
N
TSS
)2
)2
i=1 (yi y
i=1 (yi y
2
Can be interpreted as the proportion of total variance explained by

the model.
R 2 and Pearsons r
For simple linear regression (i.e. one independent variable), R 2 is

the same as the correlation coefficient, Pearsons r , squared.
R2
I
Note that computing the regression line comes from the same
sums of squares (SP, SSx , SSY ) used in computing the
correlation r
The squared correlation R 2 is an important quantity in

regression (sometimes called the coefficient of determination)
R 2 is the proportion of variance in Y determined by variation

in X
0 R 2 1.0
But remember that R 2 is not a regression parameter
Computation:
SP
r=p
SSx SSy
R 2 computation example
=
=
=
SP
SSx SSy
300
(100)(1250)
300
125000
300
353.55
.85
r 2 is then (.85)2 = .72

This means that 72% of the sentence length is explained by the
number of priors
R2
I
A much over-used statistic: it may not be what we are

interested in at all
Interpretation: the proportion of the variation in y that is

explained lineraly by the independent variables
Defined in terms of sums of squares:

ESS
TSS
RSS
= 1
TSS
P
(yi yi )2
= 1 P
(yi y )2
R2 =
Alternatively, R 2 is the squared correlation coefficient between

y and y
R 2 continued
When a model has no intercept, it is possible for R 2 to lie

outside the interval (0, 1)
R 2 rises with the addition of more explanatory variables. For

n1
this reason we often report adjusted R 2 : 1 (1 R 2 ) nk1
where k is the total number of regressors in the linear model
(excluding the constant)
Whether R 2 is high or not depends a lot on the overall

variance in Y
To R 2 values from different Y samples cannot be compared
R 2 continued
Solid arrow:
P variation in y when X is unknown (TSS Total Sum of
Squares (yi y )2 )
Dashed arrow: variation

in y when X is known (ESS Estimation
P
Sum of Squares (
yi y )2 )
R 2 decomposed
= y +
Var(y ) = Var(
y ) + Var(e) + 2Cov(
y , e)
Var(y ) = Var(
y ) + Var(e) + 0
X
X
2
(yi y ) /N =
(
yi y
)2 /N +
(ei e)2 /N
X
X
X
(yi y )2 =
(
yi y
)2 +
(ei e)2
X
X
X
(yi y )2 =
(
yi y
)2 +
ei2
TSS
TSS/TSS
= ESS + RSS
= ESS/TSS + RSSTSS
1 = R 2 + unexplained variance
Example from height-weight data

(of course) there is a direct way to compute regression statistics in R:
> # regression models in R
> regmodel <- lm(y ~ x)
> summary(regmodel)
Call:
lm(formula = y ~ x)
Residuals:
Min
1Q Median
-5.2121 -2.5682 -0.6515
3Q
1.5303
Max
8.2424
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.0909
21.6202
2.271
0.0636 .
x
0.7576
0.3992
1.898
0.1065
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 4.587 on 6 degrees of freedom
Multiple R-squared: 0.375, Adjusted R-squared: 0.2709
F-statistic: 3.601 on 1 and 6 DF, p-value: 0.1065
Illustration using the Anscombe dataset

[show R analysis here]
## Illustration using Anscombe dataset
data(anscombe)
attach(anscombe)
round(coef(lm(y1~x1)), 2)
# same b0, b1
round(summary(lm(y1~x1))$r.squared, 2)
# same R^2
# plot the four x-y pairs
par(mfrow=c(2,2), mar=c(4, 4, 1, 1)+0.1) # 4 plots in one graphic window
plot(x1,y1)
abline(lm(y1~x1), col="red", lty="dashed")
plot(x2,y2)
plot(x3,y3)
plot(x4,y4)1
detach(anscombe)
y2
4
y1
10
11
Anscombe plots
10
12
14
10
12
14
x2
y4
6
y3
10
10
12
12
x1
10
x3
12
14
10
12
14
x4
16
18
Extensions of the regression model

I
The regression model can be generalized to multiple

regression, which involves regressing Y on several
independent variables X1 , X2 , etc.
Regression allows us to isolate the linear contribution of each

unit of each Xk on Y , by holding everything else constant
This is the most common and most powerful basic technique

in social science statistics, and something that is used in
virtually every analysis that attempts to establish any sort of
cause and effect
The additional X variables are typically considered to be

control variables
Some variables may also be categorical, especially when they

are dummy variables (0 or 1)
Linear model
The linear model can be written as:

Yi = Xi + i
i N(0, 2 )
Or, alternatively, as:
Yi N(i , 2 )
i = Xi
Linear model
Two components of the model:

Yi N(i , 2 )
i = Xi
Stochastic
Systematic
Generalised version:
Yi f (i , )
i = g (Xi , )
Stochastic
Systematic
Model
Yi f (i , )
i = g (Xi , )
Stochastic
Systematic
Stochastic component: varies over repeated (hypothetical)

observations on the same unit.
Systematic component: varies across units, but constant given X .
Model
Yi f (i , )
i = g (Xi , )
Stochastic
Systematic
Two types of uncertainty:

Estimation uncertainty: lack of knowledge about and ; can be
reduced by increasing N.
Fundamental uncertainty: represented by stochastic component
and exists independent of researcher.
Inference from regression

In linear regression, the sampling distribution of the coefficient
estimates form a normal distribution, which is approximated by a t
distribution due to approximating by s.
Thus we can calculate a confidence interval for each estimated
coefficient.
Or perform a hypothesis test along the lines of:
H0 :1 = 0
H1 :1 6= 0
Inference from regression

To calculate the confidence interval, we need to calculate the
standard error of the coefficient.
Rule of thumb to get the 95% confidence interval:
2SE < < + 2SE

Thus if is positive, we are 95% certain it is different from zero
when 2SE > 0. (Or when the t value is greater than 2 or less
than 2.)
In R, we get the standard errors by using the summary command
on the model output.
Simple regression model: example

Does infant mortality get lower as GDP per capita increases,
measured in 1976?
> m <- lm(INFMORT ~ LEVEL, data=aclp,

subset=(YEAR == 1976 & INFMORT >= 0))
> summary(m)
Coefficients:
(Intercept) 33.3028647 2.7222356 12.234 9.44e-13 ***
LEVEL
-0.0018998 0.0003038 -6.253 9.29e-07 ***
F -test
In simple linear regression, we can do an F -test:

H0 :1 = 0
H1 :1 6= 0
F =
ESS/1
ESS
= 2
RSS/(n 2)
with 1 and n 2 degrees of freedom.
Example
> require(foreign)
> dail <- read.dta("dailcorrected.dta")
> summary(lm(votes1st ~ spend_total + incumb + electorate + minister, data=dail))
Call:
lm(formula = votes1st ~ spend_total + incumb + electorate + minister,
data = dail)
Residuals:
Min
1Q
-4934.1 -1038.8
Median
-347.6
3Q
1054.0
Max
6900.3
Coefficients:
(Intercept)
7.966e+02 4.172e+02
1.909
0.0569 .
spend_total
1.737e-01 1.095e-02 15.862
<2e-16 ***
incumbIncumbent 2.522e+03 2.207e+02 11.424
<2e-16 ***
electorate
-4.827e-04 5.404e-03 -0.089
0.9289
minister
-1.303e+02 3.965e+02 -0.329
0.7425
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1847 on 458 degrees of freedom
(1 observation deleted due to missingness)
F-statistic: 210.6 on 4 and 458 DF, p-value: < 2.2e-16
CLRM: Basic Assumptions
1. Specification:
I
I
I
I
Relationship between X and Y in the population is linear:

E(Y ) = X
No extraneous variables in X
No omitted independent variables
Parameters () are constant
2. E() = 0
3. Error terms:
I
I
Var() = 2 , or homoskedastic errors

E(ri ,j ) = 0, or no auto-correlation
CLRM: Basic Assumptions (cont.)
4. X is non-stochastic, meaning observations on independent

variables are fixed in repeated samples
I
I
implies no measurement error in X

implies no serial correlation where a lagged value of Y would
be used an independent variable
no simultaneity or endogenous X variables
5. N > k, or number of observations is greater than number of

independent variables (in matrix terms: rank(X ) = k), and no
exact linear relationships exist in X
6. Normally distributed errors: |X N(0, 2 ). Technically
however this is a convenience rather than a strict assumption
Ordinary Least Squares (OLS)
Objective: minimize
I
I
P
(Yi Yi )2 , where
Yi = b0 + b1 Xi
error ei = (Yi Yi )
b =
=
ei2 =
P
)(Yi Y )
(Xi X
P
)
(Xi X
P
XY
P i 2i
Xi
The intercept is: b0 = Y b1 X
OLS rationale
I
Formulas are very simple
Closely related to ANOVA (sums of squares decomposition)
Predicted Y is sample mean when Pr(Y |X ) =Pr(Y )

I
I
I
In the special case where Y has no relation to X , b1 = 0, then

OLS fit is simply Y = b0
, so Y = Y
Why? Because b0 = Y b1 X
Prediction is then sample mean when X is unrelated to Y
Since OLS is then an extension of the sample mean, it has the

same attractice properties (efficiency and lack of bias)
Alternatives exist but OLS has generally the best properties

when assumptions are met
OLS in matrix notation

I
Formula for coefficient :

Y
0
= X +
XY
= X 0X + X 0
X 0Y
= X 0X + 0
(X 0 X )1 X 0 Y
= +0
= (X 0 X )1 X 0 Y
I
Formula for variance-covariance matrix: 2 (X 0 X )1

I
In simple
P case where y = 0 + 1 x, this gives
2 / (xi x)2 for the variance of 1
Note how increasing the variation in X will reduce the variance
of 1
The hat matrix
The hat matrix H is defined as:

= (X 0 X )1 X 0 y
X = X (X 0 X )1 X 0 y
y = Hy
I
I
H = X (X 0 X )1 X 0 is called the hat-matrix

P 2
Other important quantities, such as y ,
ei (RSS) can be
expressed as functions of H
Corrections for heteroskedastic errors (robust standard
errors) involve manipulating H
Some important OLS properties to understand

Applies to y = + x +
I
If = 0 and the only regressor is the intercept, then this is

the same as regressing y on a column of ones, and hence
= y the mean of the observations
If = 0 so that therePis no intercept and one explanatory

variable x, then = P xy
x2
If there is an intercept and one explanatory variable, then

P
x)(yi y )
i (x
Pi
=
(xi x)2
P
(xi x)yi
= Pi
(xi x)2
Some important OLS properties (cont.)
If the observations are expressed as deviations from

P their
P
means, y = y y and x = x x, then =
x y / x 2
The intercept can be estimated as y

x . This implies that
the intercept is estimated by the value that causes the sum of
the OLS residuals to equal zero.
The mean of the y values equals the mean y values together

with previous properties, implies that the OLS regression line
passes through the overall mean of the data points
Normally distributed errors
OLS in R
> dail <- read.dta("dail2002.dta")
> mdl <- lm(votes1st ~ spend_total*incumb + minister, data=dail)
> summary(mdl)
Call:
lm(formula = votes1st ~ spend_total * incumb + minister, data = dail)
Residuals:
Min
1Q
-5555.8 -979.2
Median
-262.4
3Q
877.2
Max
6816.5
Coefficients:
(Intercept)
469.37438 161.54635
2.906 0.00384 **
spend_total
0.20336
0.01148 17.713 < 2e-16 ***
incumb
5150.75818 536.36856
9.603 < 2e-16 ***
minister
1260.00137 474.96610
2.653 0.00826 **
spend_total:incumb
-0.14904
0.02746 -5.428 9.28e-08 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1796 on 457 degrees of freedom
(2 observations deleted due to missingness)
F-statistic:
229 on 4 and 457 DF, p-value: < 2.2e-16
OLS in Stata
. use dail2002
(Ireland 2002 Dail Election - Candidate Spending Data)
. gen spendXinc = spend_total * incumb
(2 missing values generated)
. reg votes1st spend_total incumb minister spendXinc
Source |
SS
df
MS
-------------+-----------------------------Model | 2.9549e+09
4
738728297
Residual | 1.4739e+09
457 3225201.58
-------------+-----------------------------Total | 4.4288e+09
461 9607007.17
Number of obs
F( 4,
457)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
462
229.05
0.0000
0.6672
0.6643
1795.9
-----------------------------------------------------------------------------votes1st |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------spend_total |
.2033637
.0114807
17.71
0.000
.1808021
.2259252
incumb |
5150.758
536.3686
9.60
0.000
4096.704
6204.813
minister |
1260.001
474.9661
2.65
0.008
326.613
2193.39
spendXinc | -.1490399
.0274584
-5.43
0.000
-.2030003
-.0950794
_cons |
469.3744
161.5464
2.91
0.004
151.9086
786.8402
------------------------------------------------------------------------------
Sums of squares (ANOVA)
TSS Total sum of squares
P
(yi y )2
P
ESS Estimation or Regression sum of squares (
yi y )2
P 2 P
RSS Residual sum of squares
ei = (
yi yi )2
The key to remember is that TSS = ESS + RSS
Examining the sums of squares

> yhat <- mdl$fitted.values
# uses the lm object mdl from previous
> ybar <- mean(mdl$model[,1])
> y <- mdl$model[,1]
# cant use dail$votes1st since diff N
> TSS <- sum((y-ybar)^2)
> ESS <- sum((yhat-ybar)^2)
> RSS <- sum((yhat-y)^2)
> RSS
[1] 1473917120
> sum(mdl$residuals^2)
[1] 1473917120
> (r2 <- ESS/TSS)
[1] 0.6671995
> (adjr2 <- (1 - (1-r2)*(462-1)/(462-4-1)))
[1] 0.6642865
> summary(mdl)$r.squared
# note the call to summary()
[1] 0.6671995
> RSS/457
[1] 3225202
> sqrt(RSS/457)
[1] 1795.885
> summary(mdl)$sigma
[1] 1795.885
Regression model return values

Here we will talk about the quantities returned with the lm()
command and lm class objects.
Next (last) week
ANOVA - Analysis of variance
Regression with categorical independent variables
Regression with interaction terms (Brambor, Clark, and

Golder)
Preview of Quant 2 with non-linear/normal models

Quant1 Week8 OLS PDF

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Quant1 Week8 OLS PDF

Cargado por

Copyright:

Formatos disponibles

Ordinary Least Squares Regression

PO7001: Quantitative Methods I

Independent and Dependent variables

A dependent variable represents the quantity we wish to

An independent variable represents a quantity whose variation

votes received by a governing party

demographic: gender, national background, age

Using this language implies causality: X Y

The importance of variation

Variation in outcomes of the dependent variable are what

We seek to explain these outcomes using (independent)

The very language here, the term variable suggests that

Conversely, a quantity that does not vary is impossible to

Typically when we collect data, we wish to have as much

Example: In the returns from schooling example, it would be

Different functional relationships

(More on this in Quant 2)

The central idea behind correlation is that two variables have

This relationship can be positive or negative

This relationship varies in strength

Bivariate associations are usually depicted graphically by use

We can also summarize (as per an earlier week) the

Testing the significance of Pearsons r

Pearsons r measures correlation in the sample but the

Question: what is the probability that any correlation we

H0 : x,y = 0 No correlation exists in the population

Significance can be tested by the t-ratio:

Pearsons r significance testing: Example

Lets assume we have data such that: r = .24, N = 8

In R there is a test for this called cor.test(x,y)

Pearsons r significance testing using R

Recall the basic linear model:

Here the relationship is determined by two parameters:

Sentence length (Y)

Number of prior convictions (X)

Least squares formulas

the regression coefficient:

and the residual variance 2 :

Least squares formulas continued

the prediction line is y = 0 + 1 x

The residual sum of squares (RSS) =

The estimate for 2 is the same as

Example to show fomulas in R

From observed to predicted relationship

In the above example, 0 = 14, 1 = 3

This linear equation forms the regression line

The regression line always passes through two points:

The residual sum of squares (RSS) =

The regression line is that which minimizes the RSS

Plot of regression example

Sentence length (Y)

Number of prior convictions (X)

Requirements for regression

1. Both variables should be measured at the interval level

A transformation may fix non-linearity in some cases

3. Sample is randomly chosen (necessary for inference)

y is the dependent variable

referred to also (by Greene) as a regressand

X are the independent variables

also known as explanatory variables

The error term  is sometimes called a disturbance

Yi is a random variable (not directly observed)

yi is a realised value of Yi (observed)

The error term is sometimes called a disturbance

Var() = 2 , or homoskedastic errors