Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Estimating Relations
ENGSTAT
Bivariate Distributions
One variable is dependent on one or more related
variables.
Relationships between different variables
Detecting patterns
Linear relationships
Regression analysis
Correlation
Bivariate Distributions
Two-way Table for Categorical Data.
Projections on New Workers by Gender and Race
Gender
Women Men %
Race
White 23% 24% 47%
Black 9% 6% 15%
Asian 7% 6% 13%
Hispanic 13% 12% 25%
Total 52% 48% 100%
Bivariate Distributions
Time Series
Annual profits
Daily temperatures
Bivariate Distributions
Time Series
Bivariate Distributions
Scatter Plots
Correlation: Estimating the Strength of
a Linear Relation
Correlation Coefficient
1 n x x y y
r
n 1 i 1 s x s y
-1 r 1 always
A value of r near or equal to zero implies little or no linear relationship
between x and y
In contrast, the closer r is to 1 or -1, the stronger the linear
relationship between y and x, if r = 1, all the points fall exactly on the
line
A positive value of r implies that y increases as x increases
A negative value of r implies that y decreases as x increases
Regression: Modeling Linear Relationships
Linear Model y o 1 x
Fitting the Model: The Least Square Approach
slope, 1
x x y y
x x
2
y - intercept, o y 1 x
Sum of squared errors, SSE
yi predicted value
Regression: Modeling Linear Relationships
Properties of Least-Square Regression Line
The sum (and the mean) of residuals is zero
The variation in residuals is as small as possible for the given dataset.
The line of the best fit will always pass through the points
Residual Analysis: Assessing the Adequacy of
the Model
Residual difference between the observed and predicted
value of y for a given value of x
residual yi yi
Conditions that must hold among residuals in order for
linear regression to work well
residuals appear to nearly random quantities, unrelated to the
value of x
The variation in residuals does not depend on x, that is, the
variation in the residuals appears to be about the same no matter
which value of x is being considered
Residual Analysis: Assessing the Adequacy of
the Model
Residual Plot
Lack of any trends or patterns can be interpreted as an indication
of random nature of residuals
Nearly constant spread of residuals across all values of x can be
interpreted as an indication of variation in residuals not
dependent on x.
If a model fits the data well, there will be no discernable pattern
in the residual plot, the points will appear to be randomly
scattered about the plane
Residual Analysis: Assessing the Adequacy of
the Model
Residual Plots
Transformations
When data do not fit a simple regression model, transform
the data and fit a linear model to the transformed data.
Exponential transformation
y ae bx
ln y ln a bx
Power transformation
y ax b ln y ln a b ln x
Exercise: Daily peak load (y) for a power plant and the
maximum outdoor temperature (x)
Day Max. T (x) Peak Power Load (y)
1 95 214
2 82 152
3 90 156
4 81 129
5 99 254
6 100 266
7 93 210
8 95 204
9 93 213
10 87 150
1 15
Scatter Plot
300
250
200
150
100
50
0
0 20 40 60 80 100 120
1 19
1 20
21
1 22
23
24
25
1 26
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.944093034
R Square 0.891311656
Adjusted R Square 0.877725613
Standard Error 16.17764189
Observations 10
RESIDUAL
OUTPUT
20
10
0
Residuals
0 20 40 60 80 100 120
-10
-20
-30
-40
X Variable 1