QBM 101 Lecture 10

QBM 101 Business Statistics
Dr. Lai Kee Huong

Department of Business Studies
Faculty of Business, Economics & Accounting
keehuong.lai@help.edu.my
SUBJECT OUTLINE:
Module
1: Introduction; organizing
and graphing data; numerical
descriptive measures
Module
2: Probability, discrete random

variables; continuous random variables
and the normal distribution
Module
3: Sampling distributions;
estimation; hypothesis testing
Module
4: Simple linear regression
CHAPTER 10:
SIMPLE LINEAR REGRESSION
10.1 Simple linear regression
10.2
Standard deviation of errors
and coefficient of determination
10.3 Inferences about B
10.4 Linear correlation
10.5 Regression analysis: A complete
example
10.6 Interpretation of Excel output
A regression model is a mathematical equation

that describes the relationship between two or
more variables. A simple regression model
includes only two variables: one independent and
one dependent. The dependent variable is the one
being explained, and the independent variable is
the one used to explain the variation in the
dependent variable.
A (simple) regression model that gives a straightline relationship between two variables is called a
linear regression model.
Regression: describing the nature of relationship

between variables positive, negative, linear, or
nonlinear.
Correlation: determining whether a relationship
between variables exists
Questions: Are the two variables related? If so,
what is the strength? What kind of relationship?
What prediction can be made?
Examples: Height and weight of human, number
of cigarettes smoked vs weights of infants;
time spent on studying and exam marks.
Dependent variable (DV) (y, the one being

explained) vs. independent variable (IV) (x, used
to explain the variation).
Simple (only 1 IV) vs. multiple (> 1 IV)
regression
Linear (straight-line relationship) vs. nonlinear
regression
SIMPLE LINEAR REGRESSION ANALYSIS

In the regression model y = A + Bx + , A
is called the y-intercept or constant term, B
is the slope, and is the random error term.
The dependent and independent variables
are y and x, respectively.
In the model = a + bx, a and b, which are
calculated using sample data, are called the
estimates of A and B, respectively.
SCATTER PLOT/DIAGRAM
ERROR SUM OF SQUARE (SSE)

The error sum of squares, denoted SSE, is
SSE e 2 (y y )2
The values of a and b that give the minimum
SSE are called the least square estimates of A
and B, and the regression line obtained with
these estimates is called the least squares line.
Least square/best-fit line:
y a bx
SS xx x
SS yy y
SS xy xy
b
SS xy
SS xx
a y bx
n
x y
n
FORMULAS
Source: http://mathworld.wolfram.com/LeastSquaresFitting.html
Least square/best-fit line:
x 386 55.1429, y y 108 15.4286

n
SS xy xy
SS xx x
2
x y 6403 386 108 447.5714

n
(386) 2
23058
1772.8571
7
n
SS xy 447.5714
b
0.2525
SS xx 1772.8571
a y bx 15.4286 (0.2525)(55.1429) 1.5050

y a bx 1.5050 0.2525x
Least square/best-fit line (estimation and its

reliability):
b
SS xy
SS xx
447.5714
0.2525
1772.8571
a y bx 15.4286 (0.2525)(55.1429) 1.5050

y a bx 1.5050 0.2525 x
Estimate the amount of food expenditures when the income is $6100.
y a bx 1.5050 0.2525(61) $16.9075 hundred $1690.75
Error, e y y 16 16.9075 $0.9075 hundred $90.75
y a bx 1.5050 0.2525(60) $16.655 hundred $1665.50
The estimation is reliable because 60 (33,83)
y a bx 1.5050 0.2525(20) $6.555 hundred $655.50
The estimation is not reliable because 20 (33,83) *Extrapolation
ERROR OF PREDICTION
Least square/best-fit line (interpretation of

regression coefficients):
y a bx 1.5050 0.2525 x
y intercept, a 1.5050
A family with RM 0 income will
spend RM1.5050 hundred
=RM150.50 on food.
Slope coefficient, b 0.2525
For every one unit (RM100) of increment
in income, the expenditure on food will
increase by RM0.2525 hundred = RM25.25.
Least square/best-fit line (assumptions of

regression models):
1. Error has a mean of zero.
2. The errors are independent.
3. The distribution of error is normal.
4. The distribution of population errors has
the same (constant) standard deviation.
~ N ( 0, )
2
Degrees of Freedom for a Simple Linear

Regression Model
The degrees of freedom for a simple linear
regression model are
df = n 2
Standard deviation of errors:
is estimated by se
SSE
2
se
, where SSE ( y y )
n2
df n 2
se
SS yy bSS xy
n2
Standard deviation of errors:
SS xy
SS xx
447.5714
0.2525
1772.8571
SS xy xy
SS yy y
se
x y 6403 386 108 447.5714

n
SS yy bSS xy
n2
(108) 2
1792
125.1743
7
125.1743 (0.2525)(447.5714)
1.5939
72
Coefficient of determination (COD)
r
2
bSS xy
SS yy
,0 r 1
2
b 0.2525, SS xy 447.5714, SS yy 125.7143

r
2
bSS xy
SS yy
0.2525(447.5714)
0.899 89.9%
125.7143
Interpretation: 89.9% of the total variation in food expenditures

of household can be explained by the variation in incomes, and
the remaining 10.1% is due to randomness and other variables.
Coefficient of correlation (COC)
SS xy
SS xx SS yy
, 1 r 1
SS xx 1772.8571, SS xy 447.5714, SS yy 125.7143

r
SS xy
SS xx SS yy
447.5714
0.9481
1772.8571125.7143
Interpretation: Positive or negative sign/correlated.

Very weak, average/moderate, strong, very strong
r 0.9481: very strong and positively correlated
Other example:
r 0.1111: very weak and negatively correlated
Test statistic: tcalc
bB
, df n 2
sb
H0 : B 0
H1 : B 0 (two-tailed test)
B 0 (positive),B 0 (negative) (one-tailed test)
is unknown, use the t distribution.
HT about the slope coefficient, B

Test at the 1% significance level whether the
slope of the regression line is positive.
H 0 : B 0, H1 : B 0 (one-tailed test)
0.01
df n 2 7 2 5
tcalc
b B 0.2525 0
6.662
sb
0.0379
tcritical t ,n 2 t0.01,5 3.365

tcritical 3.365 tcalc 6.662
Reject H 0 . There is sufficient evidence to conclude
that the slope is positive, or, income determines
food expenditure positively.
A random sample of eight drivers selected from a small city

insured with a company and having similar minimum
required auto insurance policies was selected. The following
table lists their driving experiences (in years) and monthly
auto insurance premiums (in dollars).
Regression Analysis: A Complete Example
(a) IV and DV. Do you expect a positive or negative relationship?

(b) Compute SS xx , SS yy , and SS xy .
(c) Find the least square regression line.
(d) Interpret the regression coefficients in (c).
(e) Calculate the COC and COD. Interpret their meanings.
(f) Predict the monthly premium for a driver with 10 years of experience.
Comment on the reliability of the estimation.
(g) Compute the standard deviation of errors.
(h) Test at a 5% significance level whether B is negative.

(a) IV: Driving experience, DV: Monthly auto insurance premium
A negative linear relationship.

x 90
y 474
(b) x
11.25, y
59.25
n
SS xy
x y
(90)(474)
xy
4739
593.5
SS xx x
SS xy
SS xx
SS yy y 2
(c) b
y
n
(90) 2
1396
383.5
8
(474) 2
29, 642
1557.5
8
593.5
1.5476
383.5
a y bx 59.25 (1.5476)(11.25) 76.6605

y a bx 76.6605 1.5476 x

(d) y a bx 76.6605 1.5476 x
y intercept, a 76.6605
A driver with 0 years of driving experience will need to pay
a monthly premium of $76.66.
Slope coefficient, b 1.5476
For every one extra year of driving experience, the monthyly
premium will decrease by $1.55.
(e) COC, r
SS xy
SS xx SS yy
593.5
0.7679
(383.5)(1557.5)
A moderately strong and negatively correlation.

r
2
bSS xy
SS yy
(1.5476)(593.5)
0.5897
1557.5
Alternative: COD,r 2 0.7679 0.5897

2
58.97% of the variation in monthly premium can be explained by

driving experience, whereas the remaining 41.03% is due to
randomness and other unaccounted factors.
(f) y (10) 76.6605 1.5476(10) $61.18

The estimstion is reliable because 10 (2,25).
(g) se
SS yy bSS xy
n2
1557.5 (1.5476)(593.5)
10.3199
82

(h) H 0 : B 0, H1 : B 0
0.05, df n 2 8 2 6
tcalc
b B 1.5476 0
2.937
sb
0.5270
tcritical t ,df t0.05,6 1.943

tcalc 2.937 tcritical 1.943
Reject H 0 . There is sufficient evidence to conclude that the slope is negative.
The hypothesis test on B can be

performed using the p-value approach,
using the output obtained from
statistical software.
EXCEL OUTPUT
Source: http://www.excel-easy.com/examples/regression.html
EXCEL
EXCEL
EXCEL
SUMMARY
Identify IV (x) and DV (y)
Calculate SS of xx, yy, and xy
Determine the best fit line
Calculate and interpret regression coefficients
Calculate and interpret COC and COD
Estimate and comment on its reliability
Hypothesis test on B (critical value approach
using manual calculation, or p-value
approach from the Excel output)
Finding missing values from the given Excel
output

QBM 101 Lecture 10

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

QBM 101 Lecture 10

Cargado por

Copyright:

Formatos disponibles

QBM 101 Business Statistics

Dr. Lai Kee Huong

2: Probability, discrete random

4: Simple linear regression

A regression model is a mathematical equation

Regression: describing the nature of relationship

Dependent variable (DV) (y, the one being

SIMPLE LINEAR REGRESSION ANALYSIS

ERROR SUM OF SQUARE (SSE)

Least square/best-fit line:

Least square/best-fit line:

x 386 55.1429, y y 108 15.4286

x y 6403 386 108 447.5714

a y bx 15.4286 (0.2525)(55.1429) 1.5050

Least square/best-fit line (estimation and its

a y bx 15.4286 (0.2525)(55.1429) 1.5050

Least square/best-fit line (interpretation of

Least square/best-fit line (assumptions of

Degrees of Freedom for a Simple Linear

Standard deviation of errors:

Standard deviation of errors:

x y 6403 386 108 447.5714

Coefficient of determination (COD)

b 0.2525, SS xy 447.5714, SS yy 125.7143

Interpretation: 89.9% of the total variation in food expenditures

Coefficient of correlation (COC)

SS xx 1772.8571, SS xy 447.5714, SS yy 125.7143

Interpretation: Positive or negative sign/correlated.

Test statistic: tcalc

HT about the slope coefficient, B

tcritical t ,n 2 t0.01,5 3.365

A random sample of eight drivers selected from a small city

Regression Analysis: A Complete Example

(a) IV and DV. Do you expect a positive or negative relationship?

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example

a y bx 59.25 (1.5476)(11.25) 76.6605

Regression Analysis: A Complete Example

A moderately strong and negatively correlation.

Alternative: COD,r 2 0.7679 0.5897

58.97% of the variation in monthly premium can be explained by

Regression Analysis: A Complete Example

(f) y (10) 76.6605 1.5476(10) $61.18

Regression Analysis: A Complete Example

tcritical t ,df t0.05,6 1.943

The hypothesis test on B can be

También podría gustarte