Demand Estimation Forecasting

Estimating Demand
Outline
• Where do demand functions come from?
• Sources of information for demand estimation
• Cross-sectional versus time series data
• Estimating a demand specification using the
ordinary least squares (OLS) method.
• Goodness of fit statistics.
The goal of forecasting
To transform available data into

equations that provide the best
possible forecasts of economic
variables—e.g., sales revenues
and costs of production—that are
crucial for management.
Demand for air travel Houston to
Orlando
Recall that our demand function was

Now we will estimated as follows:
explain how
we estimated Q = 25 + 3Y + PO – 2P [4.1]
this demand
equation
Where Q is the number of seats
sold; Y is a regional income
index; P0 is the fare charged by
a rival airline, and P is the
airline’s own fare.
Questions managers should
ask about a forecasting equations
1. What is the “best” equation that can be

obtained (estimated) from the available
data?
2. What does the equation not explain?
3. What can be said about the likelihood
and magnitude of forecast errors?
4. What are the profit consequences of
forecast errors?
How do get the data to estimate
demand forecasting equations?
• Customer surveys and interviews.

• Controlled market studies.
• Uncontrolled market data.
Campbell’s soup
estimates demand
functions from data
obtained from a survey of
more than 100,000
consumers
Survey pitfalls
 Sample bias
 Response bias
 Response accuracy
 Cost
Types of data
Time -series data: historical data--i.e., the data sample

consists of a series of daily, monthly, quarterly, or annual
data for variables such as prices, income , employment ,
output , car sales, stock market indices, exchange rates, and
so on.
Cross-sectional data: All observations in the sample are
taken from the same point in time and represent different
individual entities (such as households, houses, etc.)
Time series data: Daily observations,
Korean Won per dollar
Year M onth Day Won per Dollar
1997 3 10 877
1997 3 11 880.5
1997 3 12 879.5
1997 3 13 880.5
1997 3 14 881.5
1997 3 17 882
1997 3 18 885
1997 3 19 887
1997 3 20 886.5
1997 3 21 887
1997 3 24 890
1997 3 25 891
Example of cross sectional data
Student ID Sex Age Height Weight
777672431 M 21 6’1” 178 lbs.
231098765 M 28 5’11” 205 lbs.
111000111 F 19 5’8” 121 lbs.
898069845 F 22 5’4” 98 lbs.
000341234 M 20 6’2” 183 lbs

Estimating demand equations
using regression analysis
Regression analysis is a
statistical technique that allows
us to quantify the relationship
between a dependent variable
and one or more independent or
“explanatory” variables.
Y
Regression theory
X and Y are not

perfectly correlated.
However, there is
on average a positive
relationship
between Y and X
0 X1 X2 X
We assume that
expected conditional values
of Y associated with
alternative values of X
fall on a line.
Y
E(Y |Xi) = 0 + 1Xi
Y1
1
E(Y|X1) 1 = Y1 - E(Y|X1)
0 X1 X
Specifying a single
variable model
Our model is specified as follows:

Q = f (P)
where Q is ticket sales and P is the fare
Q is the dependent
variable—that is, we think
that variations in Q can be
explained by variations in
P, the “explanatory”
variable.
Estimating the single variable model
Since the data
points are unlikely to fall
exactly on a line, (1)
must be modified
Q i   0   1P i [1] to include a disturbance
term (εi)
Q i   0   1P i   i [2]
Þ 0 and 1 are called parameters or

population parameters.
Þ We estimate these parameters using
the data we have available
Estimated Simple Linear Regression Equation
 The estimated simple linear regression equation
ŷ  b0  b1 x
• The graph is called the estimated regression line.

• b0 is the y intercept of the line.
• b1 is the slope of the line.
• ŷis the estimated value of y for a given x value.
Estimation Process
Regression Model Sample Data:
y = b0 + b1x +e x y
Regression Equation x1 y1
E(y) = b0 + b1x . .
Unknown Parameters . .
b0, b1 xn y n
Estimated
b0 and b1 Regression Equation
provide estimates of ŷ  b0  b1 x
b0 and b1 Sample Statistics
b0, b1
Least Squares Method
 Least Squares Criterion
min  (y i  y i ) 2
where:
yi = observed value of the dependent variable
for the ith observation
y^i = estimated value of the dependent variable
for the ith observation
 Slope for the Estimated Regression
Equation
b1   ( x  x )( y  y )
i i
 (x  x )
i
2
 y-Intercept for the Estimated Regression Equation
b0  y  b1 x
where:
xi = value of independent variable for ith
observation
yi = value of dependent variable for ith
_ observation
x = mean value for independent variable
_
y = mean value for dependent variable
n = total number of observations
Line of best fit
The line of best fit is the one
that minimizes the squared
sum of the vertical distances
of the sample points from the
line
The 4 steps of demand
estimation using regression
1. Specification
2. Estimation
3. Evaluation
4. Forecasting
Year and Average Number Average
Table 4-2
Quarter Coach Seats Fare
97-1 64.8 250 Ticket Prices and Ticket
97-2 33.6 265
97-3 37.8 265
Sales along an Air Route
97-4 83.3 240
98-1 111.7 230
98-2 137.5 225
98-3 109.6 225
98-4 96.8 220
99-1 59.5 230
99-2 83.2 235
99-3 90.5 245
99-4 105.5 240
00-1 75.7 250
00-2 91.6 240
00-3 112.7 240
00-4 102.2 235
Mean 87.3 239.7
Std. Dev. 27.9 13.1
Simple linear regression
begins by plotting Q-P
values on a scatter
diagram to determine if
there exists an
approximate linear
relationship:
Scatter plot diagram
290
280
270
260
Fare
250
240
230
220
210
20 40 60 80 100 120 140 160
Passengers
Scatter plot diagram with possible line of best fit
Average One-way Fare
Demand curve: Q = 330-P
$ 27 0
26 0
25 0
24 0
23 0
22 0
0 50 100 150
Number of Seats Sold per Flight
Note that we use X to denote the explanatory
variable and Y is the dependent variable.
So in our example Sales (Q) is the “Y” variable
and Fares (P) is the “X” variable.
Q=Y
P=X
Computing the OLS
estimators
We estimated the equation using the statistical

software package SPSS. It generated the following
output:
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 478.690 88.036 5.437 .000
FARE -1.633 .367 -.766 -4.453 .001
a. Dependent Variable: PASS
Reading the SPSS Output
From this table we see that
our estimate of 0 is 478.7
and our estimate of 1 is
–1.63.
Thus our forecasting equation is

given by:
Qˆ i  478.7  1.63Pi
Step 3: Evaluation
Now we will evaluate the forecasting equation

using standard goodness of fit statistics,
including:
1. The standard errors of the estimates.
2. The t-statistics of the estimates of the
coefficients.
3. The standard error of the regression (s)
4. The coefficient of determination (R2)
Standard errors of
the estimates
• We assume that the regression coefficients are

normally distributed variables.
• The standard error (or standard deviation) of the
estimates is a measure of the dispersion of the
estimates around their mean value.
• As a general principle, the smaller the standard error,
the better the estimates (in terms of yielding
accurate forecasts of the dependent variable).
The following rule-of-thumb is useful: The
standard error of the regression
coefficient should be less than half of the
size of the corresponding regression
coefficient.
Computing the standard error of 1
Let s ˆ 1 denote the standard error of our estimate of 1
Thus we have: Note that:

xi  Xi  X
s ˆ 1  s ˆ 1
2
and
Where: e i  Q i  Qˆ i
s ˆ 1 2

 e i
2
and
n  k  x i
2
k is the number of
estimated coefficients
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 478.690 88.036 5.437 .000
FARE -1.633 .367 -.766 -4.453 .001
a. Dependent Variable: PASS
By reference to the SPSS output, we see

that the standard error of our estimate
of 1 is 0.367, whereas the (absolute value)our
estimate of 1 is 1.63 Hence our estimate is about 4 ½
times the size of its standard error.
The SPSS output tells us that the
t statistic for the the fare coefficient (P)
is –4.453 The t test is a way
of comparing the error
suggested by the null
hypothesis to the
standard error of the estimate.
The t test
 To test for the significance of our estimate of 1, we

set the following null hypothesis, H0, and the
alternative hypothesis, H1
 H0: 1 0
 H1: 1 < 0
 The t distribution is used to
test for statistical significance of
the estimate:
ˆ 1   1 1.63  0
t   4.45
sˆ 1 0.049
Coefficient of determination (R2)
Þ The coefficient of determination, R2, is defined as the proportion of
the total variation in the dependent variable (Y) "explained" by the
regression of Y on the independent variable (X). The total variation in
Y or the total sum of squares (TSS) is defined as:
 Y 
n n
TSS  i Y   yi 2 Note: yi  Yi  Y
i 1 i 1
The explained variation in the dependent variable(Y) is called

the regression sum of squares (RSS) and is given by:
 Yˆ 
n 2 n
RSS 
i 1
i  Y   i 1
yˆ i 2
What remains is the unexplained variation in the dependent
variable or the error sum of squares (ESS)
 
n 2 n
ESS   Yi  Yˆ   ei 2
i 1 i 1
We can say the following:

• TSS = RSS + ESS, or
• Total variation = Explained variation + Unexplained variation
R2 is defined as:
n n
RSS ESS  yˆ i 2
 ei2
R 2
  1  i1
n
 1 i 1
n
TSS RSS

i1
yi2 
i 1
yi2
ANOVAb
Sum of Mean
Model Squares df Square F Sig.
1 Regression 6863.624 1 6863.624 19.826 .001a
Residual 4846.816 14 346.201
Total 11710.440 15
a. Predictors: (Constant), FARE
b. Dependent Variable: PASS
Model Summary
Std. Error
Adjusted R of the
Model R R Square Square Estimate
1 .766a .586 .557 18.6065
We see from the SPSS model summary

table that R2 for this model is .586
Notes on R2
Þ Note that: 0 R2 1
Þ If R2 = 0, all the sample points lie on a horizontal
line or in a circle
Þ If R2 = 1, the sample points all lie on the regression
line
Þ In our case, R2  0.586, meaning that 58.6 percent
of the variation in the dependent variable
(consumption) is explained by the regression.
This is not a particularly good fit based on
R2 since 41.4 percent of the
variation in the dependent
variable is unexplained.
Standard error of the
regression
Þ The standard error of the regression (s) is given

by:

i 1
ei 2
s 
n  k
Model Summary
Std. Error
Adjusted R of the
Model R R Square Square Estimate
1 .766a .586 .557 18.6065
Þ The model summary tells us that s = 18.6

Þ Regression is based on the assumption that the error term
is normally distributed, so that 68.7% of the actual values
of the dependent variable (seats sold) should be within one
standard error ($18.6 in our example) of their fitted value.
Þ Also, 95.45% of the observed values of seats sold should
be within 2 standard errors of their fitted values (37.2).
Step 4: Forecasting
Recall the equation obtained from the
regression results is :
Qˆ i  478.7  1.63Pi
Our first step is to

perform an “in-sample”
forecast.
At the most basic level, forecasting consists
of inserting forecasted values of the
explanatory variable P (fare) into the
forecasting equation to obtain forecasted
values of the dependent variable Q
(passenger seats sold).
In-Sample Forecast of Airline Sales
Year and Predicted Actual
Quarter Sales (Q*) Sales (Q) Q* - Q (Q* - Q)sq
97-1 64.8 70.44 5.64 31.81
97-2 33.6 45.94 12.34 152.28
97-3 37.8 45.94 8.14 66.26
97-4 83.3 86.77 3.47 12.04
98-1 111.7 103.1 -8.6 73.96
98-2 137.5 111.26 -26.24 688.54
98-3 109.6 111.26 1.66 2.76
98-4 96.8 119.43 22.63 512.12
99-1 59.5 103.1 43.6 1900.96
99-2 83.2 94.94 11.74 137.83
99-3 90.5 78.61 -11.89 141.37
99-4 105.5 86.77 -18.73 350.81
00-1 75.7 70.44 -5.26 27.67
00-2 91.6 86.77 -4.83 23.33
00-3 112.7 86.77 -25.93 672.36
00-4 102.2 94.94 -7.26 52.71
Sum of Squared Errors 4846.80

In-Sample Forecast of Airline Sales
160
140
120
Passengers
100
80
60
40 Actual
20 Fitted
97.1 97.3 98.1 98.3 99.1 99.3 00.1 00.3
Year/Quarter
Can we make a
good forecast?
Our ability to generate accurate forecasts of the dependent variable

depends on two factors:
• Do we have good forecasts of the explanatory variable?
• Does our model exhibit structural stability, i.e., will the causal
relationship between Q and P expressed in our forecasting equation
hold up over time? After all, the estimated coefficients are average
values for a specific time interval (1987-2001). While the past may be a
serviceable guide to the future in the case of purely physical
phenomena, the same principle does not necessarily hold in the realm
of social phenomena (to which economy belongs).
Single Variable Regression Using Excel
We will estimate an equation

and use it to predict home
prices in two cities. Our data
set is on the next slide
City Income Home Price
Akron, OH 74.1 114.9
Atlanta, GA 82.4 126.9
• Income (Y) is
Birmingham, AL 71.2 130.9
average family
Bismark, ND 62.8 92.8
income in
Cleveland, OH 79.2 135.8
2003
Columbia, SC 66.8 116.7
• Home Price Denver, CO 82.6 161.9
(HP) is the Detroit, MI 85.3 145
average price Fort Lauderdale, FL 75.8 145.3
of a new or Hartford, CT 89.1 162.1
existing home Lancaster, PA 75.2 125.9
in 2003. Madison, WI 78.8 145.2
Naples, FL 100 173.6
Nashville, TN 77.3 125.9
Philadelphia, PA 87 151.5
Savannah, GA 67.8 108.1
Toledo, OH 71.2 101.1
Washington, DC 97.4 191.9
Model Specification

HP  b0  b1Y
Scatter Diagram: Income and Home Prices
200
180
Home Prices
160
140
120
100
80
50 60 70 80 90 100 110
Income
Excel Output Regression Statistics
ANOVA Multiple R 0.906983447
df SS R Square 0.822618973
9355.71550 Adjusted R
Regression 1 2 Square 0.811532659
2017.36949 Standard Error 11.22878416
Residual 16 8 Observations 18
Total 17 11373.085
Standard
Coefficients Error t Stat
Intercept -48.11037724 21.58459326 -2.228922114
Income 2.332504769 0.270780116 8.614017895

Equation and prediction

HP  48.11  2.33Y
City Income Predicted HP

Meridian, MS 59,600 $ 138,819.89
Palo Alto, CA 121,000 $ 281,881.89

Demand Estimation Forecasting

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Demand Estimation Forecasting

Cargado por

Copyright:

Formatos disponibles

Estimating Demand

To transform available data into

Recall that our demand function was

1. What is the “best” equation that can be

• Customer surveys and interviews.

Time -series data: historical data--i.e., the data sample

Student ID Sex Age Height Weight

777672431 M 21 6’1” 178 lbs.

231098765 M 28 5’11” 205 lbs.

111000111 F 19 5’8” 121 lbs.

898069845 F 22 5’4” 98 lbs.

000341234 M 20 6’2” 183 lbs

X and Y are not

Our model is specified as follows:

where Q is ticket sales and P is the fare

Þ 0 and 1 are called parameters or

 The estimated simple linear regression equation

• The graph is called the estimated regression line.

 y-Intercept for the Estimated Regression Equation

Average One-way Fare

Demand curve: Q = 330-P

We estimated the equation using the statistical

Thus our forecasting equation is

Now we will evaluate the forecasting equation

• We assume that the regression coefficients are

Let s ˆ 1 denote the standard error of our estimate of 1

Thus we have: Note that:

By reference to the SPSS output, we see

 To test for the significance of our estimate of 1, we

The explained variation in the dependent variable(Y) is called

We can say the following:

We see from the SPSS model summary

Þ The standard error of the regression (s) is given

Þ The model summary tells us that s = 18.6

Our first step is to

Sum of Squared Errors 4846.80

Our ability to generate accurate forecasts of the dependent variable

• Do we have good forecasts of the explanatory variable?

We will estimate an equation

Intercept -48.11037724 21.58459326 -2.228922114

Income 2.332504769 0.270780116 8.614017895

City Income Predicted HP

También podría gustarte