Documentos de Académico
Documentos de Profesional
Documentos de Cultura
A Spreadsheet Oriented
Approach
Part I
Chapter 1: Introduction to Business
Analytics
Part II
Part III
Part IV
Chapter 18: Linear Programming
Chapter 19: Applications of Linear
Programming
Chapter 20: Decision Analysis
Chapter 1
Introduction to Business
Analytics
Introduction
Business analytics tries to find that data
set or databases may contain information
that could not only help to solve the
problem but also find opportunities to
improve business performance. Business
Analytics process starts with collection
of business related data and then
sequential application of descriptive,
predictive and prescriptive analytics
with a view to improve business
decision making and organizational
performance. It is a set of techniques and
processes that are used to analyse data
to improve business performance
through fact-based decision-making.
Business Analytics creates capabilities
for companies in order to compete in the
market effectively and is likely to
become one of the main functional areas
in most companies. Competing on
analytics: The new science of winning a
well acclaimed book by Thomas
Devonport emphasized that a significant
proportion of high-performance
companies have high analytical skills
among their personnel. Also in a recent
survey of nearly 3000 executives, MIT
Sloan Management Review reported that
there is striking correlation between an
organizations analytics sophistication
and its competitive performance.
CAPITAL ONE a credit card company
has managed a profit of close to $1
billion in their credit card business in
the recent past, whereas many of their
competitors have shown a loss of
several millions in credit card business.
The success of Capital One is attributed
to its analytical strength. Thus there is
significant evidence from the corporate
world that the ability to make better
decisions improves with analytical
skills. According to new research,
relevance of effective data management
and business analytics is growing and
being considered strategic and discussed
at board-room level.
Types of Analytics
The Institute of Operations Research and
Management Sciences (INFORMS) has
broadly categorized types of analytics
into:
Descriptive Analytics
Descriptive analytics means describing
any phenomenon in terms of graph,
pictures, symbols and application of
simple statistical tools such mean,
standard deviation that describes nature
of data. Describing analytics is the first
step in data handling.
Predictive Analytics
Predictive analytics means estimating in
unknown situations. Forecasting and
prediction are used often
interchangeably. However, the
prediction is a more general term and
connotes estimating for any time period
before, during, or after the current
observation. Predictive analytics search
for patterns found in historical and
transactional data to understand a
business problem. Advanced statistical
techniques such as regression analysis,
time series models are the main tool of
predictive analytics.
Prescriptive Analytics
It comprises of applications of decision
theory, operations research and
management science to make informed
decisions.
Exercises
Introduction
Numerical facts and figures are called
data. Consider the following examples:
Indian economy will grow by 9 -10
% per annum in coming 5 years.
The money supply in the US
economy is increasing by 5%
every year.
Some Stock market analysts
believe that BSE Sensesx will be
at 35,000 points by 2020 A.D.
The male-female ration in India is
980 as per 2011 census.
The population of India is growing
by above 2% every year.
The voters turn out in India is only
50%.
The literacy rate in Bihar even
after 50 years of Independence is
only 47%.
The inflation in India in the year
2015-16 was below 4%.
Meaning of Data
As said earlier, numerical facts and
figures are called data. In fact, data is
nothing but meaning of statistics in plural
sense. In plural sense, it is used to
denote and refer to numerical and
quantitative information. For example,
Bill scored 72 marks in business
statistics paper. In 2015, while Chinese
economy grew at 6.5 per cent Indian
economy grew at 7.5 per cent annum.
These numerical and quantitative
informations are called data. They are
collected, tabulated, summarized, and
analyzed for presentation and
interpretation.
Levels of Measurement
There are generally four types of
variables encountered in empirical
analysis. The type of variable under
consideration plays an important role in
selecting appropriate statistical tool for
analysis. For instance, it is not
advisable to compute arithmetic mean of
nominal scale variable. These various
types of variables and its nature are
discussed as follows:
Nominal Scale Variables
Nominal scale variable is very common
in marketing or social science research.
A nominal scale divides data into
categories which are mutually
exclusives and collectively exhaustive.
In other words, a data point is grouped
under one and only one category and all
other data will grouped somewhere else
in the scale. The word nominal means
namelike which means that the numbers
or codes given to objects or events are
naming or classifying only. These
numbers have no true meaning and thus
cannot be added, multiplied or divided.
They are simply labels or identification
number.
Types of Data
There are three common types of data
Time Series Data
A time series data is a sequence of
observations ordered in time. For
example data on real gross domestic
product, money supply, inflation, etc. are
collected at specific points in time, say
yearly. These data are ordered by time
and are called time series. The
observations made on GDP or money
supply at timet andt+1 are separated
by some unit of time such as days,
weeks, months or years. The following
are some examples of time series data:
Annual GDP at current price
from1950 to 2016
Monthly figures on broad money
(M3) from April 2005 to March
2016
Profits of Reliance Industry over 20
years
Daily closing price of BSE Sensex
over 10 years
Macroeconomist who studies economy
as a whole often works with time series
data on important macroeconomic
variables like real GDP and inflation,
etc.
Cross-Sectional Data
Cross-sectional data refers to parallel
data on many units such as individuals,
firms, countries, etc. at the same point of
time. The following are few examples of
cross sectional data:
Profits of 50 firms in 20014-15
Per capita income of 100 nations in
2016
Heights and weights of 500 people
in a company in 2016.
Income, education and experience of
2500 workers in a locality in 2015.
In economics field, micro economists
often work with cross-sectional data.
For example, suppose a labor economist
wish to know how much the workers of
the textile industry earn. To this end, he
asks 100 workers how much they earn.
Thus, income of 100 workers in textile
industry is nothing but cross sectional
data.
Panel Data
Panel data contains features of both time
series as well as cross sectional data. In
panel data analysis the same cross
section unit is surveyed over time. The
following are examples of panel data:
Growth rate of M3 of 20 countries
from 1995 to 2015
Profits of 100 firms over 10 years
Rate of return of 120 mutual funds
over 15 years
Exercises
1. What is data?
2. What are elements and variable?
3. What are the four levels of
measurement of variable?
4. What is nominal data? Can you find
arithmetic mean of such data? Why?
5. What is ordinal data? Is it qualitative
or quantitative data?
6. What is interval data? Give one
example.
7. What is ratio scaled variable? Is it
qualitative or quantitative data?
8. What are cross-sectional, time series
and panel data?
Chapter 3
Data Visualization and
Representation
Introduction
Descriptive analytics means describing
any phenomenon in terms of graph,
pictures, symbols and mathematical
expressions. Describing data related to
economy, business and other areas is the
first step in data handling.
Diagrammatic Representation
Diagrams and graphs are useful because
they provide a birds eye view of the
entire data and information presented is
easily understood. However, a
distinction is there between diagram and
graph.
Line Chart
Line graph is used to present a particular
variable measured at various points over
time. It is drawn by connecting lines
between two data points. A line graph is
very useful in highlighting trends in a
variable over time. Monthly closing
stock prices data of Facebook, Inc.,
from 18th May 2012 through 1st
February 2017 is shown in Figure 3.1 is
an example of line chart. This data set
contains 58 observations and it is
difficult for a reader to comprehend
anything meaningful observing raw data.
However, a reader can easily grasp the
main characteristics and its trend over
the years looking at the line graph.
Figure 3.1
As evident from Fig. 3.1, the stock price
of Facebook shows a rising trend after
18th May, 2013; before this the stock
price of Facebook has remained stagnant
almost for a year. The stock price of
Facebook was around $30 in May 2012
and it crossed $50 somewhere in mid of
2013. The trend in the stock is up and as
on 1st February 2017, the stock was
quoting a price of $135.54.
Computer Application
Microsoft Excel with Chart Wizard can
generate line graph. To construct line
graph in Excel, enter label and data into
a column. Choose Insert from the menu
bar, then go for Chart from the pull
down menu. Select Line from this and
follow the instructions and in four steps
line graph is completed. There are
options in Excel such as including a
legend, deciding data labels and finally
deciding the location of the chart.
Using Excel
Many of the graphs such as line graph,
bar graph and scatter plot discussed in
this chapter can be generated by the help
of Chart Wizard. Excel can also
generate histogram using Data Analysis.
Line Graph
2013 7.87
2014 12.47
2015 17.93
2016 27.64
Illustration
The following table shows the Sales of
Apple Inc.($ Billions) and cost of goods
sold ($ billions) from 2012 to 2016.
Computer Application
Illustration
The table below shows sector-wise
capital formation of household sector,
private corporate sector and public
sector in India from 2010-11 to 2014-15.
The appropriate bar diagram to
represent the contribution of one sector
in total capital formation in India is
subdivided bar diagram.
Private
Household Corporate
Years Sector Sector
2010-
11 437544 33262
2011-
12 415207 47916
2012-
13 461627 57591
2013-
14 481026 70848
2014-
15 526209 59064
Illustration
The table below shows major
components of Central Government
Receipts from 2011-12 to 2014-15.
Construct the percentage bar to illustrate
this data.
Solution
The percentage bar graph of Central
Government Receipts from 2011-12 to
2014-15 is shown below.
Fig. 3.5: Percentage Bar Graph of
Major Components of Central
Government Receipts (Rs. Cr)
Fig 3.5 shows that in 2011-12, the share
of tax revenue was more than 60% in
total receipts. The share of non-tax
revenue in the total receipts was around
15% and share of capital receipts was
roughly 25 percent. However, in 2014-
15, we find slight change in total
receipts. The percentage share of tax
revenue was less than 50% in the total
receipts; non-tax revenue was more or
less 12% but the capital receipts were
around 38% in the total receipts.
Computer Application
Table 3.2
Company Expenditure
(Rs.lakhs)
Tata Rs. 600
Reliance Rs. 1200
Godrej Rs. 200
M&M Rs. 200
Bajaj Rs. 300
Birla Rs. 100
Others Rs. 400
Total Rs. 3000
Computer Application
Microsoft Excel with Chart Wizard can
also generate pie chart. To construct pie
chart in Excel, enter label into one
column and data into another column.
Choose Insert from the menu bar, then
go for Chart from the pull down menu.
Select Pie from this and follow the
instructions and in four steps line graph
is completed. There are options in Excel
such as including a legend, deciding
data labels and finally deciding the
location of the chart.
Using Excel
The following table 3.3 shows sectoral
allocations of financial resources during
10th plan.
Table 3.3: Sectoral Allocation during
10th Plan
Sectors Rs.
Crores
Education 62461
Rural Development Land 87041
Resources & Panchayati
Raj
Health Family Welfare & 45771
Ayush
Agriculture & Irrigation 50639
Social Justice 36381
Physical Infrastructure 89021
Scientific Department 29823
Energy 47266
Total Priority Sector 448403
Others 365375
Total 813778
Example
In a MBA course there are 55 male and
30 female students respectively.
Construct a doughnut chart to display
this information.
Computer Application
Microsoft Excel with Chart Wizard can
also generate pie chart. To construct pie
chart in Excel, enter label into one
column and data into another column.
Choose Insert from the menu bar, then
go for Chart from the pull down menu.
Select Doughnut from this and follow
the instructions and in four steps. There
are options in Excel such as including a
legend, deciding data labels and finally
deciding the location of the chart.
Using Excel
The following table shows data relating
to gender of students:
Male Female
MBA 55 30
We will create doughnut chart using
excel now.
End-Use
Sector 2012 2013 2014 20
Residential 17835 18687 19278 186
Commercial 15840 16227 16622 166
Industrial 28258 28660 28931 286
Transportation 24098 24516 24683 250
Using Excel
Step 1: To find the relationship between
price movement of Microsoft and
Facebook, we can construct a scatter
plot. To construct scatter plot in Excel,
first entered data as shown below:
Step 2: Click Insert on the toolbar the
screen below will appear
Step 3: Select data and click Scatter
as shown below
Step 4: If you click on the first option
the screen below will appear
Step 5: The procedure for labeling the
scatter plot is same to that of line graph
discussed under step 5 to step 11.
Following those steps the scatter
produced in Excel is shown below:
Bubble Chart
A Bubble chart is like a scatter chart
with an additional third column to
represent the size of the bubbles.
Bubble shows to represent the data
points in the data series.
Solution
The first step is to find mid-points of each
class interval in a frequency table as shown
below
Mid-
Marks Points Frequency
0 - 10 5 4
10-20 15 7
20-30 25 3
30-40 35 5
40-50 45 18
50-60 55 30
60-70 65 20
70-80 75 8
80-90 85 6
90-
100 95 2
Hindi Hin
Film Bha
Music Type Music
Age group
10 -25 190 20
26-50 80 60
51 and more 46 98
Total 316 178
Exercises
1. The following table shows the US
Field production of crude oil
(Thousand Barrels) from 2000 to
2015. Construct a line chart using
excel.
U.S. Field Production o
Year Barr
2000 2130
2001 2117
2002 2096
2003 2061
2004 1991
2005 1892
2006 1856
2007 1852
2008 1829
2009 1953
2010 1998
2011 2060
2012 2374
2013 2725
2014 3198
2015 3436
Mean
The arithmetic mean is what most
laymen call an average. Mean is
computed by adding all the observations
in a data set and dividing the resulting
sum by the total number of observations.
The mathematical formula for computing
mean is:
(3.1)
Solved Example
Compute arithmetic mean of the
following marks in Statistics obtained by
10 students in a test:
Roll No: 1 2
3 4 5 6 7 8
Marks
56 78 65 44 90 88 75
Solution:
Roll. No. Marks
1 56
2 78
3 65
4 44
5 90
6 88
7 75
8 52
9 51
10 68
N=10 X =667
Using Excel
In Microsoft Excel, AVERAGE function
can be used to calculate arithmetic mean.
In particular, we calculate the mean by
entering the formula:
= AVERAGE (A2:A11)
Median
The median is another frequently used
measure of central tendency which is the
middle value in data set when it is
arranged in an ascending order. When
the number of observations is odd, the
middle value is the median. In case of
even number of observations, there is no
single value. So in such case the median
is computed as the average of two
middle observations.
Solved Example 1
Compute the median of the following
sales data of ABC Company.
304
414 520 315 480 600 665
Solution
First arrange the data in ascending order
as shown below:
Solved Example 2
Compute the median for the following
data
68 76 84 52 40 94 66
Solution
First arrange the data in ascending order
as shown below:
40 52 54 66 68 76 84
Median = = 67
Using Excel
In Microsoft Excel, MEDIAN function
can be used to calculate the median. In
particular, we calculate the median by
entering the formula:
= MEDIAN (A2:A9)
Mode
Mode is defined as the value that most
often occurs with highest frequency in a
data set. For example, consider the
sample of marks of 5 students in a class
given below:
45 62 58 62 76
Using Excel
In Microsoft Excel, MODE function can
be used to calculate the mode. In
particular, we calculate the mode by
entering the formula:
= MODE (A2:A6)
Quartile
The quartile divides a series or a set of
observations into 4 equal parts. Median
being the second quartile, so there are
only two quartiles actually. The lower
quartile (Q1) divides a series into such
that (one-fourth) of total frequency is
lying below Q1 and (three-fourth) is
lying above Q1. The upper quartile (Q3)
divides a series into such that (three-
fourth) of total frequency is lying below
Q3 and (one-fourth) is lying above Q3.
Using Excel
I collected stock prices data of Apple
Inc., from 1st February 2017 to 3rd
March, 2017 to illustrate how to
compute 1st and 3rd quartiles in excel. In
Microsoft Excel, QUARTILE function
can be used to calculate the 1st, 2nd, and
3rd quartiles. In particular, we calculate
the lower quartile (Q1) by entering the
formula:
= QUARTILE (B2:B23, 1)
Thus, the 1st quartile is 132.06.
Percentile
Every year in India around more than
200000 MBA aspirants appear in
Common Aptitude Test (CAT). Often you
may have heard students saying that I got
95 percentile in CAT or my CAT score
is 99 percentile. What does it mean? Its
meaning is only 5 percent person got
more marks than me if my score is 95
percentile.
Using Excel
Range
The range is the simplest measure of
dispersion. The range is defined as the
difference between the highest value and
the lowest value.
14 21 12 36 25 8
Solution
The range is given by the following:
Range = Highest Value Lowest Value
= 36-8 = 28.
Thus, the range is Rs. 28 lakhs.
Inter-quartile Range
Range as a measure of dispersion is
based on maximum and minimum values
in the data set. To avoid this problem,
one can resort to inter-quartile range.
Inter -quartile range is computed on the
middle 50% of the observations after
elimination of highest and lowest 25%
observations in a data set which is
arranged in ascending order. Unlike
range, inter-quartile range is not
sensitive to extreme values.
Solved Example
The following data shows quarterly
operating profit (Rs. Cr) of Reliance
industries from September 2008 to June
2011. Calculate inter-quartile range.
6474 5363 5437 5921 7217 7844
Solution
First arrange the data in ascending order
Using Excel
I collected stock prices data of Apple
Inc., from 1st February 2017 to 3rd
March, 2017 to illustrate how to
compute 1st and 3rd quartiles in excel
and computation of inter-quartile range.
First calculate the lower quartile (Q1) by
entering the formula:
= QUARTILE (B2:B23, 1)
So the 1st quartile is 132.06.
Solved Example
The following data shows annual gross
profit margin (%) of Indian Oil
Corporation Ltd (IOCL) from March
2007 to March 2011. Calculate Mean
Absolute Deviation (MAD).
Year:
2007 2008 2009 2010 2011
GPM(%): 5.15 5.13 2.33 6.36
Solution
Variance
The variance is the most widely used
measure of variability. It is basically
average of the squared deviations from
the arithmetic mean. The formula for
population and sample variance are as
follows:
It is to be kept in mind that we generally
work with a sample. Also the variances
of population and sample are practically
same, when the number of observations
is large.
Suppose in a class there are 20 students.
Their marks in business statistics paper
are as follows:
Students Marks
Mike
Tony
Ryan
Bob
Joe
Smith
Robin
Kate
Silsa
Tisca
Tom
Jim
David
Adam
Singer
Rocky
Mark
John
Hillan
= VARP (B2:B20)
Suppose we take a representative
sample from the class which is given as
follows:
Students Marks
Joe
Kate
Tom
David
Singer
Rocky
Mark
Hillan
Solved Example
Compute standard deviation for the
following data:
52 65 88 72 81 112 66
Solution:
X 2
52 -29 841
65 -16 256
88 7 49
72 -9 81
81 0 0
31
112 961
-15
66 24
225
105 9 576
90 21 81
102 -23 441
58 529
X =891
)2=4040
Thus, the standard deviation is 20.09.
Using Excel
In Microsoft Excel, to obtain standard
deviation of sample, enter the formula:
= STDEV (A1:A11)
Coefficient of Variation
The coefficient of variation measures
dispersion in relation to the mean. This
is a relative measure of dispersion and
is used to compare the relative variation
in one data set with the relative variation
in another data set. For example,
suppose you want to know the relative
variation of marks for two classes of
students: Class 1 and Class 2. This
relative of dispersion that is coefficient
of variation can serve the purpose.
The coefficient of variation is given by
the following expression:
where
S = standard deviation
Solved Example
The following table gives closing prices
of Infosys Technology Ltd and Tata
Consultancy Services (TCS) from
29.06.2011 to 9.08.2011 in descending
order i.e. 9th Aug to 29th June.
TCS Infosys
964.8 2374.55
1005.35 2470.5
1057.95 2591.2
1095.85 2709.15
1110.35 2732.6
1130.65 Mean= 1134.473 2750.95 Mea
1135.25 S = 50.59333 2815.1 S=
1137 2775.9
1129.55 C.V= 4.459351 2751.1 C.V=
1147.15 2796.65
1144.65 2801.8
1139.9 2807.75
1133.4 2828.25
1122.55 2768.1
1132.45 2752.45
1140.05 2750.5
1125.05 2713.4
1146.05 2731.35
1123.7 2740.35
1149.05 2777.3
1145.35 2791.55
1156.65 2921.15
1171.65 2976.55
1195.9 2995.7
1182.8 2953.7
1179.45 2956.45
1185.7 2938.95
1191.9 2934.15
1184.2 2910.45
1169.85 2881.75
Exercises
1. The following table shows closing
stock prices of Google and
Microsoft from 15th September,
2015 to 16th October, 2015.
Date Google Stock Price
15-09-2015 635.14
16-09-2015 635.98
17-09-2015 642.90
18-09-2015 629.25
21-09-2015 635.44
22-09-2015 622.69
23-09-2015 622.36
24-09-2015 625.80
25-09-2015 611.97
28-09-2015 594.89
29-09-2015 594.97
30-09-2015 608.42
01-10-2015 611.29
02-10-2015 626.91
05-10-2015 641.47
06-10-2015 645.44
07-10-2015 642.36
08-10-2015 639.16
09-10-2015 643.61
12-10-2015 646.67
13-10-2015 652.30
14-10-2015 651.16
15-10-2015 661.74
16-10-2015 662.20
Symmetrical Distribution
The shape of a distribution has a very
important role in statistical analysis. In
fact most of the statistical analysis is
based on the assumption of normal or
symmetrical distribution. Rarely
binomial or Poisson or other types of
distribution is used for statistical
analysis. It is important to note here that
normal distribution or symmetrical
distribution is a requirement in most
statistical analysis and to begin
statistical analysis, we have to check
whether data are normally distributed or
not.
Asymmetrical Distribution
Asymmetrical distribution means the
distribution is not normal. The non-
symmetrical distribution or non-normal
distribution is called skewed
distribution. In Figure 1 panel b shows
the shape of a normal distribution and
panel a and panel c show skewed
distributions.
Figure 1
Measure of Skewness
Skewness is defined as the lack of
symmetry in a frquency distribution.
There are two types of skewness:
Absolute Measure of skewness
Relative measure of skewness
Absolute Measure of Skewness
It is measured by taking the difference
between the mean and the mode.
Absolute Skewness =
If the value of mean is greater than the
mode the skewness will be positive.
However, if the mode is greater than the
mean, the skewness will be negative.
Here it is important to ask why the
skewness is defined as the difference
between the mean and the mode? In a
symmetrical or normal distribution, the
mean, median and mode all are equal.
However, in a skewed distribution, the
mean moves away from the mode, which
is nothing but skewness. Thus, the
distance between the mean and the mode
could be used to measure skewness. The
greater the distance, whether positive or
negative, the higher is the skewness.
Illustration
The following table shows stock prices
of State bank of India (SBI) and ICICI
Bank from 01.06.2016 to 08.07.2016.
Compute absolute skewness for the
above data.
Date SBI
01-06-2016
02-06-2016 20
03-06-2016
06-06-2016 19
07-06-2016
08-06-2016
09-06-2016
10-06-2016
13-06-2016
14-06-2016 20
15-06-2016 21
16-06-2016
17-06-2016
20-06-2016
21-06-2016
22-06-2016
23-06-2016
24-06-2016 21
27-06-2016
28-06-2016
29-06-2016
30-06-2016 21
01-07-2016 21
04-07-2016 22
05-07-2016
07-07-2016
08-07-2016
Solution
We computed mean, median and mode
for the SBI and ICICI bank stock prices,
which given below:
Company Mean Median Mode*
SBI 212.21 213.9 217.28
ICICI 242.65 241.25 245.45
bank
Note: Mode is computed by by the
formula: Mode = 3 Median- 2Mean
Using Excel
In Microsoft Excel, to obtain skewness,
enter the formula:
= Skew (B2:B28)
It is important to note that the above
excel formula gives relative skewness.
Concept of Kurtosis
Kurtosis refers to the degree of flatness
or peakedness of a frequency
distribution. It is always measured in
relation to the peakedness of normal
curve. It tells us the extent to which a
distribution is more peaked or flat than
the normal curve. There are three
possibilities:
1. The frequency distribution exactly
coincide with a normal curve. A
normal curve is itself called
mesokurtic. Figure 2 shows all these
three possibilies.
Figure 2
Concept of Moments
The deviation of any item in a
distribution from its mean is given by the
following expression:
X-
Let denote the above expression by x.
The arithmetic mean of the various
powers of these deviations are called
moments of the distribution. For
example,
1. If we take the mean of the first
power of the deviations of items
from the mean, we will get the first
moment about the mean. It is denoted
by . Symbolically,
Importance of Moments
The concept of moment is very important
in statistical work. Moments can help to
measure the central tendency of a set of
items, their dispersion, their asymmetry
and their peakedness. The computation
of first four moments about the mean
helps to identify the various
characteristics of a frequency
distribution. This is, in fact, the first step
in the analysis of a frequency
distribution. The following table is the
summary of how moments are helpful in
analysizing a distribution.
Moment What it
measures
1. First moment Mean
about origin Variance
2. Second moment Skewness
about the Kurtosis
arithmetic mean
3. Third moment
about the
arithmetic mean
4. Fourth moment
about the
arithmetic mean
Solved Example
Find the kurtosis for the following data:
57 60 62 65 68 72 78
Using Excel
In Microsoft Excel, to obtain skewness,
enter the formula:
= Kurt (B2:B28)
Chebyshev Theorem
For symmetrical or normal distribution
about 68% of the items fall between +1
and -1 standard deviation from
the arithmetic mean and about 95% of
the observations fall between +2 and -2
standard deviations. And about 99.7% of
the items fall between -3 and +3
standard deviations. However,
Chebyshevs Theorem allows us to use
this idea for any distribution,
irrespective of the shape of distribution.
The theorem states that given a group of
N numbers, at least the proportion 1-
(1/K)2 of the N observations will lie
within K standard deviation from the
mean.
Chebyshev Proportion
Solved Example
I collected the daily stock prices of
Apple Inc., from 1st February 2016 to 3rd
March 2017 to illustrate the empirical
rule. Between 97.37 and 119.65, 68 per
cent of the observed stock prices of
Apple Inc., fall in this range. Other
ranges are obtained in a likewise manner
shown below:
z- Score
Standard normal distribution is a special
normal probability distribution with a
mean of zero and a standard deviation of
one. A normal variable can be
transformed into standard normal
variable by the following formula:
where
X is an observation from the original
normal distribution, is the mean and
is the standard deviation of the original
normal distribution. The standard normal
distribution is also called Z distribution
or Z score. A Z score tells us the number
of standard deviations a particular
observation is above or below the mean.
A Z score is a unit free number which
help to compute probability because it
cannot be calculated directly as they are
expressed in different units. Thus, it is
necessary to convert them into Z score
first.
Solved Example
The mean mark of students in business
statistics paper is 78 with a standard
deviation of 15. The random variable
marks of students follow a normal
distribution.
a) What is probability of obtaining
marks less than 50?
b) What is the probability of
obtaining marks more than 90?
c) What is the probability that the
marks lie between 80 and 90?
Solution
a) Using standard normal distribution,
we converted it to z score first as
follows:
For a z value of between 0 and -1.86,
the area under the curve is 0.4686. As
we know that the total area below the
mean is 0.5000, the area below -1.86
must be 0.5000 0.4686 = 0.0314.
Thus, the probability that the marks will
be less than 50 is 3.14 percent.
and
Computer Application
To demonstrate using Microsoft excel
for calculating probability of a normally
distributed variable, the following
example will be used.
Solution
Step 1: Open any Microsoft excel sheet.
Step 2: Click functions the following
dialog box appear
Step 3: Select Statistical from select
category. When you select statistical the
following will appear:
Step 4: Next select NORMDISTfrom
select a function. The following dialog
box will come.
Step 5: When you click OK, the
following dialog box will appear
Step 6: Enter the value of X, , and 1
in the cumulative cell. It is important to
note that Microsoft excel always
provides cumulative probability. When
the value of Z is negative, we will get
the answer directly. When z is positive,
we will get answer by subtracting
probability return by excel from 1 i.e. 1-
probability value.
Thus, the probability that the a CFL bulb
lasts for less than 6 months is 0.0227 or
2.27 percent.
Illustration
A student of MBA got the following
marks in the first semester where the
weightage of mid-term, end-term and
internal assessment are 30%, 50% and
20% respectively. Find the weighted
arithmetic mean.
Exams Marks Weight wX
(X) (w)
Mid- 76 30 2280
term
End- 60 50 3000
term
Internal 70 20 1400
=100 =6680
where,
M = mid-point of the class interval
f = Frequency of each class
Illustration
The following table shows dividend
declared by different companies during
2015. Compute the average dividend.
Dividend Mid- No. of fM
(%) Point Companies
0-10 5 12
10-20 15 15
20-30 25 20
30-40 35 25
40-50 45 30
50-60 55 45
60-70 65 60
70-80 75 36
80-90 85 42
90-100 95 50
=2
Illustration
The following data show the monthly
stock prices of State Bank of India from
January 2015 to April 2016. Compute
the standard deviation of prices.
Date Close
Jan-15 310
Feb-15 301.6
Mar-15 267
Apr-15 270.05
May-15 278.15
Jun-15 262.8
Jul-15 270.4
Aug-15 247.1
Sep-15 237.25
Oct-15 237.2
Nov-15 250.45
Dec-15 224.4
Jan-16 179.95
Feb-16 158.4
Mar-16 194.3
Apr-16 188.95
Solution
The data on SBI shows that the highest
and lowest prices of SBI during this
period were 310 and 158.4. We grouped
the data taking a class width of 20. The
following is the grouped price data for
the SBI:
Class Frequency Mid- fm (m- fd
Interval (f) points )i.e.d
(m)
150-170 1 160 160 -82.5 -82
170-190 2 180 360 -62.5 -12
190-210 1 200 200 -42.5 -42
210-230 1 220 220 -22.5 -22
230-250 3 240 720 -2.5 -7.
250-270 3 260 780 17.5 52.
270-290 3 280 840 37.5 112
290-310 2 300 600 57.5 115
Computer Application
To compute descriptive statistics in
Excel, click tools on the excel menu bar.
From the tools drop down menu select
Data Analysis. When you click Data
Analysis, a dialog box will appear.
Choose Descriptive Statistics from this
dialog box. Select data into the Input
Range. If we want labels then click
Labels. Select Summary Statistics from
the dialog box. Specify the location
where you want results. Click OK.
Marks
Mean
Standard Error
Median
Mode
Standard
Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Exercises
Date TCS
07-09-2016 2440.55
08-09-2016 2322.1
09-09-2016 2352.45
12-09-2016 2359.05
13-09-2016 2359.05
14-09-2016 2328.45
15-09-2016 2326.95
16-09-2016 2361.7
19-09-2016 2411.45
20-09-2016 2406.5
21-09-2016 2413.4
22-09-2016 2378
23-09-2016 2398.1
26-09-2016 2401.1
27-09-2016 2436.1
28-09-2016 2423.05
29-09-2016 2437.8
30-09-2016 2430.8
03-10-2016 2411.7
04-10-2016 2405.15
05-10-2016 2386.35
06-10-2016 2388.75
07-10-2016 2367.8
What is Prediction?
Prediction is a method for guessing the
future. Predictive analysts believe in the
law of repetition; they extrapolate from
the past. Prediction is classified into
qualitative, quantitative, and mixed
methods.
Problem Identification
It is the first step in the predictive
analytics process. This involves
identifying the exact variable of interest
that is to be forecasted. The problem
identification stage requires a deep
understanding of the system or the
company. This first stage also raises
some critical questions such as:
1. How the predictions or forecasts
will be used in the company?
2. Who requires the forecasts?
3. For what purpose the forecasts are
required?
4. How the whole predictive analytical
exercise fits within the company?
Collection of information
After the problem identification, both
quantitative and qualitative information
are collected in order to carry out further
predictive analytics. In particular, it
gives rise to two types of information: a)
Statistical data, and b) personal
judgement and opinion of experts. In the
process of forecasting both kinds of data
must be obtained for arriving at the
reliable prediction.
Preliminary Analysis
It is the third stage. In this stage we try to
understand the data in hand. We can start
by constructing line chart or bar chart
using excel to understand the major
trends in data. The line chart can suggest
whether sales or any other variable is
linear or non-linear over time. One can
also compute descriptive statistics such
as mean, standard deviation, skewness
and kurtosis to know the distributional
aspects of data. The idea behind doing
this preliminary analysis is to get a feel
for the data. This stage helps in
suggesting some insightful models that
might be useful in the prediction
process.
Choosing and Fitting Models
The next step in the prediction process is
to choose and fit the correct model for
forecasting. As pointed out earlier, the
preliminary analysis is very useful and
can suggest appropriate forecasting
model for the underlying data generating
process. One can pick up one or two
leading models for subsequent analysis.
Depending upon the forecasting
horizons, the predictive analytic
techniques can be selected.
Exercises
1. What is prediction? Discuss its
significance in business decision
making process.
2. Explain the process of predictive
analytics.
Chapter 8
Time Series and its
Components
Introduction
A time series data is a sequence of
observations ordered in time. For
example data on sales turnover, net
profit, total expenses, stock prices,
exchange rates, etc., are collected at
specific points in time, say weekly,
monthly, quarterly or yearly basis. These
data are ordered by time and are called
time series. Time series analysis helps
to understand performance of a business
entity; its evolution over time and its
likely performance in the future.
According to Prof. Werner Z. Hirsch,
The main objective in analyzing time
series is to understand, interpret and
evaluate changes in economic
phenomenon in the hope of more
correctly anticipating the course of the
events. In other words, analysis of time
series data not only helps in studying
past behavior of an economic or
business phenomenon but also aid in
forecasting of economic variables such
as sales, cost scenario, etc. Based on
these predictions, businessmen,
administrators and planners can
formulate their policy and future plan.
This importance of time series analysis
was long before highlighted by Edward
E. Lewis as: For the economist in his
effort to learn more and more about how
the economic system works, the study of
time series is perhaps the most important
source of information.
Multiplicative Model
This model has a form: Y = TxSxCxI. If
you want to find short-term variations,
divide the original series by calculated
trend values. The formula below
describes this operation:
Similarly, if the aim is to find irregular
component from the original time series,
the following operations will achieve
this:
Exercises
1. What is time series data? Discuss the
various components of a time series.
2. Distinguish between additive and
multiplicative models.
Chapter 9
Trend Analytics
Introduction
Trend analytics means explaining any
variable in terms of time. Each
observation of a phenomenon in a time
series is the compound effect of the four
components namely, trend, seasonal
variation, cyclical variation and random
component. Trend is one of the dominant
components of time series data. The
procedure of isolating the trend values
from the time series involves the
measurement of trend.
Least Square Method of Estimating
Method
It is a mathematical method where a
trend line is fitted to the data in such a
way that:
a) the sum of deviations of the
actual values of Y and the estimated
values of
is zero i.e.,
b) the sum of squares of the
deviations of actual and estimated
values is least or minimum from the
estimated line and hence the name
least square method i.e.,
Solution
Year Production(Y) Time (X) X2
2009- 14.66 1 1
10 18.24 2 4
2010- 17.09 3 9
11 18.34 4 16
2011- 19.25 5 25
12 17.20 6 36
2012-
13
2013-
14
2014-
15
104.78 = 6 +21 (i)
375.22 = 21+91 (ii)
Multiply equation (i) by 7 and equation
(ii) by 2
42 +147 = 733.46
42 +182 =750.44
_ _ _
-35 = -16.98
To find the values of substitute the
values of in equation (i), we get
104.78 = 6 +21
104.78 = 6 +21(0.4851)
104.78 = 6 + 10.1871
104.78-10.1871 = 6
94.6 = 6
Or
Thus, the estimated linear trend line is
Y = 15.76 +0.4851X
What is the use of the above estimated
trend line? The estimated trend line can
be used to forecast the production of
pulses in 2015-16. In this case the value
of time variable for 2015-16 will be 7.
Substitute the value of 7 in the estimated
linear trend and we get
=82.15
Using Excel
Step 1: Enter the data as shown below:
Y = 74.45 + 0.9066 T
Non-Linear Trend
If a variable tends to increase or
decrease by constant amount over time
then the linear trend measurement is
appropriate. However, if the increase or
decrease in a variable over time
expands by uneven increment then
parabolic curve of second or third
power may be more appropriate. The
parabolic curves of various degrees are:
Exponential Trend
If the time series is increasing or
decreasing by a constant percentage
rather than a constant amount, the
exponential trend model is considered
appropriate often many economic and
business date show such tendency. The
equation for exponential function is:
Y = X
= 65.82/15 =4.39
Similarly,
= 5.98/280 = 0.02
Thus, our estimated exponential trend
model is:
= 4.39+0.02t
The above estimated model can be used
to forecast next year natural gas
production as given below:
2005-06= Antilog [4.39+0.02(8)]=
Antilog [4.55]= 35481.34 million cubic
meters
Using Excel
Quadratic Trends
The linear function to model trend is
very common. However, sometimes time
series shows nonlinear trend like
quadratic trend or cubic trend. In
quadratic relationship the value of
dependent variable Y from time variable
T, which is, independent variable takes
the form
Y = t
The above model cannot be estimated in
its original form. However, after
logarithmic transformation, it will
become linear and can be estimated by
OLS method.
Exercises
1. The following table shows index of
electricity generation in India
(Million Kwh) from 2004-05 to
2015-16. Fit straight line trend by
least square method.
Year Index of Electricity
Generation (Millions Kwh)
2004-05 100
2005-06 105.2
2006-07 112.8
2007-08 120.3
2008-09 123.3
2009-2010 130.8
2010-11 138
2011-12 149.5
2012-13 155.2
2013-14 164.7
2014-15 178.6
2015-16 188.7
Yt = Yt-1
where Yt is the forecasted value for the
next quarter or at time t, and Yt-1 is the
actual value of sales in the previous
quarter or at time t-1. There other
versions of nave models also:
2013 7.87
2014 12.47
2015 17.93
2016 27.64
Solution
1. The nave model says the forecasted
sales of the Facebook for the next
year will be same as the previous
year.
Yt = Yt-1
Thus the sales for the year 2017 will
be 27.64 billion dollars.
Yt = Yt-1+Y
= 27.64+9.71 = 37.35
Simple Average
If a time series is stationary and we just
want to predict a single future value of
this series, then using an average value
of the series is almost as good as any
other method. The most elementary
forecasting method is simple average
model. With this model, the forecasts for
time periodt is the average of the
values for a given number of previous
time periods.
The advantage of this simple method is
that it can be extended further in the
future. If we need to forecast for the next
five observations, we just extend the
mean line. By definition, if a series is
stationary it fluctuates around its mean.
Therefore, the mean is its best predictor.
This method does not produce very
accurate forecasts, but the results will be
precise enough. To add more
sophistication to our forecasting and to
try to emulate the movements of the
original series, we need to extend the
principle of a simple average to a
moving average principle.
Example
The following table shows call money
rate which is a proxy for interest rate in
India. In general, the interest rate is a
non-trending variable and for such
variable the mean is its best predictor.
Call Money
Year
Rate (%)
2002-
5.89
03
2003-
4.62
04
2004-
4.65
05
2005-
5.60
06
2006-
07 7.22
2007-
6.07
08
2008-
7.26
09
2009-
3.29
10
2010-
5.89
11
2011-
8.22
12
2012-
8.09
13
2013-
8.28
14
2014-
7.97
15
2015- 6.98
16
2016-
6.40
17
Thus the predicted call money rate for
the next year is 6.42 per cent.
Moving Average Method
Moving average is another method of
determining trend. It consists of a series
of arithmetic means computed from
overlapping groups of successive items
of a time series. Each moving average is
computed using values covering a fixed
time interval called period of moving
average. Successive moving average is
computed by removing the first
observations of the previously averaged
group by the next observation. The
objective of finding these averages is to
remove the periodic type of variations.
The averaging process smoothens out
fluctuations and ups and downs in the
data. To remove variations appropriate
period of moving average is used.
Usually 3, 4, 5, or 7 period moving
average are used to compute the moving
average.
Illustration
Find the trend of the yearly stock prices
of TCS using 3 period moving average
method.
Year Stock Price
2004-05 333.88
2005 425.62
2006 609.3
2007 541.68
2008 239.05
2009 749.75
2010 1165.05
2011 1161.25
2012 1258.55
2013 2170.95
2014 2554.7
2015-16 2439.2
Solution
Year Stock Price 3 MA
2004-05 333.88 #N/A
2005 425.62 #N/A
2006 609.3 456.2667
2007 541.68 525.5333
2008 239.05 463.3433
2009 749.75 510.16
2010 1165.05 717.95
2011 1161.25 1025.35
2012 1258.55 1194.95
2013 2170.95 1530.25
2014 2554.7 1994.733
2015-16 2439.2 2388.283
Using Excel
Step 1: Enter the data as shown below:
Step 2: Click Data and select Data
Analysis from the menu bar. When you
click data analysis the following dialog
box will appear.
Step 3: Select Moving Average from
the dialog box as shown below:
Step 4: When you click ok the following
another dialog box will appear shown
below:
Step 4: Enter data from B1:B13 in
Input Range. Click labels and put 3 in
Interval form 3-period moving
average. Specify output range as shown
below:
Step 5: After this, when you click OK
the following results will appear shown
below:
Illustration
The table below shows are the
shipments (in millions of dollars) over a
12-month period. Use these data to
compute 4-month moving average for all
available months.
Month Shipments
January 1,056
February 1,345
March 1,381
April 1,191
May 1,259
June 1,361
July 1,110
August 1,334
September 1,416
October 1,282
November 1,341
December 1,382
Solution
The first moving average is
Illustration
Compute a 4-month weighted moving
average for the data given in table 4.1,
using weights of 4 for the last months
value, 2 for the previous months value,
and 1 months for each values from the 2
months prior to that.
Solution
The first weighted average is
Ft+1=.Xt+ (1-) Ft
where
Ft+1= the forecast for the next time
period (t+1)
Ft= the forecast for the present time
period (t)
Xt = the actual value for the present time
period
= a value between 0 and 1 referred to
as the exponential smoothing constant.
Illustration
The table gives monthly price data on jet
kerosene. Use exponential smoothing to
forecast the values for ensuing time
period. Work the problem using =0.2,
0.5 and 0.8.
Monthly Price Data on Jet kerosene
Month Price of Jet
kerosene
January 66.1
February 66.1
March 66.4
April 64.3
May 63.2
June 61.6
July 59.3
August 58.1
September 58.9
October 60.9
November 60.7
December 59.4
Solution
The following table provides the
forecasts with each of the three values of
alpha. Note that because no forecast is
given for the first time period, we cannot
compute a forecast based on exponential
smoothing for the second period.
Instead, we can use the actual value for
the first period as the forecast for the
second period to get started.
Using Excel
Mobile
Month Sales =0.2
January 150 #N/A
February 180 150
March 200 156
April 175 164.8
May 160 166.84
June 148 165.472
July 165 161.9776
August 190 162.5821
September 230 168.0657
October 210 180.4525
November 200 186.362
December 245 189.0896
Computer Application
Step 1: Enter the data as shown below:
Step 2: Click Data and select Data
Analysis from the menu bar. When you
click data analysis the following dialog
box will appear.
Step 3: Select Exponential Smoothing
from the dialog box as shown below:
Step 4: When you click ok the following
another dialog box will appear shown
below:
Step 4: Enter data from B1:B13 in
Input Range. Click labels and put 0.8 if
=0.02 in Damping factor form.
Specify output range as shown below:
Step 5: After this, when you click OK
the following results will appear shown
below:
Exercises
INTRODUCTION
Regression analysis is a statistical tool
for analyzing the nature of relationship
between two variables. It is an important
tool frequently used by economists to
understand the relationship among two
or more variables. When our interest
lies in explaining one variable in terms
of another variable i.e. y in terms of x,
we use regression. Regression model
studies the relationship between two
variables when one is the dependent
variable and the other is an independent
variable. For example, an agriculture
economist might be interested in
explaining crop yield (y) with the help of
amount of fertilizers (x) used; change in
inflation (y) in terms of change in money
supply (x).
E(Y|X) = +X
(11.1)
E(Y|Xi) = +Xi
(11.2)
STOCHASTIC SPECIFICATION OF
PRF AND SRF
Y=
+X
(11.4)
Yi =E(Y|Xi)+i
(11.5)
Or
Yi =+Xi+i
WPI WTI
180.9 36.75
182.1 40.28
185.2 38.03
186.6 40.78
188.4 44.9
189.5 45.94
188.9 53.28
190.2 48.47
188.8 43.15
188.6 46.84
188.8 48.15
189.5 54.19
191.6 52.98
192.2 49.83
193.2 56.35
194.6 59
195.3 64.99
197.2 65.59
197.8 62.26
198.2 58.32
197.2 59.41
196.3 65.49
196.4 61.63
196.8 62.69
Y = +X+ e (11.9)
where
Y = dependent variable
X = independent variable
= Y intercept
= slope coefficient
Y = dependent variable
X = independent variable
= estimated Y intercept
= estimated slope coefficient
(11.11)
=0.50
= 191.42-0.50(52.47)
= 165.07
R2 =
(11.22)
(11.23)
(11.24)
(11.25)
Or in short we can write equation
(11.25) as
(11.26)
(11.27)
or
(11.28)
We must keep one thing in the mind that
the width of confidence interval is
directly proportional to the standard
error meaning if the standard error of
estimate is large then the confidence
interval will also be wide and vice-
versa.
165.07412.074 (2.3002)
(11.29)
or
160.3035169.844
(11.30)
HYPOTHESIS TESTING
= 0.20
0.20
0.20
0.20
b) Choose an appropriate
significance level i.e. . In general,
decisions in social sciences are made
at 10 percent, 5 percent and 1 percent
level of significance. Lets take
=0.05 i.e. 5% to test this hypothesis.
c) Choose an appropriate test such as
Z Test or t Test. As we know that
and follow t-distribution because
variance of population is not known.
So in our case t-test is the appropriate
test.
d) Compute the value of the t-test as
follows:
FORECASTING WITH
REGRESSION MODEL
As discussed earlier, we estimated the
sample regression function based on a
sample data as:
Mean Prediction
Let us consider the value of X1, that is,
crude oil is $75. From this given value
of X1, we can find the expected level of
inflation with the help of estimated
regression line as follows:
= 202.57
(11.33)
Variance ( ) =
=
1.0947
and
se ( ) = 1.04
Individual Prediction
variance . As
variance is rarely known, it follows t
distribution with n-2 degrees of
freedom. Once we come to know the
variance of , we can construct
the confidence interval around this point
estimate as follows:
Or
Pr [202.57-2.074
(2.16)Y1|X1202.57+2.074(2.16)] =
95% (11.38)
Or
Pr (198.0902Y1|X1= 75 207.0498) =
95% (11.39)
REPORTING REGRESSION
RESULTS
se = (2.3002) (0.0432)
t = (71.76) (11.62)
r2 = 0.85 df = 22
Solved Example
Avr
Avg. income y=
Exp (Y) (X) x=
2.0114 4.59457 0.3614 1.584
2.05028 4.08237 0.40028 1.072
1.74951 2.87199 0.09951 -0.138
1.55286 3.37059 -0.09714 0.360
1.64923 3.00678 -0.00077 -0.003
1.49251 2.73353 -0.15749 -0.276
1.34479 3.17856 -0.30521 0.168
1.26918 2.10458 -0.38082 -0.905
1.90591 4.31201 0.25591 1.302
1.18567 1.59761 -0.46433 -1.412
1.6754 3.00374 0.0254 -0.006
1.52948 2.80393 -0.12052 -0.206
1.82871 3.08625 0.17871 0.076
1.28836 1.6521 -0.36164 -1.35
1.5205 2.19846 -0.1295 -0.811
1.64457 2.52558 -0.00543 -0.484
1.6454 2.67056 -0.0046 -0.339
1.34187 2.73211 -0.30813 -0.277
2.12805 4.84775 0.47805 1.837
2.29335 2.96651 0.64335 -0.043
1.65 =3.01
0.13061 0.173563
0.160224 0.080137
0.009902 0.001053
0.009436 0.009513
5.93E-07 7.25E-06
0.024803 0.004698
0.093153 0.002254
0.145024 0.054056
0.06549 0.117614
0.215602 0.132977
0.000645 3.61E-06
0.014525 0.002519
0.031937 0.000548
0.130783 0.122819
0.01677 0.043275
2.95E-05 0.015068
2.12E-05 0.007217
0.094944 0.004749
0.228532 0.232912
0.413899 6.09E-05
SST =1.786332 SSR =1.005042
The coefficient of determination
denoted by R2 is given as follows:
Se = (0.1699) (0.0541)
t = (5.11) (4.81)
R2 = 0.56 df =18
EVALUATING REGRESSION
RESULTS
In the last section, we discussed how to
report the results of regression analysis.
After this, one would like to know
whether the fitted model is appropriate
or not. How to evaluate an estimated
regression model? There are certain
criteria of judging the estimated
regression model which are as follows:
(11.40)
where n = sample size, S =
skewness, and K = kurtosis. A
variable is considered to be normally
distributed when S = 0 and K=3.
Thus, in JB test of normality the null
hypothesis is that S and K are
jointly 0 and 3 respectively. JB
statistic follows chi square
distribution with 2 degrees of
freedom. If the calculated chi-square
value is greater than critical chi-
square value, reject the null
hypothesis for a given level of
significance. But we must remember
that JB statistic is a large sample test.
EXCEL APPLICATION
Regression analysis can be done by
Excel with the help of Data Analysis.
Select Tools on the menu bar and choose
Data Analysis from the drop down
menu. Next, select Regression from the
Data Analysis dialog box. Enter the
dependent variable into Input Y Range.
Enter the independent variables in Input
X Range. Select Labels, if you have
labels for series. If you want regression
through origin, click Constant is Zero
otherwise leave it blank. Excel also
provides residuals, standardized
residuals, residual plots and fine fits
plots. If you are interested in these items
then select them from regression dialog
box. Click OK.
(12.1)
where Y is the dependent variable, X1
and X2 are independent variables, and
is the error term. , , and are the
parameters of the model. In general
multiple regression model with k
independent variable can be written as:
(12.2)
(12.3)
(12.4)
(12.5)
(12.6)
(12.7)
(12.8)
(12.9)
which provides the OLS estimators of
population parameters , and .
After estimation of the , , and
then problem arises as how to interpret
these regression coefficients.
ASSUMPTIONS OF MULTIPLE
REGRESSION MODEL
(6.10)
t = (-8.05) (12.5)
(2.26)
r2 =0.89 df
=16
(12.11)
Standard error ( =
(12.12)
(12.13)
Standard error
(12.14)
(12.15)
Standard error
(12.16)
or
(12.18)
(12.19)
= 10.38
Thus, computed t value for estimated
regression coefficient is 10.38.
Similarly, we can find the t-statistic of
, regression coefficients. After
computing t-statistic, how can we decide
whether corporate profit variable is
statistically significant in explaining the
variation in Y? For this, a simple rule is
if the computed value of t-statistic is 2
or greater than 2 then that particular
variable is statistically significant at 5
percent level of significance. Since
computed t-statistic of is 10.38 which
implies that corporate profit variable is
statistically significant at 5%
significance level.
We can also do hypothesis testing
regarding individual regression
coefficients by t-test. Lets test this
hypothesis that . The procedure of
testing is as follows:
= 2.5
e) Compare computed t value with
critical t value and take the decision:
since the calculated t value is less than
critical t0.005,16 value of 2.92, do not
reject the null hypothesis.
Similarly, we can conduct hypothesis
test of other regression coefficients.
(12.20)
(12.21)
= 71.07
(12.22)
t = (1.85) (2.17)
r2 = 0.21 df =17
(12.23)
The above F-statistic can also be written
in terms of R2
SSRnew = 26,11,134
SSRold = 6,30,772
SSEnew =2,93,898.1
= 107.81
Solved Example
Tata Motors is a major automobile
manufacturer in India. The company is
interested to know how octane level
present in gasoline and weight of the car
impact the mileage. A random sample of
12 cars were taken for studying this
relationship. The following table gives
data on mileage (Y), weight of the car
(X1) and Octane level (X2). Answer the
following questions.
Y X1(tons) X2
16.5 3.4 88
14.6 4.1 90
21.8 2.5 94
15.4 2.8 86
18.4 5.6 86
20.2 8.2 95
25.4 4 98
16.6 6.5 92
13.5 4.5 84
18.7 4.8 96
14.8 3.2 81
21.6 5.5 92
where Y = mileage
X1 =weight of the car
X2 = Octane level
, 1 and 2 are the
parameters of the model
is the random
disturbance term.
Thus, the regression line is given as
follows:
2.6244 0.844242
12.3904 0.000933
13.5424 7.042729
7.3984 3.580972
0.0784 6.597604
4.3264 3.372392
52.9984 20.50639
2.3104 0.324146
21.3444 11.70559
0.3364 10.34756
11.0224 22.89639
12.1104 0.657479
SST =140.48 SSR =87.87
Se = (12.74) (0.43)
(0.1385)
t = (-2.44) (-0.51)
(3.80)
POLYMONIAL REGRESSION
MODELS
We discussed in the last chapter how to
deal with non-linearity. Here we will
discuss non-linearity in the context of
multiple regression models. When the
dependent variable (Y) is non-linearly
related with independent variable (X),
one can fit polynomial regression model.
In polynomial regression model powers
of the independent variables are used. A
polynomial regression model of Kth
order is given as follows:
Interaction Effects
Multiple regression is a very flexible
tool used for various purposes. One such
use is incorporating interaction effects
in quantitative analysis. One can easily
incorporate interaction between two
variables using multiple regression
framework. For example, lets consider
the placement of students. A researcher
thinks that placement of a student is
largely depends on knowledge as
reflected by CGPA and communication
skills of students. Accordingly, he
specified a multiple linear regression
model assuming that the independent
variables are linearly related with
dependent variable given as follows:
Stock Dividend
Return Payout
Company (%) (%)
Hexaware
Technology 17.36 65.23
HCL Tech 83.46 49.49
Mastek 25.38 28.64
MindTree 97.8 17.05
Persistent 68.65 23.06
Polaris 9.92 34.84
Rolta -11.84 17.56
TCS 62.21 30.3
Tech
Mahindra 63.68 11.49
Wipro 46.32 35.64
Zensar Tech -2.01 33.34
TYPES OF VARIABLES
There are generally four types of
variables encountered in empirical
analysis. The type of variable under
consideration plays an important role in
selecting appropriate statistical tool for
analysis. For instance, it is not
advisable to compute arithmetic mean of
nominal scale variable. These various
types of variables and its nature are
discussed as follows:
Nominal Scale Variables
Nominal scale variable is very common
in marketing or social science research.
A nominal scale divides data into
categories which are mutually
exclusives and collectively exhaustive.
In other words, a data point is grouped
under one and only one category and all
other data will grouped somewhere else
in the scale. The word nominal means
namelike which means that the numbers
or codes given to objects or events are
naming or classifying only. These
numbers have no true meaning and thus
cannot be added, multiplied or divided.
They are simply labels or identification
number. The following are examples of
nominal scales:
Gender (1) Men
(2) Women
Nationality (1) Indian (2)
American (3) Others
Enzo Ferrari ( )
Koenigsegg CCXR ( )
McLareb F1 ( )
Bugati Veyron ( )
Lamborghini Reventon ( )
ANOVA MODELS
Analysis of variance (ANOVA) model is
that model where the dependent variable
is quantitative in nature and all the
independent variables are categorical in
nature. We will illustrate ANOVA
models with an example. Table 7.1 gives
monthly returns of BSE Sensex for the
period May 1990-91 and December
2007.
(13.1)
0 otherwise
The above equation is similar to
multiple regression model but the only
difference is that instead of quantitative
independent variables, the variables are
categorical in nature which takes value
of 1 if the observation belongs to a
particular group and 0 otherwise. The
results of estimated model are as
follows:
Y=-1.38+2.10D1+4.13D2+4.88D3+5.68D4+1.48D5+0.98
4.88D8+5.69D9+3.48D10+2.09D11
t = (-0.75) (0.80) (1.58) (1.86) (2.17) (0.56)
(0.37) (1.35) (1.84) (2.14) (1.31) (0.78)
R2 =0.05 df = 200
ANCOVA MODELS
ANOVA models are very common in
market research, sociology and
psychology research. However, in
economics, very often regression models
contains indepedendent variables which
quantitative as well as qualitative in
nature. Such models are called Analysis
of covariance models (ANCOVA).
ANCOVA models are just an extension
of ANOVA models used for controlling
the effects of quantitative independent
variables in a model where there are
both types of variables quantitative
and qualitative. We will consider the
earlier example and include one
quantitative regressor, Index of
Industrial Productivity (IIP), a proxy for
GDP in India.
(13.2)
The results of the above model are given
below as:
DESEASONALIZATION USING
DUMMY VARIABLES
The demand for woolen cloths during
winter usually increases and declines in
all other seasons. You must have noticed
that demand for soft drinks such as Coke
and Pepsi rises during summer. Also,
sales of department stores at the time of
Dipawali and Holi increases compared
to other days. This regular and repetitive
occurrence of variation in economic
variables such as sales of refrigerator is
called seasonal variation. A time series
is composed of four components trend,
seasonal oscillations, cyclical
oscillations and random fluctuations.
Many economic variables show patterns
of seasonality. More often seasonal
component from of a time series is
removed and the process of removing
seasonality is called deseasonalization.
There are various methods of removing
seasonality from a time series, but here
we will discuss how seasonality is
removed by dummy variable technique.
Let us consider the quarterly data of gold
prices for the period 1990-91 to 1999-
2000. To find seasonal effect in each
quarter, we fitted the following model:
(13.3)
Y =
4450.45+16.64D1+1.06D2+129.96D3
t = (31.71) (0.08) (0.005) (0.65)
2
R = 0.01 df = 31
Y =
A+BX+
(13.5)
D1=
D2=
A= A0+A1D1+A2D2 (13.6)
B = B0+B1D1+B2D2 (13.7)
Now put (13.6) and (13.7) into the
equation (13.5) and we have:
Y = (A0+A1D1+A2D2)+
(B0+B1D1+B2D2)X+ (13.8)
or
Y=
A0+A1D1+A2D2+B0X+B1(D1X)+B2(D2X)+
Using Excel
Regression analysis can be done by
Excel with the help of Data Analysis.
Select Tools on the menu bar and choose
Data Analysis from the drop down
menu. Next, select Regression from the
Data Analysis dialog box. Enter the
dependent variable into Input Y Range.
Enter the independent variables in Input
X Range. Here independent variable X
range may include a number of columns.
Excel decides the number of explanatory
variables from the number of columns
entered in Input X Range. Select
Labels, if you have labels for series. If
you want regression through origin, click
Constant is Zero otherwise leave it
blank. Excel also provides residuals,
standardized residuals, residual plots
and fine fits plots. If you are interested
in these items then select them from
regression dialog box. Click OK.
3
2630 2950 2940
4
2580 2360 2620
Autoregressive Process
An AR process expresses a time series
as linear function of its own lagged
values. The simplest AR model is the
first order autoregressive or AR(1)
model as shown below:
(14.1)
where is the series in time t, is
the series in the previous period, and
are the parameters of the model and
is the random error term. The order of
the AR shows how many lagged past
values are included in the model. An AR
process of order 2 can be written as
follows:
(14.2)
(14.3)
In autoregressive models, the current
value is determined by pervious values
of a time series. There are no other
independent variables for explaining the
variations in dependent variables. In
such models data speaks for themselves.
(14.4)
As with AR models, higher order MA
models include higher lagged terms.
The letter q is used to denote the order
of the moving average model.
(14.5)
ARIMA modeling requires time series to
be stationary. If the series is stationary in
its level, then we can fit ARMA model
to the time series under question.
Box-Jenkins
Methodology
ARMA modeling is done by a series of
well-defined steps. The first step is the
identification of the model. Identification
involves specifying the appropriate
process (AR, MA or ARMA) and its
order. To identify the appropriate
process and its order autocorrelation
function(ACF) and partial
autocorrelation function (PACF) are
used. Sometimes identification is done
by an automated iterative procedure --
fitting many different possible model
structures and orders and using a
goodness-of fit statistic to select the best
model.
The second step is to estimate the
coefficients of the model. Coefficients of
AR models can be estimated by least-
squares regression. Estimation of
parameters of MA and ARMA models
usually requires a more complicated
iteration procedure (Chatfield 1975). In
practice, estimation is fairly transparent
to the user, as it accomplished
automatically by a computer program
with little or no user interaction.
(14.7)
As with the FPE, the best-fit model has
minimum value of AIC. Neither the FPE
nor the AIC directly addresses the
question whether the model residuals are
white noise. A strategy for model
identification by the FPE is to iteratively
fit several different models and find the
model that gives approximately
minimum FPE and does a good job of
producing random residuals. The
checking of residuals is described in the
next section.
E(rk) = 0
and variance
var(rk) =
where rk is the autocorrelation
coefficient of the ARMA residuals at lag
k. The appropriate confidence interval
for rk can be found by referring to a
normal distribution cdf. We know that
the 0.975 probability point of the
standard normal distribution is 1.96. The
95% confidence interval for rk is
therefore . For the 99%
confidence interval, the probability point
of the normal distribution is 2.57. The
99% CI is therefore . An rk
outside this CI is evidence that the
model residuals are not random.
(14.8)
Solved Example
Distributed-Lag Models
In economics, we often talk about long-
run period and short-run period. Time in
economics is not defined in terms of
number of days or weeks but in terms of
how quickly supply adjust to demand.
Thus, supply seldom adjust to demand
instantaneously. There are many
economic varaibles which takes times to
influence other economic variables. For
example, one of the functions of the
central bank of any country is to maintain
price statbility in the economy. In simple
words, keeping inflation in control. One
way to control inflation is to tight money
supply in the economy. However,
tightening money supply by the monetary
authority of country will not curb
inflation immediately. It may take some
time to show influence of money
tightening on inflation. Thus, inflation
responds to tightening of money supply
with lapse of time which is called lag
in economics.
(14.9)
where dependent variables are included
as explanatory variables. This is called
autoregressive model or dynamic model
which is already discussed. Another
example of regerssion model containing
lagged variables is:
Solved Example
To demonstrate the mechanics of
distributed lag model consider Wilmore
quaterly data on sales and advertisement
from Winter 1994 to Fall 2004 as given
in table 8.2. We will estimate a
distributed lag model by ad hoc
approach to the Wilmore quartely data.
Source:
www.ciadvertising.org/SA/spring_05/adv
First, we regressed sales on
advertisement and the results is as
follows:
Salest = 410701.7+1.39 advt
t = (2.45) (26.35)
r2 = 0.94 df =42
R2 =0.94 df = 41
It is evident from the results that the
coefficient of one lagged advertisement
is statistically insignificant, we should
stop this sequential process here. We
went further and estimated the following
regerssion model:
Granger Causality
In the beginning, we categorically said
that regression does not means
causations. Regression analysis helps in
explaining one variable in terms of
another variables but it does not imply
causation. However, in case of time
series data it is possible to interpret
regerssion in terms of cause and effect
analysis. It is a fact that time does not
run backward and if event A occurs
before event B, then one can safely infer
that A causes B while it is not possible
to say that B causes A. This is the basic
idea behind the Granger causality. Very
often we try to find whether GDP causes
money supply or money supply causes
GDP. A similar case is to determine
whether growth causes inflation or
inflation causes growth. Such issues in
macroeconomics can be settled by
Granger causality test. This test
invloves estimation of the following pair
of regressions:
Solved Example
To illustrate Granger causality, we
collected data on gross domestic product
of India at current price and data on
money supply (narrow money) from
1950-51 to 2002-03. The objective is to
find whether GDP causes money supply
or money supply causes GDP. In this
illustration we run regression on change
in GDP ( GDP) against change in
money supply ( Money) instead of
running regression in levels.
In the above table 7.3, we constructed
two lagged values of ( GDP) and two
lagged values of ( Money) for testing
Granger causality. The step by step
procedure of testing Granger causality
test is explained below:
Using Excel
Listed below are 64 different daily
closing value of NSE Nifty. Fit an AR(5)
model to this data.
4150.85 4117.35
4111.15 4077 4079.3
4076.65 4134.3 4120.3 417
4260.9 4278.1 4246.2 4204
4293.25 4249.65 4295.8 4297
4198.25 4179.5 4145 414
4170 4171.45 4147.1 421
4252.05 4259.4 4285.7 4263
4313.75 4357.55 4359.3 435
4406.05 4387.15 4446.15 450
4499.55 4562.1 4566.05 461
4619.8 4445.2 4440.05 4528
April 5652.5
May 5864.2
June 5990.9
July 6831.6
August 6566.7
September 6434.1
October 6637.5
November 7551.1
December 5154.2
January 5674.7
February 4359.9
March 4929.8
2009-10
April 5599.2
May 6907.7
June 5772.3
July 6722.7
August 5677
September 5577
October 5037
November 4506
December 5061
January 4921
February 4648
March 6341
2011-12
April 6090
May 7583
June 6329
July 6785
August 7891
September 7580
October 7454
November 5681
December 6666
January 7622
February 7525
March 8161
2012-13
April 6890
May 7887
June 6760
July 7111
August 6724
September 7501
October 8139
November 8165
December 8743
January 9189
February 7740
March 9671
2013-14
April 10254
May 9729
June 12395
July 10454
August 11695
September 11871
October 12197
November 9521
December 9014
January 11477
February 11054
March 14433
2014-15
April 13511
May 14178
June 13495
July 15215
August 17275
September 17778
October 15423
November 15382
December 16967
January 18502
February 18219
March 18695
Q. 2 The table below shows data on
Research & Development (R&D) and
Sales of various companies. It is not
clear whether sales is determined by
R&D or R&D is determined by sales.
Using these data, determine the direction
of Granger causality.
R&D Sales
54.95 380
72.66 450
87.58 515
64.69 400
74.81 458
66.44 460
51.46 305
72.77 485
80.03 518
76.39 506
69.84 540
52.08 354
61.98 375
73.3 330
56.99 410
78.38 490
96.44 528
60.74 445
89.5 630
95.24 600
68.33 465
56.71 388
88.18 578
64.8 416
P=
Row 1
0.8 = P11 = probability of being in state 1
after being in state 1 the preceding
period
0.1 = P12 = probability of being in state 2
after being in state 1 the preceding
period
0.1 = P13 = probability of being in state 3
after being in state 1 the preceding
period
Row 2
0.1 = P21 =probability of being in state 1
after being in state 2 the preceding
period
0.7 = P22= probability of being in state 2
after being in state 2 the preceding
period
0.2 = P23 = probability of being in state 3
after being in state .2 the preceding
period
Row 3
0.2 = P31 = probability of being in state 1
after being in state 3 the preceding
period
0.2 = P32 = probability of being in state
2 after being in state 3 the preceding
period
0.6 = P33 = probability of being in state 3
after being in state 3 the preceding
period
(1) =
(0)P
(15.2)
(n + 1) =
(n)P
(15.3)
(l) = (0)P
=
Exercises
1. The buying patterns for two brands of
health drink can be expressed as Markov
process with the following transition
probabilities:
From To
Bournvitae
Bournvitae 0.40
Horlicks 0.55
=
= (0.05)(0) + (0.10)(1) + (0.20)(2) +
(0.30)(3) + (0.20)(4) + (0.15)(5) =
2.95 tires
If this simulation were repeated
hundreds or thousands of times, it is
much more likely that the average
simulated demand would be nearly the
same as the expected demand.
Naturally, it would be risky to draw any
hard and fast conclusions regarding the
operation of a firm from only a short
simulation. However, this simulation by
hand demonstrates the important
principles involved. It helps us to
understand the process of Monte Carlo
simulation
The simulation for Harrys Tire involved
only one variable. The true power of
simulation is seen when several random
variables are involved and the situation
is more complex. As you might expect,
the computer can be a very helpful tool
in carrying out the tedious work in larger
simulation undertakings.
6.9
Two
Other Types of
Simulation Models
Simulation models are often broken into
three categories. The first, the Monte
Carlo method just discussed, uses the
concepts of probability distribution and
random numbers to evaluate system
responses to various policies. The other
two categories are operational gaming
and systems simulation. Although in
theory the three methods are distinctly
different, the growth of computerized
simulation has tended to create a
common basis in procedures and blur
these differences.
Operational Gaming
Operational gaming refers to simulation
involving two or more competing
players. The best examples are military
games and business games. Both allow
participants to match their management
and decision-making skills in
hypothetical situations of conflict.
Military games are used worldwide to
train a nation's top military officers, to
test offensive and defensive strategies,
and to examine the effectiveness of
equipment and armies. Business games,
first developed by the firm Booz" Allen
and Hamilton in the 1950s, are popular
with both executives and business
students. They provide an opportunity to
test business skills and decision-making
ability in a competitive environment.
The person or team that performs best in
the simulated environment is rewarded
by knowing that his or her company has
been most successful in earning the
largest profit, grabbing a high market
share, or perhaps increasing the firm's
trading value on the stock exchange.
During each period of competition, be it
a week, month, or quarter, teams respond
to market conditions by coding their
latest management decisions with
respect to inventory, production,
financing, investment, marketing, and
research. The competitive business
environment is simulated by computer,
and a new printout summarizing current
market conditions is presented to
players. This allows teams to simulate
years of operating conditions in a matter
of days, weeks, or a semester.
Systems Simulation
Systems simulation is similar to business
gaming in that it allows users to test
various managerial policies and
decisions to evaluate their effect on the
operating .environment. This variation of
simulation models the dynamics of large
systems. Such systems include corporate
operations,4 the national economy, a
hospital, or a city government system.
In a corporate operating system, sales,
production levels, marketing policies,
investments, union contracts, utility
rates, financing, and other factors are all
related in a series of mathematical
equations that are examined by
simulation. In a simulation of an urban
government, systems simulation can be
employed to evaluate the impact of tax
increases, capital expenditures for roads
and buildings, housing availability, new
garbage routes, immigration and out-
migration, locations of new schools or
senior citizen centers, birth and death
rates, and many more vital issues.
Simulations of economic systems, often
called econometric models, are used by
government agencies, bankers, and large
organizations to predict inflation rates,
domestic and foreign money supplies,
and unemployment levels.
The value of systems simulation lies in
its allowance of what-if questions to test
the effects of various policies. A
corporate planning group, for example,
can change the value of any input, such
as an advertising budget, and examine
the impact on sales, market share, or
short-term costs. Simulation can also be
used to evaluate different research and
development projects or to determine
long-range planning horizons.
Verification and Validation
In the development of a simulation
model, it is important that the model be
checked to see that it is working
properly and providing a good
representation of the real world
situation. The verification process
involves determining that the computer
model is internally consistent and
following the logic of the conceptual
model.
Exercises
Delphi Method
Exercises
1. Explain the Delphi method.
2. Distinguish between jury of
executive opinion and expert
judgment method.
Chapter 18
Linear Programming
Introduction
Linear programming is a mathematical
technique to find the best or optimal
solution to a problem under a given
constraint. Here linear means linear
equation which has a degree of 1. If you
plot a linear equation, you will get a
straight line equation. The word
programming is closely related with
computer programs the act or job of
creating computer programs. However,
it also refers to the process of
developing and implementing various
sets of instructions which enable a
computer to perform a certain task.
Moreover, today you cannot conduct
optimization analysis involving more
than two decision variables without the
help of some computer software.
The prime objective of any business
entity is to maximize profit. The profit is
equal to total revenue minus total cost
which is expressed as:
Profit = Total Revenue Total Cost
Thus the various objectives of firms can
be:
1. Maximization of profit
2. Maximization of total revenue
3. Minimization of total cost or cost of
production
Thus linear programming deals with
either maximizing or minimizing some
objective function, subject to a set of
linear constraints. Thus, a linear
programming model consists of the
following:
1. An objective function
2. A set of decision variables
3. A set of constraints
Linear Programming Model
Let X1, X2, X3,., Xn are decision
variables.
Z is the objective function, which is
basically a function of decision
variables X1, X2, X3,., Xk.
The objective is to maximize the
objective function Z which is assumed to
be linear shown as follows:
Step 2: Go to www.
zweigmedia.com/RealWorld/simplex.html
Step 3: Scroll down. When you click on
the Example the following dialogue
box pop up:
Step 4: Type your linear program using
space bar as shown below:
Step 5: After typing your linear
programming problem click solve
Thus the profit function P will be
maximized when 2 units of X and 3 units
of Y will be produced. The total profit is
54.
Solution
Minimize P = 96Y1 +80 Y2 +36 Y3
subject to
24Y1 +10Y2 +6Y212
16Y1 +20Y2 +6 Y3 10
Y1, Y2, Y3 0
Linear Programming with Excel
I will demonstrate how to solve linear
program with excel with an example.
Consider the following problem.
Maximize: P = 12X +10Y
Subject to
24X+16Y 96
10X +20Y 80
6X +6Y 36
X, Y0
Step 1: Enter the linear program data as
shown below:
In the above worksheet
1. Cells C3 to D5 show the per unit
requirements of X and Y
2. Cells E3 to E5 show the maximum
amount available of two inputs.
3. Cells C7 to D7 show the per unit
profit for X and Y
Step 2: Specify the cell locations for the
decision variables as shown below:
While cell C11 will show the number of
X produced the cell D11 will contain the
number of Y produced.
Step 3: Choose a cell and enter a
formula for computing the value of
objective function.
Cell C13 = C7*C11+D7*D10
Step 4: Select a cell and enter a formula
for computing the left hand side of the
each constraint:
We the following:
Cell C16 = C3*C11+D3*D11
Cell C17 = C4*C11+D4*D11
Cell C18 =C5*C11 +D5*D11
Step 5: Select a cell and enter a formula
for calculating the right hand side of
each constraints:
Thus we have
Cell E16 = E3
Cell E17 = E4
Cell E18 = E5
After writing the linear program, I will
show how to use excel to solve it:
Step 1: Click data as shown below:
Step 2: Select the Solver and when you
click it the following dialogue box will
pop up:
Step 3: Enter cell C13 into the Set
Target Cell and select the Equal to Max
option
Step 4: Enter cells C11 to D11 in By
Changing Cells box as shown below:
Step 5: Select Add and when the Add
constraint box appears, enter cells C16
:C18 in the cell reference box. Next
select <= . Further, enter Cells E16:E18
in the Constraint box shown below:
Step 6: Click Ok
Step 7: Click Solve
Step 8: Click Ok
Thus, the profit function will be
maximized when 2 units of X and 3 units
of Y are produced. The maximum profit
is 54. We got the same result using the
online tool also.
Exercises
Q1. Solve the following linear
programming problem using either MS
Excel or online simplex tool.
Maximize Z = 8x +2y
s.t.
20x +4y 60
6x + 8y 12
4x +4y 20
x, y0
Introduction
The linear programming has applications
in many areas of business and
management. In order to have deeper
insight we will discuss various
applications of linear programming
method.
Product-Mix Application
GTC manufactures two types of smart
phones. Type A smart phone belongs to
premium category and its price is
Rs.50000 per handset. Type B is an
economy brand costing Rs. 18000 per
handset. Type A smart phone contributes
a profit of Rs. 6000 per unit while Type
B phone contributes a profit of Rs. 3600.
Both the smart phones require three
inputs labour hours, machine hours and
materials. The requirements per unit and
total availability of inputs are
summarized in the following table:
Smart Phones Labour Machi
hours Hours
Type A 8 4
Type B 8 5
Total Available 600 1400
Solution
Let the product mix comprises of X units
of type A phone and Y units of type B
phones. The objective is to find the
product mix that maximizes total profit
specified as below:
Maximize Z = 6000X + 3600Y
subject to
8X + 8Y 600
4X + 5Y 1400
34X + 20Y 750
X,Y 0
Solved Example
JE Motors is an automobile
manufacturing company transports cars
from 3 plants 1, 2, and 3 to three major
cities A, B, and C. The manufacturing
plants are able to supply the following
number of cars per month:
Plants Supply (capa
1 1500
2 1500
3 1000
The requirements of the three major
cities in number of cars per month:
Cities Demand
(Requiremen
A 1000
B 1000
C 2000
Plants To
From
1 4000 (D)
2 7000 (G)
3 4000 (J)
Demand 1000
Warehouse
Plant
A B
C Plant Capacity
1
200 165
244 3000
2
105 108
88 5000
3
125 180
100 1000
Warehouse Demand
2000 4000 3000
1. Formulate a linear programming
model for minimizing transportation
cost.
2. Solve the linear programming
problem either using Solver-add-in
or online simplex tool.
Q2. A supervisor has 4 workers, A, B,
C, and D who are to be assigned to three
jobs, 1,2, and 3. The following data
show the number of hours required for
each worker to complete each job:
Jobs
Workers 1 2
3
A
160 200 275
B
176 236 225
C
186 235 226
D
161 245 235
Elements of Decision
Theory
The important elements of decision
theory are:
1. Alternatives
2. States of Nature
3. Payoff
4. Payoff Table
Alternatives
The various alternatives available to the
decision maker. When a decision maker
faced with alternatives the decision
making involves choosing among two or
more possible alternatives with a view
to choose the best alternative so that the
benefit/payoff will be maximized. For
example, suppose a company has Rs. 60
cr. of cash surplus. This cash surplus can
be used for various alternatives:
distribute the cash as dividend
reinvest the cash in the company
keep the cash for meeting future
obligation
States of Nature
States of nature of outcomes of an event
are uncontrolled occurrences. The states
of nature correspond to the possible
future events which are not in the control
of the decision maker. States of nature
are mutually exclusive meaning when
one state of nature occurs, it rules out the
occurrence of all other states of nature
and collectively exhaustive means all the
states of nature must be identified by the
decision maker.
Payoff
Payoff is the resulting net benefit to the
decision maker that arises from selecting
a particular alternative in a particular
state of nature.
Payoff Table
Payoff table lists various alternatives,
states of nature and entries in the table
are the payoff. In general, the column of
the payoff table corresponds to the
various alternatives. The rows of the
table correspond to the various states of
nature, and the entries of the payoff
tables are the payoff.
Payoff Table
States of Nature
Alternatives S1 S2 S3
Sm
A1 P 11 P 12 P 13 - -
-------------------------- -P 1m
A2
P 21 P 22 P 23 -----
A3
-------------------------------------------- P
.
. P 31 P 32 P 33 ----
.
An -----------------------------P 3m
. . . ------
----------------------------.
. .
.
.
. .
.
.
P n1 P n2 . -------
----------------------------P nm
Decision-Making Criteria
The decision-making criteria broadly
divided into three:
1. Decision-making under Certainty
2. Decision-making under Uncertainty
3. Decision-making under Risk
Hurwicz Criterion
So far we have seen that most optimistic
and most pessimistic criteria are the
maximax and maximin respectively
which are two extreme ways of making
decision. Therefore, it would be
realistic to select appropriate
combinations of two extremes. This
approach was suggested by Hurwicz. In
this case, the degree of optimism
between extreme pessimism (0) and
extreme optimism (1) is given by
which lies between 0 and 1. In this
approach, a decision index is defined as
given below:
where,
Mi = Maximum payoff corresponding to
ith decision alternative
mi = Minimum payoff corresponding to
ith decision alternative
Decisions Decision Hurwicz
Index Criterion
Regret Criterion
In this approach, a decision-maker tries
to minimize his/her opportunity loss.
When a decision maker chooses the best
payoff pertaining to a decision
alternative then there is no opportunity
loss or regret. That is, if the decision
maker selects the decision to invest
money in saving bank a/c, according to
maximin criteria, the decision-maker
gets 3 percent return under no growth
scenario, then his/her regrets or
opportunity loss would be 3-3 =0.
However, later if the decision maker
realizes that he/she could as well have a
return of 5% pertaining to saving bank
a/c decision alternative under the
scenario of high growth then his/her
opportunity loss or regret is:
5-3 = 1%. With regard to a scenario of
low growth for the decision alternative
saving bank a/c, his/her regret will be 4-
3 = 1%. Hence, the regret table for each
decision alternative can be constructed
likewise:
Regret Table
Decisions States
of Nature
No Growth Low
Growth High
Growth
Saving 0
Bank A/C 1 2
Fixed 0
Deposit 0.5 1
Gold 0
Company 2 8
Bond 0
Mutual 2 3.5
Fund 0
Equities 15 24
0
27
37
Hence the maximum regret for each
decision alternative would be the
payoffs {2, 1, 8, 3.5, 24, 37}
respectively. The minimum regret
amongst these is 1 corresponding to
fixed deposit decision alternative.
Hence, according to regret or
opportunity loss criterion, the decision
alternative selected for investment is
fixed deposit instrument.
Decision-making under
Risk
Decision making under situations of less
than complete certainty can be classified
into two:
Risk
Uncertainty
Risk refers to a situation where decision
maker is aware of all possible states of
nature or outcome and could attach
probability to each state of nature.
Uncertainty refers to a situation where
decision maker has no knowledge about
the probability of the states of nature
occurring.
where
Xi = Payoff
Pi = Probability
Using the expected monetary value
(EMV) criterion, decision maker select
the alternative with the best expected
payoff.
Example
Today education is considered as
investment. A student choice for a
professional or non-professional course
depends on kind of job and package they
get by doing it. To make decision, a
student views the prospect of highly paid
job as either low, medium or high. The
following payoff table shows the
projected package in Rs. lakhs:
Prospects of Jobs
MBA/MCA 2.3
4
MCOM 8
Others 3.5
5 6
2.5
3.6 7
Example
Omega Group has to decide whether to
set up consulting firm in Delhi or Patna.
Setting up consulting firm in Delhi will
cost the company Rs. 50 lakhs.
However, if it sets company in Patna, it
will cost Rs. 25 lakhs. Omega Group
conducted a cost-profit analysis which
reveals the following estimates over the
next 5 years:
High demand for
services Probability = 0.5
Moderate demand for
services Probability = 0.3
Low demand for
services Probability =
0.2
Summary
Decision theory provides a
framework for making decisions to
managers. When future event is
characterized by uncertainty,
decision analysis allows us to make
optimal decisions from a set of
possible decision alternatives.
Effective decision-making requires
following things:
Identify and define the problem
Identify the possible alternatives
available
Evaluate each possible
alternative
Select the best alternative
Decision theory It
provides a framework
for making decisions
to managers. When
future event is
characterized by
uncertainty, decision
analysis allows us to
make optimal
decisions from a set of
possible decision
alternatives.
Alternatives The
various alternatives available
to the decision maker
States of Nature
States of nature of
outcomes of an event
are uncontrolled
occurrences. The
states of nature
correspond to the
possible future events
which are not in the
control of the decision
maker.
Payoff Payo
is the resulting net
benefit to the decision
maker that arises from
selecting a particular
alternative in a
particular state of
nature.
Payoff Table
Payoff table lists
various alternatives,
states of nature and
entries in the table are
the payoff.
Maximax Criteriaon
Decision making using
this criterion is based
on best possible
outcome. This is the
approach taken by
optimistic and
aggressive decision-
maker.
Maximin Usi
this criterion, the
decision is based on
worst possible
outcome. This is the
approach taken by
pessimistic and
conservative
decision-maker.
Laplace Criterion In
Laplace criterion, the
decision-maker
assumes that there is
equal likelihood of
occurring each state of
nature and compute
expected payoff for
each decision
alternative.
Hurwicz Criterion
This approach was
suggested by Hurwicz.
In this case, the degree
of optimism between
extreme pessimism (0)
and extreme optimism
(1) is given by
which lies between 0
and 1.
Regret Criterion In
this approach, a
decision-maker tries
to minimize his/her
opportunity loss.
When a decision
maker chooses the
best payoff pertaining
to a decision
alternative then there
is no opportunity loss
or regret.
Risk Risk
refers to a situation
where decision maker
is aware of all
possible states of
nature or outcome and
could attach
probability to each
state of nature.
Uncertainty It
refers to a situation
where decision maker
has no knowledge
about the probability
of the states of nature
occurring.
Decision tree It
is a sequential
representation of the
multi-stage decision
problem. In the
decision-making
process, the decision
maker has to take into
account a number of
alternatives and
uncertainties
associated with those
decision alternatives.
Exercises
. Assume that a decision maker faced
with three decision alternatives and
three states of nature with the
following payoffs table:
States of
Decision S1
Alternatives
D1 160 1
D2 112 1
175
D3
States of
Decision S1 (6-7%) S2 (8-9
Alternatives
Luxury 2000 150
Non-luxury 1050 950
States of
Decision S1(0.65) S2 (0.
Alternatives
D1 160 18
D2 70 110
185 50
D3
a) Recommend an optimal decision
based on expected monetary value
(EMV) approach
b) Calculate the expected value of
perfect information (EVPI)
States of Nat
Decision S1 S2 S3
Alternatives
D1 160 235 190
D2 190 150 210
230 195 205
D3 200 220 240
D4
States of Natur
Decision S1(0.20) S2 (0.4) S3 (
Alternatives
D1 15 35
D2 25 5
20 15
D3