Está en la página 1de 52

ISE 225

ENGINEERING STATISTICS
CLASS 13: MULTIPLE REGRESSION ANALYSIS

Detlof von Winterfeldt


Professor
University of Southern California
November 20, 2014

Housekeeping
Presentations 2 and 3
Graded presentation 2
Deducted points for: Too general questions or hypotheses,
lacking missing or incomplete data table, poor data analysis plan,
Homework
HW 6 (simple regression) was due today
Last HW assigned (at end of PP)
Due in two weeks
Two more classes
Next week is Thanksgiving no class
December 4: Review and final presentations
December 11: Final Exam from 2 to 4 PM
Look for class evaluation instructions

Where Are We?


23-Oct-14

Review and Midterm Exam

Ch. 1-13

30-Oct-14

Analysis of Variance; single factor ANOVA; two factor


ANOVA; ANOVA designs and repeated measurement
Pivot tables and Chi-squared tests of independence with
discrete data, introduction to regression

Ch. 14

Ch. 16
Ch. 17

HW 6
Due

27-Nov-14

Simple linear regression, interpreting regression statistics;


third presentation PP only (preliminary data collection)
Multiple regression, selecting independent variables, issue
with regression, interpreting results
Thanksgiving

4-Dec-14

Review and final project presentations

Ch. 1-17

HW 7
Due

11-Dec-14

Final exam and also the day when the final report is due

6-Nov-14
13-Nov-14
20-Nov-14

Ch. 15

HW 4
Due

HW 5
Due

Television and Household Debt

Regression Output

=Intercept/Standard Error

=Slope/Standard Error

Diagnostic: Residual Plot

Using the Regression Equation for


Forecasting
HH Debt = 2582*HoursofTV + 40,040
For example, for a household with
10 hours prediction is $73,860 of HH debt
60 hours prediction is $202,960 of HH debt

Meaning: For each our of TV, HH debt increases by

$2,582
Large confidence intervals around the dependent variable
Unfortunately not calculated in Data Analysis
Using the Standard error and z-statistics (1.96)
10 h: $73,860 +/- $75,795
60 h: $202,960 +/- $75,795

(1.96*38,671)
(1.96*38,671)

Recap: Basic Terms and Ideas


X-axis: independent variable
Y-axis: dependent variable
Regression line:
y = + x +
slope
= intercept
= error
To predict y from x
y = a + bx
Where a is an estimate of and b is an estimate of
Main questions:
How to estimate a and b?
Are a and b significantly different from 0?

The Important Outputs of a Regression


Analysis
R2 and R2adjusted tell you how good the prediction is in

terms of the variance of Y explained by knowing X (in%)


F and associated p tells you, if the regression model
could have been generated by chance, even though
there is no association between X and Y
Intercept and Coefficients (Slopes) gives you the
regression equation to calculate the predicted y-values
for each x value
p-values associated with intercepts and coefficients
tells the probability that the coefficient could have been
generated by chance, even though it really 0

From Simple to Multiple Regression


Simple regression
One x-variable
One y variable
Simple scatterplot
Multiple regression
k different x-variables, xi, i=1,2,3k
One y variable
Multiple scatterplots for each xi - y combination

Comparing simple and multiple


regression functions
Simple regression
y =+x+

Multiple regression
y =0+1x1+2x2++ixi+.+kxk+
Estimation
i estimated by bi

Visual Regression with 2 XVariables

Example: Calculus Workbook

Correlation Matrix

Regression in Excel
Choose Data>Data Analysis
Select Y variable
Select range of X variables (make sure that there are no

non-numerical data, no gaps between columns)


Check Labels if appropriate
Set Confidence level for hypothesis tests
Select special features (e.g., residuals, line plots, etc.)

Regression Window in Data Analysis

Regression Output

Correlation Output

Standard Error = Standard


deviation of residuals around
the predicted values (line)

Regression Model Test

Standard error = SQRT(88.932)

Tests of Slopes and Intercepts

Residual Plots (one for each x-variable)

Completed Regression Equation


Substitute calculated intercept bo and bis:

Calc = 27.9+7.2*CalcHS+0.35*ACTMath+0.83*AlgPlace+
+3.68*Alg2Grade+0.111*HSRank+2.63*GenderCode
Because only Intercept, Calc HS and AlgPace were

significantly different from 0, this can be simplified to


Calc = 27.9+7.2*CalcHS+0.83*AlgPlace
Redo the analysis with only these two x-variables

Reduced Regression

Comparing predicted vs. actual scores

Plotting Observed vs. Predicted


Data

Severance Pay
A company laid off 50 workers and offered

severance pay (in weeks of pay) to the laid off


employees
Severance pay was determined by three factors:
Age
Years of employment
Salary

Bill, a 32 year old, who was employed for 10

years and made $32,000 complained that his


severance pay of 5 weeks was less than what he
should have received
Does he have a case?

Intercorrelations

Scatterplots

Regression for Severance Pay

Forecasting Bills Severance Pay


Severance Pay = 0.63*Years 0.008*Age 0.07*Salary
Is it meaningful to subtract Age or Salary
Why correct downward for Age? Do older people deserve less
severance pay?
Why correct downward for Salary? Maybe because the
severance pay is in terms of weeks of salary, so total should be
corrected for salary
Maybe try a simple forecast, using Years only

Using Years as Only Independent Variable

Side by side comparison

Putting the Results to Work


Severance Pay = 0.574*Years + 3.61
For Bill:
Severance Pay = 0.574*10 + 3.61 = 9.35 weeks
Got only 5 weeks
To make sure, we would still need a confidence interval
Confidence interval
Mean estimate +/- 1.96*Standard error
9.35 weeks +/- 1.96*1.91
LCL = 5.6
UCL = 13.1
Any other thoughts on how to use this data?

Demonstrations and Practice


Using Calculus
Create a correlation matrix of all variables
Conduct a regression analysis with Calc as the dependent (y)
variable, and all others as the independent (x) variables
Make sure you know how to interpret the results (R 2, F, p,
cofficients, and p of coefficients)
Using Severance Pay
Descriptive statistics, correlation matrix, selected scatterplots
Regression analysis of age, years employed and salary to predict
the severance pay
Forecast using Bills data (36 yrs. old, 10 years employed, $32K)
using the full regression equation
Play with data in La Quinta Location Problem

Expected
Results
Calculus

Severance Pay

Introducing the IIASA Case


IIASA = International Institute for Applied Systems

Analysis
About 300 people, 200 researchers studying global
problems like energy and climate change, food and water
shortages, etc.
Funded by the National Academies of 20 member
countries (50% of funding)
Remainder of funds comes from external contracts and
grants
www.iiasa.ac.at

37

The International Institute for Applied Systems Analysis


International, independent, interdisciplinary
Focus on major global problems
Solution oriented systems analysis

Austria

Netherlands

Egypt

Germany

Norway

India

Japan

Sweden

Republic of Korea

USA

Ukraine

Malaysia

Russian Federation Brazil

Pakistan

Finland

South Africa
Indonesia

China
Australia

New

IIASA Impressions

Membership Dues Dilemma


No clear reasoning for current membership dues structure
USA pays Euro 2,000,000
China pays Euro 324,000
Finland pays Euro 648,000
Key question:
Is there any rationale for current structure?
How should structure be changed?

Data on Dues and 12 Socioeconomic


Indicators

IIASA Correlations

More Correlations

Regression Results

Final Report

Discussion of Projects
Final Presentation
Cover slide with names and photos of team members
Problem definition slide (hypotheses, questions)
Data collection and sampling slide
Data Table slide
Result slides (2-3, no more!)
Conclusion slide

Demonstrations and Practice


Using Calculus
Create a correlation matrix of all variables
Conduct a regression analysis with Calc as the dependent (y)
variable, and all others as the independent (x) variables
Make sure you know how to interpret the results (R 2, F, p,
cofficients, and p of coefficients)
Using Severance Pay
Descriptive statistics, correlation matrix, selected scatterplots
Regression analysis of age, years employed and salary to predict
the severance pay
Forecast using Bills data (36 yrs. old, 10 years employed, $32K)
using the full regression equation
Use IIASA to explore multiple regression with many

variables

Expected
Results
Calculus

Severance Pay

Homework 7: Problem 7.15 in


Keller

Excel File Xr17.15

También podría gustarte