4 vistas

Título original: RegrCorr.pdf

Cargado por jessel

RegrCorr.pdf

© All Rights Reserved

- Relatonship Between x and Y
- predicting maximal grip strength using hand circumference.pdf
- PhD Thesis
- IPC2016-64157
- MSQE-MQEK-MQED-PEA-2014
- Correlation
- 0205019676
- Ch03SM
- Leadership Trust and Employee Loyalty in Manufacturing Firms in Port Harcourt
- 2006 - Djuric Et Al. - Evaluation of the Suchey-Brooks Method for Balkans
- application lesson plan for correlation coefficient
- The Point Biserial
- REYES-PhD
- ch03sm
- sdarticle(9)
- 8 Chapter III Upda
- 04 Correlation Regression
- QADM Assignment
- Validasi X1-dikonversi
- 020-1

Está en la página 1de 20

Introduction

Simple Linear Regression & GLM

Least Squares Trend Line Fitting

Model Testing

Inference and Regression

Regression Diagnostics

Correlation

Chpts. 16 & 17 W&S

1

Introduction

Recall, to date, have focused on statistics examining one

variable from either one or two samples.

between two variables from one sample. These statistics are

often referred to as bivariate statistics (as opposed to

univariate).

bivariate analysis.

more variables.

2

Regression is a method

used to predict the value Regression

of one numerical value

from another.

encountered type of

regression is simple linear

regression, which draws a

straight line through a

cloud of points to predict

the response variable (Y)

from the explanatory

variable (X).

3

Simple Linear Regression

A typical question asked often by biologists is:

how does variable Y change as a function of variable X ?

35 X is the independent or

30 predictor variable.

25 Independent because it

20 is assigned by

Y

15 investigator &

10 independent of

5 measurement error.

0

1 2 3 4 5

Y is the dependent or

X

response variable.

4

straight line (most parsimonious solution).

variable.

data should always be some form of exploratory

graphical analysis.

variables is referred to as a scatter plot.

linear and a different model may be more

appropriate.

6

Simple Linear Regression

- Example -

of age on blood pressure:

X: 28 23 52 42 27 29 43 34 40 28

Y: 70 68 90 75 68 80 78 70 80 72

A simple initial

scatterplot

80

Y

40

to respond to X

0

0 10 20 30 40 50 60

X 7

- Example -

95

90

previous scatterplot is

Blood Pressure (mm Hg)

85

misleading.

80

75

70

65

previously apparent. 20 30 40 50 60

Age (years)

- Example -

cloud of data.

equation for a straight line is something like this:

y = bx + a

9

The General Linear Model

more formal and utilitarian model:

y = + x +

Where: = the y-intercept

= the slope

= error deviation from mean

10

95.0

But, we still need to fit

a line through the cloud 90.7

Blood Pressure (mm Hg)

of points. 86.4

82.1

We use a procedure

known as Least Squares 77.9

69.3

We attempt to minimize

the s. 65.0

20.0 28.0 36.0 44.0 52.0 60.0

Age (yrs)

11

done in univariate analysis whereby we calculate the

least squares for variance determinations.

of eq. 1 (y-hat is a predicted

y = a + b x (1) value of y for a value of x)

f (a, b) = ( y y )

2

(2) Goal is to minimize the function

of eq. 2

a = y bx (3)

Solving simultaneously, you will

find eq. 3

12

Least Squares Trend Line Fitting

95

Equation 3 is very notable,

because it means that the 90

?

Blood Pressure (mm Hg)

least squares trend line

MUST run through the 85

80 ?

X and the mean of Y.

75

Defining the X ,Y

Y-intercept provides a 70

65

be drawn. 20 30 40 50 60

Age (yrs)

13

- Procedure -

1. Determine x, x2, y, xy

14

- Lexicon -

clarify a shorthand notation:

Syy = sum of the squared-y deviations

Sxy = sum of the products of deviations

syx = variance of y about x

15

Least Squares Trend Line Fitting

- Example -

x y x2 xy Mean x = 36.90

31 7.8 961 241.8 Mean y = 10.38

32 8.3 1024 265.6

33 7.6 1089 250.8

34 9.1 1156 309.4 Lastly, we need to

35 9.6 1225 336.0 calculate:

35 9.8 1225 343.0

40 11.8 1600 472.0 Sxx: Sum of squared-x

41 12.1 1681 496.1 deviations

42 14.7 1764 617.4 Sxy: Sum of xy-product

46 13.0 2116 598.0

deviations

369 103.8 13841 3930.1

16

- Example -

( x )2

Sxx = 224.9

S xx = x 2

N

( x )( y )

S xy = xy N Sxy = 99.88

S xy

b= b = 0.444

S xx

a = y bx a = -6.0076

17

- Example -

20

15 Y = 0.444 X - 6.007

10

Y

-5

-10

0 10 20 30 40 50

X

18

Least Squares Trend Line Fitting

- Caveat -

Hypothetical Data

You have just fitted your 0.45

first regression line! Predicted

0.4

Response

0.35

You have also created a via Fitted Valid

linear model against which 0.3 Line Prediction

Range

you could predict any value 0.25

of y from x. 0.2

0.15 Actual

Response

model is only valid in the 0.1

examined. 0

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99

19

OLS using R

> X

[1] 28 23 52 42 27 29 43 34 40 28

> Y

[1] 70 68 90 75 68 80 78 70 80 72

> lm(X~Y)

Call:

lm(formula = X ~ Y)

Coefficients:

(Intercept) Y

-46.377 1.078 20

> abline(lm(Y~X))

90

85

80

Y

75

70

25 30 35 40 45 50

21

X

> fitted(model1)

1 2 3 4 5

71.01666 67.92322 85.86517 79.67829 70.39797

6 7 8 9 10

71.63535 80.29698 74.72879 78.44092 71.01666

> resid(model1)

1 2 3 4 4

-1.01665799 0.07678293 4.13482561 -4.67829256 -2.39796981

6 7 8 9 10

8.36465383 -2.29698074 -4.72878709 1.55908381 0.98334201

> segments(X,fitted(model1),X,Y) 90

85

80

Y

75

70

25 30 35 40 45 50

22

X

Model Testing

purposes, two conditions are necessary:

(i.e., 0). In other words, the regression line

needs to be a better predictor of y than than the

mean of y.

23

Model Testing

To meet the conditions for the regression of y on x:

negligible error.

For each value of x there is a normal distribution of y values.

The distribution of y around each x must have similar

variances (i.e., have = yx2 [read as variance of y

independent of x])

The expected values of y for each x lie on a straight line.

24

Model Testing

model:

y = + x +

- normally distributed

- have a mean of zero and variance yx2

- independent of the xs

- independent of each other

25

Model Testing

95.0

Once again, recall

that the s, or 90.7

Blood Pressure (mm Hg)

residuals, are

represented by the

86.4

difference between 82.1

predicted y-hats and

73.6

are represented in

the diagram by the 69.3

20.0 28.0 36.0 44.0 52.0 60.0

Age (yrs)

27

Model Testing

Thus, the assumptions of linear regression are largely tied

to the behavior of the residuals!

examining the s:

(2) Check linearity by plotting s against the predicted

values of y (should have random scatter around = 0.

(3) Check equality of variance by plotting the s against

the xs.

28

Normality

You may subject the

residuals to the same

measures of skewness,

kurtosis, and tests of

normality that we have

previously used in

univariate analysis.

29

Linearity

expected values of y (y-hats)

should produce a band

centered around

= 0.

is indicative of a nonlinear

trend.

30

Equality of

Variance

Variances that are

independent of x (i.e.,

homogeneous) will

result in a horizontal

band of points around

= 0.

dependent upon x will

have a fan-shape.

31

Independence

natural sequence in time or

space, they may suffer

from autocorrelation.

order in time or space to

check.

32

Model Testing

from the assumptions, we have not proved the model

correct, but there is no evidence that it is wrong.

of outliers, transformations, or switching to another

model.

33

Model Testing

and we have not violated any obvious assumptions, we

need to more closely examine the parameters of the

model.

slope because this provides information about the nature

of the relationship between x and y.

> 0 then there is a positive relationship

< 0 then there is a negative relationship

34

Model Testing

- Beta -

important question:

significantly non-zero?

and variance. Statistical tests have been developed

to test the individual model parameters.

36

Model Testing

- Beta -

H a : 0, < 0, > 0 is horizontal or not, we

b o must perform an explicit

Test statistic : t =

s y x / S xx test of Ho: = 0 using a

where : b = S xy / S xx form of t-test.

s y2 x = ( S yy b S xy ) /(n 2)

and : For two-tailed test:

Reject if |t| t/2, n-2

S xx = x ( x ) / n ,

2 2

S yy = y 2 ( y ) / n ,

2

NB: df = N-2

S xy = xy ( x y) / n

37

Model Testing

- Alpha, Beta, y-hat -

38

> summary(model)

Call:

lm(formula = X ~ Y)

Residuals:

Min 1Q Median 3Q Max

-10.88342 -2.71830 0.08607 4.00782 7.50782

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -46.3765 20.3033 -2.284 0.05173 .

Y 1.0782 0.2693 4.004 0.00393 **

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Multiple R-squared: 0.6671, Adjusted R-squared: 0.6255

F-statistic: 16.03 on 1 and 8 DF, p-value: 0.003928

39

> par(mfrow=c(2,2))

Age vs. BP Example

> plot(model)

10

Standardized residuals

2.0

6 6

5

3

Residuals

1.0

-1.0 0.0

0

-5

8 4

4

1.5

Standardized residuals

Standardized residuals

6

6

2

3

4

1.0

3 1

1

0.5

0.5

0

-1

0.5

Cook's distance

0.0

4 1

Model Testing

- Confidence Intervals -

all points do not lie exactly along the fitted line.

Often, we wish to place 95% CIs around our best fit trend

line.

x* values and determine the CI0.95 for each point and then

replotting data. Note that if you do this, you get a pair of

flared lines.

41

Model Testing

- Confidence Intervals -

lines referred to as the PI0.95.

predict a single y from a single x*.

CI 0.95 : y t / 2, n 2 s y x

(

1 x * x

+

) 2 The 1 lessens

n S xx

the influence of

the means, hence

1 x * x

PI 0.95 : y t / 2, n 2 s y x 1 + +

( ) 2

less flare on plot.

n S xx

43

> abline(model)

> XV<-seq(20,65,5)

> YV<-predict(model,list(X=XV),int="c")

> matlines(XV,YV,lty=c(1,2,2))

90

85

80

Y

75

70

25 30 35 40 45 50

X 44

Nonparametric Regression

unable to correct either a variance or normality

assumption (via transformation), it may be

appropriate to conduct a nonparametric regression.

Kendalls robust line-fit method.

45

Kendalls Robust Line-fit Method

- Procedure -

i = 1 to n - 1 and j > i :

Y j Yi

S ji =

X j Xi

There will be n(n-1)/2

slope estimates per sample.

46

- Procedure -

estimate b of the slope is the MEDIAN of the of the

Sji values.

ordering test) can be used to test b for sig.

Yi - bXi and again choose the MEDIAN.

47

- Example -

Calculate Sjis:

Data Set:

X Y S21 = (8.14 -8.98)/(12-0) = -0.07000

0.0 8.98 S32 = (6.67 - 8.14)/(29.5-12) = -0.08400

12.0 8.14 .

29.5 6.67 .

43.0 6.08 S31 = (6.67 - 8.98)/(29.5 - 0) = -0.07831

53.0 5.90 .

62.5 5.83 .

75.5 4.68 S91 = (3.72 - 8.98)/(93.0 - 0) = -0.05656

85.0 4.20

93.0 3.72 Median of the 36 slopes: b = -0.05436

48

Kendalls Robust Line-fit Method

- Example -

using Yi - bXi.

49

Correlation

There are many purposes to regression, but the main one

is for prediction. Thus, the goal is to determine the

NATURE of the relationship between two variables.

to determine the STRENGTH of the relationship, one

would do a correlation analysis.

mutually exclusive, techniques.

50

Correlation

same sample statistics as used in regression:

S xy

r=

S xx S yy

In all cases, -1 r +1

r = 0 is no relationship

r = 1 is a perfect relationship (pos. or neg.)

51

Coefficient of Determination

regression, lets return to regression momentarily to tie up

a loose end.

the linear association between x and y, we frequently refer

to the coefficient of determination, R2:

S xy2

R2 =

S xx S yy

52

Coefficient of Determination

explained by the relationship. Knowledge of x permits

knowledge of y.

of x permits no insight in to y.

the correlation coefficient) but, to clearly differentiate the

two, use a capital R for regression and a lowercase r for

correlation.

53

Correlation

regression:

Both x and y contain sampling variability.

For each value of x there is a normal dist. of ys.

For each value of y there is a normal dist. of xs.

The x distributions have the same variance.

The y distributions have the same variance.

The joint distribution of x and y is bivariate normal.

54

Regression

Model

vs.

Correlation

Model

55

Nonparametric Correlation

assumed that we are referring to the parametric correlation

coefficient which is most correctly referred to as the

Pearson Product Moment Correlation Coefficient.

correlation coefficients to deal with different situations.

One, most often used for the failure of parametric

assumptions is the nonparametric Spearmans Rank

Correlation Coefficient.

56

- Procedure -

independent of the ranking of the y variable)

Ha: E(rs) 0, E(rs) > 0, E(rs) > 0

6d2

rs = 1 with d = rx ry (diff. in x, y ranks)

N (N 2 1)

Test statistic : z = rs n 1

ranks, or ordinal data.

57

Spearmans Rank Correlation

- Example -

at the end of an endurance experiment.

rs = 1

1 4 4 0 0

N (N 2 1)

2 1 2 -1 1

3 6 5 1 1

4 5 6 -1 1 rs = 1 - (6)(8)/(7)(48)

5 3 1 2 4 rs = 0.857

6 2 3 -1 1

7 7 7 0 0 z = 2.099, P = 0.018

Sum d2=8

reject Ho

58

59

- Relatonship Between x and YCargado porCherry Mae L. Villanueva
- predicting maximal grip strength using hand circumference.pdfCargado porGreis Rguez
- PhD ThesisCargado porAmedo Amelo
- IPC2016-64157Cargado porpirsiavash
- MSQE-MQEK-MQED-PEA-2014Cargado porShreshth Babbar
- CorrelationCargado porThakur Sahil Narayan
- 0205019676Cargado porKourosh Nemati
- Ch03SMCargado porNafisah Mambuay
- Leadership Trust and Employee Loyalty in Manufacturing Firms in Port HarcourtCargado porIJARP Publications
- 2006 - Djuric Et Al. - Evaluation of the Suchey-Brooks Method for BalkansCargado porMauro Pumahuacre
- application lesson plan for correlation coefficientCargado porapi-121560518
- The Point BiserialCargado porMaulida Izzati Karima
- REYES-PhDCargado poredgar_chie
- ch03smCargado porapi-267019092
- sdarticle(9)Cargado porBulqis Vellaya
- 8 Chapter III UpdaCargado porLedayl Maralit
- 04 Correlation RegressionCargado pormonuagar
- QADM AssignmentCargado porSaqib Ismail
- Validasi X1-dikonversiCargado porElbach Net
- 020-1Cargado porDrx ahmed Maher
- 3082Cargado porJessica Rocio Ferreira Moreira
- Mirowsky_1995Cargado porBecca Guo
- Deba MR ReportCargado porPramod Agarwal
- Modeling and Evaluation of Structural Reliability_current Status and Future DirectionsCargado portrong
- Quantitative Analysis for Management Ch04Cargado porQonita Nazhifa
- Frontal LobeCargado porMegan Elizabeth
- Correlation.pptCargado porChristine Mae Lumawag
- statmodwr2017_lec01.pdfCargado porAnish Date
- Texto Multivariado [Johnson].pdf-499187229.pdfCargado porJennifer Jurado Endara
- dc3-syllabus.pdfCargado portreefree

- correlation and regression.pptCargado porkaushalsingh20
- The Species JasCargado porjessel
- Gumamela is a sCargado porjessel
- The rose is a tCargado porjessel
- Means of ProvidCargado porjessel
- the group relea.docCargado porjessel
- dnndj.docCargado porjessel
- cpm EFECargado porjessel
- Chapter 3Cargado porQasim Shahbaz
- CorrelationCargado porvinamrachaware
- Handout 5 Correlation and Regression (Recovered)Cargado pormuralidharan
- Ch20_lecnoteCargado porjessel
- Labor IssueCargado porjessel
- Transaction 7 1Cargado porjessel
- 101oldfinal.docCargado porEstrellita Sobrevilla Glodo
- Macro QuizCargado porjessel
- DocumentCargado porjessel
- HboCargado porjessel
- A flowerCargado porTiviya Tarini Maniam
- SadCargado porjessel
- Preparation of End of Month Patient StatementsCargado porjessel
- BondsCargado porjessel
- internal.docxCargado porjessel
- tesdaCargado porjessel
- IngredientsCargado porjessel
- Chapter 16 to 20.pdfCargado porAnonymous pnlZWkl
- CBA_Financial Accounting_Updated.pdfCargado porjessel

- LMPTMCargado porJonathan McConico
- Stream of Consciousness (Psychology)ACargado pornieotyagi
- Why Should My Conscience-1Cargado porgargrahul20004612
- 1-1190_JUNE2014 (1)Cargado porIrsyad Khir
- Law ExamCargado porIsabel
- RM-Some Terms in ResearchCargado por29_ramesh170
- Samuelson (1970) TheCargado porcingomax
- LIRIK LAGU MaronCargado porAmirulKhoiri
- Egyptian Mabon 07Cargado porEric C. Friedman
- The Human PredicamentCargado porAnthony Barnhart
- 3_4Cargado porWindu Nur Mohamad
- daniel sheehan edu 5230 lesson plan seasons finalCargado porapi-458936554
- Huna HealingCargado porboomerb
- Exp 4Cargado porDARTH VENIOUS
- MatlabCargado porWajih Memon
- 77. Format. Hum -Cognitive Style and Self-confidence Among SecondaryCargado porImpact Journals
- Market Definition 2012Cargado porAymane Kaddouri
- APJMR-Satisfaction-in-Romantic-Relationship-among-Adolescents.pdfCargado porOctavian Aftanasa
- 6emotionalquotient 140320113712 Phpapp01(1)Cargado porwakabuba
- From-Strategy-to-Business-Models-and-onto-Tactics_2010_Long-Range-Planning.pdfCargado porgilberto francisco ceretta
- Event Marketing-latest1 - CopyCargado porVivek Sorot
- CNC Machine Tool Evaluation Under Mixed Information by RSA ApproachCargado porIRJET Journal
- Programme Plan and Session Plan Class 9 MathCargado porgobeyondsky1
- Grade 3 Literacy Curriculum.pdfCargado porhlarivee
- AADE-14-FTCE-31Cargado porAnonymous T32l1R
- Socratic Seminar Using Francisco Jimenez Story StoryCargado porJaron K. Epstein
- Cut the Clutter - 17 Phrases to Omit from your Writing Today (Harvard).docxCargado porAnonymous NxjUoJrD
- Deshpande-Will the Real Winner Please Stand Up Print)Cargado porDeshpandem
- ACT Argumentative Essay RubricCargado porKatie Hilarides
- CHITOORCargado porVenkataraman Balayogi