1 vistas

Cargado por kannan

this

- Skittles Term Project - Part 2
- Meehl_1967
- Radziwill Type i II Power Effect
- Content
- The Effects of a Translation Bias on the Scores for the Basic Economics Test
- stathyp[1]
- q Hypothesis
- MS-8-june2015.pdf
- Gerstman_PP09
- Gerstman_PP09
- MB0034 SET-1
- Chap 9 Testing of Hypotheses-I.pdf
- Hypothesis Testing
- Gerstman_PP09
- Chapter 7 Notes
- UNIT ROOT
- MAP test Q
- Chapter 7 Test
- Print Bolton Richard
- Probability

Está en la página 1de 41

Contents:

A. HYPOTHESES TESTING

6.1 Introduction

6.1.2 Important Concepts in Hypothesis Testing

6.1.3 Tests of Hypotheses using Critical Region

6.2.2 Two Samples - Test Concerning the Difference between Two Means

6.6 Estimating Model Parameters

6.7 The Coefficient of Determination & Correlation

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

A. HYPOTHESES TESTING

6.1 Introduction

While doing a particular research, one may propose a hypothesis (assumption), and

then design an experiment and collect data in order to carry out hypothesis testing.

Data may support the research hypothesis or

Data may not support the research hypothesis.

Since a conclusion is reached based on data from a sample of the population, there

always exists a chance that our conclusion about the hypothesis may turn out to be

wrong!

concerning one or more populations.

- Examples of Claims:

• At least 50% of the students will skip the morning class on Friday

• The mean lifetime of a certain light bulb is 8000 hours

• The mean of batch 1 is different from the mean of batch 2.

• The variance of batch 1 is not different from the variance of batch 2.

• The defective percentage is less than 2%.

Evidence from the sample that is inconsistent with the stated hypothesis leads to a

rejection of the hypothesis, whereas evidence supporting the hypothesis leads to its

acceptance.

is initially favoured as true.

The Alternative hypothesis, denoted by H1 (or Ha) is the assertion that is contrary to

Ho.

A test of hypothesis is a method of using sample data to decide whether the null

hypothesis H0 should be rejected (in favour of the alternative hypothesis).

TCK (2016) Page 2 of 41

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

reject Ho or

fail to reject Ho.

rejected in favour of H1. Otherwise, we continue to believe in the truth of H0.

H0: = 0

H1: ≠ 0 OR > 0 OR < 0

The parameters which used to form the hypotheses are population parameters.

For example: mean (), proportion (p), and variance (2) or standard deviation ().

Given a scenario, you must read the scenario carefully and determine the claim that

you want to test (refer Table 1).

The Ho always carries the equal (=) sign (refer to the column of Ho).

If the claim suggests a simple direction such as more than, less than, superior to,

inferior to, and so on, then H1 will be stated using the inequality symbol (< or >)

corresponding to the suggested direction (refer to row 2 and row 4).

as at least, equal to or greater, at most, no more than, and so on, then this entire

compound direction (≤ or ≥) is expressed as Ho, but using only the equality (=)

sign, and H1 is given by the opposite direction (refer to row 1 and row 3).

not equal symbol, (refer to row 5).

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

1 At least = <

2 > More Than = >

3 At Most = >

4 < Less Than = <

5 Not equal =

Table 1. The Claims and the Hypotheses

Example 6.1:

State the null and alternative hypothesis to be used in testing the claim:

(a) A manufacturer of a certain brand of rice cereal claims that the average

saturated fat content does not exceed 1.5 grams.

Claim: ≤ 1.5, Ho : = 1.5 vs. H1 : > 1.5

(b) A manufacturer of a certain brand of rice cereal claims that the average

saturated fat content is more than 1.5 grams.

Claim: >1.5, Ho : = 1.5 vs. H1 : > 1.5

(c) A manufacturer of a certain brand of rice cereal claims that the average

saturated fat content is at least 1.5 grams.

Claim: ≥ 1.5, Ho : = 1.5 vs. H1 : < 1.5

(d) A manufacturer of a certain brand of rice cereal claims that the average

saturated fat content is less than 1.5 grams.

Claim: <1.5, Ho : = 1.5 vs. H1 : < 1.5

(e) A real estate agent claims that 60% of all private residences being built today

are 3-bedroom homes. To test this claim, a large sample of new residences is

inspected; the proportion of these homes with 3 bedrooms is recorded and used

as our statistic.

Claim: p=0.6, Ho : p = 0.6 vs. H1 : p 0.6

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Example 6.2:

For the following pairs of assertions, indicate which do not comply with our rules for

setting up hypotheses and why.

These hypotheses comply with our rules

(b) H o : 20, H1 : 20

H1 includes the equality claim ( ≤ 20), it is contradict to Ho. Not comply

Ho should contain the equality claim, so these are not legitimate. Not

comply

(d) H o : 120, H1 : 150

We are not allowing both Ho and H1 to be equality claims. Not comply

based.

2. A critical region (or rejection region), the set of all test statistic values for

which H0 will be rejected (null hypothesis is rejected if and only if the test

statistic value falls in this region.)

A test statistic is the sample statistic that is used in the hypothesis testing process.

The calculated value of the test statistic is used for either rejecting or accepting the

null hypothesis. Examples of test statistic:

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Mean, :

X X

Z , and T

s

n n

Proportion, p:

pˆ p

Z

pq

n

Variance, 2 :

(n 1) s 2

2

2

The decision procedure could lead to either of two wrong conclusions. So, there are

two types of errors.

P(Type I error) = P(Reject Ho when it is true) =

P(Type II error) = P(Accept Ho when Ho is false) =

Accept Ho Reject Ho

Ho is true Correct Type I error

Ho is false Type II error Correct

denoted by α (alpha).

A test of a statistical hypothesis, where the region of rejection is on only one side of

the sampling distribution of the test statistic, is called a one-tailed test. {Note:

For one-tailed test, it can be upper-tailed test/ right-tailed test, or lower-tailed

test/ left-tailed test}

the sampling distribution of the test statistic, is called a two-tailed test.

TCK (2016) Page 6 of 41

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

tailed test.

The critical region is chosen according to three possible cases (upper-tailed test/

right-tailed test, or lower-tailed test/ left-tailed test, or two tailed test), illustrated

with a test statistic that is a standard normal random variable under H0.

A test of any statistical hypothesis, where the H1 is one-sided, such as

H 0 : 0 vs H1 : 0 (one sided test; right-tailed test as “>” is used in

H1).

The critical region :

Reject Ho if Z Z (upper-tailed test/ right-tailed test)

H 0 : 0 vs H1 : 0 (one-sided test; left-tailed test as “<” is used

in H1)

The critical region :

Reject Ho if Z Z (lower-tailed test/ left-tailed test)

H 0 : 0 vs H1 : 0 (two tailed test as "" is used in H1 )

Reject Ho if Z Z or Z Z

2 2

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Example 6.3:

State the null and alternative hypothesis to be used in testing the claim and determine

where the critical region is located:

(a) A manufacturer of a certain brand of rice cereal claims that the average

saturated fat content does not exceed 1.5 grams. State the null and

alternative hypothesis to be used in testing this claim.

H0 : = 1.5 vs. H1 : > 1.5 (One-tailed test/ right-tailed test)

Critical region: Reject H0 if z z

(b) A manufacturer of a certain brand of rice cereal claims that the average

saturated fat content is more than 1.5 grams. State the null and alternative

hypothesis to be used in testing this claim.

H0 : = 1.5 vs. H1 : > 1.5 (One-tailed test/ right-tailed test)

Critical region: Reject H0 if z z

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

(c) A manufacturer of a certain brand of rice cereal claims that the average

saturated fat content is at least 1.5 grams. State the null and alternative

hypothesis to be used in testing this claim.

H0 : = 1.5 vs. H1 : < 1.5 (One-tailed test/left-tailed test)

Critical region : Reject H0 if z - z

(d) A real estate agent claims that 60% of all private residences being built

today are 3-bedroom homes. To test this claim, a large sample of new

residences is inspected; the proportion of these homes with 3 bedrooms is

recorded and used as our statistic.

H0 : p = 0.6 vs. H1 : p 0.6 (Two-tailed test)

Critical region : Reject H0 if z - z/2 or z z/2

1. State the null hypothesis and the alternative hypothesis:

H 0 : 0 vs H1 : 0 (for 2-tailed test)

Or H1 : 0 (for right-tailed test)

Or H1 : 0 (for left-tailed test)

2. Determine whether it is a one or two-tailed test by referring to H1.

3. Decide on the sampling distribution of the test statistic under H0.

4. State the critical region for the selected significance level, . (or draw the curve).

5. Give the formula for the test statistic. Compute the value of the test statistic from

the sample data.

6. Decide whether H0 should be rejected and state this conclusion in the problem

context.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Suppose we have a sample of size n taken from a population whose mean is and

variance 2. We want to test whether this sample is taken from a population whose

2

mean is 0. We know that the sample mean X ~ N , if n is large.

n

Steps:

(1) H 0 : 0 vs H1 : 0 (for 2-tailed test)

or H1 : 0 (for right-tailed test)

or H1 : 0

(for left-tailed test)

(2) Determine whether it is a one or two-tailed test by referring to H1.

(3) Use z distribution.

(4) Critical region :

Critical/Rejection Region for Level α

Alternative hypothesis Test

H1 : 0 z ≥ zα (upper-tailed test/right-tailed test)

𝑥̅ −𝜇

(5) Test-statistics, Z = 𝜎

√𝑛

(6) Decision & Conclusion.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

The steps are same as Case 1a, but just replace with s if is unknown for test

𝑥̅ −𝜇

statistics, i.e. Z = 𝑠 .

√𝑛

Suppose we have a sample of size n, where n < 30, taken from a normal population

whose mean is and variance (2 ) unknown. We want to test whether this sample

is taken from a population whose mean is 0.

Steps:

or H1 : 0 (for left-tailed test)

(2) Determine whether it is a one or two-tailed test by referring to H1.

(3) Use t-distribution because n < 30 and 2 is unknown.

(4) Critical region :

Critical/Rejection Region for Level α

Alternative hypothesis Test

T t (upper-tailed test)

H1 : 0 ,n1

2

H1 : 0 ,n1

2

H1 : 0 ,n1 ,n1

2 2

X

(5) Use t-distribution. Test statistic, T .

s

n

(6) Decision & Conclusion.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Example 6.4:

Suppose that it is known from experience that the standard deviation of the 8-cm

diameter CDs made by a certain company is 0.16 cm. To check whether its

production is under control on a given day, namely, to check whether the true average

diameter of the CD is 8 cm, the employee selected a random sample of 25 pieces of

CDs and finds that their mean diameter is x 8.091cm. Since the company stands to

lose money when 8 and the customer loses out when 8 , test the null

hypothesis 8 against the alternative hypothesis 8 at 0.01.

Solution:

(i) Hypothesis : H0 : = 8

H1 : 8 (2-tailed test)

(ii) 2 is known z-distribution

(iii) Critical region :

= 0.01 /2=0.01/2= 0.005 ( is divided by 2 because 2-tailed test)

z0.005 = 2.57

Reject H0 if z -2.57 or z 2.57

X 8.091 8

(iv) Test statistics : Z = 2.8438

0.16

n 25

(v) Since z > 2.57, Reject H0.

{Note: We also can draw normal curve. By drawing a normal curve and shade the

critical region as below,

0

-2.57 2.57 z = 2.8438

Critical region Critical region

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Z = 2.8438 falls into the critical region. Reject H0. }

{Note: When we reject H0 means we accept H1. So use H1 to make conclusion.}

(vi) Conclusion: 8 , or >8 and the company stands to lose money.

Example 6.5:

The daily yield for a local chemical plant has averaged 880 tons for the last several

years. The quality control manager would like to know whether this average has

changed in recent months. She randomly selects 50 days from the computer database

and computes the average and standard deviation as 871 and 21 tons respectively.

Test the appropriate hypothesis using = 0.05

Solution:

Let X be the daily yield for the local chemical plant. Given : n = 50, x 871 , s = 21

(i) Ho : = 880 vs. H1 : 880 (2-tailed test)

Note: use in H1 because the

claim is “whether this average

has changed...”, indicates that

maybe is it is greater or maybe

lower.

(ii) n > 30 z-distribution

(iii) = 0.05 /2= 0.05/2 = 0.025 z0.025 = 1.96

Critical region : Reject H0 if z -1.96 or z 1.96

871 880

(iv) Test statistics : z 3.0305

21

50

(v)

Critical region Critical region

Since z = -3.0.05 falls into critical region {or z < -1.96} Reject H0.

(vi) Conclusion: 880, or < 880 and the average have changed to a value

lower than 80 tons.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Example 6.6:

The specification for a certain kind of ribbon should have a mean breaking strength

of 185 pounds. If five pieces randomly selected from different rolls having mean

breaking strengths of 183.14 and standard deviation of 8.219, test the null hypothesis

185 pounds against 185 pounds at 0.05

Solution:

Given: n = 5, x 183.14 , s = 8.219.

(ii) 2 is unknown, n < 30 t-distribution

(iii) = 0.05 t0.05, 4 = 2.132

Critical region : Reject H0 if t -2.132

183.14 185

(iv) Test statistics : t 0.506

8.219

5

(v)

-2.132 0

z = -0.506

Critical region

Since t > -2.132 (or we can write t = -0.506 does not fall into critical region),

Do not reject H0 (or we can write accept H0).

(vi) Conclusion: the mean breaking strength is not significantly less than 185.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

6.2.2 Two Samples: Test Concerning the Difference between Two Means

Two independent samples of size n1 and n2 taken from population with means 1, 2

and variances 12 and 22 . To test whether these samples are taken from population

whose means are equal:

Steps:

(1) For 2-tailed test:

H 0 : 1 2 vs H1 : 1 2 or

H 0 : 1 2 0 vs H1 : 1 2 0

H 0 : 1 2 vs H1 : 1 2 or

H 0 : 1 2 0 vs H1 : 1 2 0

H 0 : 1 2 vs H1 : 1 2 or

H 0 : 1 2 0 vs H1 : 1 2 0

(3) Use Z-distribution.

(4) For a particular value of , determine the critical region.

Critical region :

Critical/Rejection Region for Level α

Alternative hypothesis Test

z ≥ zα (upper-tailed test/right-tailed test)

H1 : 1 2 0

H1 : 1 2 0

H1 : 1 2 0

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

( x1 x2 ) ( 1 2 )

(5) Test statistic Z

12 22

n1 n2

(6) Conclusion.

test statistics.

Example 6.7:

A company claims that its light bulbs are superior to those of its main competitor. If

a study showed that a sample of 40 of its bulbs has a mean lifetime of 647 hours of

continuous use with a standard deviation of 27 hours, while a sample of 40 bulbs

made by main competitor had a mean lifetime of 638 hours of continuous use with a

standard deviation of 31 hours. Does this substantiate the claim at the 0.05 level of

significance?

Solution:

Let X1 and X2 be the lifetimes of the light bulbs made by that company and its main

competitor respectively.

n2 = 40, x 2 638 , s2 = 31

(or we can write H0 :1 - 2 = 0 vs. H1 : 1-2 >0)

(ii) 2 unknown, n1, n2 > 30 z-distribution

(iii) = 0.05 z0.05 = 1.645

Critical region : Reject H0 if z 1.64

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

90

(iv) Test statistics : z 1.3846

2 2

27 31

40 40

(v)

Critical region

0 1.96

z = 1.3846

Since z <1.645 (or z = 1.3846 does not fall into critical region) Do not reject

H0.

(vi) Conclusion: The two light bulbs have equal quality.

When the random samples if size n (n is large) can result in two possible outcomes,

with the sample proportion, p̂ represents the “successes”, could be drawn from a

population with the proportion of “successes”, po, we use the hypothesis test about

proportion.

Steps:

(1) H 0 : p p0 vs H1 : p p0 (for 2-tailed test)

or H1 : p p0 (for right-tailed test)

or H1 : p p0 (for left-tailed test)

(2) Determine whether it is a one or two-tailed test by referring to H1.

(3) Use z-distribution.

(4) For the required , the critical region is determined.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Critical region :

Critical/Rejection Region for Level α

Alternative hypothesis Test

H1 : p p0 z ≥ zα (upper-tailed test/right-tailed test)

(5)

pˆ po

Test statistic Z , qo 1 po

po qo

n

(6) Conclusion regarding the acceptance/rejection of null hypothesis based on

rejection criteria.

Example 6.8:

If 4 out of 20 patients suffered serious side effects from a new medication, test the

null hypothesis p 0.5 against the alternative hypothesis p 0.5 at 0.05 . Here,

p is the true proportion of patient suffering side effects from the new medication.

Solution:

(i) H0 : p = 0.5 vs. Ha: p 0.5 (2-tailed test)

(ii) = 0.05 /2 = 0.05/2 = 0.025 z0.025 = 1.96

Critical region : Reject H0 if z -1.96 or z 1.96

(iii) Test statistics : n = 20, po = 0.5, qo = 1-po = 0.5, pˆ 4 / 20 0.2 , and thus

0.2 0.5

z 2.68

0.5 0.5

20

(iv) Since z < -1.96, Reject H0 .

(Note: We also can draw normal curve & shade the critical region as previous

examples above to make decision whether to reject or accept H0.)

(v) Conclusion: p 0.5

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

To test whether the population variance 2 equals to a specific value 2 . The sample

variance is s2.

Steps:

(1) H 0 : 2 02 , vs H1 : 2 02 (Two-tailed test)

Or H1: σ2 > σ02 (Right-tailed test)

Or H1: σ < σ0

2 2

(Left-tailed test)

(2) Determine whether it is a one or two-tailed test by referring to H1.

(3) Use 2 distribution

(4) Critical region for the required value of α and given n, based on 2 distribution

table.

Critical Region:

Hypotheses: Critical/Rejection Region for Level α Test

H0: σ2 = σ02 χ2 > χ2α

H0: σ2 = σ02 χ2 < χ2 1 - α

H0: σ2 = σ02 χ2 > χ2α /2 or χ2 < χ2 1 - α/2

The rules in the table above are based on the following diagrams:

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

(n 1) s 2

(5) Test statistic,

2

2

.

(6) Conclusion regarding the acceptance/rejection of null hypothesis based on

rejection criteria.

Example 6.9:

Suppose that the thickness of a part used in a semiconductor is its critical dimension

and that measurements of the thickness of a random sample of 18 such parts have the

variance s2 = 0.68. The process is said to be under control if the variation of the

thickness is given by a variance not greater than 0.36. Assuming that the

measurements constitute a random sample from a normal population, test the

hypothesis = 0.05.

Solution:

n = 18 s2 = 0.68

(i) H0 : 2 =0.36 vs. H1 : 2 >0.36 (right-tailed test)

(ii) = 0.05 20.05, 17 = 27.587

Critical region : Reject H0 if 2 27.587

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Or draw diagram:

= 0.05

= 27.587

18 1 0.68 32.111

0.36

(iv) Since > 27.587, Reject H0.

2

(v) Conclusion: 2 >0.36 .

Example 6.10:

You have a random sample of size 20, with a standard deviation of 125. You have

good reason to believe that the underlying population is normal. Is the population

variance different from 10,000, at the 0.05 significance level?

Solution:

H1: σ2 ≠ 10,000 (two-tailed test.)

/2 = 0.05/2=0.025

Rejection Criteria:

Reject Ho if

2 2 or 2 2

, n 1 1 ,n1 .

2 2

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

2 32.852 or 2 8.906

Or draw diagram:

=0.025

=0.025

=8.906 =32.852

(n 1) s 2 (20 1)(1252 )

(iii) Test statistic ,

2

=29.688

2 10000 .

2

Accept H0 .

(iv) σ2 = 10,000 (or The population standard deviation may not be different from

10,000)

is a linear relationship 𝑦 = 𝛽0 + 𝛽1 𝑥

However, the relationship between two variables x and y may not be deterministic.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Regression analysis is the part of statistics that deals with investigation of the

relationship between two or more variables using probabilistic models.

For our discussion, we shall assume that values of the variable x are fixed by the

experimenter. The variable x is the independent (predictor, explanatory) variable.

For a fixed x, the second variable will be a random variable Y with observed value y,

referred to as the dependent (response) variable.

variable. Let x1, x2, . . . , xn denote values of the independent variable for which

observations are made, and let Yi and yi respectively denote the random variable and

observed value associated with xi. The available bivariate data then consists of the n

pairs (x1, y1), (x2, y2), . . . , (xn, yn). A first step in regression analysis involving two

variables is to construct a scatter plot of the observed data. In such a plot, each (xi, yi)

is represented as a point plotted on a two-dimensional coordinate system.

Scatter Plot

A scatter plot is a useful summary of a set of bivariate data (two variables), usually

drawn before working out a linear correlation coefficient or fitting a regression line. It

gives a good visual picture of the relationship between the two variables, and aids the

interpretation of the correlation coefficient or regression model.

Each unit contributes one point to the scatter plot, on which points are plotted but not

joined. The resulting pattern indicates the type and strength of the relationship between

the two variables.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

A simple linear regression model describes the linear relationship between dependent

variable Y and a single independent variable x as

𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀

where

Y is the response variable/dependent variable

x is the explanatory variable/predictor/ independent variable

𝛽0 and 𝛽1 are the regression coefficients

𝜀 is the random error, with E[𝜀] = 0 and Var[𝜀] = 𝜎 2

𝛽0 , 𝛽1 and 𝜎 2 are parameters.

Note:

(i) 0 indicates the y intersect only if the scope of the model includes the value

x = 0.

(ii) 1 indicates the changes in the mean respond associated with one unit increase in

x. ( 1 is also the slope of the regression line.)

(iii) The true (or population) regression line 𝑌 = 𝛽0 + 𝛽1 𝑥 is the line of mean

values; for a particular x value, y is the expected value of Y for that value of x.

model

Linear models: The simplest relationship between two variables is a straight line,

thus termed as Simple Linear Regressions. By having such relationship

Y = 𝛽0 + 𝛽1 𝑥, one may be able to predict Y at unknown values of X from the

knowledge of the trend between X and Y.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Example 6.11

Suppose that in a certain chemical process the reaction time Y (hr) is related to the

temperature (oF) in the chamber in which the reaction takes place according to the

simple linear regression model with equation Y = 5.00 - 0.01 X and = 0.075.

a. What is the expected change in reaction time for a 1 oF increase in

temperature? For a 10 oF increase in temperature?

b. What is the expected reaction time when temperature is 200 oF? When

temperature is 250 oF?

Solution:

a. When X = 1 oF, expected change for a one degree increase,

𝛽1 = -0.01*1 = - 0.01#

𝛽1 = -0.01*10 = -0.1#

b. When X = 200 oF, Y = 5.00 – 0.01(200) = 3.00#

When X = 250 oF, Y = 5.00 – 0.01(250) = 2.50#

Consider a given sample data {(x1, y1), (x2, y2), …, (xi, yi) ,…, (xn, yn) }. Let yi is the

observed value of a rv Yi, where 𝑌𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜀𝑖 . The errors i are

independent rv’s.

If the line 𝑦 = 𝛽0 + 𝛽1 𝑥 is used to fit the model, the fitted values 𝑦̂𝑖 are obtained

via 𝑦̂ = 𝛽0 + 𝛽1 𝑥 . The residual 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 = 𝑦𝑖 − 𝛽0 + 𝛽1 𝑥𝑖 is the vertical

deviation of the point (xi, yi) from the fitted line y = 𝛽0 + 𝛽1 𝑥.

SSE i2 (Yi Yˆi ) 2 (Yi o 1 X i ) 2

i i i

It is used as the measure of goodness of fit.

Using the principle of least squares, we minimize this sum of squares to obtain the

estimated regression line or least squares line.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

A line provides a good fit to the data if the vertical distances (deviations) from the

observed points to the line are “small” (see Figure 2).

y = 𝛽̂0 + 𝛽̂1 𝑥.

𝑆𝑥𝑦

̂

𝛽1 =

𝑆𝑥𝑥

1

𝛽̂0 = (∑ 𝑦𝑖 − 𝛽̂1 ∑ 𝑥𝑖 ) = 𝑦̅ − 𝛽̂1 𝑥̅

𝑛

𝑖

Where

1

𝑆𝑥𝑦 = ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) = ∑ 𝑥𝑖 𝑦𝑖 − (∑ 𝑥𝑖 )(∑ 𝑦𝑖 )

𝑛

𝑖 𝑖 𝑖 𝑖

2

22

(∑𝑖 𝑥𝑖 )

𝑆𝑥𝑥 = ∑(𝑥𝑖 − 𝑥̅ ) = ∑ 𝑥𝑖 −

𝑛

To minimize SSE with respect to the linear regression parameters (0, 1) :

𝑖 𝑖

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Least squares estimators of 𝛽0 and 𝛽1 given above are unbiased and have minimum

variance among all other unbiased estimators.

In computing 𝛽̂0 , use extra digits (at least up to 4 decimal) in 𝛽̂1 because if 𝑥̅ is

large in magnitude, rounding will affect the final answer.

The Line

After estimating the model parameters, the fitted regression line can then be written

as:

y = 𝛽̂0 + 𝛽̂1 𝑥.

Note: It must be emphasized that before 𝛽̂0 and 𝛽̂1 are computed, a scatter plot

should be examined to see whether a linear probabilistic model is plausible. If the

points do not tend to cluster about a straight line with roughly the same degree of

spread for all x, other models should be investigated. In practice, plots and regression

calculations are usually done by using a statistical computer package.

Estimating 2 and

The parameter variance, 2, determines the amount of variability inherent in the

regression model. After a regression model has been fitted, the fitted values 𝑦̂𝑖 are

obtained via 𝑦̂𝑖 = 𝛽̂0 + 𝛽̂1 𝑥 with residuals 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 .

given by

𝑆𝑆𝐸

̂𝜎 2 = 𝑠 2 =

𝑛−2

S

SSE S YY XY S XY

S XX

SYY 1 S XY

where

1 1

SYY yi2 ( yi ) 2 and S XY xi yi ( xi )( yi )

i n i i n i i

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

𝑦 = 𝛽0 + 𝛽1 𝑥

Step 1: Draw the scatter plot of the (X,Y) data for visual inspection of the

relationship that may exist between X and Y. {Note: This step can be

skipped if the scatter diagram is not required in the question.}

k X Y X2 Y2 XY

1 x1 y1 x1 2 y1 2 x1 y1

2 x2 y2 x2 2 y2 2 x2 y2

n xn yn xn 2 yn 2 xn yn

Sum x i

i y

i

i x

i

2

i y

i

2

i x y

i

i i

Step 3: Calculate the linear regression parameters (o, 1) using the formula below:

1 1

S XY xi yi ( xi )( yi ) and S XX xi2 ( xi ) 2

i n i i i n i

where

S xy

̂1 and ˆ0 y ˆ1 x

S xx

Y ˆ0 ˆ1 X

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Additionally, we can compute 𝑆𝑆𝐸 = 𝑆𝑦𝑦 − 𝛽̂1 𝑆𝑥𝑦 and hence, an unbiased

𝑆𝑆𝐸

estimate of 2, ̂𝜎2 = 𝑠2 =

𝑛−2

Example 6.12

synthetic fiber and its tensile strength. Researchers took measurements at various pre-

selected, known levels of fiber thickness, and the following data was collected.

Fiber thickness, 40 31 34 44 49 36 41 50 39 45

X

Tensile strength, 83 74 72 70 75 73 70 76 79 72

Y

If the fiber strength thickness was 45, what would be the predicted strength?

In addition, give an estimate of the standard deviation of the model error.

Solution:

Y 0 1 X

Step 1: Draw the scatter plot of the (X,Y) data for visual inspection of the

relationship that may exist between X and Y. {Note: can be skipped}

Y

85

*

80

*

75

* **

** *

70 * *

0 30 35 40 45 50 X

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

k X Y X2 Y2 XY

1 40 83 1600 6889 3320

2 31 74 961 5476 2294

3 34 72

4 44 70

5 49 75

6 36 73

7 41 70

8 50 76

9 39 79

10 45 72

Sum xi

i yi =744 x

i

2

i y

i

2

i x y

i

i i

i

Using the table above, n =10, we determine

1 1

x xi 40.9 , y yi 74.4

n i n i

and

S 6.4

ˆ1 XY 0.01834

S XX 348.9

ˆ0 y ˆ1 x 74.4 (0.01834)(40.9) 73.6499

Y 73.6499 0.0183 X

Y = 73.6499+0.0183(45) = 74.4734#

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

2

(∑𝑖 𝑦𝑖 ) 7442

𝑆𝑦𝑦 = ∑ 𝑦𝑖2 − = 55504 − = 150.4

𝑛 10

𝑖

𝑆𝑆𝐸 = 𝑆𝑦𝑦 − 𝛽̂1 𝑆𝑥𝑦 = 150.4 − (0.01834)(6.4) = 150.28

150.28

̂𝜎 2 = 𝑠 2 == 18.785

8

An estimate of the standard deviation ( )of the model error is √18.785 = 4.33

1) 0 does not give any meaning since the scope of sample data not include x = 0.

2) Within the scope of the model, we have linear relationship between x and y.

3) We should not make inference about the relationship between x and y for value

out of the range of sample data.

Example 6.13

product yield. The study results in the following data:

0

Temperature, C 100 110 120 130 140 150 160 170 180 190

(x)

Yield, % (y) 45 51 54 61 66 70 74 78 85 89

These pairs of points are plotted in Fig. 14-1. Such a display is called a scatter

diagram. Examination of this scatter diagram indidates that there is a strong

relationship between yield and temperature, and the tentative assumption of the

straight-line model y = 𝛽0 + 𝛽1 𝑥 + 𝜀 appears to be reasonable. Find the regression

line equation that represents this set of data.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Solution:

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

The sample coefficient of determination, r2, represents the proportion of the total

variation of the variable Y that can be explained by a linear relationship with the

values of X. It is widely used to determine how well a regression fits. In other words,

how close the points are to the regression line.

by the total sum of squares.

SSE = the sum of squared deviations about the least squares line Y 0 1 X ,

SST = the sum of squared deviations about the horizontal line at height y.

SSE/SST = the proportion of total variation that cannot be explained by the simple

linear regression model,

1 – SSE/SST = the proportion of observed y variation explained by the model.

Thus, r2 = 1 – SSE/SST

The higher the value of r2, the more successful is the simple linear regression model

in explaining y variation.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

6.7.2 Correlation

Correlation analysis is used to measure the strength of linear relation between X and

Y by means of a single number called a correlation coefficient.

XY

, with 1 +1.

XX YY

= ±1 only occur when we have a perfect linear relationship between the two

variables

= +1 implies a perfect linear relationship with a positive slope (1 > 0),

= 1 implies a perfect linear relationship with a negative slope (1 < 0),

implies a good correlation or linear association between X and Y, whereas values

that near to zero indicate little or no correlation.

S XY

r or r r2

S XX SYY

The value of r (1 r 1) measures how good is the linear relationship between X

and Y.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Y Y

X X

Y Y

X X

A value of r near 0 is not evidence of the lack of a strong relationship, but only the

absence of a linear relation or correlation.

A value of r that fall within the range from 0 to 0.5 is considered weak, strong if it is

between 0.8 to 1, and moderate otherwise. Refer to the diagram below for the

summary of r value:

negative negative negative positive positive positive

relationship relationship relationship relationship relationship relationship

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Example 6.14

Construct the correlation coefficient between X (test grade) & Y (number of years) if

SXX = 10.5, SYY = 1504.1, SXY = 114.5

Solution:

S XY

r

S XX SYY

114.5

r

10.5 1504.1

= 0.9111#

As a conclusion, the r value of 0.9111 shows a strong correlation between test grade

and the number of years.

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

Statistical Tables

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

TMA1111 Mathematical Techniques Faculty of Information Science & Technology

- Skittles Term Project - Part 2Cargado porBrendS818
- Meehl_1967Cargado porWon Chan Oh
- Radziwill Type i II Power EffectCargado porGitanjali Bagga
- ContentCargado porsir jj
- The Effects of a Translation Bias on the Scores for the Basic Economics TestCargado porLily Rose
- stathyp[1]Cargado pormaayaank
- q HypothesisCargado porNitiraj123
- MS-8-june2015.pdfCargado pordebaditya_hit326634
- Gerstman_PP09Cargado porVemuri Raja
- Gerstman_PP09Cargado porRoyal Mechanical
- MB0034 SET-1Cargado porDilipk86
- Chap 9 Testing of Hypotheses-I.pdfCargado porTanisha sharma
- Hypothesis TestingCargado porPavithra Vinoth
- Gerstman_PP09Cargado porBidyut Sonowal
- Chapter 7 NotesCargado porjiteshjacob439
- UNIT ROOTCargado porOlayanju Oluwarotimi Temitope
- MAP test QCargado porShubhangi B
- Chapter 7 TestCargado poraka
- Print Bolton RichardCargado porapi-3806285
- ProbabilityCargado pornour
- 1473-001Cargado porAnonymous C9tDC5
- A Comparative Analysis of Procurement Methods Used on Competitively Tendered Office Projects in the UKCargado porjacom0811
- 3.ISCA-RJMSS-2013-037Cargado porchakravarti singh
- 0.05Cargado porvsnaveenk
- A Comprehensive Relationship between Job Satisfaction and Turnover Intension of Private Commercial Bank Employees’ in BangladeshCargado porIjsrnet Editorial
- A Study on Online Shopping Behavior of ConsumersCargado porInternational Journal of Innovative Science and Research Technology
- Analysis of Textile IndustryCargado porDinesh Gupta
- Pertemuan 15 Comparing SystemCargado porbumisatriawan
- Implementation PlanCargado porWilliam Mak
- An enhanced bootstrap method to detect possible fraudulent behavior in testing facilitiesCargado porImmer

- VSAT_G703E1_appnoteCargado porAhmad Draie
- the_former_yugoslav_republic_of_macedonia_II.pdfCargado pordiplomski Diplomsko
- Analysis of Road Accident in Hisar TehsilCargado poresatjournals
- UW Water Resources and Innovation Conference 2015Cargado porLungisani
- Chapel of St Laurence, Bradford-On-Avon, UKCargado porFuzzy_Wood_Person
- DAISy 2 Plus AIS Receiver ManualCargado portariktunad
- 94d86611813aaa076baa4abc1f8ea61b91fdea68.3Cargado porignaciomarin
- Compressor Foundation RepairCargado pordicktracy11
- Henry Morton Stanley - John RowlandsCargado porKintyre On Record
- How Ultrasonic Welding Works 06Cargado porJohn Carter
- Asch 1Cargado porjuncar25
- AGUILAR, Guadalupe. La Interacción, La Interpretación y La Implicación Como Estrategias ParticipaCargado porR.e. Miendo
- Aqa 2455 w Trb u01asqmsCargado porbuddha200209
- 59160-Feed-TapCargado poryraju88
- Awakening and Sleep-Wake Cycle Across DevelopmentCargado porHaning Tyas Cahya Pratama
- gravity_irrigation_schemes_DZ_Final_Report.pdfCargado porliftfund
- Effective Management of Replacement GiltsCargado porJeff Rodríguez
- List of MSS Standard PracticesCargado pormaz234
- 9. Susp. Nephrolithiasis DD BPHCargado porkymaloga
- Chapter 3 - EnzymesCargado porNoor Hidayah Sambli
- thai-people-20120618-152305-620872Cargado porbisankhe2
- Ecocert BASF Products List EMC BCS 062014 FinalCargado porluna_nicole
- NX100 MaintenanceCargado porRubenCuevas
- DR.SID-LA YR3.pptxCargado porSiddharth Dhanaraj
- Teknik-Menjawab-Kimia-SPM BETUL SALAH.ppsxCargado porCik Su
- Bonar, Horatius - Truth and ErrorCargado porapi-26121206
- 3 1 3 Optimization of a Seven Element Antenna ArrayCargado porLiz Benhamou
- PRS Methods Marees2018Cargado pormarc
- Dr. Mohseni_ in-situ Insulation Test of 400 KV GISCargado poralisaidii
- Pile DesignCargado porteomalancu