This presentation explain the concept of central limit theorem for students have not strong mathematical or statistical background

This presentation explain the concept of central limit theorem for students have not strong mathematical or statistical background

Ghada Ahmed

Lecturer of Biomedical Informatics and

Medical Statistics

ILOs

1.

2.

3.

4.

5.

6.

7.

8.

Differentiate randomization from random sampling

Draw a probability distribution

Compute the middle 95% and 99% under normal distribution

Explain central limit theorem

Apply central limit theorem

Determine limitations for central limit theorem

Explain what are confidence intervals, confidence levels,

confidence limits

learn Central limit

theorem?

chocolate chips in 20 cookies for a class

activity. They found that the cookies on

average had 15 chocolate chips with a

standard deviation of 5 chocolate chips. Can

you determine the true mean of the number

of chocolate chips?

Statistical Inference

Infer

to guess that something is true because of the information that you have

e.g., I inferred from the number of cups that he was expecting visitors.

Statistical inference

Settings where one wants to infer facts

about a population using noisy statistical

data where uncertainty must be accounted

for.

The sample mean will estimate the

population mean

The sample median will estimate the

population median

The sample standard deviation will

estimate the population standard deviation

Motivating Example

In every major election, pollsters would like to

know, ahead of the actual election, who's

going to win.

What is the target of estimation?

What can not we do?

What can we do?

The goal of statistical inference is to:

a. Infer facts about a population from a

sample

b. Infer facts about a sample from a

population

c. Calculate sample quantities to

understand your data

d. To teach researcher about statistical

randomization, random allocation,

random sampling and random

variable?

Random sampling

It is concerned with obtaining data that is

representative of the population of interest

The Literary Digest polled about 10 million Americans, and

got responses from about 2.4 million. The poll showed that

Landon would likely be the overwhelming winner and FDR

would get only 43% of the votes.

Election result: FDR won, with 62% of the votes.

The magazine was completely discredited because of the

poll, and was soon discontinued.

Roosevelt 43% Landon 57%

actual results:

Roosevelt 62% Landon 38%

They took a huge sample (2.4 million

people)

Roosevelt victory on the basis of a much

smaller sample (50,000)

Literary Digest Poll using

telephone directories

magazine subscriber lists

club and association rosters, etc

ballots

Selection bias

Sample lists tended to represent middleand upper-class voters: in those days, many

poor did not

have telephones

they didnt subscribe to magazines

they didnt belong to clubs

10 million ballots were sent out

2.4 million were returned

response rate of 2.4/10 = 24%

Fact: those who respond to surveys tend to

be:

better educated

in higher economic brackets

in 1936, above = Republican!

well stirred, it doesnt matter how large a

spoon you have, it will still not taste right.

suffice to test the soup.

Randomization

It is concerned with balancing unobserved

variables that may confound inferences of

interest

will no longer allow high school students

to park at school after two recent

accidents where students were severely

injured. As a first step, they survey

parents by mail, asking them whether or

not the parents would object to this policy

change. Of 6,000 surveys that go out,

1,200 are returned. Of these 1,200 surveys

that were completed, 960 agreed with the

policy change and 240 disagreed.

are true?

I. Some of the mailings may have never reached

the parents.

II. The school district has strong support from

parents to move forward with the policy approval.

III. It is possible that majority of the parents of high

school students disagree with the policy change.

IV. The survey results are unlikely to be biased

because all parents were mailed a survey.

treatment in a randomized trial is

to:

2. To obtain a representative sample of

subjects from the population of interest

3. Balance unobserved covariates that may

contaminate the comparison between

the treated and control groups

4. To add variation to our conclusion

Distribution

Frequency distribution

Frequency

It is how often something

occur

Frequency distribution

By counting frequencies we can make a

Frequency Distribution table.

How to display a

frequency distribution?

Bar char or Histogram

Bar Graph

A Bar Graph (also called Bar Chart) is a

graphical display of data using bars of

different heights.

a class.

How many more of them favored orange?

A.2

B.3

C.4

D.5

a class.

How many more of them favored green?

A.2

B.1

C.4

D.5

a class.

How many more of them favored orange than those who

favored green?

A.2

B.3

C.4

D.5

Histogram

A Histogram is a graphical

display of data using bars of

different heights.

of cuckoo eggs. The length of each egg was measured to

the nearest mm. The results are shown in the following

histogram:

were measured

altogether in the

experiment?

A.25

B.40

C.90

D.100

of cuckoo eggs. The length of each egg was measured to

the nearest mm. The results are shown in the following

histogram:

were less than 23

mm in length?

A.26

B.40

C.66

D.92

distribution?

The list of IQ scores are: 129, 150, 124, 154, 127, 141,

118, 130, 149, 133, 142, 138, 128, 136, 130, 123, 125.

Probability distribution

A probability distribution assigns a

probability to each measurable subset of the

possible outcomes of a random experiment,

survey, or procedure of statistical inference.

Normal distribution

Skewed

distributi

on

Bimodal

Distribution

Characteristics of normal

distribution

Mean

Standard deviation

It is a function that describes the relative likelihood

for this random variable to take on a given value.

normal curve

is

the standard normal curve

is ..

between -1.96 and 1.96 standard

deviations from the mean

% of normal density

lies within 1.96 standard

deviations from the mean

% of normal density

lies within 2.58 standard

deviations from the mean

between -2.58 and 2.58 standard

deviations from the mean

Population distribution

Population mean

Population SD

Group work

Sample distribution

Sample mean

Sample standard deviation

Sample size

What is this

distribution?

distribution for a population. The mean is 60 and the

standard deviation is 18.

sample of 500 values from

this population,

(2) a distribution of 500

sample means from

random samples of each

size 18,

(3) a distribution of 500

sample means from

random samples of each

size 81.

Population

mean

Population standard deviation

Sample mean

Sample standard deviation SD

Sample size N

Given

a certain conditions, the central limit

mean and standard deviation , the

sampling distribution of the mean approaches

a normal distribution with a mean () and a

standard deviation of /, as the sample size,

increases, regardless of the underlying

distribution.

Given

a certain conditions, the central limit

mean and standard deviation , the

sampling distribution of the mean approaches

a normal distribution with a mean () and a

standard deviation of /, as the sample size

increases, regardless of the underlying

distribution.

Sampling distribution

Sampling

mean

Sampling distribution standard deviation= /

Standard deviations

SD

SE

Means

You are imprisoned with a monster in room.

The room is half-ball in shape.

The monster can see, but you are blind.

The monster is fixed in center of the room, so he can not run after you.

But, you can go wherever you like.

The monster carries a wooden stick, which is 1.96 m length.

Despite that the radius of the room is much more than 1.96, however, the

probability that you are out of the field of his wooden stick is less than 5%

as the height of the room get much smaller at the periphery.

Now, if you have a 1.96 m wooden stick, how much you will be confident

that your stick will hit the monster???

chocolate chips in 20 cookies for a class

activity. They found that the cookies on

average had 15 chocolate chips with a

standard deviation of 5 chocolate chips. Can

you determine the true mean of the number

of chocolate chips?

Confidence interval

A confidence interval is a range of values

that describes the uncertainty surrounding

an estimate.

Confidence level

The confidence level tells you how sure

you can be. It is expressed as a percentage

and represents how often the true

percentage of the population who would

pick an answer lies within the confidence

interval.

Confidence level

The 95% confidence level means you can be

95% certain; the 99% confidence level

means you can be 99% certain.

Most researchers use the 95% confidence

level.

The upper and lower values of a confidence

interval, that is, the values defining the

range of a confidence interval

Suppose we've taken a random sample of 10

ice-cream cones, and determined that a

95% confidence interval for the mean caloric

contents of a single scoop of ice-cream is

(260,310)

are correct?

1. If we repeatedly took samples of size 10 and then formed

confidence intervals, we would expect 95% of them to

contain the true (but unknown) mean, confidence interval

(260,310).

2. We are 95% confident that the true mean caloric content

lies between 260 and 310.

3. There is a 95% probability that the true mean caloric

content lies between 260 and 310.

4. We are 95% confident that the sample mean caloric

content lies between 260 and 310.

informal

Important conditions to help ensure the sampling

distribution is nearly normal and the estimate of

SE sufficiently accurate:

The sample observations are independent.

The sample size is large: n > 30 is a good rule

of thumb.

The distribution of sample observations is not

strongly skewed.

Independence

In probability theory, to say that two events

are independent means that the

occurrence of one does not affect the

probability of the other.

gave misleading evidence at the cot deaths trial

of Sally Clark

Casino

An empty bag containing five marbles, three

are orange and two are green.

It is 3.5 pounds to play.

If you pick consecutive green marbles,

without replacement, you will win ten

pound.

Would you play?

Applications

The estimated prevalence of sudden infant

death syndrome (SIDS) is 1 out of 8543.

A mother had her two babies died because

of SIDS.

You were called for expert testimony for this

trial.

What is your opinion?

sudden infant death syndrome of out of , Dr

.Meadow testified that that the probability

of a mother having two children with SIDS

was

The mother on trial was convicted of

murder

British paediatrician, who claimed that in a

single familyone sudden infant death is a

tragedy, two is suspicious and three is

murder, until proved otherwise.

Gamblers fallacy

Lets recap

Differentiate randomization from random sampling

Draw a probability distribution

Compute the middle 95% and 99% under normal distribution

Explain central limit theorem

Apply central limit theorem

Determine limitations for central limit theorem

Explain what are confidence intervals, confidence levels,

confidence limits

