Está en la página 1de 101

# Central Limit Theorem

Lecturer of Biomedical Informatics and
Medical Statistics

ILOs
1.
2.
3.
4.
5.
6.
7.
8.

## Define statistical inferences

Differentiate randomization from random sampling
Draw a probability distribution
Compute the middle 95% and 99% under normal distribution
Explain central limit theorem
Apply central limit theorem
Determine limitations for central limit theorem
Explain what are confidence intervals, confidence levels,
confidence limits

## What can I do after I

learn Central limit
theorem?

## Students are asked to count the number of

chocolate chips in 20 cookies for a class
activity. They found that the cookies on
average had 15 chocolate chips with a
standard deviation of 5 chocolate chips. Can
you determine the true mean of the number
of chocolate chips?

Statistical Inference

Infer

to guess that something is true because of the information that you have
e.g., I inferred from the number of cups that he was expecting visitors.

Statistical inference
Settings where one wants to infer facts
about a population using noisy statistical
data where uncertainty must be accounted
for.

## Estimator and Estimand

The sample mean will estimate the
population mean
The sample median will estimate the
population median
The sample standard deviation will
estimate the population standard deviation

Motivating Example
In every major election, pollsters would like to
know, ahead of the actual election, who's
going to win.
What is the target of estimation?
What can not we do?
What can we do?

The goal of statistical inference is to:
a. Infer facts about a population from a
sample
b. Infer facts about a sample from a
population
c. Calculate sample quantities to
d. To teach researcher about statistical

## What is the difference between

randomization, random allocation,
random sampling and random
variable?

Random sampling
It is concerned with obtaining data that is
representative of the population of interest

## The Literary Digest Poll

The Literary Digest polled about 10 million Americans, and
got responses from about 2.4 million. The poll showed that
Landon would likely be the overwhelming winner and FDR
would get only 43% of the votes.
Election result: FDR won, with 62% of the votes.
The magazine was completely discredited because of the
poll, and was soon discontinued.

## Literary Digest predicts:

Roosevelt 43% Landon 57%
actual results:
Roosevelt 62% Landon 38%
They took a huge sample (2.4 million
people)

## At the same time, Gallup predicted a

Roosevelt victory on the basis of a much
smaller sample (50,000)

## Sampling method for

Literary Digest Poll using
telephone directories
magazine subscriber lists
club and association rosters, etc

## They mailed out 10 million

ballots

Selection bias
Sample lists tended to represent middleand upper-class voters: in those days, many
poor did not
have telephones
they didnt subscribe to magazines
they didnt belong to clubs

## Non response bias

10 million ballots were sent out
2.4 million were returned
response rate of 2.4/10 = 24%
Fact: those who respond to surveys tend to
be:
better educated
in higher economic brackets
in 1936, above = Republican!

## Back to the soup analogy: If the soup is not

well stirred, it doesnt matter how large a
spoon you have, it will still not taste right.

## If the soup is well stirred, a small spoon will

suffice to test the soup.

Randomization
It is concerned with balancing unobserved
variables that may confound inferences of
interest

## A school district is considering whether it

will no longer allow high school students
to park at school after two recent
accidents where students were severely
injured. As a first step, they survey
parents by mail, asking them whether or
not the parents would object to this policy
change. Of 6,000 surveys that go out,
1,200 are returned. Of these 1,200 surveys
that were completed, 960 agreed with the
policy change and 240 disagreed.

## Which of the following statements

are true?
I. Some of the mailings may have never reached
the parents.
II. The school district has strong support from
parents to move forward with the policy approval.
III. It is possible that majority of the parents of high
school students disagree with the policy change.
IV. The survey results are unlikely to be biased
because all parents were mailed a survey.

## The goal of randomization of a

treatment in a randomized trial is
to:

## 1. It does not really do anything

2. To obtain a representative sample of
subjects from the population of interest
3. Balance unobserved covariates that may
contaminate the comparison between
the treated and control groups
4. To add variation to our conclusion

Distribution

Frequency distribution

Frequency
It is how often something
occur

Frequency distribution
By counting frequencies we can make a
Frequency Distribution table.

How to display a
frequency distribution?
Bar char or Histogram

Bar Graph
A Bar Graph (also called Bar Chart) is a
graphical display of data using bars of
different heights.

## The bar graph shows the favorite colors of 20 students in

a class.
How many more of them favored orange?

A.2
B.3
C.4
D.5

## The bar graph shows the favorite colors of 20 students in

a class.
How many more of them favored green?

A.2
B.1
C.4
D.5

## The bar graph shows the favorite colors of 20 students in

a class.
How many more of them favored orange than those who
favored green?

A.2
B.3
C.4
D.5

Histogram
A Histogram is a graphical
display of data using bars of
different heights.

## A class carried out an experiment to measure the lengths

of cuckoo eggs. The length of each egg was measured to
the nearest mm. The results are shown in the following
histogram:

## How many eggs

were measured
altogether in the
experiment?

A.25
B.40
C.90
D.100

## A class carried out an experiment to measure the lengths

of cuckoo eggs. The length of each egg was measured to
the nearest mm. The results are shown in the following
histogram:

## How many eggs

were less than 23
mm in length?

A.26
B.40
C.66
D.92

## How to draw a frequency

distribution?

The list of IQ scores are: 129, 150, 124, 154, 127, 141,
118, 130, 149, 133, 142, 138, 128, 136, 130, 123, 125.

Probability distribution
A probability distribution assigns a
probability to each measurable subset of the
possible outcomes of a random experiment,
survey, or procedure of statistical inference.

Normal distribution

Skewed
distributi
on

Bimodal
Distribution

Characteristics of normal
distribution

## Standard normal curve

Mean

Standard deviation

PDF

PDF

## PDF of a random variable

It is a function that describes the relative likelihood
for this random variable to take on a given value.

normal curve
is

## The standard deviation of

the standard normal curve
is ..

## % of normal density lies

between -1.96 and 1.96 standard
deviations from the mean

% of normal density
lies within 1.96 standard
deviations from the mean

% of normal density
lies within 2.58 standard
deviations from the mean

## % of normal density lies

between -2.58 and 2.58 standard
deviations from the mean

Population distribution
Population mean
Population SD

Group work

Sample distribution
Sample mean
Sample standard deviation
Sample size

What is this
distribution?

## Four plots are presented below. The plot at the top is a

distribution for a population. The mean is 60 and the
standard deviation is 18.

## (1) a single random

sample of 500 values from
this population,
(2) a distribution of 500
sample means from
random samples of each
size 18,
(3) a distribution of 500
sample means from
random samples of each
size 81.

## Central limit theorem

Population

mean
Population standard deviation
Sample mean
Sample standard deviation SD
Sample size N

## Central limit theorem

Given
a certain conditions, the central limit

## theorem states that given a distribution with a

mean and standard deviation , the
sampling distribution of the mean approaches
a normal distribution with a mean () and a
standard deviation of /, as the sample size,
increases, regardless of the underlying
distribution.

## Central limit theorem

Given
a certain conditions, the central limit

## theorem states that given a distribution with a

mean and standard deviation , the
sampling distribution of the mean approaches
a normal distribution with a mean () and a
standard deviation of /, as the sample size
increases, regardless of the underlying
distribution.

Sampling distribution
Sampling

## distribution mean = population

mean
Sampling distribution standard deviation= /

## Lets know their names

Standard deviations

SD
SE

Means

## Lets have fun

You are imprisoned with a monster in room.
The room is half-ball in shape.
The monster can see, but you are blind.
The monster is fixed in center of the room, so he can not run after you.
But, you can go wherever you like.
The monster carries a wooden stick, which is 1.96 m length.
Despite that the radius of the room is much more than 1.96, however, the
probability that you are out of the field of his wooden stick is less than 5%
as the height of the room get much smaller at the periphery.
Now, if you have a 1.96 m wooden stick, how much you will be confident
that your stick will hit the monster???

## Students are asked to count the number of

chocolate chips in 20 cookies for a class
activity. They found that the cookies on
average had 15 chocolate chips with a
standard deviation of 5 chocolate chips. Can
you determine the true mean of the number
of chocolate chips?

Confidence interval
A confidence interval is a range of values
that describes the uncertainty surrounding
an estimate.

Confidence level
The confidence level tells you how sure
you can be. It is expressed as a percentage
and represents how often the true
percentage of the population who would
pick an answer lies within the confidence
interval.

Confidence level
The 95% confidence level means you can be
95% certain; the 99% confidence level
means you can be 99% certain.
Most researchers use the 95% confidence
level.

## The confidence limits

The upper and lower values of a confidence
interval, that is, the values defining the
range of a confidence interval

## Interpretation of confidence interval

Suppose we've taken a random sample of 10
ice-cream cones, and determined that a
95% confidence interval for the mean caloric
contents of a single scoop of ice-cream is
(260,310)

## Which of the following statements

are correct?
1. If we repeatedly took samples of size 10 and then formed
confidence intervals, we would expect 95% of them to
contain the true (but unknown) mean, confidence interval
(260,310).
2. We are 95% confident that the true mean caloric content
lies between 260 and 310.
3. There is a 95% probability that the true mean caloric
content lies between 260 and 310.
4. We are 95% confident that the sample mean caloric
content lies between 260 and 310.

## Conditions for central limit theorem,

informal
Important conditions to help ensure the sampling
distribution is nearly normal and the estimate of
SE sufficiently accurate:
The sample observations are independent.
The sample size is large: n > 30 is a good rule
of thumb.
The distribution of sample observations is not
strongly skewed.

Independence
In probability theory, to say that two events
are independent means that the
occurrence of one does not affect the
probability of the other.

## PROFESSOR Sir Roy Meadow the medic who

gave misleading evidence at the cot deaths trial
of Sally Clark

Casino
An empty bag containing five marbles, three
are orange and two are green.
It is 3.5 pounds to play.
If you pick consecutive green marbles,
without replacement, you will win ten
pound.
Would you play?

Applications
The estimated prevalence of sudden infant
death syndrome (SIDS) is 1 out of 8543.
A mother had her two babies died because
of SIDS.
You were called for expert testimony for this
trial.

## Based on an estimated prevalence of

sudden infant death syndrome of out of , Dr
.Meadow testified that that the probability
of a mother having two children with SIDS
was
The mother on trial was convicted of
murder

## Sir Samuel Roy Meadow (born 1933) is a

British paediatrician, who claimed that in a
single familyone sudden infant death is a
tragedy, two is suspicious and three is
murder, until proved otherwise.

Gamblers fallacy

Lets recap
1.
2.
3.
4.
5.
6.
7.
8.

## Define statistical inferences

Differentiate randomization from random sampling
Draw a probability distribution
Compute the middle 95% and 99% under normal distribution
Explain central limit theorem
Apply central limit theorem
Determine limitations for central limit theorem
Explain what are confidence intervals, confidence levels,
confidence limits

Q and A

Thank You