Está en la página 1de 101

Central Limit Theorem

Ghada Ahmed
Lecturer of Biomedical Informatics and
Medical Statistics

ILOs
1.
2.
3.
4.
5.
6.
7.
8.

Define statistical inferences


Differentiate randomization from random sampling
Draw a probability distribution
Compute the middle 95% and 99% under normal distribution
Explain central limit theorem
Apply central limit theorem
Determine limitations for central limit theorem
Explain what are confidence intervals, confidence levels,
confidence limits

What can I do after I


learn Central limit
theorem?

Students are asked to count the number of


chocolate chips in 20 cookies for a class
activity. They found that the cookies on
average had 15 chocolate chips with a
standard deviation of 5 chocolate chips. Can
you determine the true mean of the number
of chocolate chips?

Statistical Inference

Infer

to guess that something is true because of the information that you have
e.g., I inferred from the number of cups that he was expecting visitors.

Statistical inference
Settings where one wants to infer facts
about a population using noisy statistical
data where uncertainty must be accounted
for.

Estimator and Estimand


The sample mean will estimate the
population mean
The sample median will estimate the
population median
The sample standard deviation will
estimate the population standard deviation

Motivating Example
In every major election, pollsters would like to
know, ahead of the actual election, who's
going to win.
What is the target of estimation?
What can not we do?
What can we do?

Choose the correct answer:


The goal of statistical inference is to:
a. Infer facts about a population from a
sample
b. Infer facts about a sample from a
population
c. Calculate sample quantities to
understand your data
d. To teach researcher about statistical

What is the difference between


randomization, random allocation,
random sampling and random
variable?

Random sampling
It is concerned with obtaining data that is
representative of the population of interest

The Literary Digest Poll


The Literary Digest polled about 10 million Americans, and
got responses from about 2.4 million. The poll showed that
Landon would likely be the overwhelming winner and FDR
would get only 43% of the votes.
Election result: FDR won, with 62% of the votes.
The magazine was completely discredited because of the
poll, and was soon discontinued.

OpenIntro Statistics, 2nd Edition Chp 1: Intro. to data 21 / 94

1963 Presidential election

Literary Digest predicts:


Roosevelt 43% Landon 57%
actual results:
Roosevelt 62% Landon 38%
They took a huge sample (2.4 million
people)

At the same time, Gallup predicted a


Roosevelt victory on the basis of a much
smaller sample (50,000)

Sampling method for


Literary Digest Poll using
telephone directories
magazine subscriber lists
club and association rosters, etc

They mailed out 10 million


ballots

Selection bias
Sample lists tended to represent middleand upper-class voters: in those days, many
poor did not
have telephones
they didnt subscribe to magazines
they didnt belong to clubs

Non response bias


10 million ballots were sent out
2.4 million were returned
response rate of 2.4/10 = 24%
Fact: those who respond to surveys tend to
be:
better educated
in higher economic brackets
in 1936, above = Republican!

Back to the soup analogy: If the soup is not


well stirred, it doesnt matter how large a
spoon you have, it will still not taste right.

If the soup is well stirred, a small spoon will


suffice to test the soup.

Randomization
It is concerned with balancing unobserved
variables that may confound inferences of
interest

A school district is considering whether it


will no longer allow high school students
to park at school after two recent
accidents where students were severely
injured. As a first step, they survey
parents by mail, asking them whether or
not the parents would object to this policy
change. Of 6,000 surveys that go out,
1,200 are returned. Of these 1,200 surveys
that were completed, 960 agreed with the
policy change and 240 disagreed.

Which of the following statements


are true?
I. Some of the mailings may have never reached
the parents.
II. The school district has strong support from
parents to move forward with the policy approval.
III. It is possible that majority of the parents of high
school students disagree with the policy change.
IV. The survey results are unlikely to be biased
because all parents were mailed a survey.

The goal of randomization of a


treatment in a randomized trial is
to:

1. It does not really do anything


2. To obtain a representative sample of
subjects from the population of interest
3. Balance unobserved covariates that may
contaminate the comparison between
the treated and control groups
4. To add variation to our conclusion

Distribution

Frequency distribution

Frequency
It is how often something
occur

Frequency distribution
By counting frequencies we can make a
Frequency Distribution table.

How to display a
frequency distribution?
Bar char or Histogram

Bar Graph
A Bar Graph (also called Bar Chart) is a
graphical display of data using bars of
different heights.

The bar graph shows the favorite colors of 20 students in


a class.
How many more of them favored orange?

A.2
B.3
C.4
D.5

The bar graph shows the favorite colors of 20 students in


a class.
How many more of them favored green?

A.2
B.1
C.4
D.5

The bar graph shows the favorite colors of 20 students in


a class.
How many more of them favored orange than those who
favored green?

A.2
B.3
C.4
D.5

Histogram
A Histogram is a graphical
display of data using bars of
different heights.

A class carried out an experiment to measure the lengths


of cuckoo eggs. The length of each egg was measured to
the nearest mm. The results are shown in the following
histogram:

How many eggs


were measured
altogether in the
experiment?

A.25
B.40
C.90
D.100

A class carried out an experiment to measure the lengths


of cuckoo eggs. The length of each egg was measured to
the nearest mm. The results are shown in the following
histogram:

How many eggs


were less than 23
mm in length?

A.26
B.40
C.66
D.92

Histograms vs Bar Charts

How to draw a frequency


distribution?

The list of IQ scores are: 129, 150, 124, 154, 127, 141,
118, 130, 149, 133, 142, 138, 128, 136, 130, 123, 125.

Probability distribution
A probability distribution assigns a
probability to each measurable subset of the
possible outcomes of a random experiment,
survey, or procedure of statistical inference.

Normal distribution

Skewed
distributi
on

Bimodal
Distribution

Characteristics of normal
distribution

Standard normal curve

Mean

Standard deviation

PDF

PDF

PDF of a random variable


It is a function that describes the relative likelihood
for this random variable to take on a given value.

The mean of standard


normal curve
is

The standard deviation of


the standard normal curve
is ..

% of normal density lies


between -1.96 and 1.96 standard
deviations from the mean

% of normal density
lies within 1.96 standard
deviations from the mean

% of normal density
lies within 2.58 standard
deviations from the mean

% of normal density lies


between -2.58 and 2.58 standard
deviations from the mean

Population distribution
Population mean
Population SD

Group work

Sample distribution
Sample mean
Sample standard deviation
Sample size

What is this
distribution?

Four plots are presented below. The plot at the top is a


distribution for a population. The mean is 60 and the
standard deviation is 18.

(1) a single random


sample of 500 values from
this population,
(2) a distribution of 500
sample means from
random samples of each
size 18,
(3) a distribution of 500
sample means from
random samples of each
size 81.

Central limit theorem


Population

mean
Population standard deviation
Sample mean
Sample standard deviation SD
Sample size N

Central limit theorem


Given
a certain conditions, the central limit

theorem states that given a distribution with a


mean and standard deviation , the
sampling distribution of the mean approaches
a normal distribution with a mean () and a
standard deviation of /, as the sample size,
increases, regardless of the underlying
distribution.

Central limit theorem


Given
a certain conditions, the central limit

theorem states that given a distribution with a


mean and standard deviation , the
sampling distribution of the mean approaches
a normal distribution with a mean () and a
standard deviation of /, as the sample size
increases, regardless of the underlying
distribution.

Sampling distribution
Sampling

distribution mean = population


mean
Sampling distribution standard deviation= /

Lets know their names


Standard deviations

SD
SE

Means

Lets have fun


You are imprisoned with a monster in room.
The room is half-ball in shape.
The monster can see, but you are blind.
The monster is fixed in center of the room, so he can not run after you.
But, you can go wherever you like.
The monster carries a wooden stick, which is 1.96 m length.
Despite that the radius of the room is much more than 1.96, however, the
probability that you are out of the field of his wooden stick is less than 5%
as the height of the room get much smaller at the periphery.
Now, if you have a 1.96 m wooden stick, how much you will be confident
that your stick will hit the monster???

Students are asked to count the number of


chocolate chips in 20 cookies for a class
activity. They found that the cookies on
average had 15 chocolate chips with a
standard deviation of 5 chocolate chips. Can
you determine the true mean of the number
of chocolate chips?

Confidence interval
A confidence interval is a range of values
that describes the uncertainty surrounding
an estimate.

Confidence level
The confidence level tells you how sure
you can be. It is expressed as a percentage
and represents how often the true
percentage of the population who would
pick an answer lies within the confidence
interval.

Confidence level
The 95% confidence level means you can be
95% certain; the 99% confidence level
means you can be 99% certain.
Most researchers use the 95% confidence
level.

The confidence limits


The upper and lower values of a confidence
interval, that is, the values defining the
range of a confidence interval

Interpretation of confidence interval


Suppose we've taken a random sample of 10
ice-cream cones, and determined that a
95% confidence interval for the mean caloric
contents of a single scoop of ice-cream is
(260,310)

Which of the following statements


are correct?
1. If we repeatedly took samples of size 10 and then formed
confidence intervals, we would expect 95% of them to
contain the true (but unknown) mean, confidence interval
(260,310).
2. We are 95% confident that the true mean caloric content
lies between 260 and 310.
3. There is a 95% probability that the true mean caloric
content lies between 260 and 310.
4. We are 95% confident that the sample mean caloric
content lies between 260 and 310.

Conditions for central limit theorem,


informal
Important conditions to help ensure the sampling
distribution is nearly normal and the estimate of
SE sufficiently accurate:
The sample observations are independent.
The sample size is large: n > 30 is a good rule
of thumb.
The distribution of sample observations is not
strongly skewed.

Independence
In probability theory, to say that two events
are independent means that the
occurrence of one does not affect the
probability of the other.

PROFESSOR Sir Roy Meadow the medic who


gave misleading evidence at the cot deaths trial
of Sally Clark

Casino
An empty bag containing five marbles, three
are orange and two are green.
It is 3.5 pounds to play.
If you pick consecutive green marbles,
without replacement, you will win ten
pound.
Would you play?

Applications
The estimated prevalence of sudden infant
death syndrome (SIDS) is 1 out of 8543.
A mother had her two babies died because
of SIDS.
You were called for expert testimony for this
trial.
What is your opinion?

Based on an estimated prevalence of


sudden infant death syndrome of out of , Dr
.Meadow testified that that the probability
of a mother having two children with SIDS
was
The mother on trial was convicted of
murder

Sir Samuel Roy Meadow (born 1933) is a


British paediatrician, who claimed that in a
single familyone sudden infant death is a
tragedy, two is suspicious and three is
murder, until proved otherwise.

Gamblers fallacy

Making right decisions

Lets recap
1.
2.
3.
4.
5.
6.
7.
8.

Define statistical inferences


Differentiate randomization from random sampling
Draw a probability distribution
Compute the middle 95% and 99% under normal distribution
Explain central limit theorem
Apply central limit theorem
Determine limitations for central limit theorem
Explain what are confidence intervals, confidence levels,
confidence limits

Q and A

Thank You