Está en la página 1de 44

Probability Distribution

Probability Distribution
A variable is a symbol (A, B, x, y, etc.) that can take on any of a
specified set of values
When the value of a variable is the outcome of a statistical
experiment, that variable is a random variable.

Generally, statisticians use a capital letter to represent a random


variable and a lower-case letter, to represent one of its values. For
example,

X represents the random variable X.


P(X) represents the probability of X.
P(X = x) refers to the probability that the random variable X is
equal to a particular value, denoted by x. As an example, P(X = 1)
refers to the probability that the random variable X is equal to 1.
Probability Distribution
An example will make clear the relationship between random variables
and probability distributions. Suppose you flip a coin two times. This
simple statistical experiment can have four possible outcomes: HH, HT,
TH, and TT. Now, let the variable X represent the number of Heads that
result from this experiment. The variable X can take on the values 0, 1,
or 2. In this example, X is a random variable; because its value is
determined by the outcome of a statistical experiment.
A probability distribution is a table or an equation that links each
outcome of a statistical experiment with its probability of occurrence.
Consider the coin flip experiment described above. The table which
associates each outcome with its probability, is an example of a
probability distribution. It represents the probability distribution of
random variable x.

Discrete & Continuous Probability Distribution


If a variable can take on any value between two specified values, it is
called a continuous variable; otherwise, it is called a discrete variable.
Some examples will clarify the difference between discrete and
continuous variables.
Suppose the fire department mandates that all fire fighters must weigh
between 150 and 250 pounds. The weight of a fire fighter would be an
example of a continuous variable; since a fire fighter's weight could
take on any value between 150 and 250 pounds.
Suppose we flip a coin and count the number of heads. The number of
heads could be any integer value between 0 and plus infinity. However,
it could not be any number between 0 and plus infinity. We could not,
for example, get 2.5 heads. Therefore, the number of heads must be a
discrete variable.
Just like variables, Probability Distributions can be classified as discrete or
continuous.
Discrete Probability Distribution
If a random variable is a discrete variable, its probability distribution is
called a discrete probability distribution.

An example will make this clear. Suppose you flip a coin two times. This
simple statistical experiment can have four possible outcomes: HH, HT,
TH, and TT. Now, let the random variable X represent the number of
Heads that result from this experiment. The random variable X can only
take on the values 0, 1, or 2, so it is a discrete random variable. The
probability distribution for this statistical experiment appears below.
table represents a discrete probability distribution because it relates each
value of a discrete random variable with its probability of occurrence
Discrete Probability Distribution
Most well-known discrete probability distributions that are used for
statistical modeling are

Binomial Probability Distribution


Hypergeometric Probability Distribution
Poisson Probability Distribution

Note: With a discrete probability distribution, each possible value of the


discrete random variable can be associated with a non-zero probability.
Thus, a discrete probability distribution can always be presented in tabular
form.
Binomial Probability Distribution
Binomial Experiment
A binomial experiment (also known as a Bernoulli trial) is a statistical
experiment that has the following properties:
The experiment consists of n repeated trials.
Each trial can result in just two possible outcomes. We call one of these outcomes
a success and the other, a failure.
The probability of success, denoted by P, is the same on every trial.
The trials are independent; that is, the outcome on one trial does not affect the
outcome on other trials.
Consider the following statistical experiment. You flip a coin 2 times and
count the number of times the coin lands on heads. This is a binomial
experiment because:
The experiment consists of repeated trials. We flip a coin 2 times.
Each trial can result in just two possible outcomes - heads or tails.
The probability of success is constant - 0.5 on every trial.
The trials are independent; that is, getting heads on one trial does not affect
whether we get heads on other trials.
Binomial Probability Distribution
Binomial Distribution
Possible Binomial Distribution Settings:
A manufacturing plant labels items as either defective or acceptable
A firm bidding for contracts will either get a contract or not
A marketing research firm receives survey responses of yes I will buy
or no I will not
New job applicants either accept the offer or reject it
Binomial Probability Distribution
Notation
The following notation is helpful, when we talk about binomial
probability.
x: The number of successes that result from the binomial
experiment.
n: The number of trials in the binomial experiment.
P: The probability of success on an individual trial.
Q: The probability of failure on an individual trial. (This is equal
to 1 - P)
b(x; n, P): Binomial probability - the probability that an n-trial
binomial experiment results in exactly x successes, when the
probability of success on an individual trial is P.
nCr : The number of combinations of n things, taken r at a time.
Binomial Probability Distribution
Binomial Distribution
A binomial random variable is the number of successes x in n
repeated trials of a binomial experiment. The Probability Distribution of
a binomial random variable is called a binomial distribution (also
known as a Bernoulli distribution).
Suppose we flip a coin two times and count the number of heads
(successes). The binomial random variable is the number of heads,
which can take on values of 0, 1, or 2. The binomial distribution is
presented below.

The binomial distribution has the following properties:


The mean of the distribution (x) is equal to n * P .
The variance (2x) is n * P * ( 1 - P ).
The Standard Deviation (x) is sqrt[ n * P * ( 1 - P ) ].
Binomial Probability Distribution
Binomial Probability
refers to the probability that a binomial experiment results in exactly x
successes. For example, in the above table, we see that the binomial
probability of getting exactly one head in two coin flips is 0.50.
Suppose we flip a coin two times and count the number of heads
(successes). The binomial random variable is the number of heads, which
can take on values of 0, 1, or 2. The binomial distribution is presented on
the previous slide.
Binomial Probability Distribution
Example 1:
Suppose a die is tossed 5 times. What is the probability of getting
exactly 2 fours?

Solution:
This is a binomial experiment in which the number of trials is equal to
5, the number of successes is equal to 2, and the probability of success
on a single trial is 1/6 or about 0.167. Therefore, the binomial
probability is:
b(2; 5, 0.167) = 5C2 * (0.167)2 * (0.833)3
b(2; 5, 0.167) = 0.161
Binomial Probability Distribution
Example 2:
Evans is concerned about a low retention rate for employees. In recent
years, management has seen a turnover of 10% of the hourly employees
annually. Thus, for any hourly employee chosen at random, management
estimates a probability of 0.1 that the person will not be with the company
next year. Choosing 3 hourly employees at random, what is the
probability that 1 of them will leave the company this year?
Solution:
This is a binomial experiment in which the number of trials is equal to
3, the number of successes is equal to 1, and the probability of success
on a single trial is 10% or 0.10. Therefore, the binomial probability is:
b(1; 3, 0.10) = 3C1 * (0.10)1 * (0.9)2
B(1; 3, 0.10) = 0.243
Binomial Probability Distribution
Example 3:
You are performing a cohort study. If the probability of developing
disease in the exposed group is .05 for the study duration, then if you
sample (randomly) 500 exposed people, how many do you expect to
develop the disease? Give a margin of error (+/- 1 standard deviation)
for your estimate.
Solution:
B(x; 500, 0.05)
E(x) or x is equal to n * P = 500 * 0.05 = 25
Var(x) or 2x is n * P * ( 1 - P ) = 500 * 0.05 * 0.95 = 23.75
StdDev(x) or x is sqrt[ n * P * ( 1 - P ) ] = sqrt[23.75] = 4.87

Therefore: 25 4.87
Binomial Probability Distribution
Cumulative Binomial Probability
A cumulative binomial probability refers to the probability that the
binomial random variable falls within a specified range (e.g., is
greater than or equal to a stated lower limit and less than or equal to a
stated upper limit).
For example, we might be interested in the cumulative binomial
probability of obtaining 45 or fewer heads in 100 tosses of a coin (see
Example 1 below). This would be the sum of all these individual
binomial probabilities.

b(x < 45; 100, 0.5) = b(x = 0; 100, 0.5) + b(x = 1; 100, 0.5)
+ ... + b(x = 44; 100, 0.5) + b(x = 45; 100, 0.5)
Binomial Probability Distribution
Example 4:
The probability that a student is accepted to a prestigious college is
0.3. If 5 students from the same school apply, what is the probability
that at most 2 are accepted?

Solution:
To solve this problem, we compute 3 individual probabilities, using
the binomial formula. The sum of all these probabilities is the answer
we seek. Thus,
b(x < 2; 5, 0.3) = b(x = 0; 5, 0.3) + b(x = 1; 5, 0.3) + b(x = 2; 5, 0.3)
b(x < 2; 5, 0.3) = 0.1681 + 0.3601 + 0.3087
b(x < 2; 5, 0.3) = 0.8369
Binomial Probability Distribution
Example 5:
If the probability of being a smoker among a group of cases with lung
cancer is .6, whats the probability that in a group of 8 cases you have
less than 2 smokers? More than 5? What are the expected value and
variance of the number of smokers?
Solution:
1
11
121
1331
14641
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
Binomial Probability Distribution
Example 5:
If the probability of being a smoker among a group of cases with lung
cancer is .6, whats the probability that in a group of 8 cases you have
less than 2 smokers? More than 5? What are the expected value and
variance of the number of smokers?
Solution:
P(>5)=.21+.09+.0168 = .3168
P(<2)=.00065 + .008 = .00865

0 1 2 3 4 5 6 7 8
E(X) = 8 (.6) = 4.8
Var(X) = 8 (.6) (.4) =1.92
StdDev(X) = 1.38
Binomial Probability Distribution
Using Tables of Binomial Probabilities
Hypergeometric Probability Distribution
Hypergeometric Experiment
A hypergeometric experiment is a statistical experiment that has the
following properties:

n trials in a sample taken from a finite population of size N

Sample taken without replacement

Outcomes of trials are dependent

Concerned with finding the probability of X successes in the sample


where there are A successes in the population

Consider the following statistical experiment. You have a surn of 10


marbles - 5 red and 5 green. You randomly select 2 marbles without
replacement and count the number of red marbles you have selected.
This would be a hypergeometric experiment.
Hypergeometric Probability Distribution
Hypergeometric Experiment
If you select a red marble on the first trial, the probability of selecting a
red marble on the second trial is 4/9. And if you select a green marble on
the first trial, the probability of selecting a red marble on the second trial
is 5/9.
Note further that if you selected the marbles with replacement, the
probability of success would not change. It would be 5/10 on every trial.
Then, this would be a binomial experiment.
Hypergeometric Probability Distribution
Notation
The following notation is helpful, when we talk about hypergeometric
probability.
N: The number of items in the population
k: The number of items in the population that are classified as
successes.
n: The number of items in the sample
x: The number of items in the sample that are classified as successes.
kCx: The number of k things, taken x at a time.
h(x; N, n, k) : hypergeometric probability - the probability that an n-
trial hypergeometric experiment results in exactly x successes, when
the population consists of N items, k of which are classified as
successes.
Hypergeometric Probability Distribution
Hypergeometric Distribution
A hypergeometric random variable is the number of successes that result
from a hypergeometric experiment. The probability distribution of a
hypergeometric random variable is called a hypergeometric distribution.
Given x, N, n, and k, we can compute the hypergeometric probability based
on the following formula:

The hypergeometric distribution has the following properties:


The mean of the distribution is equal to n * k / N .
The variance is n * k * ( N - k ) * ( N - n ) / [ N2 * ( N - 1 ) ] .
Hypergeometric Probability Distribution
Example 1:
Suppose we randomly select 5 cards without replacement from an ordinary
deck of playing cards. What is the probability of getting exactly 2 red cards
(i.e., hearts or diamonds)?

Solution:
This is a hypergeometric experiment in which we know the following:
N = 52; since there are 52 cards in a deck.
k = 26; since there are 26 red cards in a deck.
n = 5; since we randomly select 5 cards from the deck.
x = 2; since 2 of the cards we select are red.
Hypergeometric Probability Distribution
Example 1: continuation

Suppose we randomly select 5 cards without replacement from an ordinary


deck of playing cards. What is the probability of getting exactly 2 red cards
(i.e., hearts or diamonds)?

Solution:
We plug these values into the hypergeometric formula as follows:
h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]
h(2; 52, 5, 26) = [ 26C2 ] [ 26C3 ] / [ 52C5 ]
h(2; 52, 5, 26) = [ 325 ] [ 2600 ] / [ 2,598,960 ] = 0.32513

Thus, the probability of randomly selecting 2 red cards is 0.32513.


Hypergeometric Probability Distribution
Example 2:
3 different computers are checked from 10 in the department. 4 of
the 10 computers have illegal software loaded. What is the
probability that 2 of the 3 selected computers have illegal software
loaded?
Solution:
This is a hypergeometric experiment in which we know the following:
N = 10; since there are 10 computers.
k = 4; since there are 4computers with illegal software.
n = 3; since we randomly select 3 computers with illegal software.
x = 2; since 2 of the computers were selected.
Hypergeometric Probability Distribution
Example 2: continuation

3 different computers are checked from 10 in the department. 4 of


the 10 computers have illegal software loaded. What is the
probability that 2 of the 3 selected computers have illegal software
loaded?

Solution:
We plug these values into the hypergeometric formula as follows:
h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]
h(2; 10, 3, 4) = [ 4C2 ] [ 6C1 ] / [ 10C3 ]
h(2; 10, 3, 4) = [ 6 ] [ 6 ] / [ 120 ] = 0.3

The probability that 2 of the 3 selected computers have illegal


software loaded is 0.30, or 30%.
Hypergeometric Probability Distribution
Cumulative Hypergeometric Probability
A cumulative hypergeometric probability refers to the probability that the
hypergeometric random variable is greater than or equal to some specified
lower limit and less than or equal to some specified upper limit.

For example, suppose we randomly select five cards from an ordinary deck
of playing cards. We might be interested in the cumulative hypergeometric
probability of obtaining 2 or fewer hearts. This would be the probability of
obtaining 0 hearts plus the probability of obtaining 1 heart plus the
probability of obtaining 2 hearts, as shown in the example below.
Hypergeometric Probability Distribution
Example 3
Suppose we select 5 cards from an ordinary deck of playing cards.
What is the probability of obtaining 2 or fewer hearts?

Solution:
This is a hypergeometric experiment in which we know the following:

N = 52; since there are 52 cards in a deck.


k = 13; since there are 13 hearts in a deck.
n = 5; since we randomly select 5 cards from the deck.
x = 0 to 2; since our selection includes 0, 1, or 2 hearts.
Hypergeometric Probability Distribution
Example 3 continuation
Suppose we select 5 cards from an ordinary deck of playing cards. What is the
probability of obtaining 2 or fewer hearts?

Solution:
We plug these values into the hypergeometric formula as follows:
h(x < x; N, n, k) = h(x < 2; 52, 5, 13)
h(x < 2; 52, 5, 13) = h(x=0; 52, 5, 13) + h(x=1; 52, 5, 13) + h(x=2; 52, 5, 13)
h(x < 2; 52, 5, 13) = [ (13C0) (39C5) / (52C5) ] + [ (13C1) (39C4) / (52C5) ] +
[ (13C2) (39C3) / (52C5) ]
h(x < 2; 52, 5, 13) = [ (1)(575,757)/(2,598,960) ] + [ (13)(82,251)/(270,725) ]
+ [ (78)(9139)/(22,100) ]
h(x < 2; 52, 5, 13) = [ 0.2215 ] + [ 0.4114 ] + [ 0.2743 ]
h(x < 2; 52, 5, 13) = 0.9072
Thus, the probability of randomly selecting at most 2 hearts is 0.9072.
Poisson Probability Distribution
Poisson Experiment
A poisson experiment is a statistical experiment that has the following
properties:

The experiment results in outcomes that can be classified as successes or


failures.
The average number of successes () that occurs in a specified region is
known.
The probability that a success will occur is proportional to the size of the
region.
The probability that a success will occur in an extremely small region is
virtually zero.

Note that the specified region could take many forms. For instance, it could
be a length, an area, a volume, a period of time, etc.
Poisson Probability Distribution
Notation
The following notation is helpful, when we talk about binomial probability.

e: A constant equal to approximately 2.71828. (Actually, e is the base of


the natural logarithm system.)
: The mean number of successes that occur in a specified region.
x: The actual number of successes that occur in a specified region.
P(x; ): The Poisson probability that exactly x successes occur in a
Poisson experiment, when the mean number of successes is .
Poisson Probability Distribution
Poisson Distribution
A Poisson random variable is the number of successes that result from a
Poisson experiment. The Probability Distribution of a Poisson random
variable is called a Poisson distribution.
Given the mean number of successes () that occur in a specified region,
we can compute the Poisson probability based on the following formula:

The poisson distribution has the following properties:


The mean of the distribution is equal to which is equal to n*P.
The variance is also equal to .
Poisson Probability Distribution
Example 1
The average number of homes sold by the Acme Realty company is 2 homes
per day. What is the probability that exactly 3 homes will be sold tomorrow?

Solution:
This is a poisson experiment in which we know the following:
= 2; since 2 homes are sold per day, on average.
x = 3; since we want to find the likelihood that 3 homes will be sold
tomorrow.
e = 2.71828; since e is a constant equal to approximately 2.71828.
Poisson Probability Distribution
Example 1 continuation
The average number of homes sold by the Acme Realty company is 2 homes
per day. What is the probability that exactly 3 homes will be sold tomorrow?

Solution:
We plug these values into the poisson formula as follows:
P(x; ) = (e-) (x) / x!
P(3; 2) = (2.71828-2) (23) / 3!
P(3; 2) = (0.13534) (8) / 6
P(3; 2) = 0.180

Thus, the probability of selling 3 homes tomorrow is 0.180 .


Poisson Probability Distribution
Example 2
The number of typographical errors in new editions of textbooks
varies considerably from book to book. After some analysis he
concludes that the number of errors is Poisson distributed with a mean
of 1.5 typos per 100 pages. The instructor randomly selects 100 pages
of a new book. What is the probability that there are no typos?
Solution:
We plug these values into the poisson formula as follows:
P(x; ) = (e-) (x) / x!
P(0; 1.5) = (2.71828-1.5) (1.50) / 0!
P(0; 1.5) = 0.2231

There is about a 22% chance of finding zero errors


Poisson Probability Distribution
Example 3
Suppose that a rare disease has an incidence of 1 in 1000 person-
years. Assuming that members of the population are affected
independently, find the probability of k cases in a population of
10,000 (followed over 1 year) for k=0,1,2.
Solution:
The expected value (mean) = = N*P = 10,000 * 0.001 = 10
10 new cases expected in this population per year
P(x; ) = (e-) (x) / x!
P(0; 10) = (2.71828-10) (100) / 0! = 0.0000454
P(1; 10) = (2.71828-10) (101) / 1! = 0.000454
P(2; 10) = (2.71828-10) (102) / 2! = 0.00227
Poisson Probability Distribution
Poisson Distribution
Poisson Process (rates)

Note that the Poisson parameter can be given as the mean


number of events that occur in a defined time period OR,
equivalently, can be given as a rate, such as =2/month (2
events per 1 month) that must be multiplied by t=time (called a
Poisson Process)
X ~ Poisson ()

P(x=K, t) = (e- t) ( t)x / x!

E(X) = t
Var(X) = t
Poisson Probability Distribution
Example 4
If new cases of West Nile in New England are occurring at a rate
of about 2 per month, then whats the probability that exactly 4
cases will occur in the next 3 months? Exactly 6 cases?
Solution:
X ~ Poisson (=2/month)
X = 4 cases, twice in 3 months
P(x=K, t) = (e-t) ( t)x / x!
P(x=4, 2*3) = (e-2*3) (2*3)4 / 4! = 0.1339
X = 6 cases, twice in 3 months
P(x=K, t) = (e-t) ( t)x / x!
P(x=6, 2*3) = (e-2*3) (2*3) 6 / 6! = 0.1606
Poisson Probability Distribution
Example 5
If calls to your cell phone are a Poisson process with a constant rate =2 calls
per hour, whats the probability that, if you forget to turn your phone off in a
1.5 hour movie, your phone rings during that time? How many phone calls
do you expect to get during the movie?

Solution:
Probability that your phone rings in a 1.5hr movie
X ~ Poisson (=2 calls/hour)
P(X1)=1 P(X=0)
X = 0 calls, 2 calls /hr in 1.5hrs
P(x=K, t) = (e-t) ( t)x / x!
P(x=0, 2*1.5) = (e-2*1.5) (2*1.5)0 / 0! = 0.05%
Therefore: P(X1)=1 0.05 = 95% chance
Poisson Probability Distribution
Example 5
If calls to your cell phone are a Poisson process with a constant rate =2 calls
per hour, whats the probability that, if you forget to turn your phone off in a
1.5 hour movie, your phone rings during that time? How many phone calls
do you expect to get during the movie?

Solution:
Number of phone calls during a 1.5hr movie
X ~ Poisson (=2 calls/hour)
1.5hr movie

E(X) = t = 2(1.5) = 3 calls


Poisson Probability Distribution
Cumulative Poisson Probability
A cumulative Poisson probability refers to the probability that the Poisson
random variable is greater than some specified lower limit and less than
some specified upper limit.

Example 6
Suppose the average number of lions seen on a 1-day safari is 5. What is the
probability that tourists will see fewer than four lions on the next 1-day
safari?
Poisson Probability Distribution
Solution:
This is a poisson experiment in which we know the following:
= 5; since 5 lions are seen per safari, on average.
x = 0, 1, 2, or 3; since we want to find the likelihood that tourists will see
fewer than 4 lions; that is, we want the probability that they will see 0, 1, 2,
or 3 lions.
e = 2.71828; since e is a constant equal to approximately 2.71828.

To solve this problem, we need to find the probability that tourists


will see 0, 1, 2, or 3 lions. Thus, we need to calculate the sum of four
probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5).
Poisson Probability Distribution
Example 6 continuation...
To compute this sum, we use the Poisson formula:

P(x < 3, 5) = P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5)


P(x < 3, 5) = [ (e-5)(50) / 0! ] + [ (e-5)(51) / 1! ] + [ (e-5)(52) / 2! ]
+ [ (e-5)(53) / 3! ]
P(x < 3, 5) = [ (0.006738)(1) / 1 ] + [ (0.006738)(5) / 1 ]
+ [ (0.006738)(25) / 2 ] + [ (0.006738)(125) / 6 ]
P(x < 3, 5) = [ 0.0067 ] + [ 0.03369 ] + [ 0.084224 ] + [ 0.140375 ]
P(x < 3, 5) = 0.2650

Thus, the probability of seeing at no more than 3 lions is 0.2650.

También podría gustarte