Probability and Statistics: A Sample Analogues Approach: Charlie Gibbons Economics 140 University of California, Berkeley

Probability and Statistics:
A Sample Analogues Approach

Charlie Gibbons
Economics 140
University of California, Berkeley
Summer 2011
Outline
1 Populations and samples
2 Probability
Simple probability
Joint probabilities
Conditional probability
Independence
3 Expectations
4 Dispersion
Variance
Covariance
5 Appendix: Additional examples
Populations and samples
The population is the universe of units that you care about.
Example: American adults.
A sample is a subset of the population.
Example: The observations in the Current Population Survey.
Econometrics uses a sample to make inferences about the
population. Sample statistics have population analogues.
Sample frequencies
We begin with some random quantity Y that takes on K
possible values y
1
, . . . , y
K
. The value of this random variable for
observation i is y
i
; y
i
is a realization of the random variable Y.
Example: The roll of a die can take on values 1, . . . , 6.
We ask, what is the sample frequency of some y from the set
y
1
, . . . , y
K
? That is, what fraction of our observations have an
observed value of y?
All we do is count the number of observations that have the
value y and divide by the number of observations:
f (y) =
#{y
i
= y}
N
.
Probability mass function
We typically dene the probability of y as the fraction of times
that it arises if we had innite observationsthe sample
frequency of y in an innite sample.
We write this as Pr(Y = y). This is the probability that a
random variable Y takes on the value y.
Example: The probability of getting some value y {1, . . . , 6}
when you roll a die is Pr(Y = y) =
1
6
for all y.
Terminology: y is a realization of the random variable Y and
Pr(Y = y) is a probability mass function.
Cumulative distribution function
We might care about the probability that Y takes on a value of
y or less: Pr(Y y). This is called the cumulative distribution
function (CDF) of Y.
To get this, we add up the probability of getting any value less
than y:
F(y) Pr(Y y) =
y
j
y
Pr(Y = y
j
).
Example: When you roll a die, the probability of getting a 3 or
less is
F(3) = Pr(Y 3) = Pr(Y = 1) + Pr(Y = 2) + Pr(Y = 3) =
1
2
.
Continuous random variables
Life is pretty simple when we have a nite number of y values,
but what if we have an innite number?
The denition of the sample frequency doesnt change, but
often the frequency of any particular value of y will be
1
N
i.e.,
only one observation will have that value.
Probability density function
Instead of a probability mass function, we have a probability
density function that is dened as the derivative of the CDF:
f (y) =
d
dy
F(y).
Intuition: The derivative of the CDF answers, how much does
the total probability change if we consider a little bigger value
of y? How much more probable is getting a value less than y if
we make y a bit bigger? This is additional contribution in
probability of a small change in y is the probability density of y.
Note: For continuous random variables, the CDF is the integral
of the PDF (cf., for discrete random variables, the CDF is the
sum of the PMF).
CDF-PDF example
3 2 1 0 1 2 3
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
x
C
D
F
3 2 1 0 1 2 3
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
x
P
D
F
Figure: Normal CDF and PDF; slope of CDF line is height of PDF line
Joint probabilities
Suppose that we have 2 random variables, X and Y and want
to consider their joint frequency in the sample. Extending our
previous denition, we have
f (x, y) =
#{y
i
= y and x
i
= x}
N
.
These are often called cross tabs (tabulations).
We have obvious extensions to a joint PMF, Pr(X = x, Y = y),
joint PDF, f (x, y), and joint CDF, F(x, y).
Examples:
Joint PMF Joint CDF (discrete)
Joint normal PDF Joint normal CDF
Conditional frequencies
Suppose that we have two random variables, but we want to
consider the distribution of one for some xed value of the
other. That is, what is the distribution of Y when X = x?
Note that we are limiting our samplewe only care about the
observations such that x
i
= x. Of this subgroup, what is the
frequency of y?
Example: What is the distribution of student heights given that
they are male?
f (y|X = x) =
#{y
i
= y and x
i
= x}
#{x
i
= x}
.
This is the sample frequency of y given or conditional upon X
being xthe conditional sample frequency.
Conditional probability
The population analogue of conditional frequency, the
conditional probability of Y, forms the core of econometrics.
The probability that Y takes the value y given that X takes
the value x is
Pr(Y = y|X = x) =
Pr(Y = y and X = x)
Pr(X = x)
.
We divide by the probability that X = x to account for the fact
that we are only considering a subpopulation.
Example:
Conditional probabilities and dice
Dictatorships and growth
Example from Bill Easterlys Benevolent Autocrats (2011).
Growth Commission Report, World Bank
Growth at such a quick pace, over such a long period, requires
strong political leadership.
Thomas Friedman, NY Times
One-party autocracy certainly has its drawbacks. But when it is
led by a reasonably enlightened group of people, as China is
today, it can also have great advantages. That one party can
just impose the politically dicult but critically important
policies needed to move a society forward in the 21st century.
Wrong question, wrong interpretation
Autocracy Democracy
Growth Success 9 1
f (Autocracy | Success) =
9
9 + 1
= 90%
f (Democracy | Success) =
1
9 + 1
= 10%
Typical question
Econometricians generally ask for the
Pr(outcome | treatment and other predictors).
The right question
Autocracy Democracy
Growth Success 9 1
Growth Failure 10 0
Neither 70 12
f (Success | Autocracy) =
9
9 + 10 + 70
= 10%
f (Success | Democracy) =
1
1 + 0 + 12
= 8%
f (Failure | Autocracy) =
10
9 + 10 + 70
= 11%
f (Failure | Democracy) =
0
1 + 0 + 12
= 0%
Independence
X and Y are independent if and only if
F
X,Y
(x, y) = F
X
(x)F
Y
(y)
(note: these are the population CDFs) and
f
X,Y
(x, y) = f
X
(x)f
Y
(y).
We also see that X and Y are independent if and only if
f
Y|X
(y | X = x) = f
Y
(y) x X.
Example: Whats the probability of getting heads on a second
coin toss if you got heads on the rst?
This implies that knowing X gives you no additional ability to
predict Y, an intuitive notion underlying independence.
Example:
Independence and dependence
Sample average
We are all familiar with the sample average of Y: add up all the
observed values and divide by N:
y =
1
N
N
i=1
y
i
.
Alternatively, we can consider every possible value of Y,
y
1
, . . . , y
K
and multiply each by its sample frequency:
y =
K
j =1
y
j
#{y
i
= y
j
}
N
.
Expectations
The population version is the expectationtake each value that
Y can take on and multiply by its probability (as opposed to its
sample frequency):
E(Y) =
K
j =1
y
j
Pr(Y = y
j
).
For a continuous random variable, we turn sums into integrals:
E(Y) =
yf (y) dy.
Expectations of functions
We can calculate expectations of functions of Y, g(Y).
We have the equations
E[g(Y)] =
yY
f (y)g(y)
E[g(Y)] =
g(y)f (y) dy
for discrete and continuous random variables respectively.
Expectations of functions example
Note that, in general, E[g(Y)] = g[E(Y)].
Using a die rolling example,
E(Y
2
) = 1
2
1
6
+ 2
2
1
6
+ 3
2
1
6
+ 4
2
1
6
+ 5
2
1
6
+ 6
2
1
6
=
91
6
= 15.17
= 3.5
2
= 12.25
E(Y
2
) = [E(Y)]
2
Expectations are linear
Expectations are linear operators, i.e.,
E(a g(Y) + b h(Y) + c) = a E[g(Y)] + b E[h(Y)] + c.
Expectations and independence
Recall that, for independent random variables X and Y,
f
Y|X
(y | X = x) = f
Y
(y) and f
X|Y
(x | Y = y) = f
X
(x)
Hence,
E(Y | X) = E(Y) and E(X | Y) = E(X).
Conditional expectations
The conditional expectation E[Y | X = x] asks, what is the
average value of Y given that X takes on the value x?
Conditional expectations hold X xed at some x and the value
E[Y | X = x] varies depending upon which x we pick.
Since X is xed, it isnt random and can come out of the
expectation:
E[g(X)Y + h(X) | X = x] = g(x)E[Y] + h(x).
Law of iterated expectations
The law of iterated expectations says that
E
Y
[Y] = E
X
[E[Y | X = x]] ;
the expectation of Y is the conditional expectation of Y at
X = x averaged over all possible values of X.
Variance
The variance of a random variable is a measure of its dispersion
around its mean. It is dened as the second central moment of
Y:
2
Y
Var(Y) = E
(Y )
2
Multiplying this out yields:

= E
Y
2
2Y +
2
= E
Y
2
2E(Y) +
2
= E
Y
2
[E(Y)]
2
Same mean, dierent variance
3 2 1 0 1 2 3
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
D
e
n
s
i
t
y
Variance facts
The standard deviation, , of a random variable is the square
root of its variance; i.e., =
Var(Y).
While the variance is in squared units, the standard deviation is
in the same units as y.
See that Var(aY + b) = a
2
Var(Y).
Sample analogue of variance
A candidate for the sample analogue of the variance of Y is

2
=
1
N
N
i=1
(y
i
y)
2
.
It turns out that this is a biased estimator of
2
, so we use

2
=
1
N 1
N
i=1
(y
i
y)
2
instead to get an unbiased estimator.
It turns out that the other estimator is consistent; its bias goes
to 0 as N goes to .
Covariance
The covariance of random variables X and Y is dened as
Cov(X, Y)
XY
= E[(X E
X
(X)) (Y E
Y
(Y))]
= E(XY)
X
Y
.
We have
Var(aX + bY) = a
2
Var(X) + b
2
Var(Y) + 2abCov(X, Y).
Note that covariance only measures the linear relationship
between two random variables; well see just what this means
later on.
The covariance between two independent random variables is 0.
Correlation
The correlation of random variables X and Y is dened as
XY
=

XY
Y
.
Correlation is a normalized version of covariancehow big is
the covariance relative to the variation in X and Y? Both will
have the same sign.
Sample analogues for covariance and correlation
Of course, we can get an unbiased estimator for covariance:

XY
=
1
N 1
N
i=1
(x
i
x)(y
i
y).
The sample analogue of correlation can be calculated using the
preceding denitions.
Standardization
Suppose that we take Y, subtract o its mean and divide by
its standard deviation . We have
E
=
E[Y]
= 0
and
Var
=
1
2
Var(Y ) =
1
2
Var(Y) = 1.
This is called standardizing a random variable.
Appendix: Additional examples
Example setup
Consider the roll of two dice and let X and Y be the outcomes
on each die. Then the 36 (equally-likely) possibilities are:
x, y 1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,6
3 3,1 3,2 3,3 3,4 3,5 3,6
4 4,1 4,2 4,3 4,4 4,5 4,6
5 5,1 5,2 5,3 5,4 5,5 5,6
6 6,1 6,2 6,3 6,4 6,5 6,6
Joint PMF example
The joint probability mass function (joint PMF), f
X,Y
is
f
X,Y
(x, y) = Pr(X = x and Y = y)
What is f
X,Y
(6, 5)?
x, y 1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,6
3 3,1 3,2 3,3 3,4 3,5 3,6
4 4,1 4,2 4,3 4,4 4,5 4,6
5 5,1 5,2 5,3 5,4 5,5 5,6
6 6,1 6,2 6,3 6,4 6,5 6,6
f
X,Y
(6, 5) =
1
36
Joint CDF denition
The joint cumulative distribution function (joint CDF),
F
X,Y
(x, y), of the random variables X and Y is dened by
F
X,Y
(x, y) = Pr(X x and Y y)
=
x
s=
y
t=
f
X,Y
(s, t)
Joint CDF example
What is F
X,Y
(2, 3)?
x, y 1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,6
3 3,1 3,2 3,3 3,4 3,5 3,6
4 4,1 4,2 4,3 4,4 4,5 4,6
5 5,1 5,2 5,3 5,4 5,5 5,6
6 6,1 6,2 6,3 6,4 6,5 6,6
F
X,Y
(2, 3) =
6
36
=
1
6
Joint normal PDF
Joint PDF of independent normals
4
2
0
2
4
4
2
0
2
4
X
Y
Density
Joint normal CDF
Joint CDF of independent normals
4
2
0
2
4
4
2
0
2
4
X
Y
Density
Conditional probability example
What is f (Y = 3 | X 2)?
1 2 3 4 5 6
1 1,1 1,2 1,3 1,4 1,5 1,6
2 2,1 2,2 2,3 2,4 2,5 2,6
f
Y|X
(y = 3 | X 2) =
2
12
=
1
6
Note how our table changed dimensions because conditioning is
all about changing the range of values that we care about; here,
we only care about what happens if X 2.
Independence example
We showed in the two dice example that F
X,Y
(2, 3) =
1
6
, which
is equal to
F
X
(2) F
Y
(3) =
2
6

3
6
=
1
6
.
This is because the rolls of the two dice are intuitively
independentthe result on one die has no bearing on that of
the other.
A new random variable
Imagine instead that X is the outcome on the rst die and Z is
the sum of the outcomes on two dice. Then we have
x, z 1 2 3 4 5 6
1 1,2 1,3 1,4 1,5 1,6 1,7
2 2,3 2,4 2,5 2,6 2,7 2,8
3 3,4 3,5 3,6 3,7 3,8 3,9
4 4,5 4,6 4,7 4,8 4,9 4,10
5 5,6 5,7 5,8 5,9 5,10 5,11
6 6,7 6,8 6,9 6,10 6,11 6,12
As we would imagine, the result of X inuences the value of Z,
so they shouldnt be independent.
Dependence example
Lets prove it: What is F
X,Z
(2, 5)?
x, z 1 2 3 4 5 6
1 1,2 1,3 1,4 1,5 1,6 1,7
2 2,3 2,4 2,5 2,6 2,7 2,8
3 3,4 3,5 3,6 3,7 3,8 3,9
4 4,5 4,6 4,7 4,8 4,9 4,10
5 5,6 5,7 5,8 5,9 5,10 5,11
6 6,7 6,8 6,9 6,10 6,11 6,12
F
X,Z
(2, 5) =
7
36
=
5
54
=
2
6

10
36
= F
X
(2) F
Z
(5)

Probability and Statistics: A Sample Analogues Approach: Charlie Gibbons Economics 140 University of California, Berkeley

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Probability and Statistics: A Sample Analogues Approach: Charlie Gibbons Economics 140 University of California, Berkeley

Cargado por

Copyright:

Formatos disponibles

Probability and Statistics:

A Sample Analogues Approach

Multiplying this out yields:

También podría gustarte