Documentos de Académico
Documentos de Profesional
Documentos de Cultura
p(x, y)
p(y|x) =
pX (x)
Z X
p(y|x)dy = 1 ( p(y|x) = 1)
y
py (1 − p)n−y 1
p(x|y) = ¡n¢ y n−y
= ¡ n¢ . ¤
y
p (1 − p) y
• Bayes’ rule
p(y|x)pX (x)
p(x|y) = R
x
p(y|x)pX (x)dx
1
p(Y = y|X = x)
¡N ¢ y ¡ ¢¡ −y¢
y
θ (1 − θ)N −y xy Nn−x
= P ¡ ¢¡ ¢
y N −y y N −y
y θ (1 − θ) x n−x
µ ¶
N − n y−x
= θ (1 − θ)N −n−(y−x) . ¤
y−x
Z
E(X|Y = y) = xp(x|y)dx
x
Z Z Z Z
x(p(x|y)dxp(y)dy = xp(x, y)dxdy = E(X)
µ ¶
Y np
E{E(X1 |Y )} = E = = p = E(X1 ). ¤
n n
iid
Ex. X1 , X2 ∼ U (0, 1)
2
Y = min(X1 , X2 ), Z = max(X1 , X2 ),
= p(Y ≤ y, Z ≤ z)
∂2
p(y, z) = p(Y ≤ y, Z ≤ z)
∂y∂z
2, 0 < y ≤ z < 1
=
0, o.w.
Z 1
pY (y) = p(y, z)dz = 2(1 − y), 0 < y < 1
y
2 1
p(z|y) = = , 0<y≤z<1
2(1 − y) 1−y
Z 1
1 1+y
E(Z|Y ) = z dz = , 0<y<1¤
y 1−y 2
• Jacobian: h = (h1 , . . . , hk )0 : Rk → Rk
t = (t1 , . . . , tk )0
¯ ¯
¯ ¯
¯ ∂ h (t) · · · ∂
h (t) ¯
¯ ∂t1 1 ∂t1 k ¯
¯ ¯
¯ .. .. ¯
Jh (t) = ¯¯ ... ¯
. . ¯
¯ ¯
¯ ¯
¯ ∂ h (t) · · · ∂
h (t)¯¯
¯ ∂tk 1 ∂tk k
3
• Y = g(X),
Z Z
FY (y) = ··· pX (x1 , · · · , xk )dx1 · · · dxk
Ak
Z Z
= ··· pX (g−1 (t))|Jg−1 (t)|dt1 · · · dtk . ¤
Ak
X1
Y1 = X1 + X2 , Y2 = X1 +X2
• Y1 ⊥
⊥ Y2
Recall:
λp xp−1 e−λx
Γ(p, λ) =
Γ(p)
Z ∞
Γ(p) = tp−1 e−t dt, Γ(p + 1) = pΓ(p)
0
xr−1 (1 − x)s−1
β(r, s) =
B(r, s)
Γ(r)Γ(s)
B(r, s) =
Γ(r + s)
4
gamma distributions
1.2
gamma(.5,.5)
gamma(1,1)
1.0
gamma(10,10)
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5
beta distributions
15
10
beta(.5,9.5)
beta(.25,.25)
beta(3,3)
beta(1,1)
5
0
5
p.f.
¯ ¯
¯ ¯
¯ dx1 dx2 ¯
¯ dy1 dy1 ¯
J = ¯¯ ¯
¯
¯ dx1 dx2 ¯
¯ dy2 dy2 ¯
¯ ¯
¯ ¯
¯y 1 − y ¯
¯ 2 2¯
= ¯¯ ¯
¯
¯y ¯
¯ 1 −y1 ¯
= −y1
indep.
Corollary. X1 , . . . , Xn ∼ Γ(pi , λ)
n
X Xn
⇒ Xi ∼ Γ( pi , λ)
i=1 i=1
1.4 χ2 , F , t Distributions
√ √
p(Z12 ≤ t) = p(− t < Z1 < t)
6
√ √
∴ FT (t) = Φ( t) − Φ(− t)
√ 1
fT (t) = t−1/2 φ( t) = √ t−1/2 e−t/2 = Γ(1/2.1/2).
2π
⊥ W ∼ χ2k , χ2m
– V ⊥
V /k
– S= W/m
∼ Fk,m
– Q = √Z ∼ tk
V /k
iid
Corollary. X = (X1 , . . . , Xn ) ∼ N (0, σ 2 ).
Pk
i=1 Xi2 /k
Pk+m ∼ Fk,m
i=k+1 Xi2 /m
X1
qP ∼ tk
k
i=2 +1Xi2 /k
7
chi−square distributions
0.5
0.4
chisq(2)
chisq(5)
chisq(10)
0.3
0.2
0.1
0.0
0 5 10 15 20 25 30
F distributions
1.0
0.8
0.6
0.4
F(10,1000)
F(10,50)
F(10,5)
0.2
F(1,5)
0.0
0 1 2 3 4 5
t distributions
0.4
normal
t(2)
t(8)
0.3
0.2
0.1
0.0
−4 −2 80 2 4
x
1.5 Orthogonal Transformation
• Orthogonal matrix:
– A0 = A−1 ; A0 A = AA0 = I
– u = Av ⇒ u0 u = v0 A0 Av = v0 v
• Y = AX + c
X X0
• if pX = √ 1 e− 2σ2 then
2πσ 2
iid
Theorem. Zi ∼ N (µi , σ 2 ), i = 1, . . . , n
Y = Az + c, A is orthogonal
iid
⇒ Yi ∼ N (ηi , σ 2 ), i = 1, . . . , n, where η = Aµ + c.
iid
Theorem. . Zi ∼ N (µ, σ 2 ), i = 1, . . . , n
⇒
Pn
i=1 (Zi − Z̄)2
Z̄ ⊥
⊥
σ2
9
Pn
i=1 (Zi − Z̄)2
∼ χ2n−1
σ2
is orthogonal.
√
Let Y = AZ. Then Y0 Y = Z0 Z, and Y1 = nZ̄. So
n
X n
X
2 0 2 0
(Zi − Z̄) = Z Z − nZ̄ = Y Y − Y12 = Yi2 .
i=1 i=2
A∗ 1 = 0,
implying
Y2
.
.. = A∗ Z
Yn
implying
Pn
i=1 (Zi − Z̄)2
Z̄ ⊥⊥ .
σ2
Also,
Pn Pn
i=1 (Zi − Z̄)2 i=1 Yi
2
= ∼ χ2n−1 . ¤
σ2 σ2
10
1.6 Bivariate Normal
Z1 iid
• Y = AZ + µ, Z =
∼ N (0, 1)
Z2
var(Y1 ) cov(Y1 , Y2 )
cov(Y) =
cov(Y1 , Y2 ) var(Y2 )
σ12 ρσ1 σ2
=
ρσ1 σ2 σ22
a211 + a212 a11 a21 + a12 a22
=
a11 a21 + a12 a22 a221 + a222
= A · A0
1 Z0 Z
pZ = √ e− 2
( 2π)2
1 (y−µ)0 (A−1 )0 A−1 (y−µ)
pY = √ e− 2
( 2π)2 | det(A)|
1 (y−µ)0 (AA0 )−1 (y−µ)
= √ p e− 2
( 2π)2 | det(AA0 )|
1 (y−µ)0 Σ−1 (y−µ)
= p e− 2 ,
2π det(Σ)
where Σ = cov(Y).
E(Y) = µ
cov(Y) = AΣA0
11
Two dimensional Normal Distribution
µ1 = 0,µ 2 = 0,σ 11 = 10,σ 22 = 10,σ 12 = 15,ρ = 0.5
0.015
z 0.010
0.005 10
5
0.000
−10 0
−5 x2
0 −5
x1
5
10 −10
• (X, Y ) ∼ BVN(µ, Σ)
Y − µ2 X − µ1
2
=ρ
σ2 σ12
U1 = aZ1 + bZ2 + µ1
U2 = cZ1 + dZ2 + µ2
⇒
a2 + b2 = σ12
c2 + d2 = σ22
ac + bd = ρ
12
One solution:
p
U1 = σ 1 1 − ρ2 Z1 + ρσ1 Z2 + µ1
U 2 = σ 2 Z 1 + µ2
σ2
⇒ Y |X ∼ N (µ1 + ρ (X − µ1 ), σ 2 (1 − ρ2 ))
σ1
Asymptotic Theory
Theorem. (X1 , . . . , Xn ) is a sample from a population with mean µ and positive, finite
variance σ 2 . Then
√
n(X̄ − µ) d
→ Z ∼ N (0, 1)
σ
i.e.,
µ√ ¶
n(x − µ)
p(X̄ ≤ x) + Φ
σ
δ-method
i.e.,
√
n[h(X̄) − h(µ)] ≈ N (0, [h0 (µ)]2 σ 2 )
13
Ex. h(X̄) = X̄(1 − X̄)
when µ 6= 12 ,
(1 − 2µ)2 σ 2
var(h(X̄)) =
n
where σ 2 = var(X).
Ex.
1
Pk 2
k i=1 Zi
Fk,m = 1
Pk+m 2
m i=k+1 Zi
1
Pk 2
m→∞ k i=1 Zi
−−−→
LLN 1
Slutsky’s theorem 1 2
∼ χ .¤
k k
• variance stabilization:
σ 2 [h0 (µ)]2 = C
where C is a constant
λ
Ex. X ∼ Poisson(λ), var(X) = λ, var(X̄) = n
[h0 (λ)]2 λ = C
r
C
⇒ h0 (λ) =
λ
√
⇒ h(λ) = 2 Cλ + d
14
√
So we choose h(t) = t, then
√ p √ 1
n( X̄ − λ) ≈ N (0, ). ¤
4
Lemma. If Q(c) = E(Y − c)2 , then either Q(c) = ∞ for all c, or Q is minimized uniquely
by c = E(Y ).
Lemma. If X is a random vector and Y is a random variable, then either E(Y −g(X))2 = ∞
for every function g with strict inequality holding unless g(X) = E(Y |X). This implies that
Recall that var(Y |X) = E((Y − E(Y |X))2 |X) and note that
15
Ex. Z1 ⊥⊥ Z2 ∼ Bernoulli(0.5), X = Z1 , Y = Z1 Z2 ,
1
E(Y |X) = X,
2
1 1
var(E(Y |X)) = var( X) = ,
2 16
1 1
var(Y |X = x) = E[(Z1 Z2 − X)2 |Z1 = x] = x2 ,
2 4
1
E(var(Y |X)) = ,
8
3
var(Y ) = E(var(Y |X)) + var(E(Y |X)) = .¤
16
• Bivariate normal:
σ2
Y |X ∼ N (µ2 + ρ (X − µ1 ), σ22 (1 − ρ2 ))
σ1
Hence the best predictor is µ2 + ρ σσ21 (X − µ1 ), and MSE of the predictor = σ22 (1 − ρ2 ).
when σ1 = σ2 , µ1 = µ2 = µ,
16