Documentos de Académico
Documentos de Profesional
Documentos de Cultura
An Introduction
PETER M. LEE
Formerly Provost of Wentworth College,
University of York, England
Fourth Edition
Appendix D
Answers to Exercises
D.1
1.
Exercises on Chapter 1
Probability of a 2 : 1 split is 39/50, and the conditional probability that the king is the
odd one is 1/3, so probability one player has the king and the other the remaining two
is 13/50.
2. (a) If P(A) = 0 or P(A) = 1.
(b) Take A as odd red die, B as odd blue die and C as odd sum.
(c) Take C as {HHH, T HH, T HT, T T H}.
3. (a) P(homozygous) = 1/3;
(b)
P(heterozygous) = 2/3.
By Bayes Theorem
P(BB | 7 black) =
(1/3)(1)
64
=
.
(1/3)(1) + (2/3)(1/27 )
65
(c) Similarly we see by induction (wih case n = 0 given by part (a)) that P(BB | first n black) =
2n1 /(2n1 + 1) since
P(BB | first n + 1 black) =
=
{2n1 /(2n1
2n
.
2n + 1
1
4.
P(GG) p2
p(1 p)
0
for x < 3;
1/36
for 3 6 x < 4;
2/36
for 4 6 x < 5;
4/36
for 5 6 x < 6;
6/36
for 6 6 x < 7;
F (x) =
(3[x]
12)/36
for
n 6 x < n + 1 where 7 6 n < 15;
32/36
for
15 6 x < 16;
34/36
for
16 6 x < 17;
35/36
for
17 6 x < 18;
1
for x > 18
where [x] denotes the integer part of x.
6.
nk+1
k
1
...
1
=
k!
n
n n
n
n
k
exp().
k!
7. (a) P e
k = 0 = P(m
e =n
e = 0) = P(m
e = 0) P(e
n = 0) = e e = e(+) ,
P( e
k = 1 = P {m
e = 1, n
e = 0} or {m
e = 0, n
e = 1}
= e(+) + e(+) = ( + )e(+) .
(b) More generally
k
k
X
X
P e
k=k =
P(m
e = m, n
e = k m) =
P(m
e = m) P(e
n = k m)
m=0
m=0
k
X
k
km
=
exp()
exp()
k!
(k m)!
m=0
k
X
1
k
( + )k
exp(( + ))
exp(( + )).
m km =
k!
k!
m
m=0
m
m!
km
exp()) (km)!
exp()
(+)k
k!
exp(( + ))
m
km
k
=
1
.
m
+
+
8.
e6 y)
P (e
y 6 y) = P x
e2 6 y = P ( y 6 x
=P x
e 6 y ) P(e
x< x
10. Let m be the minimum of u and v and c be the length of the centre section. Then
p(m, c) = pue, ve(m, c) + pue, ve(c, m) = 2 if m + c 6 1 and 0 otherwise.
11. If F (x, y) = F (x)F (y) then by differentiation with respect to x and with respect
to y we have p(x, y) = p(x)p(y). The converse follows by integration.
In discrete cases if F (x, y) = F (x)F (y) then p(x, y) = F (x, y) F (x 1, y)
F (x, y 1) + F (x 1, y 1) = (Fxe(x) Fxe(x 1))(FP
Fye(y 1)) =
y
e(y) P
p(x)p(y). Coversely
if
p(x,
y)
=
p(x)p(y)
then
F
(x,
y)
=
e,e
y (, )
6x
6y px
P
P
and so F (x, y) = 6x pxe() 6y pye() = F (x)F (y).
12.
P
P
P
P
13. EX =
EZi2 =
1 = , while EX 2 =
EZi4 + i6=j EXi2 Xj2 = 3 +
( 1) so that VZ = EX 2 (EX)2 = 2. Similarly by integration.
14. EX = n, EX(X 1) = n(n 1) 2 and EX(X 1)(X 2) = n(n 1)(n
2) 3 , so EX = n, EX 2 = EX(X 1) + EX = n(n 1) + n and
E(X EX)3 = EX 3 3(EX 2 )(EX) + 2(EX)3
= EX(X 1)(X 2) 3(EX 2 )(EX) + 3EX 2 + 2(EX)3 2EX
= n(1 )(1 2),
p
and thus 1 = (1 2)/ [n(1 )].
15.
>
R
{x; |x|>c}
e=x
17. p(y|x) = {2(1 2 )} 2 exp{ 12 (y x)2 /(1 2 )} so conditional on x
we have y N(x, 1 2 ), so E(y|x) = x. E(y 2 |x) = V(y|x) + {E(y|x)}2 =
1 2 + 2 x2 , hence E(xy|x) = x2 , E(x2 y 2 |x) = x2 2 x2 + 2 x4 . Therefore
Exy = and (as Ex4 = 3) Ex2 y 2 = 1 + 22 , so Exy (Ex)(Ey) = and Ex2 y 2
(Ex2 )(Ey 2 ) = 22 . As Vx = 1 and Vx2 = Ex4 (Ex2 )2 = 3 1 = 2 = Vy, the
result follows.
18. (a) p(x, y) = (x ex /x!) xy y (1 )xy so adding over x = y, y + 1, . . . and
P
using xy (1 )xy = e(1) we get p(y) = ()y e /y! so that ye P().
Now note that Eye|ex (e
y |e
x) = x and this has expectation .
(b)
Eye|ex (e
y |e
x) = x which has variance 2 so that the right hand side adds to , the
variance of ye.
19.
We note that
Z
I=
0
exp( 21 z 2 ) dz =
Z
0
exp( 21 (xy)2 ) y dx
for any y (on setting z = xy). Putting z in place of y, it follows that for any z
Z
I=
exp( 12 (zx)2 ) z dx
0
so that
Z
2
I =
0
exp( 21 z 2 ) dz
Z
exp( 12 (zx)2 ) dx
=
0
= [ exp(t)]0 tan1 x 0 = 1 21
=
2
p
and hence I = /2 so that the integral of from to is 1, and hence is a
probability density function. This method is apparently due to Laplace (1812, Section
24, pages 9495 in the first edition).
D.2
1.
Exercises on Chapter 2
p() = B(k + 1, n k + 1)
1
90% HDR is 16.35525 1.6449/ 12 or 16.35525 0.47484, that is, the interval
(15.88041, 16.83009).
3.
5. Posterior is k(2/25) 2 exp{ 21 ( 0.33)2 25} for > 0 and 0 for <
0. Integrating we find 1 = kP(X > 0) where X N(0.33, 1/25), so k = {1
5
= Ac(1 + )
p
/n
R
I J
p(|x) d = 1,
1
1
6 Ac 6
.
(1 + )(1 ) + M
(1 )(1 )
8.
The likelihood is
nX
o
t(xi )()
n
h
.X
io
h()n exp exp log () log 1
t(xi )
.
Sufficient statistic is
Pn
i=1
Pn
xi , or equivalently x.
i=1
xi ,
Qn
i=1
) where x
is the
xi ) or equivalently (x, x
p() 0 exp(/).
14. If U(/2, /2) and x = tan , then by the usual change-of-variable rule
1
p(x) = 1 |d/ dx| = 1 1 + x2
, and similarly if U(/2, 3/2). The
result follows.
15.
Straightforwardly
n!
x y z
x! y! z!
n!
=
exp{n log(1 )}
x! y! (n x y)!
p(x|) =
1 1
S = (n 1)s2 = 2970; S1 = S0 + S + (n1
) (0 x)2 = 3336; s1 / n1 =
0 +n
7
S1 2
1 .
For prior, find 75% point of t4 from, e.g., Neaves Table 3.1 as 0.741. For posterior,
as degrees of freedom are large, can approximate t by normal, noting that the 75%
point of thep
standard normal distribution is 0.6745. Hence a 50% prior HDR for is
85 0.741 (350/4 1), that is (75.6, 94.4), while a 50% posterior HDR for is
88.96 0.6745 0.56, that is, (88.58, 89.34).
17. With p1 () being N(0, 1) we see that p1 (|x) is N(2, 2), and with p2 () being
N(1, 1) we see that p2 (|x) is N(3, 1). As
Z
Z
1
exp{ 21 (x )2 12 2 } d
p(x|)p1 () d =
2
q
1
2
= exp{ 14 x2 }
2
Z
1
q
exp{ 12 ( 21 x)2 / 12 } d
1
2 2
q
1
1
2
= exp{ 14 x2 } = exp{2}
2
2
and similarly
Z
Z
1
exp{ 12 (x )2 21 ( 1)2 } d
p(x|)p1 () d =
2
q
1
2
= exp{ 41 (x 1)2 }
2
Z
1
q
exp{ 12 ( 12 (x + 1))2 / 12 } d
1
2 2
q
1
1
2
= exp{ 41 (x 1)2 } = exp{ 41 }
2
2
so that just as in Section 2.13 we see that the posterior is an 0 to 0 mixture of N(2, 1)
and N(3, 1) where 0 32 e2 = 0.09 and 0 13 e1/4 = 0.26, so that 0 = 0.26 and
0 = 0.74. It follows that
P( > 1) = 0.26 0.1587 + 0.74 0.0228 = 0.058.
18.
nX + n0 02 (n + n0 )
nX + n0 0
n + n0
2
1
2
[{n(n + n0 ) n2 }X + {n0 (n + n0 ) n2o }02 2(nn0 )X]
n + n0
nn0
2
1 1
=
[X + 02 2X0 ] = (n1
) (X )2 .
0 +n
n + n0
=
D.3
Exercises on Chapter 3
( +
)2 (
+ + 1)
2
32 (3
+ 1)
so = 55/27
= 2 and = 4. Posterior is then Be(2 + 8, 4 + 12), that is Be(10, 16).
The 95% values for F32,20 are 0.45 and 2.30 by interpolation, so those for F20,32 are
0.43 and 2.22. An appropriate interval for is from 10 0.43/(16 + 10 0.43 to
10 2.22/(16 + 10 2.22), that is (0.21, 0.58).
4. Take /( + ) = 0.4 and + = 12, so approximately = 5, = 7. Posterior
is then Be(5 + 12, 7 + 13), that is, Be(17, 20).
5. Take beta prior with /( + ) = 13 and + = 14 11 = 2.75. It is convenient
to take integer and , so take = 1 and = 2, giving /( + ) = 31 and
( + )/11 = 0.273. Variance of prior is 2/36 = 0.0555 so standard deviation
is 0.236. Data is such that n = 11 and X = 3, so posterior is Be(4, 10). From
tables, values of F corresponding to a 50% HDR for log F20,8 are F = 0.67 and F =
1.52. It follows that the appropriate interval of values of F8,20 is (1/F , 1/F ), that is
(0.66, 1.49). Consequently and appropriate interval for the proportion required is
4 0.66/(10 + 4 0.66) 6 6 4 1.49/(10 + 4 1.49), that is (0.21, 0.37). The
posterior mean is 4/(4 + 10) = 0.29, the posterior mode is (4 1)/(4 + 10 2) =
9
12. Using Appendix A.16, coefficient of variation is {2/( + 1)( 2)}. This
is less than0.01 if 21 ( + 1)( 2) > 1/0.012 or 2 20, 002 > 0, so if
> 12 (1 + 80, 009) = 141.9. Taking the reference prior p(, ) ( )2 , that
is, Pabb(, , 1) (cf. Section 3.6), we need 0 = n 1 > 141.9, that is, n at least
143.
10
13. Take prior p() = (d 1)d for 0 < < . Then posterior is p(|x) =
0
(d0 1)d for 1 < < where d0 = d + n(c + 1) and 1 = max{0 , M } where
M = max{xi }.
14. Prior p() 1/ as before, but likelihood is now p(71, 100 | ) = 1/ 2 for
> 100, so posterior is approximately
X
p( | 71, 100) 3 /
3 .
>100
1002 /02 , and in particular the posterior median is 100 2 or about 140.
15. We have p() = 1/ and setting = log we get p() = p() |d/d| = 1.
Thus we expect all pages to be more or less equally dirty.
16. The group operation is (x, y) 7 x + y mod 2 and the appropriate Haar measure is Lebesgue measure on the circle, that is, arc length around the circle.
17.
1
0.00
0.06
0.04
0.06
0.00
2
0.01
0.05
0.06
0.05
0.01
3
0.01
0.04
0.05
0.04
0.01
4
0.01
0.03
0.04
0.03
0.01
4
0.01
0.02
0.03
0.02
0.01
Integrating over c using Simpsons Rule (and ignoring the constant) for the above values of we get 0.11, 0.48, 0.57, 0.48 and 0.11 respectively. Integrating again we get
for intervals as shown:
(1, 1)
0.92
(1, 3)
2.60
(3, 5)
3.24
(5, 7)
2.60
(7, 9) Total
0.92 10.28
1
4
Hence
L0 (|x) = 1 2 exp( x)/{1 + exp( x)}
= {1 exp( x)}/{1 + exp( x)}
= 1 2/{1 + exp(x )}
00
Z
= (1/8)
sech4 ( x) dx
Z
= (1/8)
sech4 y dy
D.4
Exercises on Chapter 4
1
1. 1 (1 p0 ) = p0 = 1 + (1 0 )01 B 1
= 1 (1 0 )01 B 1 + (1
1
1
2 2 2
0 ) 0 B . Result then follows on noting 0 = {1 (1 0 )}
= 1+(10 ).
2.
(/n)z =
B = (1 + 1/) 2 exp[ 12 (1 + )1 ]
1
= 2 (1 + ) 2 exp( 12 z 2 ) exp[ 21 z 2 (1 + )1 ]
1
= 2 (1 + 12 + . . . ) exp( 21 z 2 )[1 + 12 z 2 (1 + )1 + . . . ]
1
= 2 exp( 12 z 2 )(1 + 12 (z 2 + 1) + . . . ).
8.
Likelihood ratio is
P 2
n
h X
i
xi /( + )]
{2( + )}n/2 exp[ 12
2
2
P
1
+
exp
x
/
=
i
x2i n /2
= exp
P 2
xi /n
n
1 .
= exp
9. p1 (x)/ = { 12 ( + /n)1 + 12 (x )2 /( + /n)2 }p1 (x) which vanishes if + /n = (x )2 = z 2 /n so if = (z 2 1)/n. Hence p1 (x) 6
1
1
(2/n) 2 z 1 exp( 12 ). ConsequentlyB = (2/n) 2 exp( 21 z 2 )/p1 (x) > e z exp( 12 z 2 ).
For last part use p0 = 1 + (1 /0 )B 1 .
10. Posterior probability is a minimum when B is a minimum, hence when 2 log B
is a minimum, and
d (2 log B)/dn = {1 + n}1 + z 2 {1 + 1/n}2 (1/n2 )
which vanishes when n = z 2 1. Since z 2 1 is not necessarily an integer, this answer
is only approximate.
11. Test
against B(7324, 0.75) with mean 5493 and variance 1373, so z = (5474
5493)/ 1373 = 0.51. Not significant so theory confirmed.
13
12.
Likelihood ratio is
1
(22)
21
exp[ 21 u2 /(2)](2/2)
exp[ 12 (z )/(/2)]
P
(Xi X)2 = 0, then B = 0, that is, p1 = 1.
From (i) he deduces that p() must be an even function with integral 1, while he shows
that (ii) is equivalent to the condition
Z
p() n2 d = .
0
He then shows that the simplest function satisfying these conditions is p() = 1 (1+
2 )1from which it follows that p() = p()|d/d| = 1 ( 2 + 2 )1 . Putting
= and generalizing to the case where H0 is that = 0 we get the distribution
in the question.
There is some arbitrariness in Jeffreys simplest function; if instead he had taken
p() = 1 (2 + 2 )1 he would have ended up with p() = 1 ( 2 + (
0 )2 )1 where = . However, even after this generalization, the argument is not
overwhelmingly convincing.
14
15. (a)
B>
x
1 0
1 b
nx
;
1 0
p0 > 1 +
0
b
0
!x
1 b
1 0
!nx 1
(b) From tables (e.g. D. V. Lindley and W. F. Scott, New Cambridge Elementary
Statistical Tables, Cambridge: University Press 1995 [1st edn (1984)], Table 1, or
H. R. Neave, Statistics Tables for mathematicians, engineers, economists and the behavioural and management sciences, London: George Allen & Unwin (1978), Table 1.1) the probability of a binomial observation 6 14 from B(20, 0.5) is 0.9793,
so the appropriate (two-tailed) P -value is 2(1 0.9793) = 0.0414
= 1/24. The
maximum likelihood estimate is b = 15/20 = 0.75, so the lower bound on B is
(0.5/0.75)15 (0.5/0.25)5 = 220 /315 = 0.0731, implying a lower bound on p0 of
0.0681 or just over 1/15.
p
16. (a) n = 12, = 11, t = x/(s/ n) = 1.2/ (1.1/12) = 3.96 and if we take
k = 1 the Bayes factor B is
(1 + t2 /)(+1)/2
12
(1 + nk)
(b)
(1 +
t2 (1
nk)1 /)(+1)/2
0.004910
= 0.033.
(0.2774)(0.5356)
17. Two-tailed P -value is 0.0432 (cf. D. V. Lindley and W. F. Scott, New Cambridge
Elementary Statistical Tables, Cambridge: University Press 1995 [1st edn (1984)], Table 9), while Bayes factor is
(1 + t2 /)(+1)/2
21
(1 + nk)
(1 +
t2 (1
nk)1 /)(+1)/2
0.08712
= 0.377
(0.3162)(0.7313)
D.5
Exercises on Chapter 5
B = (1 + n) 2 exp[ 12 z 2 (1 + 1/n)1 ]
1
{1 + 2.0282 /8}9/2
1
10 2 {1
0.1546
= 0.613
(0.3162)(0.7980)
P
P
P
P
3. PWe know (xP
i ) and (xi yi ) is independently N(0, 2 i ),
i +yi ) N(2, 2
so (xi yi )2 /2 i 2n and hence if = 0 then
qX
X
(xi + yi )/
(xi yi )2 tn .
Hence test whether = 0. If 6= 0, it can be estimated as
4.
1
2
P
(xi + yi ).
In that case
1
1
2,
2
(a) Posterior
of is N(12.42 12.27, 0.1 (1/9 + 1/12)) so 90% interval is 0.15
1.6449 0.00194, that is, (0.077, 0.223).
p
(b)
= 8 0.10542 + 11 0.09892 = 0.1965, s = 0.1965/19 = 0.102, so
p S 1
s (m +n1 ) = 0.045, so from tables of t19 a 90% interval is 0.151.7290.045,
that is (0.072, 0.228).
16
6.
With independent priors uniform in , and log , that is, p(, , ) 1/,
1
2
o i
(yi )2 /
p( | S)
is an S2
density,
p( | , x) is an N(x, /m) density,
p( | , x) is an N(y, 2/m) density,
It follows that, for given , the parameters and have independent normal distributions, and so that the joint density of = and is
p(, |x, y) = p(|S) p(|x y, )
where p( | x y, ) is an N(x y, (m1 + 2n1 ) density. This variance can now
be integrated out just as in Sections 2.12 and 5.2, giving a very similar conclusion, that
is, that if
(x y)
t=
1
s{m1 + 2n1 } 2
where s2 = S/, then t t .
7.
We find that
S1 = S0 + Sx + Sy + m0 20 + mx2 + n0 20 + ny 2 m1 21 n1 21
(m 1)2
(sin4 + cos4 ).
(m 3)2 (m 5)
17
Also 1 = (sin2 +cos2 )2 = sin4 +cos4 +2 sin2 cos2 , so that sin4 +cos4 =
1 sin2 2 = cos2 2. The result follows.
10.
15. Cross-ratio is (45 29)/(28 30) = 1.554 so its logarithm is 0.441. More
accurately, adjusted value is (44.5 28.5)/(27.5 29.5) = 1.563 so its logarithm is
0.446. Value of a1 + b1 + c1 + d1 is 0.126, and so the posterior distribution of
the log-odds ratio is 0 N(0.441, 0.126) or more accurately N(0.446,
0.126).
The posterior probability that the log-odds ratio ispositive is (0.441/ 0.126) =
(1.242) = 0.8929 or more
= 0.8955.
p accurately (0.446/ 0.126) = (1.256)
p
With the same data sin1 (45/73) = 0.903 radians and sin1 (30/59) = 0.794
radians, and 1/4m +
1/4n = 0.00766, so the posterior probability that > is
((0.903 0.794)/ 0.00766) = (1.245) = 0.8934.
18
16.
17. With conventional priors, posteriors are near Be(56, 252) and Be(34, 212),
so approximating by normals of the same means and variances N(0.1818, 0.0004814),
N(0.1382, 0.0004822),
so N(0.0436, 0.000963) so P( > 0.01) =
((0.0436 0.01)/ 0.000963) = (1.083) = 0.8606 and so the posterior odds are
0.8606/(1 0.8606) = 6.174.
18. Using a normal approximation, x N(8.5, 8.5) and y N(11.0, 11.0), so that
x y N(2.5, 19.5).
D.6
1.
Exercises on Chapter 6
1
2
3
4
5
12
45
23
19
30
0.631 0.712 0.445 0.696 0.535
0.743 0.891 0.478 0.860 0.597
8.916 40.095 10.994 16.340 17.910
Total
129
94.255
Posterior for is N(94.255/129, 1/129) and 95% HDR is 0.7307 1.96 0.0880,
that is, (0.5582, 0.9032). A corresponding interval for is (0.507, 0.718).
2.
45
0.0222
0.489
0.535
34
0.0294
0.545
0.611
49
0.0204
0.601
0.695
ni tanh1 ri )/
ni , 1/
ni so required interval is
1.96
ni tanh1 ri
P
pP .
ni
ni
19
4.
(1 2 )(n1)/2
(1 r)n(3/2)
(1 2 )n/2
(1 r)n
We have
p(|x, y) p()(1 2 )(n1)/2
(cosh t r)(n1) dt
and set
Z
Ik =
(cosh t + cos )k dt
(k > 0).
Taking k = n 2, or k + 1 = n 1, we get
Z
Ik =
(cosh t + cos )k dt (/ sin )k (/ sin ).
0
6.
and r = 0 then
p(|x, y) p()(1 2 )(n1)/2
so with the prior as in the question we get
p(|x, y) (1 2 )(k+n1)/2
If we define
t=
(k + n + 1) p
(1
2 )
= p
t
{(k + n + 1) + t2 }
so that
1 2 =
(k + n + 1)
d
(k + n + 1)2t
,
2
=
2
(k + n + 1) + t
dt
{(k + n + 1) + t2 }2
(k + n + 1)
d
=
dt
{(k + n + 1) + t2 }3/2
and hence
p(t|X, Y ) p(|X, Y ) d/ dt
(k+n1)/2
(k + n + 1)
(k + n + 1)
2
(k + n 1) + t
{(k + n + 1) + t2 }3/2
(k+n+2)/2
t2
1+
k+n+1
This can be recognized as Students t distribution on k + n + 1 degrees of freedom (see
Section 2.12).
21
7. See G. E. P. Box and G. C. Tiao, Bayesian Inference in Statistical Analysis, Reading, MA: Addison-Wesley (1973, Section 8.5.4the equation given in the question is
implicit in their equation (8.5.49)).
8. See G. E. P. Box and G. C. Tiao, Bayesian Inference in Statistical Analysis, Reading, MA: Addison-Wesley (1973, Section 8.5.4, and in particular equation (8.5.43)).
9.
1
2 (x
1
2
log
2
) (x )(y ) 21 (y )2
1 2 2
21 1 + 12 2
2
2
1 2 2
2
I = 12 1 + 21 2
2
2
2
1
+ 2 2 2
2
2
2
1
2
2
2
= 2
2
2 2 2( + 2 )
so that its determinant is
1
det I = 6
8
2
2
2
2
2
2
2
2
2( + 2 )
Adding 21 / times the last column to the first and 12 / times the last column to the
second we get
0
0
2
1
1
2
det I = 6
8
1
1
2
2( + )
1
1
= 6 (2) 2 + 2 = 3
8
4
We thus conclude that det I = 1/4 3 implying a reference prior
p(, , 3/2 .
22
2/2
2/2
1/ 22 /2
2
2
2
See = Syy Sxy
/Sxx = Syy 2Sxy
/Sxx + Sxy
/Sxx = Syy 2bSxy + b2 Sxx
(yi ui vi )2 =
{(yi y) + (y ) ui vi }
Source
Treatments
Error
TOTAL
ANOVA Table
Degrees
Sum of of
squares freedom
93
6
1446
24
1539
27
Mean
square
31
60
Ratio
(< 1)
Source
Treatments
Blocks
Error
TOTAL
ANOVA Table
Degrees
Sum of
of
squares freedom
49,884
2
149,700
5
18,725 10
218,309 17
Mean
square
24,942
29,940
1,872
Ratio
13.3
16.0
We conclude that the treatments differ significantly (an F2,10 variable exceeds 9.43
with probability 0.5%).
16.
x111
1 1 0 1 0
x112
1 1 0 1 0
x121
1 1 0 0 1
x122
, A = 1 1 0 0 1
x=
x211
1 0 1 1 0
x212
1 0 1 1 1
x221
1 0 1 0 1
x222
1 0 1 0 1
24
When r = t = 2 we take
1
1
2
1
=
1 .
1
2
1
0
12
0
17. Evidently A+ A = I from which (a) and (b) and (d) are immediate. For (c), note
AA+ = A(AT A)1 AT which is clearly symmetric.
18.
We take
1 x1
P
1 x2
n
xi
T
P
P
A= .
,
A A=
..
xi
x2i
1 xn
P
so that writing S = (xi x)2 we see that det(AT A) = nS and hence
P 2
P
P
1
xi
Pxi
P yi
(AT A)1 =
,
AT y =
xi
n
xi yy
nS
b = (AT A)1 AT y is easily found. In particular
from which
P(y y)(x x)
X
1 X X
i
i
P
xi
yi + n
xi yi =
.
b1 =
nS
(xi x)2
D.7
Exercises on Chapter 7
1. Recall that t is sufficient for given x if p(x|) = f (t, )g(x) (see Section 2.9).
Now
p(x|) + p(y|) = p(z|) if t = t(z)
p(t|) =
p(x|)
if t = t(x) for x 6= y
so that
p(t|) =
0
if x = y
p(x|) if x = t
Setting f (t, ) = p(t|) and g(x) = 1 (x 6= y), g(y) = 0, it follows that t is sufficient
for x given . It then follows from a nave appliaction of the weak sufficiency principle
that Ev(E, y, ) = Ev(E, z, ). However if x
e is a continuous random variable, then
pxe(y|) = 0 for all y, so we may take y and z as any two possible values of x
e.
2.
3.
x/2, 2x, 2x + 1
1
(x 1)/2, 2x, 2x + 1
l(|x) =
for
=
3
1, 2, 3
when x is even
when x 6= 1 is odd
when x = 1.
The estimators d1 , d2 and d3 corresponding to the smallest, middle and largest possible
are
when x is even
x/2
(x 1)/2 when x 6= 1 is odd
d1 (x) =
1
when x = 1,
25
and d2 (x) = 2x, d3 (x) = 2x + 1. The values of p(d1 = ) given in the question now
follow. However, a Bayesian analyis leads to a posterior
p(|x) =
()IA ()
l(|x)()
=
p(x)
(d1 (x)) + (d2 (x)) + (d3 (x))
where () is the prior, A = {d1 (x), d2 (x), d3 (x)} and IA () is the indicator function
of A. Thus, indeed, the data conveys nothing to the Bayesian version except that
is d1 (x), d2 (x), or d3 (x). However, Bayesians are indifferent between d1 (x), d2 (x),
or d3 (x) only if they have equal prior probabilities, which cannot hold for all x if the
prior is proper. For a discussion, see J. O. Berger and R. L. Wolpert, The Likelihood
Principle, Hayward, CA: Institute of Mathematical Statistics (1984 and 1988, Example
34).
4. Computation of the posterior is straightforward. For a discussion, see J. O. Berger
and R. L. Wolpert The Likelihood Principle, Hayward, CA: Institute of Mathematical
Statistics (1984 and 1988, Example 35).
5.
Rules (a) and (b) are stopping times and rule (c) is not.
6.
(a) Reject at the 5% level (and indeed possibly do so on the basis of the first observation alone).
1
As we do not stop the first four times but do the fifth time
3+1+2+5+7
3 1 2 5 11
exp(5)
3! 1! 5! 2! 7!
3 4 6 11 18
l(|x) 10 exp(5).
p(x|) =
Up to a constant
L(|x, y) = log l(|x, y) = (x + n) log + (n x + y) log(1 )
26
p(|x, y) (1 + ) 2 1 (1 ) 2 .
P
P
x
10. Eu(x) =
u(x)p(x) =
u(x)2P
suggesting you would enter provided
e < Eu(x). If u(x) x then Eu(x)
2x 2x = resulting in the implausible proposition that you would pay an arbitrarily large entry fee e.
11. By differentiation of the log-likelihood L(|x) = x log + (n x) log(1 )
with respect to we see that x/n is the maximum likelihood estimator.
Because prior for is uniform, that is, Be(1, 1), posterior is Be(x + 1, n x + 1).
The question deals with a particular case of weighted quadratic loss, so we find d(x) as
Ew (|x) =
B(x + 1, n x) B(x + 1, n x + 1)
x
E((1 )1 |x)
=
= .
E(1 (1 )1 | x)
B(x + 1, n x + 1)
B(x, n x)
n
If x = 0 or x = n then the posterior loss (a, x) is infinite for all a because the integral
diverges at x = 0 or at x = n so the minimum is not well-defined.
12. The minimum of E( )2 2a(E( )+a2 clearly occurs when a = E( ).
But since the prior for is uniform, that is, Be(1, 1), its posterior is Be(x+1, nx+1)
and so its posterior mean is (x+1)/(n+2); similarly for y. We conclude that the Bayes
rule is
d(x, y) = (x y)/(n + 2).
13. Posterior mean (resulting from quadratic loss) is a weighted mean of the component means, so with the data in Section 2.10 is
0
10
20
20
115 10
14
+ 0
=
+
= 0.370.
10 + 20
10 + 20
129 10 + 20 129 10 + 20
Posterior median (resulting from absolute error loss) can be found by computing the
weighted mean of the distribution functions of the two beta distributions for various
values and honing in. Result is 0.343. Posterior mode (resulting from zero-one loss)
is not very meaningful in a bimodal case, and even in a case where the posterior is not
actually bimodal it is not very useful.
14.
u(a ) if a 6
v( a) if a >
27
if 6 m(x)
u[m(x) d(x)]
(u + v) [um(x) + vd(x)] if m(x) < < d(x)
L(, m(x)) L(, d(x)) =
v[d(x) m(x)]
if > d(x)
while for m(x) < < d(x)
(u + v) [um(x) + vd(x)] < u[ m(x)] < u[d(x) m(x)]
so that
L(, m(x)) L(, d(x)) 6
u[m(x) d(x)]
v[d(x) m(x)]
if 6 m(x)
if > m(x)
28
(b) The second suggested loss function is of the 0Ki form and the decision depends on the relative sizes of p1 and 2p0 . Again this results in a decision in facour of
H1 .
16.
(a1 , x) = 10p0 ,
110
z(z) d + 20.38
(z) d
( 90)(|x) d = 69.23
2.450
2.450
90
p
= (69.23/2){exp( 12 0.0462 ) exp( 21 2.4502 )}
+ 20.39{(0.046) (2.450)}
= 3.15 + 9.66 = 6.51.
By similar manipulations we find that (a1 , x) = 34.3, (a2 , x) = 3.6 and (a3 , x) =
3.3, and thus conclude that a3 is the Bayes decision.
18.
( + 1)fn ( + 1)
.
(n + )(fn () + 1)
29
D.8
1.
Exercises on Chapter 8
30
We know that
+
i
1 X h bJS 2 bJS + 2
+
E
2i E biJS biJS
.
To show that the experession in brackets is always > 0, calculate the expectation by
+
first conditioning on S1 . For any value S1 6 r 2, we have biJS = biJS so that it
is enough to show that the right hand side is positive when conditional on any value
+
S1 = s1 > r 2. Since in that case biJS = 0, it is enough to show that for any
s1 > r 2
h
i
r2
JS
b
i E i S1 = s1 = i 1
E(Xi i | S1 = s1 ) 6 0
s1
and hence it is enough to show thati E(Xi i | S1 = s1 ) > 0. Now S1 = s1
is equivalent to (X1 )2 = s1 (X2 )2 + + (Xn )2 . Conditioning
further on X2 , . . . , Xn and noting that
P(X1 = y | S1 = s1 , X2 = x2 , . . . , Xn = xn ) exp 21 (y 1 )2
P(X1 = y | S1 = s1 , X2 = x2 , . . . , Xn = xn ) exp 21 (y + 1 )2
we find that
E [1 (X1 1 ) | S1 = s1 , X2 = x2 , . . . Xn = xn ] =
1 y(e1 y e1 y )
e1 y + e1 y
p
where y = s1 (x2 )2 (xn )2 . The right hand side is an increasing
function of |1 y|, which is zero when 1 y = 0, and this completes the proof.
7.
The bias is
b(k) = (AT A + kI)1 AT A I ,
31
We defined
H 1 = 1 1 B B T 1 B B T 1
so that
B T H 1 B = B T 1 B B T 1 B = 0.
32
9.
1
B T 1 = 1 1 = 0.
B=
1
0
1
0
21
0
0
21
0
1
0
1
,
so that
H 1 = 1 1 B B T 1 B B T 1
1 1
12 1
0
0
2
1
1 1
2 1
0
0
2
=
1 1
1 1
0
0
2
2
1 1
0
0
12 1
2
while
4
0
T
A A=
2
2
0
4
2
2
2
2
4
0
2
2
.
0
4
Now
K 1 = AT 1 A + H 1
4 + 12 1
21 1
1
1
2
4 + 12 1
=
2
2
2
2
2
2
4 + 21 1
21 1
12 1
4 + 12 1
10. See D. V. Lindley and A. F. M. Smith, Bayes estimates for the linear model
(with discussion), Journal of the Royal Statistical Society Series B, 34 (1971), 141
[reprinted in N. Polson and G. C. Tiao, Bayesian Inference (2 vols), (The International
Library of Critical Writings in Econometrics, No. 7), Aldershot: Edward Elgar (1995)].
11.
=
a
a
a mb
a(a mb)
1 + uT u a
(m + 1)b
ij =
a(a mb)
D.9
1.
Exercises on Chapter 9
http://www-users.york.ac.uk/~pml1/bayes/tprogs/integral.txt
In this case
Q = E[(y1 + y2 1) log + (y3 + y4 1) log(1 )]
giving
(t+1) =
E(y1 | (t) , x) + x2 1
E(y1 | (t) , x) + x2 + x3 + E(y4 | (t) , x) 1
x1 (t) /( (t) + 1) + x2 1
x1 (t) /( (t) + 1) + x2 + x3 + x4 (1 (t) )/(2 (t) ) 2
461 (t) /( (t)
Starting with (0) = 0.5, successive iterations give 0.4681, 0.4461, 0.4411, 0.4394,
0.4386, 0.4385, and thereafter 0.4384. This compares with a value of 0.439 found by
maximum likelihood in C. A. B. Smith, op. cit.
34
4. The proof that p( (t+1) | x) > p( (t+1) | x) in the subsection Why the EM algorithm works only uses the properties of a GEM algorithm. As for the last part, if
convergence has occurred there is no further room for increasing Q, so we must have
reached a maximum.
5. Take prior for variance S0 2 , that is, 35024 , and prior for mean which is N(0 , 0 )
where 0 = 85 and 0 = S0 /n0 ( 2) = 350/(1 2) = 175. We then have data with
n = 100, x = 89 and S1 = (n 1)s2 = 2970. Successive iterations using the EM
algorithm give the posterior mean as 88.9793 and at all subsequent stages 88.9862.
This compares with 88.96 as found in Exercise 16 on Chapter 2.
6. Means for the four looms are 97.3497, 91.7870, 95.7272 and 96.8861, overall
mean is 97.4375. Variance of observations from the same loom is 1.6250 and variance
of means from different looms in the population is 5.1680.
7. (a) For the example on genetic linkage in Section 9.3, see
http://www-users.york.ac.uk/~pml1/bayes/rprogs/dataaug.txt
or
http://www-users.york.ac.uk/~pml1/bayes/cprogs/dataaug.cpp
(b) For the example on chained data sampling due to Casella and George, see a similar
file with chained in place of dataaug.
(c) For the example on the semi-conjugate prior with a normal likelihood (using both
the EM algorithm and chained data augmentation, see a similar file with semiconj
in place of dataaug.
(d) For the example on a change-point analysis of data on coal mining disasters, see
similar files with coalnr.cpp or coal.cpp in place of dataaug.cpp (the difference is that the former uses only routines in W. H. Press et al., Numerical Recipes
in C (2nd edn), Cambridge: University Press 1992, whereas the latter program, which
is in many ways preferable, uses gamma variates with non-integer parameters).
8.
9.
See the program referred to in part (b) of the answer to the question before last.
10.
35
http://www-users.york.ac.uk/~pml1/bayes/rprogs/semicon2.txt
or
http://www-users.york.ac.uk/pml1/bayes/cprogs/semicon2.cpp
11. B. P. Carlin and T. A. Louis, Bayes and Empirical Bayes Methods for Data Analysis (2nd edn), Chapman and Hall 2000, p. 149, remark that
In our dataset, each rat was weighed once a week for five consecutive weeks. In
fact the rats were all the same age at each weighing xi1 = 8, xi2 = 15, xi3 = 22,
xi4 = 29, and xi5 = 36 for all i. As a result we may simplify our computations by
rewriting the likelihood as
Yij N (i + i (xij x) , 2 ), i = 1, ..., k, j = 1, ..., ni ,
so that it is now reasonable to think of i and i as independent a priori. Thus we
may set = Diag(2 , 2 ), and replace the Wishart prior with a product of idependent
inverse gamma priors, say IG(a , b ) and IG(a , b ).
This being so, it probably suffices to proceed as in the Rats example supplied with
WinBUGS.
A more detailed description of the general set-up is to be found in Taken from
Illustration of Bayesian Inference in Normal Data Models Using Gibbs Sampling,
Alan E. Gelfand, Susan E. Hills, Amy Racine-Poon, and Adrian F. M. Smith, Journal
of the American Statistical Association 85 (412) (1990), 972985, Section 6, pp. 978
979, which can be found at
http://www-users.york.ac.uk/~pml1/bayes/rprogs/gelfand.pdf
(LATEXsource at gelfand.htm). A program for generating random matrices with
a Wishart distribution based on the algorithm of Odell and Feiveson described in
W. J. Kennedy and J. E. Gentle, Statistical Computing, Marcel Dekker 1980, pp. 231
232 can be found at
http://www-users.york.ac.uk/~pml1/bayes/rprogs/wishart.txt
while the relevant extract from Kennedy and Gentle is at
http://www-users.york.ac.uk/~pml1/bayes/rprogs/rwishart.pdf
12.
http://www-users.york.ac.uk/~pml1/bayes/rprogs/linkagemh.txt
13.
36
http://www-users.york.ac.uk/~pml1/bayes/winbugs/wheat.txt
or
http://www-users.york.ac.uk/~pml1/bayes/rprogs/wheat.txt
14.
D.10
Exercises on Chapter 10
1.
Z
Ew(x) =
f (x)q(x)
p(x)
2
Z
p(x) dx >
2
2
Z
=
|f (x)|q(x)
p(x) dx
p(x)
|f (x)|q(x) dx
If we take
p(x) = R
|f (x)|q(x)
|f ()|q() d
then
2
2
Z Z
Ew(x) =
|f ()|q() d
2
Z
p(x) dx. =
|f (x)|q(x) dx
so that this choice of p(x) minimizes Ew(x)2 and, since we always have Ew(x) = ,
consequently minimizes Vw(x).
2.
1
2
The variance is
(1 )/n = 0.125 802 7/n.
37
(b) Use the usual change of variable rule to deduce the density of y. Then note that
q(yi )
1 yi2
1/((1 + yi2 ))
=
=
p(yi )
2/yi2
2 1 + yi2
and that we can take f (y) = 1 as all values of y satisfy y > 2.
(c) Substitute xi = 2/yi in the above to get
4
1
.
2 4 + x2i
(d) On writing x = 2 tan and 0 = tan1 ( 12 ) we find
Z
Eb
=
0
1
4
dx =
2 4 + x2
1 1
2 sec2 d = 0 / =
2 sec2
1
1
2 sec2 d =
4 2 sec4
1
2 cos2 d
4 2
1
{1 + cos(2)} d
2
0 4
0
= + 21 sin(2) 0 /(4)2
= tan1 ( 21 ) + 25 /(4 2 )
=
= 0.021 876 4
so that
V b = Eb
2 (Eb
)2 = 0.000 095 5.
3.
A suitable program is
n <- 10000
alpha <- 2
beta <- 3
x <- runif(n)
p <- function(x) dunif(x)
q <- function(x) dbeta(x,alpha,beta)
w <- q(x)/p(x)
pi <- w/sum(w)
samp <- sample(x,size=n,prob=pi,replace=T)
print(mean(samp))
print(var(samp))
38
A run of this program resulted in a value of the mean of 0.399 as compared with the
true value of 0.4 and a value of the mean of 0.0399 compared with a true value of 0.04.
4.
The lower limit resulting from a run of the program was 0.061 and the upper limit was
0.705. Referring to the tables in the Appendix we find that values of F corresponding
to a 90% HDR of log F6,4 are F = 0.21 and F = 5.75. It follows that an appropriate
interval of values of F4,6 is (1/F, 1/F), that is (0.17, 4.76), so that an appropriate
interval for is
2 0.17
2 4.76
66
3 + 2 0.17
3 + 2 4.76
that is (0.10, 0.76) (but note that in Section 3.1 we were looking for intervals in which
the distribution of the log-odds is higher than anywhere outside, rather than for HDRs
for the beta distribution itself).
5.
A suitable program is
r <- 10
phi <- rep(NA,r)
mu <- rep(NA,r)
S <- rep(NA,r)
n <- 100; xbar <- 89; SS <- (n-1)*30
mu0 <- 85; S0 <- 350; nu0 <- 4; n0 <- 1
phi0 <- S0/(n0*(nu0-2))
S[1] <- S0
nustar <- nu0 + n
39
for (i in 2:r) {
phi[i] <- (1/phi0 + n*nustar/S[i-1]){-1}
mu[i] <- phi[i]*(mu0/phi0 + n*xbar*nustar/S[i-1])
S[i] <- S0 + (SS+n*xbar2) - 2*n*xbar*mu[i+1] +
n*(mu[i]2 + phi[i])
cat("i",i,"phi",phi[i],"mu",mu[i],"S",S[i],"\n")
}
mustar <- mu[r]; phistar <- phi[r]; Sstar <- S[r]
cat("mu has mean",mustar,"and s.d.",sqrt(phistar),"\n")
cat("phi has mean",Sstar/(nustar-2),"and s.d.",
(Sstar/(nustar-2))*sqrt(2/(nustar-4)),"\n")
We get convergence to = 88.993 and = 0.322.
6.
so
( ) log
from which it is clear that this can happen if and only if = (when I(q : p) = I(p :
q) = 0).
7.
1
2
8.
We find
Z
I(q : p) = q(x) log{q(x)/p(x)} dx
Z
n
o
.
=
1 exp(x/) log 1 exp(x/) 2(2)1/2 exp( 21 x2 )
0
1
2
1
2
log(8) log 1 + 2
and hence
1
I(q : p)
= + 2
2 I(q : p)
1
= 2 +2
2
10.
The model is
( 14 + 14 , 14 , 14 (1 ), 14 (1 ) + 14 )
and the values quoted are x1 = 461, x2 = 130, x3 = 161 and x4 = 515. An R
program for the ABC-REJ algorithm is
N <- 100000
etastar <- c(461,130,161,515)
n <- sum(etastar)
eps <- 22
d <- rep(NA,N)
trial <- rep(NA,N)
41
for (j in 1:N) {
etatrial <- runif(1)
inds <- sample(4,n,replace=T,p=c(0.25+etatrial/4,etatrial/4,
(1-etatrial)/4,(1-etatrial)/4+0.25))
samp <- c(sum(inds==1),sum(inds==2),sum(inds==3),sum(inds==4))
d <- sqrt(sum((etastar-samp)2))
if (d <= eps) trial[j] <- etatrial
}
eta <- trial[!is.na(trial)]
k <- length(eta)
m <- mean(eta)
s <- sd(eta)
cat("k",k,"m",m,"s",s,"\n")
Similar modifications can be made for the other algorithms.
11. Let Mi denote model i (i = 1, 2, 3, 4) with model parameters i , where 1 =
(, ), 2 = (, , ), 3 = (, , ) and 4 = (, , , ).
The reversible jump MCMC algorithm is as follows:
Initiate the algorithm by choosing an initial model, m to start in and initial parameter values m .
Each iteration t:
Update the model parameters mt1 , where mt1 is the model which the
algorithm is residing in at the end of iteration t 1.
Propose to move to a new model Mt0 with parameter values 0 . Calculate
the acceptance probability for the move and decide whether to accept or
reject the new model (and parameter values).
The models are nested with model 4 being the full model with sub-models model 2 and
model 3 and the simplest model 1. Each model either includes or excludes and
(exclusion of a parameter is equivalent to setting it equal to 0).
Given that the current model j, employ the following reversible jump:
Choose randomly either and , and propose to move to model i has the chosen
parameter included (excluded) if the parameter is excluded (included) in model j.
A bijection is required for transforming between the models. Therefore for inclusion
of a parameter propose u N(0, 2 ) for suitably chosen 2 and propose to set the
parameter equal to u. (This works best if xi and ti have been standardised to have
mean 0 and variance 1.) Leave the other parameters unchanged. Thus the Jacobian of
the transformation is simply 1.
42
Let Lj (j ) denote the likelihood of j given model Mj . Then the acceptance probability for the inclusion of a parameter (either or set equal to u) is
)
(
Li (i )
(i , Mi )
1/2
1
.
min 1,
Lj (j ) (j , Mj ) 1/2 exp( 1 u2 / 2 )/ 2 2
2
43