Está en la página 1de 18

Locally Optimal Designs for Estimating Parameters Author(s): Herman Chernoff Source: The Annals of Mathematical Statistics, Vol.

24, No. 4 (Dec., 1953), pp. 586-602 Published by: Institute of Mathematical Statistics Stable URL: http://www.jstor.org/stable/2236782 Accessed: 05/02/2010 06:29
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=ims. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The Annals of Mathematical Statistics.

http://www.jstor.org

LOCALLY OPTIMAL DESIGNS FOR ESTIMATING PARAMETERS BY


HERMAN CHERNOFF

Stanford University 1. Summary. It is desired to estimate s parameters 01, 02, .., . There is available a set of experiments which may be performed. The probability distribution of the data obtained from any of these experiments may depend on
01,
02,
,

Ok,

k > s. One is permitted to select a design consisting of n of

these experiments to be performed independently. The repetition of experiments is permitted in the design. We shall show that, under mild conditions, locally optimal designs for large n may be approximated by selecting a certain set of r < k + (k - 1) + -.. + (k - s + 1) of the experiments available and by repeating each of these r experiments in certain specified proportions. Examples are given illustrating how this result simplifies considerably the problem of obtaining optimal designs. The criterioni of optimality that is employed is one that involves the use of Fisher's information matrix. For the case where it is desired to estimate one of the k parameters, this criterioincorresponds to minimizing the variance of the asymptotic distribution of the maximum likelihood estimate of that parameter. The result of this paper constitutes a generalization of a result of Elfving [1]. As in Elfving's paper, the results extend to the case where the cost depends on the experiment and the amount of money to be allocated on experimentation is determined instead of the sample size. 2. Introduction. Before formulating the problem precisely we shall consider a simple special example which will illustrate many of the points involved. Consider the regression problem (1) y=Y + Ax + u -1 < x <1

where u is an unobserved disturbance which is normally distributed with meani 0 and variance 1. The disturbances of successive observations are distributed independently of each other. Suppose that we are permitted to select a set of n values of x between -1 and +-1 and to observe the corresponding values of y. If our objective were to estimate 6, it is well kniownithat the best procedure consists of using x = +1 for half of the observations an-dx = -1 for the other half. In this problem we may regard the observatioin of a y corresponding to a given value of x as an experiment Ex. The class of available experiments is the set Er: -1 < x ? 1 }. The parameter in which we are interested is 8, but the distribution of the data depends on y also. In this case y is a nuisance parameter. The optimal design consists of using each of the two experiments E1 and E1
Received 10/28/52.

586

LOCALLY OPTIMAL

DESIGNS

587

half the time (if n is even). It should be noted that if the set of experiments available were decreased so that E. is available only for -1 < x < 1, niooptimal design could be found. This is essenitially due to the fact that given any design, a better one can be obtained by spreading out the values of x eveni more (i.e., by taking values of x closer to the end points -1 and + 1). A peculiarity of this particular problem is that no matter how- many times a particular experiment Ex is repeated, no reasonable estimate of 6 can be determined. At least two distinict experiments are required. Another peculiarity of this problem is that the variance of 5, the maximum likelihood estimate of a does not depenid on the value of -y and &.In general, this latter property will not hold and we shall be restricted to obtaining locally optimal designs, that is, designs which are optimal if the parameters are known to be close to certain specified values. We may consider a variation of the above problem. Suppose that it is desired to estimate y and that a is the nuisance parameter. Then it is well known that ani optimal design consists in repeating the experiment Eo, n times. An equally optimal design may also be obtained by using any set of x's so that x = 0. 3. Information matrices and mixed experiments. The formulation of our problem will involve the concepts of information matrices [2] and of randomized or mixed experiments. For the sake of notational convenience and in order to clear up some technicalities that arise, we shall discuss these concepts before proceeding to the formulation. R. A. Fisher defined the information matrix X(8) for an experiment involving o) * Ok, by the parameter 0 = (01 ,

(2)

X()

-E

a32L
=I

UxiK)II

i,j = 1, 2,.

where L is the logarithm of the likelihoodfunction. It should be noted that X(8) ordinarily depends on 0. It is easily seen and wel known that (3)

X(O) =

E faL aL'| ao1)' ~aoi

and hence that X(8) is a nonnegative definite symmetric matrix. Another well kn-own property of information matrices is that of additivity. , En, are experiments yielding information matrices That is, if El, E2, X1(0), X2(0),1 -.-, Xn(0), the combined experiment or design which consists in carrying out each of these experiments independently yields the information matrix Xl(8) + X2(0) + + X (0) The experiment w-hich consists in carrying out one of the available experiments, this one to be determined by a random device, is called a randomized or mixed experiment. Hence if pi, P2, X -* * pn are positive numbers adding up consists in carrying out Es with probability pi is to one, the experiment wNvhich mixed. It is easily seeni that this experiment has informatioin matrix piXi(0) + + P2X2(O) . . . + p.X. (0).
...

588

HERMAN

CHERNOFF

= out m times and let - (', = (01, 02, ***, estimate of 0

Let an experiment E with positive definite information matrix X(0) be carried 02, . . , O) be the resultingmaximumlikelihood
Ok).

Under mild conditions [3], the covariance


-

matrix of the asymptotic (as m -*

oo) distribution of V/r(O

0) is given by

X-(a) = jjxt'(G) 1J i, j = 1, 2, ., k (4) at all points of continuity of X(a). This property suggests the usefulness of information matrices in comparing designs. Unfortunately, it is possible for an information matrix to be singular and hence to fail to have an inverse. To allow for this situation, we extend the notion of inverse to the class of nonnegative definite symmetric matrices. Let X be nonnegative definite symmetric and let Y be any other symmetric matrix so that X + XY is positive definitefor positive X small enough. Then, let (5)
=

JJ

lim (X + XY)'.

In Appendix A it will be shown that this new definition is consistent with the usual definition and is statistically meaningful. Also, if x"i and xii are finite, then xtj is finite and xii, xii and x"jare independent of the particular Y selected. It should be noted that X-1 is a continuous function of X on the set of positive definite symmetric matrices but that elements of X-' may fail to be continuous for X singular. 4. Formulation. In this section we shall formulate our problem and then indicate the reasons behind this formulation. Using the special example previously mentioned, we shall examine conditions which we shall impose to obtain the desired results. There is a set {E} of experiments available. The distribution of the data from one of these experiments depends on 0 = (01, 02, 0k). The information matrix X(0) may be characterized by the elements oii and above the main diagonal. These elements arranged in some order may be considered as components of a vector in k(k + 1)/2 dimensional space. This vector may be identified with the matrix. Since we are interested in locally optimal designs, that is, designs that are optimal when 0 is known to be close to some given value, say ()atnin t Y~~ (0) o(O) -= 'a(0) I At , ** a (01, , t ,,)) we confine our attention to X(0(?)). Let R1 be the set of vectors corresponding to the X(0(?0) for the experiments of {E}. Let R be the convex hull of R1, that is, a typical clement of R is the convex linear combination p1Xi + p2X2 + * * * + p,Xn where X1, X2, X * X, are elementsof R1 and pi, P2 2 * * * pn are positive numbersadding up to 1. From the previous section, it follows that R represents the set of information matrices of the class of mixed experimenlts.
1 The author is indebted to Max A. Woodbury and the referee who independently pointed out a close relationship existing between this definition of inverse and the concept of the pseudo inverse of a matrix.

LOCALLY OPTIMAL

DESIGNS

589

We shall be interested in showing that under certain conditions, an element X of R which minimizes

(6)
(7) r
,

v8(X)

= X"I +

X22 +"

+ xSs

s<k
+ 1)

can be representedas a convex linear combination of k + (k-1) + '+ (k-s

elements X1, X2 Xr of R1. It is evident that X corresponds to a mixed experiment which is "optimal" in the sense that if a were based on n repetitions of this experiment, the sum of the variances in the asymptotic, (as n -* oo), distribution of n(bi( 0-) V2n( - 02), * * n*, xn(s - 0), would be a minimum. Certain questions naturally arise concerning the usefulness of this criterion. First, it may be asked whether this criterion is relevant if one desires to confine oneself to pure experiments, that is, elements of {E}. Here we note that as n X may be approximated by (niXl + n2X2 + * * + nfrXr)/n where ni, * 00, n2, * * *, nr are positive integers adding up to n. The latter expression represents 1/n times the information matrix corresponding to the design where Ei is carried out ni times. The answer to the last question would be yes if it were shown that v8(X) is continuous at X = X on the convex set generated by X1, X2, ***
Xr.

One may also ask why our criterion should involve information matrices. Such a criterion has a certain aesthetic appeal. Furthermore, we shall discuss in Appendix B how the main result yields a justification of this criterion. Finally, one may seriously inquire whether a "good" design must minimize the sum of the asymptotic variances. In fact, we shall see in Appendix C that very often when one is interested in s parameters, a sound criterion for a "good" design involves minimizing tr(AV) where A is a nonnegative definite symmetric matrix of rank less than or equal to s and V is the covariance matrix of the asymptotic distribution of V/n(O - 0). By a linear transformation of 0 this criterion may be transformed to that of minimizing the sum of no more than s asymptotic variances. Sinice certain conditions must be imposed to obtain our desired result, we shall explain these conditions by referring to the example considered in Section 2. In that example the experiment E, yields a likelihood function with logarithm given by L = -2 log 27r (y -__X)2 Let 0- = 6 and
02 =-y.

The corresponding information matrix is given by

xz=

x ix

x 1

For this example, R1 is the set of all points in three dimensional space whose coordinates are (x2,x, 1),-1 < x < 1. This set represents a segment of a para-

590

CHERNOFF HERMTAN

bola lying in a plane of three dimensional space. The convex set R generated by R1 is the set bounded by R1 and the line segment connecting the end points of R1. The optimal design consisting in using X1 and X-1, each half the time, corresponds to the mid-point of the above-mentioned line segnent, that is, the point (1, 0, 1). We mentioned previously that if x is restricted to -1 < x < 1, no optimal nth order design exists. Note that in this case R has been changed by deleting the boundary line segment on which the optimizing point (1, 0, 1) lies. Although we can get arbitrarily close to this point when -1 < x < 1, we cannot reach it. In general, to prevent this minor difficulty we shall impose the condition that R be closed. Then R will contain all of its boundary points. A second condition that we shall impose is that R be bounded. That is, no element of Xx can be made arbitrarily large by selecting Ex properly. This condition is satisfied in our example, for there no element cail exceed 1 in absolute value. If, however, the example were modified to permit all real values of x, the element of the first row and first column of X. would be unbounded. Note in this modified example, that if the parameter y were known, a could be estimated with arbitrarily small variance from one experiment by taking x large enough. This interpretation of the effect of unbounded R applies to the general case, too. If some element of X is unbounded, the fact that X is nonnegative definite implies that some element of the main diagonal of X is unbounded. If the ith element of the main diagonal of X is unbounded, Oican be estimated with arbitrarily small asymptotic variance if all the other parameters are known. 5. Main results. In this section we state our main results. The proofs will first be given for s = 1 and then extended to s > 1. THEOREM1. If R is closed and boundedthere is an element X of R which mini+ X88 and which is a convex linear combination mizes v8(X) = X" + X+ + , Xr of R1. + 1) elements X1, X2, of r < k + (k- 1) + -. + (k-s FurthermoreX1, X2 , ... (X) is a continuousfunction may be chosen so that v8 , at X = X with respect to the topology of the convex set generatedby X1, X2,
.

Xr

We treat the case s (8)

1 where we let z(X) = vi(X) = x".

In outline, our proof for s = 1 consists in obtaining an expression for


8(X, A) = z(X + A)
-

z(X)

which will be used to show the existence of an X(?) c R which minimizes z(X) and such that X(?) lies on a supporting hyperplane of R. It will also be evident that z(X) is constant on a sub-hyperplane. The dimension of this sub-hyperplane leads to the existence of X with the desired properties. The complexity of the details of the proof arise mainly,from difficulties in the case that X(?) is singular since z(X) is not continuous at singular X.

LOCALLY OPTIMAL DESIGNS LEMMA

591

matricesand definitesymmetric 1. If X and X + A are nonnegative


6(X, A)
=

z(X) 5 oo, then (9)

z(X + A) -z(X)
E(X, A)

-E(X,

A) +

2(X, A)

where
(10)
(11)
= x-o+

rim ex(X, A),

Ex(X, A) = [(X + X1)-'A(X + X1)-,1,

(12) (13)

X(X, A) = lrn

(X, A),

A) MM(X, = [(X + XI)_1A(X + XI + A)-'A(X + XI)-%1

and E(X, A) is a linear function in A and q(X, A) _ 0. PROOF. Since the matrices (X + XI) and (X + A + XI) are positive definite for X > 0 (14)
(15) (X +
6(X,

A)

lim [z(X + A + XI) - z(X + XI)],


(X + XIF) =-(X + XIY)'A(X + X1)

xi

+ A)-_

+ (X + XI)-'A(X + XI + A)-'A(X + XI)'.


Since z(X) 0 oo it follows (see Appendix A, property 1) that limx.o+ (X + XIP) exists and is finite for each i. Let us denote this limit by Xl = Xi'. Hence (16) E(X, A) =
lim Eq(X,

A)

>E X"iAijxl.

It follows that as X -+ o+, 7)(X, A) converges (possibly to + oo). Since the matrix, of which ,(X, A) is the element of the first row and first column, is nonnegative definite it follows that 7(X, A) > 0.
LEMMA

matricesand definitesymmetric 2. If X and X + A are nonnegative

z(X) = oo, then limA^o z(X + A) = oo. (We write A -> 0 if each element of A approaches zero. Note that A converges to zero subject to the condition that X + A is nonnegative definite and symmetric.) PROOF.If A and B are symmetric matrices we use the notation A < B or B > A if p'Ap < p'Bp for every vector p. If A and B are positive definite and A < B, it is easily seen that B-' < A-' by diagonalizing B and A. Also,
z(X + A) = Iim [(X + A +
xI)']11.

Let d be the largest characteristic value of A. Then (X + A + XI)


-<

(X + (X + d)I),

(X + A + XI)-'

>

(X + (X + d)I)-

z(X + A) > (X + dI) ".

592

HERMAN

CHERNOFF

As A -O 0, d -O 0. Furthermore z(X) = oo. Hence lima0o (X + dl) = co, and our lemma follows. 3. LEMMA If R is closed and bounded,z(X) attains its minimum on R. PROOF. Let w = infx,R z(X). Because R is bounded, w > 0. The case w = 00 is trivial and hence we assume 0 < w < oo. Since R is closed and bounded there is a sequence {X(t) } such that X(t) e R, z(X(t)) -> w and {X(i) } has a limit point X(0) e R. Let A(i) = x()- X(?). It suffices to show that z(X(?)) < w. By Lemma 2, z(X(?0) oo. Hence )
z(X(0) + A") - z(X0)
=

-E(XW ) A()) +

7(X(

A(X))

Since

is linear in A(i) lim IE(X?, A t)


i-0*OO

O.

But 7(X(?),A(t)) > 0. Letting i -> oo, we obtain w - z(X(?)) > 0. Hereafter we shall assume that R is closed and bounded. Then let p be the lowest rank associated with those elements of R which minimize z(X). Now we assume that X(?) minimizes z(X) on R, z(X(?)) 5 oo and X(?) has rank p. We shall now reduce the set under consideration from R to R n H1 where H1 is a hyperplane containing X(?) and H1 has dimension p(p + 1)/2. In the event that X(0) is nonsingular, no reduction from R has been effected. We shall not consider the trivial case X) -0 for then w = oo. We construct H1 as follows. Corresponding to X), there is an orthogonal matrix P = 11 such that (17) where
X(?-

PIEP

(18)

E=(

I
\O

0/

and EI is a diagonal p X p matrix where all the elements on the main diagonal are positive. We define H1 as the set of X for which

(19)

P(X - X(?))P' = (D,


0 0

whereDI is a symmetricp X p matrix. It is evident that H1 is a p(p + 1)/2 dimensional hyperplane containing X(?). We note that the nonnull set R n H1 is the convex hull of R1 n Hi and is also closed and bounded. LEMMA 4. If X(1) E R n H1, X(1) has rank p, z(X(1)) 00o, and X() + A e HI1, then 77(Xl), v'A)approaches zero at least quadratically as v -O 0 and z(X) is continuous at X = XV) (in the topology of H1). PROOF. Since X1) E R n Hi and X(1) has rank p,

PX(')P'=(F1
r0

0)
0s

LOCALLY OPTIMAL

DESIGNS

593

where FI is a positive definite symmetric matrix. If X(1) + A e H1,


PAP/ =

where DI is symmetric. Let PI be the vector consisting of the first p elements of the first column of P. If FI and F1 + vD, are nonnegative definite we have, (20) q,(X('), v/A) = pl(Fl + XI)-'(vD,)(Fl + XI + vD,)-'(vD,)(Fl + XI)-'p1.

But FI is positive definite and for v small enough FI + vDi is also positive definite. Therefore, (21) and Ji (X() VA)/V2 = PF-'DI'DF7'DIFlpI < oo. (X PIFI VA) = (pDI)(Fi + VD,)-'(vDI)F7'p-1

Similarly one may obtain (22) e(X(1), vA)= p'Fij'(vD1)F1'pr.

The continuity of z(X) at X = X(1) follows immediately from equations (21) and (22). LEMMA 5. There is a sub-hyperplaneH2 of H1 which is a supporting hyperplane of R n Hi at X(?).H2 has dimension2(p(p + 1))-1. PROOF. Suppose X = X(0) + A c R n H1. By convexity X(0) + vAeR nHi for0 < v <1.

If e(X(?)2A) > 0, it follows from Lemma 4 and the linearity of e that z(X(?) + vA) - z(X(?)) < 0 for small enough positive v. This contradicts the fact that X(?) minimizes z(X) on R. Hence E(X(),X - X(0)) < 0 The sub-hyperplane H2 of H1 defined by the restriction for X,? R n H1.

(23)

X E(X(?), - X( )) = p E1 D1E-1p,

is a supporting hyperplane of R n H1 at X(?). The fact that equation (23) actually constitutes a restriction on X depends on the fact that pr # 0, and this in turn is easily established from z(X(?)) = oo, which implies that the last k - p elements of the first column of P are all zero. LEMMA 6. There is a sub-hyperplane H3 of H2 so that z(X) = z(X (0) for X ? R n H3 . The dimension of H2 minus that of H3 is no more than p - 1. = 0. From equation (21) it follows that PROOF.For X - H2, 2E(X(?), X -X()) if E, + DI is nonsingular, the restriction (24) PIES'DI = 0

594

HERMAN

CHERNOFF

A) implies 77(X("0, = 0 and hence z(XV0)+ A) = z(X(?)). This implication holds even if E, + DI is singular and nonnegative definite. For then we may apply equation (20) with FI replaced by Er and v = 1. We note that subject to restrictioni (24) pI(EI + XI)-1DI = 0(X). Furthermore (FI + XI + DI)-1' < (1/X)I whence 7X(X(c),A) = O(X) and z(XV0)+ A) = z(X(?)). Equation (24) constitutes a set of at most p linearly independent restrictions A) on X = X?0)+ A. However, since the restriction e(XV0), = 0 may be written = 0 it follows that on H2, the restriction (24) constitutes a set pEIE'DrEI'pI of at most p - 1 linearly independent restrictions. Let H3 be the sub-hyperplane of H2 on which P1Ei1D1 = 0. Lemma 6 follows. LEMMA There is an elementX of R which minimizes z(X) and which is a con7. vex combinationof a set of r < p elementsof R1 n H2. PROOF. The set R n H3 is closed, convex and bounded. There exists at least one element X of R n H3 which is not a convex combination of any two distinct elements of R n H3. By Lemma 6, z(X) = z(X(?)), that is, X minimizes z(X) on R. The matrix it is an element of H2 which supports R n H1. Hence X is a convex combination of elements of R1 n H2 . Let r be the least number of elements of R1 n H2 which are required to yield I as a convex combination. Then X is an interior point of R n H4 where H4 is an r - 1 dimensional sub-hyperplane of H2. Since X was selected so that it is not an interior point of any line segment of R n H3, H4 n H3 must have dimension 0 and hence r - 1 < p - 1. LEMMA 8. Theorem1 is valid for s = 1. PROOF. Lemma shows the existence of X and the continuity property is given by Lemma 4. Now that Theorem 1 has been established for s = 1, we shall extend the proof for s > 1. In that case we change our notation slightly. We let (25) (26) (27)
(28)

z(X) = v.(X) b(X, A)


e),(X, A) = e((X,

= =

x" + x22 +

...

+ x88

z(X + A) - z(X) + e,8'(X A)

A) + E2)(X A) + *
imn ex(X, A) x-.*o+
71(2) (X, A) + *

e(X, A) =

(29)
(30)

(X A) = -71(

(X A) +

+ W ((X,

q(X, A) = lim -(X,

A)

(X, W(X, A) and 7p(i A) are obtainied from the ith diagonal terms of the where e( matrices appearing in equations (11) and (13), respectively. Then Lemmas 1, 2, 3, and 4 may be established as in the case for s + 1. Equations (20), (21), and (22) are slightly modified. To illustrate, equation (20) becomes (31) 1(X
)

i ()'(Fr

+ XI)'(VDX)(Fi + XI + vDrY' (vDr)(Fr + XI)-'p1

LOCALLY OPTIMAL

DESIGNS

595

where p(') is the vector whose components are the first p elements of the ith column of P. It will be useful to note later that the condition z(X(?)) - 0o implies p elements of the first s columns of P are all zeros. In fact, that the last k) p( p(2)* *, ), are then unit orthogonal vectors. 5 follows as before with the restriction defining H2 replaced by Lemma

(32)

((X0 , X

=
i=1

p p()',E7lDIE71p

0.

In Lemma 6, the wording must be modified so that the dimension of H2 minus that of H3is no more than p + (p-1) + - - * + (p - s + 1)- 1. The change is due to the fact that restriction (24) is now replaced by (33) PE7'Dl
=

where P, is the (p X s) matrix of rank s consisting of the first p rows and s columns of P. It is possible to rearrange the rows and columns of DI (maintaining symmetry) so that equation (33) may be expressed by (34)
(QllQl2) Q1

IDi,
\D21

D12\

0
D22/

where Qll is nonsingular, Qll, Q12, Di1, D12 = D21 and D22 are of order s X s, s X (p - s), s X s, s X (p - s) and (p - s) X (p - s), respectively, and
(/D11 D21
2

D22/

is the result of rearranging the rows and columns of Dr. But then
D21 = = Dll -Q11'1Q12D22
1Q12D12 = -Qj111Q12D1.

Hence, after the restriction (33), D is determined by D22 and has only (p - s) (p - s + 1)/2 linearly independent elements. Hence, equation (33) imposes p(p
+1)_

(p-s)(p-s 2

+ 1)

s(2p-s 2

+ 1)
(p -1 + ***+ (p+ 1

= p +

independenit linear restrictions on the symmetric matrix D. But as before one of these restrictionis is lost on H2 , for (33) implies E(X(?0, = 0. Lemma 7, A) with p replaced by p + (p - 1) + - + (p - s + 1) follows as befpre. Theorem 1 is once more an immediate consequence of Lemmas 4 and 7. 6. Remarks. In many cases, the cost of experimentation depends on the experiment. Then the usual design problem is to maximize information, given a certain amount of moneV to spend on experimentation. Our results of Section

596

HERMAN

CHERNOFF

5 are easily seen to apply in this case, too. Here we identify with each experiment a matrix (35) Y(6) = X(6)/c

where c is the cost of performing the experiment. The matrix Y(6) represents information per unit cost. The matrix which we associate with the mixed experimnent when Ej is carried out with probability pj, i = 1, 2, ... , n, is
n

E piXi(0)

Cjpi Yi(o)
cipi

(36)

y(0).i
pic

i-1 i=1 It is evident that a reasonable criterion for a good mixed experiment is that be vJ[Y(G)] mirnimized. 1 and s = k in the In [1], Elfving obtained our result (Theorem 1) for s case of linear regression. Elfving also indicated an elegant geometrical method of obtaining the optimal design. The methods used by Elfving depend only on the assumption that for any nonrandomized experiment, X(0) may be represented in the form (37) X(8)
=

Ixij(6) I -

Xi. xi(6)xj(6)

Hence, these methods may be applied in many examples which are not regression problems. 7. Examples. In this section we shall discuss some examples in order to show how the results of Section 5 may be used to reduce considerably the amount of work required to obtain optimal designs of experiments. The results of Elfving [1] make it unnecessary for us to consider the important and interesting examples from linear regression theory. EXAMPLE 1. Suppose that G > O, d > ?, represents the probability that an insect will survive a dose of d units of a certain insecticide. It is desired to select n dose levels to try on n insects to estimate 0 in an optimal fashion. Here the information matrix corresponding to a particular d is given by

(38)

pd =

(39) and

Xd

d2e-d/(1

e-d)

d > 0

(40)

Xd=0

d = O.

The conditions of Theorem 1 are satisfied and hence it follows that a locally optimal design consists of repeating one dose level n times. Maximizing Xd we find that this dose level satisfies

LOCALLY OPTIMAL DESIGNS

597

2e0d + (40)

=2

d ; 1.6

For this locally optimal dose level, the probability of survival Pd is very close to .2. An interesting by-product is that for the general design the maximum likelihood estimates are not too simple to obtain or study. For the optimal design the estimation problem is that of estimating the probability associated with a binomial distribution.2 EXAMPLE Let A and B represent two characteristics that members of a 2. population may or may not have, for example, the habit of smoking and heart disease. Let A and B represent the complementary characteristics. It is desired to estimate the degree of dependence of the two characteristics A and B. Five experiments may be performed. These correspond to examining individuals either: (i) at random; (ii) with characteristic A; (iii) with characteristic A; (iv) with characteristic B; or (v) with characteristic B. The parameters involved are PA ,PB, PB = 1 - PB, and 0 = PAB - pApBwhere p with a subPA, PA = 1 script indicates the proportion of the population with the characteristics of the subscript. There are three independent parameters. In the case where PA and PB are known, it has been shown by Blackwell [4] that to test for independence an optimal design involves repeating one experiment n times. This experiment is the one which corresponds to the smallest of the four probabilities PA, PA, PB, and PB. Here, Theorem 1 may be applied to yield the same result if it is desired to estimate 6 when 0 is assumed to be close to zero. Suppose, now, that our problem is modified so that PB is only approximately known. Here, Theorem 1 applies with k = 2, s = 1, and tells us that we should use at most two of the experiments. Furthermore, since selecting an individual at random is equivalent to a mixture of two of the other experiments we may confine our attention only to pairs of the other four experiments. Let us now evaluate the information matrix XA corresponding to examining an individual with characteristic A, this information matrix to be evaluated at 0 = 0. In this experiment the probability of observing a smoker is PB + O/PA . If the individual observed has characteristic B L = log (PB and dL
APB
__________

c)

(PB +
1
PB

T,)PA

~~PA ~~

The author wishes to express his thanks to Fred Andrews for his assistance on this example.
2

598

HERMAN

CHERNOFF

If the individual observed has characteristic B,

L = log (1
and

PB -

),
PA

= (

-1 ~~PBT) PA

dL
9PBPB

~-1
pA,

Hence
2 Pi

(41)

XA

1
PBPB

PA

PA~
PA P-XPB

PA AP

Similarly (42) X1
PA PA PB PB PA PAPX

PA

(43) and (44)

XB

1
PA PA PB PB

PB0

Oo

XB|
PA Pi

P1l
PB PB|

From the remarks of the previous section, it follows that Elfving's results [1] may be applied. The geometrical figure that is developed shows immediately that the optimal design consists of using either B, or B or A and A each half the time, according as to which of the numbers
2-v\pA

VPB'

pi

is greatest. This last result can also be obtained directly without computational difficulty. 8. Appendices. A. Extension of the inverse to nonnegative definite symmetric matrices.
APPENDIX

LOCALLY OPTIMAL

DESIGNS

599

Here we extend the notion of the inverse of a matrix to nonnegative definite symmetric matrices and show how this extension has statistical significance. Suppose that X is nonnegative definite and symmetric. Let Y be a symmetric matrix so that X + XY is positive definite for positive X sufficiently small. Then we define the inverse of X relative to Y by (45) X-I = irn (X + Xy)-.

The usefulness of this definition arises mainly from the following property. PROPERTY 1. The diagonal elements of Xy' are independent of Y. Furthermore, if the ith and jth diagonal elementsof XYl are finite, the (i, j) elementof Xy' is finite and independentoj Y. If the ith diagonal elementof X'y is finite, all the elementsof the ith row of X2' are finite. PROOF.Corresponding to X there is an orthogonal matrix P such that (46) P'XP = E
=

l
\O

0
0/
... e,

where E11is a diagonal p X p matrix whose diagonal elements el, e2, are positive. We define F by (47) P'YP =F F
(Fil

F12\
F21 F22

Then F22is positive definite and (X + XY)_l


=

P(E + XF)Y1P'
El-' + 0(X) -[El-

+ O(X)]F12F2 21

= P-

,'F22 [E211

+ 0(x)]

+ [F22-XF21 [E11
-

. XF11]-1F12J)1

Let pi, and Pi2 represent the first p and the remaining k column of P. Then
(X + XY)Yj = p1lElj1plp'2F2

p elements of the ith

2F2lElljpji

pilEllFl2F22lPj2

(48)

+!>, pt2F2'pJ2

+ pt2F2'F2iEi'Fi2F2p1j2 + O(X).
=

Suppose that lim .o+(X + XY)Y is finite. Then Pi2 (49) iim (X + XY)" = Pi1Ej1Pi1 -

and
2
eh

p
X @ -4

h=l

E phi

which is independent of Y. Also (50)


-,o+ X

lim (X + xY)Y = p1Ej1'pji

p'1E-ji1Fl2F2pPj2

600

HERMAN CHERNOFF

are finite. Then Pi2 (51)

which is finite (though it depends on Y). Suppose that (X + XY)" and (X + XY) = = 0, p,2 0, and
--. >, 0+

irn (X + XY)ij = pl

=
h=1

phi ph
eh

which is independent of Y. Let us now assume that the probability distribution of the data of an experiment depends on sp = (sI , (P2, X * * * Xpa) and that the information matrix with respect to 'p is positive definite. Let us assume that the above distribution is independent of i = (9/1, X * * X h). Suppose now, that the parameters in , '2, 0 = (01, 02, X * - 0X) and -1 = (X1, fl2 X * * ,X ta) where which we are interested are , a + b = c + d and there is a one to one relationship between (p, ik) and (0, t). In fact, suppose that (53) (54)

(w)

v = 92g(P, ik)

where the Jacobian of the transformation is not zero and where for each component of X the partial derivative with respect to some component of it does not vanish. We also suppose that the likelihood may be expressed in terms of (0, n), that is, (55) L = u(p)
=

w(0, )

We are interested in the following information matrices: (56) U


=

Efu ,u,
W(c:
(W

(57)

W =E{(,
' Wo

Wo oW;)}
'q )q

We??)
W ),

where us. is a row vector whose ith component is &u/&i We shall also use the p. notation 'po to denote an a X c matrix whose (i, j) element is 3'0ji/a0j.We assume that U is positive definite and U-' represents a covariance matrix E.V For our extension of the notion of the inverse of a matrix to be suitable, it should yield for us the following property. PROPERTY 2. The matrix W-' may be decomposedas follows: (58) W-1 (W=
We"

where Wjje uniquely defined and is given by is = Wee = ov. (59) , and where the diagonal elements of W7 are infinite.
PROOF:

= A'( . n

LOCALLY OPTIMAL

DESIGNS

601

where
A
=
_

0
)

(#

(i/)

is nonsingular. Furthermore [w

+ XA'

(
{

) A]
/XE$

A-'

) A'"
I + /f+1m-77 74

0 WA' (? I) 1A=

Property 1, together with the fact that not all components of t7,pvanish, yields our desired results.
APPENDIX B. Justification for the use of information matrices. We sketch here a brief justification for the use of information matrices in our formulation. This justification presupposes that we are interested in the variances of the asymptotic distribution of the estimate based on our design. Rubin has shown [5] that under mild conditions these variances are greater than or equal to the diagonal elements of the inverse of the information matrix. On the other hand, if the design involves repeating a fixed number of these experiments in certain proportions, one (again under mild conditions) obtains equality. Since the "optimal" design using the information criterion involves repeating a fixed number of experiments in certain proportions, the sum of the variances of the asymptotic distributions of the estimates with this "optimal" design is actually equal to the minimum v, which is a lower bound for the sum of the variances of the asymptotic distributions of the estimates for all designs. APPENDIX C. The relevanceof sums of variances. If one is interested in the pa, rameters 01, 02, ... , Os, it may be assumed that for a given estimate t1, t2, , there is a loss represented by a function

(60)

g(t,0 ) = g(tl, t2, ...,

ts,1,

2 X28)

which as a function of the ti is a minimum at ti = 0i. If we assume that g is sufficiently well-behaved and that the sample is large enough so that the ti are close to 0i with large probability 0) g+(t,
=

g(0, 0) +

aq(I, ) G

(ti

oi)(t, - 0,) + O(t - 0).

The "value" of our statistic is measured by how small E{ g(t, 0)} is. For large samples (size n) we have, under mild conditions, E{g(t, 0)} = g(0, 0) + -

nli,j=i

E aijaij + o() fl

602
where

HERiMAN CHEPRNOFF

ol oij

I is

the covariance matrix of the asymptotic distribution of t and

aij =Gdtj<3t' )A reasonable criterion of a good statistic t should then be that it minimizes
(61) Eaij aij.

should We now note that since g is minimized at t = 0, the matrix A aij 11 be nonnegative definite. If A has rank p, it is possible to reduce the above expression to EI=io-si by a linear transformation on 0. Ordinarily one would expect p = s if one is interested in s parameters.
REFERENCES [1] G. ELFVING, "Optimum allocation in linear regression theory," Ann. Math. Stat., Vol. 23 (1952), pp. 255-62. [21 R. A. FISHER, Contributions to Mathematical Statistics, Papers 10, 11, and 38, John Wiley and Sons, 1950. [31 H. CRAMPR, Mathematical Methods of Statistics, Princeton University Press, 1946. [41 D. BLACKWELL,"Comparison of experiments," Proceedings of the Second Berkeley Statistics and Probability, University of California Symposium on Mllathematical
Press, 1951.

[51 H. RUBIN, "The asymptotic analogue of the theorem of Cramdrand Rao," (Abstract), Ann. Math. Stat., Vol. 19 (1948), p. 121.

También podría gustarte