Summary-In this paper, a general class of stochastic estimation and control problems is formulated from the Bayesian Decision- Theoretic viewpoint. A discussion as to how these problems can be solved step by step in principle and practice from this approach is presented. As a specific example, the closed form Wi ener-Khan solution for linear estimation in Gaussian noise is derived. The pur- pose of the paper is to show that the Bayesian approach provides; 1) a general uni fyi ng framework within which to pursue further researches in stochastic estimation and control problems, and 2) the necessary computations and difficulties that must be overcome for these problems. Anexample of a nonlinear, non-Gaussian estimation problem is al so solved. SINGLE STAGE ESTIYATIOS PROBLEM F OR THE PURPOSE of illustrating the concepts involved, the single-stage estimation problem will be discussed first. Once this is accomplished, the multistage problem can be treated straightforwardly. Problem Statement The following information is assumed, given; set of measurements 21, 22, . . . , zk which are denoted by the vector z. The phsrsical relationship between the state of na- ture which is to be estimated and the measure- ments. This is given by 2 =g(. , v ) , (1) where z is the measurement vector ( k x l ) , x is the state (signal) vector ( nX l ), and v is the noise vector ( q X 1). The joint density function p ( x , a). From this the respective marginal density functions, p( r ) and pb), are readily obtained. is assumed that information for 3) is available in analytical form or can be approximated by analytical distributions. Item 2) can be either in closed form or merely computable. The problem is to obtain an esti- mate of x upon which to base the best measurements which will be defined later. The Bayesian Solzrtion The Bayesian solution to the above problem now proceeds via the following steps: 1) Evaluate p(z)-This can be done analytically, at Manuscript received October 31, 1963; revised J uly 2, 1964. also a Consultant at the Minneapolis-Honeywell Regulator Co.. Y. C. Ho is with Harvard Cniversity, Cambridge, Mass., and is Boston, Mass. lator Co., Minneapolis, Minn. - R. C. K. Lee was formerly with the Minneapolis-Honeywell Regu- 333 least in principle, or experimentally by hlonte Carlo methods since z=g( x, v) and p ( x , 21) are given. I n the latter case, i t is assumed possible to fit the experimental distribution again by a mem- ber of a family of distributions. At this point, two alternatives are possible, one may be superior to the other depending on the nature of the problem. Evaluate p ( x , z ) . This is possible analytically if v is of the same dimension as z and one can ob- tain the functional relationship v =g* ( x, z) from (1). Then, using p ( x , v) and the theory of de- rived distributions, one obtains p(., 2) =p ( x , z' =g*(n, z ) ) J (2) where Evaluate p( z/ x) . This conditional density func- tion can always be obtained either analytically, whenever possible, or experimentally from the z =g ( x , v) and p( x , v). Note that 2a) may be difficult to obtain in gen- eral since g* may not exist either because of the nonlinear nature of g or the fact that z , v are of different dimensions. Nevertheless, 2b) can always be carried out. This fact will be demonstrated in the nonlinear example in the sequel. Evaluate p( x/ z) using the following relationships: a) Following 2a) b) Following 2b), use the Bayes' rule Depending on the class of distributions one has as- sumed or obtained for p ( ~ , n), p ( ~ ) , p ( z / x ) , this key step may be easy or difficult to carry out. Several classes of distribution which have nice properties for this purpose can be found in Raiffa and Schlaifer [l]. The density function f ( x / z ) is known as the a posteriori density function of x. I t is the knowledge about the state of nature after the measurements z. By definition, it contains all the information necessary for estimation. 334 I EEE TRANSACTI ONS ON AUTOMATI C CONTROL October 4) Depending on the criterion function for estimation, one can compute estimate f from p ( s / z ) . Some typical examples are Criterion: XIaximize the Probability ( i =x ) . Solution: f = 1I ode of p( x/ z ) . ( 5 ) This is defined as the most probable estimate. When the a priori density function p( x ) is uni- form, this estimate is identical to the classical maximum likelihood estimate. Criterion: AIinimize JI : x- . r l ~z p( . x~z ) d. ~. Solution : f =E( x / z ) (6) This is the conditional mean estimate. Criterion: LIinimize 1I aximum I x- 21. Solution: ?= 1I edium of p( x, ' z) . ( 7) This can be defined as the minimax estimate. 1) Evaluate p( z ) . Since Z = H x f v and x, z are Gaussian and inde- pendent, one immediately gets p ( z ) is Gaussian 2a) Evaluate P(X, z) . Since (dg*,:'dz) =I dentify matrix, i t follows p ( ~ , Z) =p ( ~ , =z - Hx ) =p( r ) p, ( z - Hx) . 2 (13) 2b) Evaluate p ( ~ l x ) . ~ Pictorially, the three estimates are shown in Fig. 1 for a general p ( x / z ) for a scalar case. 3) Evaluate p( x/ z ) . One gets from Bayes' rule, x 0 - Most Pr obabl e Est i mat e x,, - Condl t i onal Mean Est l mat e x c - Mi ni max Est i mat e Fig. I-Estimates based on a posteriori density. Clearly, other estimates, as well as confidence intervals can be derived from p ( x l z ) directly. Speciul Cas e of the Wiener-Kalnzan F,ilter ( s i ngl e stage) Sow a special case of the above estimation problem will be considered. Let there be given; 1) 4 set of measurements z = (21, 22, . . . , z ~ ) . 2) The physical relationship z =H s + c. (8) 3) The independent noise and state density functions p(., =p ( x ) p ( e ) (9) @(x) be Gaussian with E( r) =3 cov ( x ) =Po 1 p( c ) be Gaussian with E( t ) =0 Cov (I') =R Kow, following the steps for the Bayesian solution, By direct substitution of (to), (1 I), and (12), one obtains 1 HP0HT + R ( 2 ~ ) 4 ~ I Po 1 ' I 2 I R I p(./z) = .exp [ -1/2[(x - f ) Tpo- ' ( x - :T) + (Z - Hx ) ~ R- ' ( z - Hx ) - ( z - Hf ) T( HPoHT + R)-'(z - H f ) ] f . (16) Kow completing squares in the { ) , (16) simplifies to I NPoHT + R I1! ? p(x/z) = (2g)n12 I Po 11: 9 j R 11!2 .exp {- 1:2(x - ?) TP' ( x - 2) 1 (15) where or equivalently, and .i. =3 + PHTR-' (z - Hm). (20) 4) Kow, since p( x / z ) is Gaussian, the most probable, conditional mean and minimax estimate all coin- cide and is given by ?. This is the derivation of the single stage Weiner- Kalman filter [2], [SI . The pair ( P, a) is called a sufi- I t is assumed that p ( x / z ) has finite second moment. * p,(z--Hx) means substituting ( z - Hx ) for v in p( v ) . Note: step (2b) is redundant. 1964 Ho and Lee: Bayesian Approach in Stochastic Estimation 335 cient statistic for the problem in the sense that p ( x / z ) =p( x/ P, i ). ~I ULTI STAGE ESTIMATIOK PROBLEM The problem formulation and the solution in this case is basically similar to the single-stage problem. The only additional complication is that now the state is changing from stage to stage according to some dynamic relationship, and that the a posteriori density function is to be computed recursively. from (22) From this the marginal density functions p( xk+l / Z, ) 4) Evaluate and p(zkTljZk) can be directly evaluated. The system equations governing the evolution of the state. where xk+l is the state vector at k + 1 ~ k + 1 is the measurement noise at k + 1, z g +l is the additional measurement available at w k is the disturbance vector at k. k +1 , The complete set of measurements Zk+l 2 (zl? . . , Zk+l ). The density functions4 P(Xk:/Zl, . . * , z k) =p( xk/ z k) ~ ( w R , o/;+dxx.)statistics of a vector random se- quence with components wk and v!:+~ which de- pends on x k . A Now i t is required to estimate xp+l based on measure- ments 21, . . . , z k+l . The Bayesi an Solution The procedure is analogous to the single-stage case, 1) Evaluate p( xg+l , / xp) . This can be accomplished either experimentally or analytically from knowl- edge of p(wk, vk+l/xl;), p(xk/Z, ) and (21). p ( x k + l / Z k + l ) exactly as in the single-stage case. Special Case of the Wiener-Kalman Filter5 The given data at k+ 1 is specified as follows: The physical model is given by where w and v are independent, white, Gaussian random sequences with P ( . ~ k i Z k ) is Gaussian method is very similar to a paper by H. Rauch, F. Tung, and C. 5 This development of the multistage IViener-Kalman filtering C. Striebel entitled "On The ?&ximum Likelihood Estimate for Linear Dynamic Systems" presented at the S1-431 Conference on System Optimization, 1964, Monterey, Calif. The only difference be- tween the two developments is that the Rauch-Tung-Striebel paper does not explicitly compute p ( x / z ) but simply computes its maximum and uses it as the estimate. In the author's approach, the computation of the maximum plays a secondary role. The explicit calculation of the a posleriori probability is emphasized as the Bayesian viewpoint. The authors are indebted to Prof. &A. E. Bryson for bringing this reference to their attention. Similar development of the Wiener-Kalman filter is also presented in XAS.\ TR-R-135 1962 by G. L. Smith, S. Schmidt, and L. A. Megee. This was brought to the authors' attention after the publication of the J aCC preprint. 336 I EEE TRANSACTI OXS OAT AUTOMATI C CONTROL October Since in this case, the noise wk, vk+1 is not dependent on the state, (24) simplifies to Combining (28-30) using (24), one gets .A SIMPLE XONLINEXR NON-GAUSSIAK ESTIMATION PROBLEM The discussions in the above sections have been carried out in terms of continuous density functions. However, it is obvious that the same process can be applied to problems involving discrete density function and discontinuous functional relationships. I t is worth- while, at this point, to carry out one such solution for a simple contriced example which nevertheless illustrates the application of the basic approach. The problems can be visualized as an abstraction of the following physical estimation problem. An infrared detector followed by a threshold device is used in a satellite to detect hot targets on the ground. However, extraneous signals, particularly reflection from clouds, obscure the measurements. The problem is to design a multistage estimation process to estimate the presence of hot targets through measurement of the output of the threshold detector. Let SI; (target) be scalar independent Bernoulli proc- ess with, p( s d =(1 - 4 ) 6 ( S k ) + @(I - Sk), (37Y ZB (cloud noise) be a scalar Markov process with, p(t21) =(1 - ~ ) ~ ( T z I ) + - TZI) (38) p ( f z k + d % k ) =(I - a - - ' ; ) 6( 12k+l ) and the scalar measurement, Zk =Sk f3 f2k (40) where @ indicates the logical "OR" operation. Essentially (37-40) indicate the fact that as the de- tector sweeps across the field of view, cloud reflection tends to appear in groups while targets appear in iso- lated dots. Kow to proceed to the Bayesian solution. First, there is, 711 O j l 1 SI l o 1 1 0 Probability of ZI )(1 - a) ( l -q)l q(l - a) I a( 1 - q) 1 aq p( z1) =(1 - a) ( l - q) 6( 21) + ( a + q - aq)d(z1 - 1). (41) The notation 6(x) =4, I1 x = o ;o x #O is used here. Also, p ( x ) is to be interpreted as mass functions. I964 Ho and Lee: Bayesian Approach in Stochastic Estimation 337 Also, P(Zl /?Zl ) =6(Zl - 1)m + (1 - q) 6( 21) + @(el - 1)](1 - n1). (42) Then by direct calculation, =(1 - a(21))6(121) + a(e1>6(n1 - l ), (43) I? here which, after straightforward but somewhat laborious manipulations, becomes 2 (1 - U ( z l ) ) s ( n z ) + a(zl)s(n2 - 1). Furthermore, Eqs. (49) and (50) now take the place of ( 3 i ) and (38) and by the same process, one can get, in general, similarly, and =(1 - q(z1))6(sd + q(z1)6(s1 - 1) where q(z1) = qS(z1 - 1) (1 - 4(1 - P W l ) + ( a + q - aq)6(21 - 1) and a reasonable estimate is { 1 if ~(zI) > e (Given constant) 0 if q(z1) <e $1 = where $1 =1 may be interpreted as an alarm. made. One has NOW consider a second measurement zp has p( n?l l - Zl ) p( ~Z] IZl ) d72l ? Eqs. (51-57) now represent the general recursion solu- tion for the multistage estimation process. As a check, two possible observed sequences for z, (45) namely (0, 1) and (1, 1) are considered. With a= 1/4 and q = lj4 it is found that P ( s z / z ~ , 21) =0.571 and 0.337, respectively. This agrees with intuition since the sequence (1, 1) has a higher probability of being cloud reflections. On the other hand, the numbers also showed that under the circumstances, it is very difficult to de- (46) tect targets with accuracy using the system contrived here. Oftentimes, one is actually interested in p(x~:/Zk+~) (4;) with T > O in order to obtain the so-called smoothed estimate for sk. The desired density function can be computed from p( sk/ Zk) by further manipulations. However, the calculation becomes involved and will not be done here. (48) RELATIOSSHIP TO GENERAL BAYESIAN STATISTICAL DECISIOS THEORY been I t is worthwhile to point out the relationship of the above formulation and solution of the estimation prob- lem to and its difference from the general statistical (49) decision problem. For simplicity, the single-stage case is considered again. In the general statistical decision 338 IEEE TRAXSACTI OXS ON A UTOl I f ATI C CONTROL Octobev problem, the input data is somewhat different. One typical form is,' p(x)-a priori density of x le1-a set of choices of experiments from which we can derive measurements z with p ( z / x , e)- conditional density of e for given x and e. J( e, z, 21, x)-a criterion function which is a possible {z~}-a set of choices of decisions function of e, z, ZL and x. The problem is then stated as the determination of e and ZI so that E( J ) is optimized. The optimal J is given by JOpt =Max (Min) e Thus, the main differences between the estimation prob- lem and the general decision problem are as follows: 1) In the estimation problem there is no choice of ex- periment. One alwas-s makes the same type of measurement z given by g(x, a). To generalize the estimation problem. one can specify, z e =g e (x, a ) ; { e ] =I , 2 . . . . =possible sets of measurements (59) and then require that 32 =Opt [(.?l e e =I , 2, . . . 1. e 2) I n the general decision problem, the function z =g(x, I?) is implicit in p(z/x, e). Hence step 2a) and 2b) for the estimation solution is not required. This is often a tremendous simplification. 3) In the estimation problem the criterion function J is aha\- s a simple function of x only. There is, furthermore, no choice of action (one has to make an estimate by definition). On the other hand, the general decision problem is more analogous to a combined estimation and control problem where one has a further choice of action aiter determin- ing ~(xi 'z), and like a control problem. the criteria iunction is generally more complex 4) I t is, hen-ever, to be noted that the kel- step is the computation of p( s: ' z) for both problems. The choice of action is determined only after the com- putation of p( x, / z ) . Thus, a general decision prob- lem can be composed into two problems. namely, determination of P( x , : z) (estimation problem) and choice of action (control problem). In control- * See [l]. For other equivalent forms, see Raiffa and Schlaifer [lj. theoretic technology, this fact is called the Gen- eralized Decomposition Xxiom. As an example, consider the single stage Wiener- Kalman problem and the added requirement that, J( e, z, u, x) =~ ( u , x) =E]( B.Y + 2 (60) be a minimum. Expanding ( 60) , one gets clearly, uopt =U ( i ) zt(x(z)) A u(z) =- Bf ( z ) , (62) which is one of the fundamental results of linear sto- chastic control. Thus, the control action 21 is only a function of the criterion J and the a posteriori density function p ( x l z ) . I n fact, in this case only .tof p(x, ;'z) is needed. \f7e call .i: as the mitzimal szfficient statistic for the control problem. I n the more general multistage case, the decomposi- tion property clearly still holds, the on117difference be- ing that p(x~~+l:Zk+1) is now dependent on u A : . However, this dependence is entirely deterministic since, in a given situation, one always knows what u k ' s are. In fact, in the ij?ener-E;alman control problem, i t is known that u k is a linear function of .i-n only. COKCLUSIOK I n the above sections, the problem of estimation from the Bas-esian viewpoint is discussed. I t is the author's thesis that this approach offers a unifs-ing methodology, at least conceptually, to the general problems of estima- tion and control. The a posreriori conditional density function p( x l z ) is seen to be the key to the solution of the general prob- lem. Difficulties associated with the solution of the gen- eral problem now appear more specifically as difficulties in steps leading to the computation of p ( x l z ) . From the above discussions, i t is relatively obvious that these difficulties are 1) Computation of p( z / x ) . I n both the single-stage or multistage case, this problem is complicated by the nonlinear functional relationships between z and x. Except in the case when z and x are linearly related or when z and x are scalars, very little can be done in general, analyticall>- or experimentally. -Aswas mentioned earlier? this dificulty does not appear in the usual decision problem, since there it is assumed that p(z;x) is given as part of the problem. 2) Requirement that p ( x / z ) be in analytical form. I964 IEEE TRAMSACTI OLYS O S AG: TOAI A4 TI C C0, VTROL 339 This is an obvious requirement if we intend to use the solution in real-time applications. 3) Requirements that p ( x ) , p ( z ) , p ( x / z ) be conjugate distributions [I ]. This is simply the requirement that p ( x ) and p(x/zj be density functions from the same family. Sote that all the examples dis- cussed in this paper possess this desirable prop- erty. This is precisely the reason that multistage computation can be done efficiently. This imposed a further restriction on the functions g, f and h. The difficulties (1-3) listed above are formidable ones. I t is not likely that they can be easily circumvented ex- cept for special classes of problems such as those dis- cussed. However, i t is worthwhile first to pinpoint these difficulties. Research toward their solution can then be effectively initiated. Finally, it is felt that the Bayesian approach offers a unified and intuitive viewpoint par- ticularly adaptable to handling modern-day control problems where the State and the -1larkoa assumptions play a fundamental role. REFERENCES [l ] H. Raiffa and R. Schlaifer, Xpplied Statistical Decision Theory [2] A . E. Bryson and 3.1. Frazler, Smoothing in linear and nonlinear Hanrard University Press, Cambridge, Mass., chs. 1-3; 1961. systems, Proc. of Optimal System Synthesis Symp. , \\-right Field, Ohio; 1962. [3] Y. C. Ho, On stochastic approximation and optimal filtering method, J . Xat h. Analysis and Applications, vol. 6, pp. 152-151; [1] R. E. k h a n , Sew hIethods and Results in Linear Prediction February, 1963. and Filtering Theory, Research Institute for XdvancedStudies, Baltimore, xi d,, No.-61-1. The Minimization of Measurement Error in a General Perturbation-Correlation Process Identification System Summary-A general identilication system is studied for an important class of realistic, time-varying processes. This class con- sists of those in which the process is nominally known, and the statis- tical characteristics of its varying parameters and of the environment are al so known. The expression for identification error in terms of the spectral properties of the parameter variations and of the output transducer noise is developed. Optimization procedures are given to minimize the perturbation-correlation systems mean-square identification error. Wd INTRODCCTION I TH THE I KCREASE in automation and the emand for more self-contained, sophisticated systems, a more general approach to s>-stem measurement must be taken. This is especialll- true in the field of self-adaptive control [1]-[13]. Here, before the system can perform self-adjustment i t first must identify itself. Then, this measurement must be evalu- ated in accordance with a predetermined criterion and mathematical model of the dpamical system. Thus, one sees that the basic functions for self-adaption are: identification, evaluation, and adjustment. Manuscript received October 1.5, 1963; revised J uly 13, 1964. The author is with the Dept. of Electrical Engineering, Rutgers, The State University, Sew Brunswick, N. J . Although only one of the basic functions, the self- adaptive subsystem is dominated by the identification technique employed. I t has been shown that regardless of its particular form, the identification process performs its function by measuring one of the following: 1) the coefficients of the differential equation, 2) the impulse or step response, 3) the frequency response. Similarly, in order to identify a system, i t is necessary to have available, either directly or indirectly, both the input and the output signals. Either normal input or special test signals are employed. Usually, the latter is preferred, since it guarantees system measurement re- gardless of the level of normal plant activity. The basic function of identification has been treated extensively in the literature. However, the combined consideration of measurement noise, u priori knowledge, and time-varving system parameters has not been given sufficient treatment. The major block of identification techniques deals with the learning process associated with abstract systems. Cooper and Lindenlaub [14] pointed out that no investigation had been made which incorporated the effects of noise and a priori knowledge.