Dynamic Programming 3 Bellman

5.
2 The Bellman Equation Continuous Time

In the continuous-time case, we write the value function v(s) as the optimal sum of the
immediate reward and the discounted continuation value:
v(s) = max
x
_
u(x, s) dt + e
dt
v(s + ds)
_
.
This is the analogue of (23), i.e. if, in state s, an action x is taken for a length of
time dt, then the agent gets an immediate reward of u(x, s) dt and the state transits to
s + ds which is valued at v(s + ds) and discounted by e
dt
. Of course, we also have
ds = s dt = (x, s) dt from the state transition rule.
We use a Taylor series to approximate each term in the discounted continuation value,
namely e
dt
(1 dt) and v(s + ds) v(s) + v
(s) ds = v(s) + v
(s) s dt. Thus the

discounted continuation value is
(1 dt) [v(s) + v
(s) s dt] = v(s) + v
(s) s dt v(s) dt
where we have ignored higher order terms of dt. Therefore,
v(s) = max
x
{u(x, s) dt + v(s) + v
(s) s dt v(s) dt} ;

we subtract v(s) from both sides then divide through by dt to obtain the continuous-time
Bellman equation
v(s) = max
x
{u(x, s) + v
(s) s} , (25)
with s = (x, s). The FOC for this problem is
u
x
(x, s) +
x
(x, s)v
(s) = 0
and if x
(s) satises the FOC, then the Bellman equation becomes

v(s) = u(x
, s) + (x
, s)v
(s) ,
which is a rst-order dierential equation. The solution to this ODE, subject to the
appropriate boundary condition (often v(0) = u(0, 0)), is the value function for the
problem, and x
is the optimal policy.

Note: The solution to a rst-order dierential equation involves a single constant of
integration, and we use a single boundary condition to determine this constant. Provided
that the derivative of this function is nite when evaluated at the boundary, this is
the solution to the problem. So, either we simply solve the ODE and impose the BC
to determine constant of integration, or we guess a candidate solution, check that it
satises both the ODE and the BC, then look at its derivative at the boundary if it is
nite then we are done; if not, then . . . .
Example: Cake-eating with logarithmic utility
Consider an agent with utility function u(x) = ln x, and a capital stock s. Assume
that the state transition rule is s = x, i.e. we have a cake-eating problem. The value
function for this problem satises
v(s) = max
0xs
{ln x xv
(s)} .
22
The FOC is
1/x = v
(s), x
= 1/v
(s)
and so the Bellman equation becomes
v(s) = ln v
(s) 1 .
Use the exponential function, and rearrange to arrive at
v
(s) exp(v(s) + 1) = .
Since the LHS of this is
d
ds
exp(v(s) + 1), we see that the solution is
exp(v(s) + 1) = s + k ,
i.e.
v(s) = ln(s + k) 1
for some constant k. The boundary condition is v(0) = u(0) = , i.e. ln k = or
k = 0. So the solution is
v(s) =
1
[ln(s) 1] ,
with v
(s) = 1/s, and so

x
(s) = s .
Therefore, in this example, it is optimal to consume a constant fraction of the remaining
stock at each point in time. (More precisely, the agent should consume at a rate that
is proportional to her remaining stock.) The rate of consumption is increasing with the
discount rate as one would expect.
Exercise: Cake-eating with CRRA utility
Generalize the above problem to one where the agent has utility function u(x) =
x
1R
1R
,
where 0 < R < 1. Show that the value function is
v(s) =
1
R
R
(s)
1R
1 R
with optimal consumption given by
x
(s) = s/R.
What is the solution if the agent has utility function u(x) =
x
1R
1
1R
? Check if this
solution converges to that of the previous problem when R 1. What about the case
R > 1?
Example: Consumption-savings with CRRA utility and a known interest rate
Consider an agent with utility function u(x) =
x
1R
1R
, and wealth s. Wealth evolves
according to s = rs x where r is known and xed over time. The value function for
this problem satises
v(s) = max
0xrs
{u(x) + [rs x]v
(s)} , (26)
23
the FOC is the same as in the above exercise, and the Bellman equation becomes
v(s) = rsv
(s) +
R
1 R
v
(s)
1R
R
.
This is not an especially easy ODE to solve, but given the solution when r = 0 it is natural
to guess a solution of the form v(s) =
Bs
1R
1R
. This satises the BC v(0) = u(0) = 0,
and also satises the ODE for a value of B leading to
v(s) =
_
R
(1 R)r
_
R
s
1R
1 R
.
(Notice that this solution collapses to the cake-eating problem when r = 0.) Unfortu-
nately, since v
(0) = , we need to advance further arguments to make this approach

fully rigorous.
This solution makes sense only if > (1 R)r, and so the interest rate cannot be
too large compared with the discount rate. If < (1 R)r then no solution exists since
the agent can get arbitrarily large utility by delaying consumption arbitrarily far into
the future.
3
The optimal level of consumption is v
(s)
1/R
, or
x
(s) =
(1 R)r
R
s
and again the agent consumes at a rate proportional to her wealth at each point in time.
Notice that, since x
(s) = [(r)/R+r] s, if r > then current consumption is less than

current income from savings rs, and so wealth s(t) grows over time, which implies that
consumption x
(s(t)) also grows forever. (This is a feature of innite-horizon models,

and could not happen with a nite horizon.) If the interest rate is exactly equal to the
discount rate, r = , then it is optimal to just consume current interest payments rs and
wealth is unchanged over time. (Again, this could not be optimal with a nite-horizon
model.)
This feature is generally true in consumption-savings models, and not just with this
specic functional form. To see this, dierentiate the Bellman equation (26) using the
envelope theorem to obtain
v
(s) = rv
(s) + (rs x
(s))v
(s)
or
(r )v
(s) = (rs x
(s))(v
(s)) .
Since we can show directly that v is both increasing and concave, i.e. both v
and v
are positive, we see that rsx
(s) has the same sign as r , which implies that savings

s and consumption x grow over time if and only if r > .
3
Suppose she consumes nothing for a time T, and then consumes her entire wealth, which is se
rT
by
then, in a short period of time at that point. (In a continuous-time framework, it does not make sense
to say that she consumes her entire wealth instantaneously at time T.) This gives her discounted utility
of (approximately) e
T
u(se
rT
/). When u(x) =
x
1R
1R
, this tends to innity as T whenever
< (1 R)r.
24
Example: Learning-by-doing
In the learning-by-doing example, let v(s) denote the rms maximum discounted prots
if it has already manufactured a total quantity s. Then the Bellman equation is just
v(s) = max
x
{R(x) c(s)x + v
(s)x}
= max
x
{R(x) [c(s) v
(s)]x} .
A myopic rm would choose x simply to maximize R(x) c(s)x. Since it is clear that
v
(s) > 0 (because c
(s) < 0), it is easy to see that the rm produces more at each point
in time than its short-run incentive would dictate in eect its relevant marginal costs
are lower than c(s) the intuition being that producing more now has a positive eect
on future prots.
Example: The Ramsey growth model
The Bellman equation for the Ramsey growth model is given by
v(s) = max
x
{u(x) + v
(s) [f(s) x (n + )s]} .

This is a rst-order ODE in v with (in most natural cases) BC v(0) = 0. Unfortunately
it is dicult to solve explicitly. However, one thing that is easy to do is to nd the
steady-state level of capital and consumption at the optimum. Suppose the steady-state
level of these quantities is (s
, x
). Dierentiating the above expression and using the

envelope theorem implies that
v
(s
) = v
(s
) [f(s
) x
(n + )s
] + v
(s
) [f
(s
) (n + )] .
However, from the condition that s = 0, this pair must also satisfy f(s
)x
(n+)s
=
0. Therefore
v
(s
) = v
(s
) [f
(s
) (n + )] ,
and by cancelling v
(s
) we nd that the optimal steady-state capital-to-labour ratio

satises
f
(s
) = n + + .
(Compare equation (17).) Stability properties and asymptotic behaviour are covered in
the section on Optimal Control.
One problem that does allow an explicit solution is the following (see Dixit, Exercise
11.2), which takes a slightly dierent form to our previous problems.
Example: A rms research project involves ongoing R&D. The R&D intensity at time
t is x(t) and the stock of research is s(t), where this evolves according to s = f(x), with
f being a concave function. As soon as the stock reaches the level s = s the project is
completed and the rm receives a payo R. If the time taken to reach this target is T,
the rms discounted payo is
e
T
R
_
T
0
e
t
x(t) dt .
25
Let v(s) denote the maximum discounted prot starting with an initial research stock s.
Then the Bellman equation is
v(s) = max
x
{x + f(x)v
(s)} . (27)
In the case where f(x) = 2
x this becomes
v(s) = (v
(s))
2
.
Solving this subject to the boundary condition v( s) = R gives the solution
_
v(s) =
( s s)/2 .
Therefore, starting with an initial stock s = 0, it is worth pursuing the project only if
R > s
2
/4. From (27) the optimal choice of x as a function of the current stock s is
given by
x
(s) = (v
(s))
2
= v(s) .
Therefore, the optimal time-path of R&D intensity is increasing over time.
5.2.1 Asset equations
Assume that you can be in one of two states, X and Y . In state X you receive ow
utility of w
X
and with probability q
X
dt an event occurs resulting in instantaneous utility
of u
X
and a transit to state Y ; with probability 1 q
X
dt the event does not occur and
you simply remain in state X. Similarly for being in state Y . Then, with v() denoting
the value function,
v(X) w
X
dt + (1 dt) [q
X
dt (u
X
+ v(Y )) + (1 q
X
dt)v(X)]
= w
X
dt + (1 dt) [q
X
(u
X
+ v(Y ) v(X)) dt + v(X)]
w
X
dt + q
X
[u
X
+ v(Y ) v(X)] dt + v(X) v(X) dt
So
v(X) = w
X
+ q
X
[u
X
+ v(Y ) v(X)]
and similarly
v(Y ) = w
Y
+ q
Y
[u
Y
+ v(X) v(Y )] .
These are known as asset equations, and are of the form
discount rate asset value = dividend + expected capital gain,
and subtracting one from the other gives us a single equation determining v(X) v(Y ).
Example: Diamonds model of search (Journal of Political Economy, 1982)
Any agent can be in one of two states, employed (E): in possession of a good to barter;
unemployed (U): searching for a good. In state E an agent meets another in state E
with probability b dt, they exchange, consume the good receiving utility y, and become
unemployed. In state U an agent nds a production opportunity with probability a dt;
26
this production opportunity has a cost c which is distributed according to some CDF
G, and she uses a cut-o rule accept the opportunity if c c and reject otherwise.
The agents objective is to maximize discounted life-time utility
_
0
e
t
u(y
t
) dt where
u(y) = y.
If v() denotes the value function for this problem, show that
v(E) = b [y + v(U) v(E)]
and
v(U) = a
_
c
0
[c + v(E) v(U)] dG(c) .
Writing V ( c) for v(E) v(U) when using cut-o c, show that
V ( c) =
by + a
_
c
0
c dG(c)
+ b + aG( c)
and nd the optimal cut-o c
.
The rest of the model:
The probability of two agents in state E meeting depends on the fraction of agents
employed e, i.e. b = b(e), with b
> 0. Therefore, from the above equation, we can nd

c
as a function of e. Also, the fraction of agents employed evolves according to

e = (1 e)aG(c
) eb(e)
and we can determine steady-state values of e as a function of c
. The two equations

connecting e and c
lead to equilibrium steady-states.

The coconut story . . .
27
5.2.2 Finite horizon
The general Bellman equation with a nite time horizon is derived as follows. Let v(s, T)
be the maximum discounted utility in (5) when the initial stock is s and the time horizon
is T. Then
v(s, T) = max
x
_
u(x, s) dt + e
dt
v(s + s dt, T dt)
_
max
x
{u(x, s) dt + (1 dt)(v(s, T) + v
s
(s, T) s dt v
T
(s, T) dt)}
v(s, T) + max
x
{u(x, s) + v
s
(s, T) s v
T
(s, T) v(s, T)} dt ,
where v
s
and v
T
denote the partial derivatives of v. Since (6) implies that s = (x, s)
we obtain the Bellman equation:
v(s, T) + v
T
(s, T) = max
x
{u(x, s) + (x, s)v
s
(s, T)} . (28)
This is a partial dierential equation, involving the derivatives of v with respect to both
s and T, and is usually dicult to solve.
Exercise: Consider the continuous-time version of the cake-eating problem with a
nite horizon T. (Suppose the cake suddenly becomes inedible after time T.) In the case
where u(x) = 2
x, nd by direct methods the value function v(s, T). Show that this
function satises the Bellman equation (28).
Exercise: In the cake-eating problem with a nite horizon T and no discounting (i.e.
= 0), it is intuitive (when utility u(x) is concave) that the agent will consume the cake
steadily at the rate s/T for the whole time, which yields total utility v(s, T) = Tu(s/T).
Verify that for any concave utility function u(x), this value function does indeed satisfy
the Bellman equation (28).
28
6 Optimal Control and Dynamic Programming the
connections
In the optimal control approach, the rst equation in (13) shows that x
(t) maximizes
H(x, s(t)) = u(x, s(t)) + (t)(x, s(t)) ,
whereas from (25) we see that in the dynamic programming approach x
(t) maximizes
u(x, s(t)) + v
(s(t))(x, s(t)) .
These two approaches are consistent only if
(t) v
(s(t)) ,
so that the multiplier (t) represents the marginal benet of having fractionally more of
the state variable s at time t. (This is a general property of Lagrange multipliers.)
The second equation in (13) can be rewritten as
=

s
u(x
, s) +

s
(x
, s) +

and when we use the equivalence of with v
(s) this becomes

v
(s) =

s
u(x
, s) + v
(s)

s
(x
, s) +
d
dt
v
(s)
=

s
u(x
, s) + v
(s)

s
(x
, s) + v
(s) s
=

s
u(x
, s) + v
(s)

s
(x
, s) + v
(s)(x
, s)
=

s
u(x
, s) +

s
_
v
(s)(x
, s)
_
.
If we consider the dierential equation that comes from the Bellman equation (25),
namely
v(s) = u(x
, s) + v
(s)(x
, s) ,
and dierentiate it with respect to s, using the envelope theorem, then we obtain exactly
the same equality.
29

Dynamic Programming 3 Bellman

Cargado por

Información del documento

Descripción original:

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Dynamic Programming 3 Bellman

Cargado por

Copyright:

Formatos disponibles

5.

2 The Bellman Equation Continuous Time

(s) s dt. Thus the

(s) s dt] = v(s) + v

(s) s dt v(s) dt} ;

(s) satises the FOC, then the Bellman equation becomes

is the optimal policy.

(s) = 1/s, and so

(0) = , we need to advance further arguments to make this approach

(s) = [(r)/R+r] s, if r > then current consumption is less than

(s(t)) also grows forever. (This is a feature of innite-horizon models,

are positive, we see that rsx

(s) has the same sign as r , which implies that savings

(s) > 0 (because c

(s) [f(s) x (n + )s]} .

). Dierentiating the above expression and using the

) we nd that the optimal steady-state capital-to-labour ratio

> 0. Therefore, from the above equation, we can nd

as a function of e. Also, the fraction of agents employed evolves according to

. The two equations

lead to equilibrium steady-states.

and when we use the equivalence of with v

(s) this becomes

También podría gustarte