Lin 2015

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO.
6, JUNE 2015
1659
An Optimal Control Approach to the Multi-Agent Persistent

Monitoring Problem in Two-Dimensional Spaces
Xuchao Lin and Christos G. Cassandras
AbstractWe address the persistent monitoring problem in
two-dimensional mission spaces where the objective is to control
the trajectories of multiple cooperating agents to minimize an
uncertainty metric. In a one-dimensional mission space, we have
shown that the optimal solution is for each agent to move at
maximal speed and switch direction at specific points, possibly
waiting some time at each such point before switching. In a twodimensional mission space, such simple solutions can no longer
be derived. An alternative is to optimally assign each agent a
linear trajectory, motivated by the one-dimensional analysis. We
prove, however, that elliptical trajectories outperform linear ones.
With this motivation, we formulate a parametric optimization
problem in which we seek to determine such trajectories. We show
that the problem can be solved using Infinitesimal Perturbation
Analysis (IPA) to obtain performance gradients on line and obtain
a complete and scalable solution. Since the solutions obtained are
generally locally optimal, we incorporate a stochastic comparison
algorithm for deriving globally optimal elliptical trajectories. Numerical examples are included to illustrate the main result, allow
for uncertainties modeled as stochastic processes, and compare
our proposed scalable approach to trajectories obtained through
off-line computationally intensive solutions.
Index TermsHybrid systems, Infinitesimal Perturbation Analysis (IPA), multi-agent systems, optimal control.
I. I NTRODUCTION
Autonomous cooperating agents may be used to perform tasks
such as coverage control [1], [2], surveillance [3] and environmental sampling [4][6]. Persistent monitoring (also called persistent
surveillance or persistent search) arises in a large dynamically
changing environment which cannot be fully covered by a stationary
team of available agents. Thus, persistent monitoring differs from
traditional coverage tasks due to the perpetual need to cover a changing
environment, i.e., all areas of the mission space must be sensed
infinitely often. The main challenge in designing control strategies
in this case is in balancing the presence of agents in the changing
environment so that it is covered over time optimally (in some welldefined sense) while still satisfying sensing and motion constraints.
Manuscript received October 10, 2013; revised April 25, 2014, April 28,
2014, July 14, 2014, and August 13, 2014; accepted September 18, 2014. Date
of publication September 24, 2014; date of current version May 21, 2015.
The authors work is supported in part by NSF under Grant CNS-1239021,
by AFOSR under grant FA9550-12-1-0113, by ONR under grant N0001409-1-1051, and by ARO under Grant W911NF-11-1-0227. Recommended by
Associate Editor A. Nedich.
The authors are with the Division of Systems Engineering and Center for
Information and Systems Engineering, Boston University, Boston, MA 02215
USA (e-mail: mmxclin@bu.edu; cgc@bu.edu).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TAC.2014.2359712
Control and motion planning for agents performing persistent monitoring tasks have been studied in the literature, e.g., see [7][13]. In
[14], we addressed the persistent monitoring problem by proposing
an optimal control framework to drive multiple cooperating agents
so as to minimize a metric of uncertainty over the environment. This
metric is a function of both space and time such that uncertainty at a
point grows if it is not covered by any agent sensors. It was shown in
[14] that the optimal control problem can be reduced to a parametric
optimization problem. In particular, each agents optimal trajectory
is fully described by a set of switching points {1 , . . . , K } and
associated waiting times at these points, {w1 , . . . , wK }. This allows
us to make use of Infinitesimal Perturbation Analysis (IPA) [15] to
determine gradients of the objective function with respect to these
parameters and subsequently obtain optimal switching locations and
waiting times that fully characterize an optimal solution. It also allows
us to exploit robustness properties of IPA to readily extend this solution
approach to a stochastic uncertainty model.
In this paper, we address the same persistent monitoring problem
in a two-dimensional (2D) mission space. Using an analysis similar to
the one-dimensional (1D) case, we find that we can no longer identify
a parametric representation of optimal agent trajectories. Motivated
by the simple structure of the 1D problem, it has been suggested to
assign each agent a linear trajectory for which the explicit 1D solution
can be used. However, in a 2D space it is not obvious that a linear
trajectory is a desirable choice. Indeed, a key contribution of this paper
is to formally prove that an elliptical agent trajectory outperforms a
linear one in terms of the uncertainty metric we are using. Motivated
by this result, we formulate a 2D persistent monitoring problem as
one of determining optimal elliptical trajectories for a given number of
agents, noting that this includes the possibility that one or more agents
share the same trajectory. We show that this problem can be explicitly
solved using similar IPA techniques as in our 1D analysis. In particular,
we use IPA to determine on line the gradient of the objective function
with respect to the parameters that fully define each elliptical trajectory
(center, orientation and length of the minor and major axes). This
approach is scalable in the number of observed events, not states, of
the underlying hybrid system characterizing the persistent monitoring
process, so that it is suitable for on-line implementation. However,
the standard gradient-based optimization process we use is generally
limited to local, rather than global optimal solutions. Thus, we adopt a
stochastic comparison algorithm from the literature [16] to overcome
this problem.
Section II formulates the optimal control problem for 2D mission
spaces and Section III presents the solution approach. In Section IV
we establish our key result that elliptical agent trajectories outperform
linear ones in terms of minimizing an uncertainty metric per unit
area. In Section V we formulate and solve the problem of determining
optimal elliptical agent trajectories using an algorithm driven by gradients evaluated through IPA. In Section VI we incorporate a stochastic
comparison algorithm for obtaining globally optimal solutions and in
Section VII we provide numerical results to illustrate our approach
and compare it to computationally intensive solutions based on a
Two-Point-Boundary-Value-Problem (TPBVP) solver. Section VIII
concludes the paper.
0018-9286 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1660
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 6, JUNE 2015
II. P ERSISTENT M ONITORING P ROBLEM F ORMULATION

We consider N mobile agents in a 2D rectangular mission space
[0, L1 ] [0, L2 ] R2 . Let the position of the agents at time
t be sn (t) = [sxn (t), syn (t)] with sxn (t) [0, L1 ] and syn (t) [0, L2 ],
n = 1, . . . , N , following the dynamics:
s xn (t) = un (t) cos n (t),
s yn (t) = un (t) sin n (t)
(1)
where 0 un (t) 1 is the normalized scalar speed of the nth agent

and n (t) is the angle relative to the positive direction that satisfies 0
n (t) < 2. Each agent is represented as a particle in the 2D space,
thus we ignore the case of two or more agents colliding with each other.
We associate with every point [x, y] a function pn (x, y, sn ) that
measures the probability that an event at location [x, y] is detected by
agent n. We also assume that pn (x, y, sn ) = 1 if [x, y] = sn , and that
pn (x, y, sn ) is monotonically nonincreasing in the Euclidean distance
D(x, y, sn ) [x, y] sn between [x, y] and sn , thus capturing the
reduced effectiveness of a sensor over its range which we consider
to be finite and denoted by rn . Therefore, we set pn (x, y, sn ) = 0
when D(x, y, sn ) > rn . Our analysis is not affected by the precise
sensing model pn (x, y, sn ) (in [14], for example, a linear decay model
was used). Next, consider a set of points {[i , i ], i = 1, . . . , M },
[i , i ] , and associate a time-varying measure of uncertainty with
each point [i , i ], which we denote by Ri (t). The set of points
{[1 , 1 ], . . . , [M , M ]} may be selected to contain specific points
of interest in the environment, or simply to sample points in the
mission space. Alternatively, we may consider a partition of into M
rectangles denoted by i whose center points are [i , i ]. Therefore,
the joint probability of detecting an event at location [i , i ] by all the
N agents (assuming detection independence) is
Pi (s(t)) = 1
N
[1 (t), . . . , N (t)] we aim to solve the following optimal control

problem P1:
P1 :
We first characterize the optimal control solution of problem P1.

We define the state vector x(t)= [sx1 (t), sy1 (t), . . . , sxN (t), syN (t),
R1 (t), . . . , RM (t)]T and the associated costate vector (t)=
[x1 (t), y1 (t), . . . , xN (t), yN (t), 1 (t), . . . , M (t)]T . In view of the
discontinuity in the dynamics of Ri (t) in (3), the optimal state trajectory may contain a boundary arc when Ri (t) = 0 for any i; otherwise,
the state evolves in an interior arc [17]. This follows from the fact,
proved in [14] and [18], that it is never optimal for agents to reach
the mission space boundary. We analyze the system operating in such
an interior arc and omit the state constraint sn (t) , n = 1, . . . , N ,
t [0, T ]. Using (1) and (3), the Hamiltonian is
H=

i
0
Ai BPi (s(t))
if Ri (t) = 0, Ai BPi (s(t))

otherwise
Ri (t) +
i R i (t)
xn (t)un (t) cos n (t) +
yn (t)un (t) sin n (t).
(5)
Combining the trigonometric function terms, after some algebra, we

obtain
Ri (t) +

i
i R i (t)
sgn(yn (t))un (t)
(xn (t))2+(yn (t))2 sin(n (t)+n (t))
(6)
where sgn() is the sign function and n (t) is defined so that
tan n (t) = xn (t)/yn (t) for yn (t) = 0 and n (t) = (/2)sgn(yn )
for yn (t) = 0. In what follows, we exclude the case where xn (t) = 0
and yn (t) = 0 at the same time for any given n over any finite
singular interval. Applying the Pontryagin minimum principle to (6)
with un (t), n (t), t [0, T ), denoting optimal controls, we have
H (x , , u , ) =
min
u[0,1]N ,[0,2]N
H(x, , u, )
and it is immediately obvious that it is necessary for an optimal control

to satisfy
un (t) = 1
(3)
and
where we assume that initial conditions Ri (0), i = 1, . . . , M , are

given and that B > Ai > 0 for all i = 1, . . . , M ; thus, the uncertainty
strictly decreases when there is perfect sensing Pi (s(t)) = 1.
The goal of the optimal persistent monitoring problem we consider is to control through un (t), n (t) in (1) the movement of
the N agents so that the cumulative uncertainty over all sensing
points {[1 , 1 ], . . . , [M , M ]} is minimized over a fixed time
horizon T . Thus, setting u(t) = [u1 (t), . . . , uN (t)] and (t) =
(4)
III. O PTIMAL C ONTROL S OLUTION
n=1
R i (t) =
Ri (t)dt
i=1
subject to the agent dynamics (1), uncertainty dynamics (3), control

constraint 0 un (t) 1, 0 n (t) 2, t [0, T ], and state constraints sn (t) for all t [0, T ], n = 1, . . . , N .
(2)
In order to avoid the uninteresting case where there is a large number

of agents who can adequately cover the mission space, we assume that
for any s(t) = [s1 (t), . . . , sN (t)]T , there exists some point [x, y]
(the discretized mission space) with P (x, y, s(t)) = 0. This means
that for any assignment of N agents at time t, there is always at least
one point in the mission space that cannot be sensed by any agent. Similar to the 1D analysis in [14], we define uncertainty functions Ri (t)
associated with the rectangles i , i = 1, . . . , M , so that they have the
following properties: (i) Ri (t) increases with a rate Ai if Pi (s(t)) = 0,
(ii) Ri (t) decreases with a fixed rate B Ai if Pi (s(t)) = 1 and
(iii) Ri (t) 0 for all t. It is then natural to model uncertainty so that
its decrease is proportional to the probability of detection. In particular,
we model the dynamics of Ri (t), i = 1, . . . , M , as follows:
J=
u(t),(t)
H=
[1 pn (i , i , sn (t))]
min
T
M
sin (n (t) + n (t)) = 1,

if yn (t) < 0
sin (n (t) + n (t)) = 1, if yn (t) > 0.
(7)
(8)
We are only left with the task of determining n (t), n = 1, . . . , N .

This can be accomplished by solving a standard TPBVP involving
forward and backward integrations of the state and costate equations to
evaluate H/n after each such iteration and using a gradient descent
approach until the objective function converges to a (local) minimum.
1661
Clearly, this is a computationally intensive process which scales poorly

with the number of agents and the size of the mission space. In
addition, it requires discretizing the mission time T and calculating
every control at each time step which adds to the computational
complexity.
where = d is the area of the effective coverage region. We

drop the subscript here to indicate we are now considering the single
agent case. Note that we view this normalized metric as a function of
b 0, so that when b = 0 we obtain the uncertainty corresponding to a
linear trajectory. For simplicity, the trajectory is selected so that [X, Y ]
coincides with the origin and = 0, the major axis a is assumed fixed.
Regarding the range of b, we will only be interested in values which
are limited to a neighborhood of zero that we will denote by B. Given
a, this set dictates the values that s(t) is allowed to take. Finally,
we make the following assumptions.
Assumption 1: p(, s) p(D(, s)) is a continuous function of
D(, s) s.
Assumption 2: Let , be symmetric points in with respect to
the center point of the ellipse. Then, A() = A( ).
The following result establishes the fact that an elliptical trajectory
with some b > 0 can achieve a lower cost than a linear trajectory (i.e.,
b = 0) in terms of a long-term average uncertainty per unit area. The
proof may be found in [19].
Proposition IV.1: Under Assumptions 12 and b B,
IV. L INEAR V ERSUS E LLIPTICAL AGENT T RAJECTORIES

Given the complexity of the TPBVP required to obtain an optimal
solution of problem P 1, we seek alternative approaches which may
be suboptimal but are tractable and scalable. The first such effort is
motivated by the results obtained in our 1D analysis, where we found
that on a mission space defined by a line segment [0, L] the optimal
trajectory for each agent is to move at full speed until it reaches some
switching point, dwell on the switching point for some time (possibly
zero), and then switch directions. Thus, each agents optimal trajectory
is fully described by a set of switching points {1 , . . . , K } and associated waiting times at these points, {w1 , . . . , wK }. The values of these
parameters can then be efficiently determined using a gradient-based
algorithm; in particular, we used Infinitesimal Perturbation Analysis
(IPA) to evaluate the objective function gradient as shown in [14].
Thus, a reasonable approach that has been suggested is to assign
each agent a linear trajectory. However, there is no reason to believe
that a linear trajectory is a good choice in a 2D setting. A broader
choice is provided by the set of elliptical trajectories which in fact
encompass linear ones when the minor axis of the ellipse becomes
zero. The main result of this section is to formally show that an elliptical trajectory outperforms a linear one using the average uncertainty
metric in (4) as the basis for such comparison.
To simplify notation, let = [x, y] R2 and, for a single agent
s(t), define
= R2 |t [0, T ] such that Bp(, s(t)) > A()
sxn (t) = Xn + an cos n (t) cos n bn sin n (t) sin n

syn (t) = Yn + an cos n (t) sin n + bn sin n (t) cos n
(10)
where [Xn , Yn ] is the center of the ellipse, an , bn are its major and
minor axes respectively, n [0, ) is the ellipse orientation (the
angle between the x axis and the major ellipse axis) and n (t)
[0, 2) is the eccentric anomaly of the ellipse. Assuming the agent
moves with constant maximal speed 1 on this trajectory (based on (7)),
2
2
we have (sxn ) + (syn ) = 1, which gives
n (t) = (an sin n (t) cos n + bn cos n (t) sin n )
+ (an sin n (t) sin n bn cos n (t) cos n )2
1/2
. (11)
T
R(, t)ddt
0
J(b)
<0
b
i.e., switching from a linear to an elliptical trajectory reduces the cost

in (12).
In other words, Prop. IV.1 shows that elliptical trajectories are more
suitable for a 2D mission space in terms of achieving near-optimal results in solving problem P1. We should point out that if Assumption 2
is violated, i.e., a mission space is inhomogeneous in terms of how
uncertainties arise, then this result does not hold in general (there are
counterexamples).
V. O PTIMAL E LLIPTICAL T RAJECTORIES

Based on our analysis thus far, the approach is to associate with
each agent an elliptical trajectory, parameterize each such trajectory
by its center, orientation and major and minor axes, and then solve
P1 as a parametric optimization problem. Note that this includes the
possibility that two agents share the same trajectory if the solution to
this problem results in identical parameters for the associated ellipses.
Choosing elliptical trajectories, offers several practical advantages in
addition to reduced computational complexity. Elliptical trajectories
induce a periodic structure to the agent movements which provides
predictability. As a result, it is also easier to handle issues related to
collision avoidance. Therefore, we replace problem P1 by the determination of optimal parameter vectors n [Xn , Yn , an , bn , n ]T ,
n = 1, . . . , N , and formulate the following problem P2:
P2 :
In order to make a fair comparison between a linear and an elliptical

trajectory, we normalize the objective function in (4) with respect
to the coverage area in (9) and consider all points in (rather than
discretizing it or limiting ourselves to a finite set of sampling points).
Thus, we define
1
J(b) =
lim
T ,b0
(9)
In view of the uncertainty dynamics in (3), it should be clear that

defines the effective coverage region for the agent, i.e., the region
where R(, t) is strictly reduced given B and a specific sensing model
p(, s). Clearly, depends on the values of s(t) which are dependent
on the single agent trajectory. Let us define an elliptical trajectory
so that the agent position sn (t) = [sxn (t), syn (t)] follows the general
parametric form of an ellipse:
(12)
min
n ,n=1,...,N
J=
T
M
0
Ri (1 , . . . , N , t)dt.
(13)
i=1
Observe that the behavior of each agent under the optimal ellipse control policy is that of a hybrid system whose dynamics undergo switches
when Ri (t) reaches or leaves the boundary value Ri = 0 (the events
causing the switches). As a result, we are faced with a parametric
optimization problem for a system with hybrid dynamics. We solve
this hybrid system problem using a gradient-based approach in which
we apply IPA to determine the gradients Ri (1 , . . . , N , t) on line
(hence, J), i.e., directly using information from the agent trajectories
and iterate upon them.
1662
A. Infinitesimal Perturbation Analysis (IPA)
When Ri (t) > 0, we have
We begin with a brief review of the IPA framework for general

stochastic hybrid systems as presented in [15]. The purpose of IPA is to
study the behavior of a hybrid system state as a function of a parameter
vector for a given compact, convex set Rl . Let {k ()},
k = 1, . . . , K, denote the occurrence times of all events in the state
trajectory. For convenience, we set 0 = 0 and K+1 = T . Over an
interval [k (), k+1 ()), the system is at some mode during which the
time-driven state satisfies x = fk (x, , t). An event at k is classified
as (i) Exogenous if it causes a discrete state transition independent
of and satisfies dk /d = 0; (ii) Endogenous, if there exists a
continuously differentiable function gk : Rn R such that k =
min{t > k1 : gk (x(, t), ) = 0}; and (iii) Induced if it is triggered
by the occurrence of another event at time m k . IPA specifies how
changes in influence the state x(, t) and the event times k () and,
ultimately, how they influence interesting performance metrics which
are generally expressed in terms of these variables.
We define
x(, t)
k ()
, k
,
x (t)
k = 1, . . . , K
for all state and event time derivatives. It is shown in [15] that x (t)
satisfies
d
fk (t)
fk (t)
x (t) =
x (t) +
dt
x
for t [k , k+1 ) with boundary condition
x k+ = x k + fk1 k fk k+
(14)
k
(15)
for k = 0, . . . , K, where k is the left limit of k . In addition, in (15),

the gradient vector for each k is k = 0 if the event at k is exogenous
and
k =
1
gk
fk k
x
gk
gk
+
x k
(16)
(17)
(18)
d=n
Noting that pn (i , sn (t)) pn (D(i , sn (t))), we have

pn (i , sn (t))
pn (D (i , sn (t))) D (i , sn (t))
=
n
D (i , sn (t)))
n
where D(i , sn (t)) = [(sxn (t) i )2 + (syn (t) i )2 ]
plicity, we write D = D(i , sn (t)) and we get
1
D
=
n
2D
D sxn
D syn
+ y
x
sn n
sn n
1/2
(19)
. For sim-

(20)
where D/sxn = 2(sxn i ) and D/syn = 2(syn i ). Note

that sxn /n = [sxn /Xn , sxn /Yn , sxn /an , sxn /bn , sxn /
n ]T and syn /n = [syn /Xn , syn /Yn , syn /an , syn /bn ,
syn /n ]T . From (10), for sxn /n , we obtain sxn /Xn = 1,
sxn /Yn = 0, sxn /an = cos n (t) cos n , sxn /bn = sin n
(t) sin n , sxn /n = an cos n (t) sin n b sin n (t) cos n .
Similarly, for syn /n , we get syn /Xn = 0, syn /Yn = 1,
syn /an = cos n (t) sin n , syn /bn = sin n (t) cos n and syn /
n = an cos n (t) cos n b sin n (t) sin n . Using sxn /n
and syn /n in (20) and then (19) and back into (18), we can finally
obtain Ri (t)/n for t [k , k+1 ) as
Ri k+
Ri (t)
=
n
n
if the event at k is endogenous (i.e., gk (x(, k ), ) = 0) and defined

as long as (gk /x)fk (k ) = 0.
In our case, the parameter vectors are n [Xn , Yn , an , bn , n ]T
as defined earlier, and we seek to determine optimal vectors n ,
n = 1, . . . , N . We will use IPA to evaluate J(1 , . . . , N ) =
[J/1 , . . . , J/N ]T . From (13), this gradient clearly depends
on Ri (t) = [Ri (t)/1 , . . . , Ri (t)/N ]T . In turn, this gradient
depends on whether the dynamics of Ri (t) in (3) are given by
R i (t) = 0 or R i (t) = Ai BPi (s(t)). The dynamics switch at event
times k , k = 1, . . . , K, when Ri (t) reaches or escapes from 0 which
are observed on a trajectory over [0, T ] based on a given n , n =
1, . . . , N .
IPA Equations: We begin by recalling the dynamics of Ri (t) in (3)
which depend on the relative positions of all agents with respect to
[i , i ] and change at time instants k such that either Ri (k ) = 0 with
Ri (k ) > 0 or Ai > BPi (s(k )) with Ri (k ) = 0. Moreover, the
agent positions sn (t) = [sxn (t), syn (t)], n = 1, . . . , N , on an elliptical
trajectory are expressed using (10). Viewed as a hybrid system, we can
now concentrate on all events causing transitions in the dynamics of
Ri (t), i = 1, . . . , M , since any other event has no effect on the values
of Ri (1 , . . . , N , t) at t = k .
For notational simplicity, we define i = [i , i ] . First, if
Ri (t) = 0 and A(i ) BP (i , s(t)) 0, applying (14) to Ri (t)
and using (3) gives
d Ri (t)
= 0.
dt n
N
pn (i , sn (t))
d Ri (t)
= B
[1 pd (i , sd (t))] .
dt n
n
d Ri (t)
dt
k dt n
if Ri (t) = 0, Ai BPi (s(t))

otherwise
(21)
where the integral above is obtained from (17)(19). Thus, it remains to determine the components Ri (k+ ) in (21) using (15).
This involves the event time gradient vectors k = k /n for
k = 1, . . . , K, which will be determined through (16). There are two
possible cases regarding the events that cause switches in the dynamics
of Ri (t).
Case 1) At k , R i (t) switches from R i (t) = 0 to R i (t) = Ai
BPi (s(t)). In this case, it is easy to see that the dynamics
of Ri (t) are continuous, so that fk1 (k ) = fk (k+ ) in
(15) applied to Ri (t) and we get
Ri k+ = Ri k , i = 1, . . . , M
(22)
Case 2) At k , R i (t) switches from R i (t) = Ai BPi (s(t))

to R i (t) = 0, i.e., Ri (k ) becomes zero. In this case,
we need to first evaluate k from (16) in order to
determine Ri (k+ ) through (15). Observing that this
event is endogenous, (16) applies with gk = Ri = 0 and
we get k = Ri (k )/(A(i ) BP (i , s(k ))). It
follows from (15) that
Ri k+ = Ri k
A(i ) BP i , s k

Ri
k = 0. (23)
A(i ) BP i , s k
Thus, Ri (k+ ) is always reset to 0 regardless of

Ri (k ).
1663
Objective Function Gradient Evaluation: Based on our analysis,

we first rewrite J in (13) as
J(1 , . . . , N ) =
K
M

i=1 k=0
k+1 (1 ,...,N )
Ri (1 , . . . , N , t)dt
k (1 ,...,N )
and (omitting some function arguments), we get
k+1

M
K

Ri (t)dt+Ri (k+1 )k+1 Ri (k)k .

J =
i=1 k=0
Fig. 1. Optimal trajectories using TPBVP solver for two agents. (a) Red and
green trajectories obtained from TPBVP solution. (b) Cost as a function of
algorithm iterations. JTPBVP = 7.15 104 .
Observing the cancellation of all terms of the form Ri (k )k for all

k (with 0 = 0, K+1 = T fixed), we finally get
J(1 , . . . , N ) =
VI. S TOCHASTIC C OMPARISON A LGORITHM
k+1

M
K

FOR GLOBAL OPTIMALITY
Ri (t)dt.
i=1 k=0
(24)
This depends entirely on Ri (t), which is obtained from (21) and

the event times k , k = 1, . . . , K, given initial conditions sn (0) for
n = 1, . . . , N , and Ri (0) for i = 1, . . . , M . In (21), Ri (k+ )/n
is obtained through (22), (23), whereas (d/dt)(Ri (t)/n ) is obtained through (17)(20).
Remark 1: Observe that the evaluation of Ri (t), hence J, is
independent of Ai , i = 1, . . . , M , i.e., the values in our uncertainty
model. Thus, the IPA approach possesses an inherent robustness
property: there is no need to explicitly model how uncertainty affects
Ri (t) in (3). Consequently, we may treat Ai as unknown without
affecting the solution approach (the values of Ri (t) are obviously
affected). We may also allow this uncertainty to be modeled through
random processes {Ai (t)}, i = 1, . . . , M ; in this case, however, the
result of Proposition IV.1 no longer applies without some conditions
on the statistical characteristics of {Ai (t)} and the resulting J is an
estimate of a stochastic gradient.
Remark 2: Note that the complexity of J(1 , . . . , N ) in (24)
grows linearly in the number of agents N as well as in T . In other
words, solving the problem using IPA is scalable with respect to the
number of agents and the operation time.
Gradient-based optimization algorithms are generally efficient and

effective in finding the global optimum when one is uniquely specified
by the point where the gradient is zero. When this is not the case, to
seek a global optimum one must resort to several alternatives which
include a variety of random search algorithms. In this section, we
use the Stochastic Comparison algorithm in [16] to find the global
optimum. As shown in [16], for a stochastic system, if (i), the cost
)
function J( ) is continuous in and (ii), for each estimate J(
) J( ) has a symmetric pdf, then
of J( ) the error W ( ) = J(
the Markov process {k } generated by the Stochastic Comparison
algorithm will converge to an -optimal interval of the global optimum
for arbitrarily small > 0. In short, limk P [ k ] = 1, for any
> 0, where is defined as = { |J( ) J( ) + }. Using
the Continuous Stochastic Comparison (CSC) Algorithm developed in
[16] for a general continuous optimization problem, consider
to be a controllable vector, where is the bounded feasible controllable parameter space. The detailed CSC algorithm is included in
[19]. Note that in the deterministic case, the CSC algorithm reduces
to a comparison algorithm with multi-starts over the 6-dimensional
controllable vector n [Xn , Yn , an , bn , n , n ]T , for each ellipse
associated with agent n = 1, . . . , N .
VII. N UMERICAL R ESULTS
B. Objective Function Optimization
]
[1 , . . . , N
minimizing J(1 , . . . , N )
We now seek to obtain
through a standard gradient-based optimization algorithm of the form
l+1
l
1l+1 , . . . , N
= 1l , . . . , N
l
l
1l , . . . , N
J
1l , . . . , N
(25)
where {nl }, l = 1, 2, . . . are appropriate step size sequences and

l
l
J(
1 , . . . , N ) is the projection of the gradient J(1 , . . . , N )
onto the feasible set, i.e., sn (t) for all t [0, T ], n = 1, . . . , N .
l
l
The optimization algorithm terminates when |J(

1 , . . . , N )| <
(for a fixed threshold ) for some [1 , . . . , N ]. When > 0 is

l
] is believed to be in the neighborhood of the local
small, [1l , . . . , N
l
] = [1l , . . . , N
]. However, in our
optimum, then we set [1 , . . . , N
problem the function J(1 , . . . , N ) is non-convex and there are actually many local optima depending on the initial controllable parameter
0
]. In the next section, we propose a stochastic
vector [10 , . . . , N
comparison algorithm which addresses this issue by randomizing over
0
]. This algorithm defines a process which
the initial points [10 , . . . , N
converges to a global optimum under certain well-defined conditions.
We begin with a two-agent example in which we solve P2 by

assigning elliptical trajectories using the gradient-based approach. The
environment setting parameters used are: r = 4 for the sensing range
of agents; L1 = 20, L2 = 10, for the mission space dimensions; and
T = 200. All sampling points [i , i ] are uniformly spaced within
L1 L2 , i = 1, . . . , M where M = (L1 + 1)(L2 + 1) = 231. Initial values for the uncertainty functions are Ri (0) = 2 and B = 6,
Ai = 0.2 for all i = 1, . . . , M in (3). We applied the TPBVP algorithm for P1. The results are shown in Fig. 1. It is computationally
expensive and time consuming (about 800,000 steps to converge).
Interestingly, the solution corresponds to a cost JTPBVP = 7.15
104 , which is higher than that of Fig. 2, which is an elliptical trajectory
case discussed next. This is an indication of the presence of locally
optimal trajectories.
We then solve the same two-agent example with the same environment setting using the CSC algorithm. For simplicity, we select the
ellipse center location [Xn , Yn ] as the only two (out of six) multi-start
components: for a given number of comparisons Q, we sample the
ellipse center [Xn , Yn ] L1 L2 , n = 1, . . . , N, using a uniform
distribution while an = 5, bn = 2, n = /4, n = 0, for n = 1, 2
are randomly assigned but initially fixed parameters during the number
1664
R EFERENCES
Fig. 2. Two agent example for the deterministic environment setting using
the CSC algorithm. (a) Red ellipses: initial trajectories. Blue ellipses: optimal
Det =
elliptical trajectories. (b) Cost as a function of algorithm iterations. JCSC
4
6.57 10 .
of comparisons Q (thus, it is still possible that there are local minima

with respect to the remaining four components [an , bn , n , n ], but,
clearly, all six components in n can be used at the expense of some
additional computational cost.) In Fig. 2, the red elliptical trajectories
on the left show the initial ellipses and the blue trajectories represent
the corresponding resulting ellipses the CSC algorithm converges to.
Fig. 2(b) shows the cost versus number of iterations of the CSC
Det
= 6.57 104 ,
algorithm. The resulting cost for Q = 300 is JCSC
where Det stands for a deterministic environment. It is clear from
Fig. 2(b) that the cost of the worst local minimum is much higher
than that of the best local minimum. Note also that the CSC algorithm does improve the original pure gradient-based algorithm performance Je = 6.93 104 in [19]. Additional numerical results can be
found in [19], including cases where the uncertainty term Ai in (3)
is stochastic.
VIII. C ONCLUSION
We have shown that an optimal control solution to the 1D persistent
monitoring problem does not easily extend to the 2D case. In particular, we have proved that elliptical trajectories outperform linear ones
in a 2D mission space. Therefore, we have sought to solve a parametric optimization problem to determine optimal elliptical trajectories.
Numerical examples indicate that this scalable approach (which can
be used on line) provides solutions that approximate those obtained
through a computationally intensive TPBVP solver. Moreover, since
the solutions obtained are generally locally optimal, we have incorporated a stochastic comparison algorithm for deriving globally optimal
elliptical trajectories. Ongoing work aims at alternative approaches for
near-optimal solutions and at distributed implementations.
[1] M. Zhong and C. G. Cassandras, Distributed coverage control and data

collection with mobile sensor networks, IEEE Trans. Autom. Control,
vol. 56, no. 10, pp. 24452455, Oct. 2011.
[2] J. Cortes, S. Martinez, T. Karatas, and F. Bullo, Coverage control for
mobile sensing networks, IEEE Trans. Robot. Autom., vol. 20, no. 2,
pp. 243255, 2004.
[3] B. Grocholsky, J. Keller, V. Kumar, and G. Pappas, Cooperative air and
ground surveillance, IEEE Robot. Autom. Mag., vol. 13, no. 3, pp. 1625,
2006.
[4] R. Smith, S. Mac Schwager, D. Rus, and G. Sukhatme, Persistent ocean
monitoring with underwater gliders: Towards accurate reconstruction of
dynamic ocean processes, in Proc. IEEE Conf. Robot. Autom., 2011,
pp. 15171524.
[5] D. Paley, F. Zhang, and N. Leonard, Cooperative control for ocean
sampling: The glider coordinated control system, IEEE Trans. Control
Syst. Technol., vol. 16, no. 4, pp. 735744, 2008.
[6] P. Dames, M. Schwager, V. Kumar, and D. Rus, A decentralized control
policy for adaptive information gathering in hazardous environments, in
Proc. IEEE Conf. Decision Control, 2012, pp. 28072813.
[7] S. L. Smith, M. Schwager, and D. Rus, Persistent monitoring of changing environments using robots with limited range sensing, IEEE Trans.
Robotics, 2011.
[8] D. E. Soltero, S. Smith, and D. Rus, Collision avoidance for persistent monitoring in multi-robot systems with intersecting trajectories, in
Proc. IEEE/RSJ Int. Conf. Intelligent Robots Systems (IROS), 2011,
pp. 36453652.
[9] N. Nigam and I. Kroo, Persistent surveillance using multiple unmanned
air vehicles, in Proc. IEEE Aerospace Conf., 2008, pp. 114.
[10] P. Hokayem, D. Stipanovic, and M. Spong, On persistent coverage control, in Proc. 46th IEEE Conf. Decision Control, 2008, pp. 61306135.
[11] B. Julian, M. Angermann, and D. Rus, Non-parametric inference and
coordination for distributed robotics, in Proc. 51st IEEE Conf. Decision
Control, 2012, pp. 27872794.
[12] Y. Chen, K. Deng, and C. Belta, Multi-agent persistent monitoring in
stochastic environments with temporal logic constraints, in Proc. 51st
IEEE Conf. Decision Control, 2012, pp. 28012806.
[13] X. Lan and M. Schwager, Planning periodic persistent monitoring trajectories for sensing robots in gaussian random fields, in Proc. 2013 IEEE
Int. Conf. Robot. Autom., 2013, pp. 24152420.
[14] C. G. Cassandras, X. Lin, and X. C. Ding, An optimal control approach
to the multi-agent persistent monitoring problem, IEEE Trans. Autom.
Control, vol. 58, no. 4, pp. 947961, 2013.
[15] C. G. Cassandras, Y. Wardi, C. G. Panayiotou, and C. Yao, Perturbation
analysis and optimization of stochastic hybrid systems, Eur. J. Control,
vol. 16, no. 6, pp. 642664, 2010.
[16] G. Bao and C. G. Cassandras, Stochastic comparison algorithm for continuous optimization with estimation, J. Optimiz. Theory Applic., vol. 91,
no. 3, pp. 585615, Dec. 1996.
[17] A. Bryson and Y. Ho, Applied Optimal Control. New York: Wiley, 1975.
[18] X. Lin and C. G. Cassandras, An optimal control approach to the multiagent persistent monitoring problem in two-dimensional spaces, in Proc.
52nd IEEE Conf. Decision Control, 2013, pp. 68866891.
[19] X. Lin and C. G. Cassandras, An Optimal Control Approach to the MultiAgent Persistent Monitoring Problem in Two-Dimensional Spaces, Tech.
Rep. [Online]. Available: http://arxiv.org/abs/1308.0345
Copyright of IEEE Transactions on Automatic Control is the property of IEEE and its content
may not be copied or emailed to multiple sites or posted to a listserv without the copyright
holder's express written permission. However, users may print, download, or email articles for
individual use.

Lin 2015

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Lin 2015

Cargado por

Copyright:

Formatos disponibles

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO.

An Optimal Control Approach to the Multi-Agent Persistent

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 6, JUNE 2015

II. P ERSISTENT M ONITORING P ROBLEM F ORMULATION

s yn (t) = un (t) sin n (t)

where 0 un (t) 1 is the normalized scalar speed of the nth agent

[1 (t), . . . , N (t)] we aim to solve the following optimal control

We first characterize the optimal control solution of problem P1.

if Ri (t) = 0, Ai BPi (s(t))

xn (t)un (t) cos n (t) +

yn (t)un (t) sin n (t).

Combining the trigonometric function terms, after some algebra, we

sgn(yn (t))un (t)

(xn (t))2+(yn (t))2 sin(n (t)+n (t))

and it is immediately obvious that it is necessary for an optimal control

where we assume that initial conditions Ri (0), i = 1, . . . , M , are

III. O PTIMAL C ONTROL S OLUTION

subject to the agent dynamics (1), uncertainty dynamics (3), control

In order to avoid the uninteresting case where there is a large number

sin (n (t) + n (t)) = 1,

sin (n (t) + n (t)) = 1, if yn (t) > 0.

We are only left with the task of determining n (t), n = 1, . . . , N .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 6, JUNE 2015

Clearly, this is a computationally intensive process which scales poorly

where = d is the area of the effective coverage region. We

IV. L INEAR V ERSUS E LLIPTICAL AGENT T RAJECTORIES

= R2 |t [0, T ] such that Bp(, s(t)) > A()

sxn (t) = Xn + an cos n (t) cos n bn sin n (t) sin n

n (t) = (an sin n (t) cos n + bn cos n (t) sin n )

+ (an sin n (t) sin n bn cos n (t) cos n )2

i.e., switching from a linear to an elliptical trajectory reduces the cost

V. O PTIMAL E LLIPTICAL T RAJECTORIES

In order to make a fair comparison between a linear and an elliptical

In view of the uncertainty dynamics in (3), it should be clear that

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 6, JUNE 2015

A. Infinitesimal Perturbation Analysis (IPA)

When Ri (t) > 0, we have

We begin with a brief review of the IPA framework for general

for t [k , k+1 ) with boundary condition

for k = 0, . . . , K, where k is the left limit of k . In addition, in (15),

Noting that pn (i , sn (t)) pn (D(i , sn (t))), we have

where D/sxn = 2(sxn i ) and D/syn = 2(syn i ). Note

if the event at k is endogenous (i.e., gk (x(, k ), ) = 0) and defined

if Ri (t) = 0, Ai BPi (s(t))

Case 2) At k , R i (t) switches from R i (t) = Ai BPi (s(t))

Thus, Ri (k+ ) is always reset to 0 regardless of

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 6, JUNE 2015

Objective Function Gradient Evaluation: Based on our analysis,

and (omitting some function arguments), we get

Ri (t)dt+Ri (k+1 )k+1 Ri (k)k .

Observing the cancellation of all terms of the form Ri (k )k for all

VI. S TOCHASTIC C OMPARISON A LGORITHM

FOR GLOBAL OPTIMALITY

This depends entirely on Ri (t), which is obtained from (21) and

Gradient-based optimization algorithms are generally efficient and

B. Objective Function Optimization

where {nl }, l = 1, 2, . . . are appropriate step size sequences and

The optimization algorithm terminates when |J(

(for a fixed threshold ) for some [1 , . . . , N ]. When > 0 is

We begin with a two-agent example in which we solve P2 by

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 60, NO. 6, JUNE 2015

of comparisons Q (thus, it is still possible that there are local minima

[1] M. Zhong and C. G. Cassandras, Distributed coverage control and data

También podría gustarte