Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Daniel Adkins
Shelley Zhuang
Sonesh Surana
General Terms
Design
Keywords
Indirection, Abstraction, Scalable, Internet, Architecture
INTRODUCTION
The original Internet architecture was designed to provide unicast point-to-point communication between xed locations. In this
basic service, the sending host knows the IP address of the receiver
and the job of IP routing and forwarding is simply to deliver packets
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for prot or commercial advantage and that copies
bear this notice and the full citation on the rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specic
permission and/or a fee.
SIGCOMM 02 Pittsburgh, Pennsylvania USA
Copyright 2002 ACM X-XXXXX-XX-X/XX/XX ...$5.00.
ABSTRACT
1.
Scott Shenker
Ion Stoica
CDAAACD
CD CD
CD CD CD
AAACD
E E E
AAAE
@B @B @B
AAA@
receiver (R)
F F F
AAAF
sender (S)
(id, R)
(R, data)
(id, data)
receiver (R)
(id, R)
(b)
(c)
inserts trigger
I
Figure 1: (a) s API. Example illustrating communication between two nodes. (b) The receiver
.
sender sends packet
T I R Q
US3 P
#321 ( )974%
% 0 0 % ' 8 6 5
#321 ( )(&
% 0 0 % ' %
#
$" !
sender (S)
GH GH GH
AAAG
. (c) The
T V X V Q R Q
a`YW!3 P
Q
d
and
Q
r 3
Q
3
(a)
bits, and
h
r Q
Yd
Q
d
Q
d
T g Q Q V R Q
WWYSd P
Q
d
Q
d
Q
d
Q
3
Q
3
y x v
1wuf
t
and
v
h
t
The service model is instantiated as a rendezvous-based communication abstraction. In their simplest form, packets are pairs
where
is an -bit identier and
consists of
a payload (typically a normal IP packet payload). Receivers use
triggers to indicate their interest in packets. In the simplest form,
triggers are pairs
, where represents the trigger identier, and
represents a nodes address which consists of an
IP address and a port number. A trigger
indicates that
all packets with an identier
should be forwarded (at the IP
level) by the infrastructure to the node identied by
. More
specically, the rendezvous-based communication abstraction exports three basic primitives as shown in Figure 1(a).
Figure 1(b) illustrates the communication between two nodes,
where receiver wants to receive packets sent to . The receiver
f
qph
i
c
&b
r Q
ad
Q
3
Q
3
OVERVIEW
T I R Q
Ud P
2.
ditional aspects of the design such as scalability and efcient routing. Section 5 describes some simulation results on performance
along with a discussion on an initial implementation. Related work
is discussed in Section 6, followed by a discussion on future work
Section 7. We conclude with a summary in Section 8.
V X V
!WWQ
Q
3
Q
3
T V X V Q R Q
e`WWSd P
T g Q Q V R Q
WWYSd P
g Q Q
WWYV
T g Q Q V R Q
9WWYSd P
Q
d
g Q Q
WWYV
Q
d
$ $ $
###
#$
(id, R)
(id, data)
sender (S)
(R, data)
(id, R)
(R, data)
receiver (R)
(id, data)
sender (S)
receiver (R)
(a) Mobility
receiver (R1)
""!
"!
! !
sender (S)
receiver (R2)
receiver (R3)
'
(
I
0
6
Q
d
2.4.3 Anycast
Anycast ensures that a packet is delivered to exactly one receiver
in a group, if any. Anycast enables server selection, a basic building
block for many of todays applications. To achieve this with , all
hosts in an anycast group maintain triggers which are identical in
the most signicant bits. These bits play the role of the anycast
group identier. To send a packet to an anycast group, a sender uses
an identier whose -bit prex matches the anycast group identier. The packet is then delivered to the member of the group whose
trigger identier best matches the packet identier according to the
longest prex matching rule (see Figure 2(c)). Section 3.3 gives
two examples of how end-hosts can use the last
bits of the
identier to encode their preferences.
7
@
f
H
G DDD C A
B EFE 5 B
'
5
Q
d
R 4
G EPQ B
R
G SQP 4
T V X V Q R
a!WWS
T
r Q
! d P
r Q R Q
` d d P
t
t
)X
Trigger
R
9 d d R s 3 P
Q R
Q
Q
Packet
T
1 d R
Q
Q
d
Q
d
v
UI
Q
3
c
&b
%
9
I
%
)
I R Q
d P
T I R Q
Ud P
2.4.2 Multicast
T V X V Q R Q
a`YW!3 P
7
8
4 20
5 31
The form of mobility addressed here is when a host (e.g., a laptop) is assigned a new address when it moves from one location to
another. A mobile host that changes its address from to
as a
result of moving from one subnet to another can preserve the endto-end connectivity by simply updating each of its existing triggers
from
to
, as shown in Figure 2(a). The sending host
needs not be aware of the mobile hosts current location or address.
Furthermore, since each packet is routed based on its identier to
the server that stores its trigger, no additional operation needs to be
invoked when the sender moves. Thus, can maintain end-to-end
connectivity even when both end-points move simultaneously.
With any scheme that supports mobility, efciency is a major
concern [25]. With , applications can use two techniques to
achieve efciency. First, the address of the server storing the trigger
is cached at the sender, and thus subsequent packets are forwarded
directly to that server via IP. This way, most packets are forwarded
through only one server in the overlay network. Second, to alleviate the triangle routing problem due to the trigger being stored
at a server far away, end-hosts can use off-line heuristics to choose
triggers that are stored at servers close to themselves (see Section 4.5 for details).
2.4.1 Mobility
I R Q
d P
'
)
%
&
receiver (R3)
(c) Anycast
(b) Multicast
receiver (R2)
(id, R3)
(R3, data)
sender (S)
receiver (R1)
(id, data)
Q
3
D D
C7C7CD
777C
sender (S)
(R, data)
(id, R)
(id HTMLWML , T)
receiver (R)
Q
3
V
!X
t T Q P
)$d ` V
P Y4
Q V
$
DH
$
#
"!
D H H H
Q
d
T h
a& !X
V
X
YV
T
t
h
B B
A7A7AB
777A
(R1, data)
receiver (R1)
(id, data)
sender (S)
(R2, data)
receiver (R2)
(id, R2)
X
&
T
X
X
4
Q
d
c b
T Q
$d R
Q P TSQ PHG
WVUR2I
Q
d P
Q
d
T V X V Q R Q
a!WWSd P
T
eh
3. USING
Q
d
R & !X
h
V
X
&
!X &
g
X
P g
$4
X
(P Q
g
e
Q
3
V
!X
2H 0
D H
X
(P
T Q P
$d W
Q
d
P Y4
Q V
R Q
d
X
&
X
4
Q
d
V
!X
t
)X
t
)X
D H S 2H 1 H
D 3
H H ' H
0
# )( '
$ %
&
T h
a& !X
V
T
8 8
676768
7776
Q
d
X
WV
T Q P
$d
@ @
97979@
7779
D H
T
eh
((T,id), data)
((id HTMLWML , id), data)
Q
d
Q
s d
Q
3
Q
d
H
C A
Q
d R
Q
s d
Q
d R
Q
s d
Q
d P
C
5 A 54
0
T
4 d 9 d R
Q R
Q
C
C
5 A
T
4 d d P
Q R
Q
Q
3
0 54
T
a R
Q
d
C
5 A
T
d d P
Q R
Q
C
5 A
T
d 9 3 P
Q R
Q
!!
!
!
(idg, data)
T T
S4
C A 2S P
G
I R
1
Q P R Q
d 4d P
Q
3
id2
R5
R6
#
""#
""
"# "#
R3
R4
R5
R1
'& &
'&
F C A 2S
G
%
$$%
$
T
4
T V X V Q R Q
a!WWSd P
R
I R Q
d P
P
Q
d P
v
UI
R6
The performance of depends greatly on the nature of the underlying overlay network. In particular, we need an overlay network that exhibits the following desirable properties:
Scalability: The overlay network can handle the trafc generated by millions of end-hosts and applications.
Q
d
DD
EED A 4 54
T
&
R s
T s
R
W d P
Q
'
'
'
I R
W d P
Q
' ' 54
I
SR
T V X V Q R
a!WWW d P
Q
T V X V Q R Q
a!WWSd P
T V X V Q R Q
a!WWSd P
id2
R4
R2
id2
R3
R2
id1
id1
Stability: The mapping between triggers and servers is relatively stable over time, that is, it is unlikely to change during
!R
T
Q
d P
P
5
R
!$
Q
d P
Q
d
P
5
R
!a
!R
Q
d P
!R
Q
d P
P
5
Q
d
Q
d P
Q
d
the duration of a ow. This property allows end-hosts to optimize their performance by choosing triggers that are stored
on nearby servers.
3
4
Q
d
i finger
1
7
2
7
3
2
P
6
i finger
1
5
2
5
3
7
0
7
Node
i finger
1
2
2
2
3
5
0 R
E G EP
T I
UR
Q
d P
0 R
F G EP
T I R Q
Ud P
Q R Q
3 d P
4.4 Self-Organizing
is an overlay infrastructure that may grow to large sizes. Thus,
it is important that it not require extensive manual conguration or
human intervention. The Chord overlay network is self-conguring,
in that nodes joining the infrastructure use a simple bootstrapping mechanism (see [26]) to nd out about at least one existing
node, and then contacts that node to join the infrastructure. Sim
"
0 R
E G EP #
t
U
Q
d
Q
3
Q
3
" "
T Q P g
$d $9
t
uf
53 "
P g
$
Q
3
5S'
(S'
Q
d
Q
d
Q
3
Q
d
T P
A
Before discussing s properties, we introduce an important technique that allows applications to use
more securely and efciently. With this technique applications make a distinction between two types of triggers: public and private. This distinction
is made only at the application level: itself doesnt differentiate
between private and public triggers.
4.3 Robustness
tmndec
i3
Proxy
mpeg
receiver (R1)
(R1, data)
mpeg_play
i3
Proxy
(id, data)
sender (S)
(R2, data)
i3
Proxy
receiver (R2)
Figure 7: Heterogeneous multicast application. Refer to Figure 4(b) for data forwarding in .
a trigger exceeds a certain threshold, the server storing the trigger pushes a copy of to another server. This process can continue
recursively until the load is spread out. The decision of where to
push the trigger is subject to two constraints. First, should push
the trigger to the server most likely to route the packets matching
that trigger. Second, should try to minimize the state it needs
to maintain; at least needs to know the servers to which it has
already pushed triggers in order to forward refresh messages for
these triggers (otherwise the triggers will expire). With Chord, one
simple way to address these problems is to always push the triggers
to the predecessor server.
If there are more triggers that share the same -bit prex with a
popular trigger , all these triggers need to be cached together with
. Otherwise, if the identier of a packet matches the identier
of a cached trigger , we cannot be sure that is indeed the best
matching trigger for the packet.
X
4.7 Scalability
Since typically each ow is required to maintain two triggers
(one for each end-point), the number of triggers stored in is of the
order of the number of ows plus the number of end-hosts. At rst
sight, this would be equivalent to a network in which each router
maintains per-ow state. Fortunately, this is not the case. While the
state of a ow is maintained by each router along its path, a trigger
is stored at only one node at a time. Thus, if there are triggers and
servers, each server will store
triggers on the average. This
also suggests that can be easily upgraded by simply adding more
servers to the network. One interesting point to note is that these
nodes do not need to be placed at specic locations in the network.
%
F
T
e
f
!3 P
Q R Q
R Q
3 P
Consider the problem of a large number of clients that try to contact a popular trigger such as the CNNs trigger. This may cause the
server storing this trigger to overload. The classical solution to this
problem is to use caching. When the rate of the packets matching
Q
3
Q
3
E 0
R
Q
d P
E 54
R
4 1
0
R
a d P
Q
Q
d
E 54
T
) R
Q
d
4 0
R
$ d P
Q
Unlike IP, where an end-host can only send and receive packets, in end-hosts are also responsible for maintaining the routing
information through triggers. While this allows exibility for applications, it also (and unfortunately) creates new opportunities for
malicious users. We now discuss several security issues and how
addresses them.
We emphasize that our main goal here is not to design a bullet
proof system. Instead, our goal is to design simple and efcient
not worse and in many cases better than
solutions that make
todays Internet. The solutions outlined in this section should be
viewed as a starting point towards more sophisticated and better
security solutions that we will develop in the future.
4.10 Security
Note that an attacker can still count the number of connection requests to . However, this information is of very limited use, if
any, to the attacker. If, in the future, it turns out that this is unacceptable for some applications, then other security mechanisms
such as public trigger authentication will need to be used.
Recall that the key to enabling multicast functionality is to allow multiple triggers with the same identifer. Unfortunately, a malicious user that knows a hosts trigger can use this exibility to
eavesdrop the trafc towards that host by simply inserting a trigger with the same identier and its own address. In addressing this
problem, we consider two cases: (a) private and (b) public triggers
(see Section 4.2).
Private triggers are secretly chosen by the application end-points
and are not supposed to be revealed to the outside world. The length
of the triggers identier makes it difcult for a third party to use
a brute force attack. While other application constraints such as
storing a trigger at a server nearby can limit the identier choice,
the identier is long enough (i.e.,
bits), such that the application can always reserve a reasonably large number of bits that
are randomly chosen. Assuming that an application chooses
random bits in the triggers identier, it will take an attacker
probes on the average to guess the identier. Even in the face of
a distributed attack of say one millions of hosts, it will take about
probes per host to guess a private trigger. We
note that the technique of using random identiers as probabilistic
secure capabilities was previously used in [28, 37].
Furthermore, end-points can periodically change the private triggers associated with a ow. Another alternative would be for the
receiver to associate multiple private triggers to the same ow, and
the sender to send packets randomly to one of these private triggers.
The alternative left to a malicious user is to intercept all private triggers. However this is equivalent to eavesdropping at the IP level or
taking control of the server storing the trigger, which makes
no worse than IP.
With , a public trigger is known by all users in the system, and
thus anyone can eavesdrop the trafc to such a trigger. To alleviate
this problem, end-hosts can use the public triggers to choose a pair
of private triggers, and then use these private triggers to exchange
the actual data. To keep the private triggers secret, one can use
public key cryptography to exchange the private triggers. To initiate
a connection, a host
encrypts its private trigger
under the
public key of a receiver , and then sends it to via s public
trigger.
decrypts s private trigger
, then chooses its own
4.10.1 Eavesdropping
y x
1v
s
v
9
A S A
Q
3
Q
d
4.5
2.5
R
6
Q
d
T I
USR
R y
1 $&
Q
d P
v
7
5
y
4
c b
SIMULATION RESULTS
P
A
7
5
U g
We have also experimented with identiers that have location semantics. In particular, we have used space lling curves, such as
the Hilbert curve, to map a -dimensional geometric spacewhich
In this section, we evaluate the routing efciency of by simulation. One of the main challenges in providing efcient routing
is that end-hosts have little control over the location of their triggers. However, we show that simple heuristics can signicantly
enhance s performance. The metric we use to evaluate these
heuristics is the ratio of the inter-node latency on the network to
the inter-node latency on the underlying IP network. This is called
the latency stretch.
The simulator is based on the Chord protocol and uses iterative
style routing [26]. We assume that node identiers are randomly
distributed. This assumption is consistent with the way the identiers are chosen in other lookup systems such as CAN [22] and
Pastry [23]. As discussed in [26], using random node identiers
increases system robustness and load-balancing. 6 We consider the
5.
Point-to-point communication networks such as the Internet provide limited support for anonymity. Packets usually carry the destination and the source addresses, which makes it relatively easy
for an eavesdropper to learn the sender and the receiver identities. In contrast, with , eavesdropping the trafc of a sender will
not reveal the identity of the receiver, and eavesdropping the trafc of a receiver will not reveal the senders identity. The level of
anonymity can be further enhanced by using chain of triggers or
stack of identiers to route packets.
4.11 Anonymity
Number of Samples
35
30
25
20
T I R Q
Ud P
15
10
T I R Q
Ud P
1.5
3.5
T R
14Yx
16
16
14
14
12
10
12
10
default chord
closest finger set
closest finger replica
2
0
10
12
14
Number of i3 Servers
default chord
closest finger set
closest finger replica
2
0
10
12
Number of i3 Servers
x 10
(a)
14
4
x 10
(b)
Figure 9: The 90th percentile latency stretch in the case of (a) a power-law random network topology with 5000 nodes, and (b) a
transit-stub topology with 5000 nodes. The servers are randomly assigned to all nodes in case (a), and only to the stub nodes in
case (b).
A
' # "
P g
a
7
5
A
A
5.4 Performance
7
5
y x
1v
v
h
t
80
50 percentile
25, 75 percentiles
10, 90 percentiles
30
25
20
15
10
60
50
40
30
20
10
200
400
600
800
1000
1200
1400
10
15
20
25
30
35
Number of I3 Servers
Payload Size
(bytes)
0
200
400
600
800
1,000
1,200
1,400
Avg. Throughput
(payload Mbps)
0
53.00
91.23
135.84
178.18
216.48
251.16
261.39
6. RELATED WORK
The rendezvous-based communication is similar in spirit to the
tuple space work in distributed systems [2, 14, 36]. A tuple space
is a shared memory that can be accessed by any node in the system.
Nodes communicate by inserting tuples and retrieving them from a
tuple space, rather than by point-to-point communication. Tuples
are more general than data packets. A tuple consists of arbitrary
typed elds and values, while a packet consists of just an identier and a data payload. In addition, tuples are guaranteed to be
stored until they are explicitly removed. Unfortunately, the added
expressiveness and stronger guarantees of tuple spaces make them
very hard to efciently implement on a large scale. Finally, with
tuple spaces, a node has to explicitly ask for each data packet. This
interface is not effective for high speed communications.
s communication paradigm is similar to the publish-subscribenotify (PSN) model. The PSN model itself exists in many proprietary forms already in commercial systems [29, 31]. While the
matching operations employed by these systems are typically much
more powerful than the longest prex matching used by , it is not
clear how scalable these systems are. In addition, these systems
dont provide support for service composition.
Active Networks aim to support rapid development and deployment of new network applications by downloading and executing
customized programs in the network [33].
provides an alternative design that, while not as general and exible as Active Networks, is able to realize a variety of basic communication services
without the need for mobile code or any heavyweight protocols.
is similar to many naming systems. This should come as no
surprise, as identiers can be viewed as semantic-less names. One
future research direction is to use
as a unifying framework to
be able to maintain up to
triggers.
Data packet forwarding: Figure 10 plots the overhead of forwarding a data packet to its nal destination. This involves looking
up the matching trigger and forwarding the packet to its destination
addresses. Since we didnt enable multicast, in our experiments
there was never more than one address. Like trigger insertion,
packet forwarding consists of a hash table lookup. In addition, this
measurement includes the time to send the data packet. Packet forwarding time, in our experiments, increases roughly linearly with
the packet size. This indicates that as packet size increases, memory copy operations and pushing the bits through the network dominate processing time.
routing: Figure 11 plots the overhead of routing a packet to
another
node. This differs from data packet forwarding in that
we route the packet using a nodes nger table rather than its trigger
table. This occurs when a data packets trigger is stored on some
other node. The most costly operation here is a linear nger table
lookup, as evidenced by the graph. There are two reasons for this
seemingly poor behavior. First, we augment the nger table with a
cache containing the most recent servers that have sent control or
data packets. Since in our experiments this cache is large enough
to store all servers in the system, the number of nodes used to route
a packet (i.e., the ngers plus the cached nodes) increases roughly
linearly with the number of nodes in the system. Second, the nger table data structure in our implementation is a list. In a more
polished implementation, a more efcient data structure is clearly
needed to signicantly improve the performance.
Throughput: Finally, we ran some experiments to see the maximum rate at which a node can process data packets. Ideally, this
should be the inverse of overhead. To test throughput, a single
node is bombarded with more packets than it can reasonably handle. We measure the time it takes for
packets to emerge
from the node to determine throughput. Not surprisingly, as packet
payload increases, throughput in packets decreases. In addition, we
calculate the data throughput from the user perspective. Only the
payload data is considered; headers are overhead to users. The user
throughput in Mbps increases as the packet payload increases because the overhead for headers and processing is roughly the same
for both small and large payloads.
50 percentile
25, 75 percentiles
10, 90 percentiles
70
35
40
D
6
v
wt
$
R
R
$
8.
SUMMARY
9.
ACKNOWLEDGMENTS
10. REFERENCES
[1] C ALLON , R., D OOLAN , P., F ELDMAN , N., F REDETTE , A.,
S WALLOW, G., AND V ISWANATHAN , A. A framework for
multiprotocol label switching, Nov. 1997. Internet Draft,
draft-ietf-mpls-framework-02.txt.
[2] C ARRIERO , N. The Implementation of Tuple Space
Machines. PhD thesis, Yale University, 1987.
[3] C HERITON , D. R., AND G RITTER , M. TRIAD: A new next
generation Internet architecture, Mar. 2000.
http://www-dsg.stanford.edu/triad/ triad.ps.gz.
[4] C HU , Y., R AO , S. G., AND Z HANG , H. A case for end
system multicast. In Proc. of ACM SIGMETRICS00 (Santa
Clara, CA, June 2000), pp. 112.
[5] DABEK , F., K AASHOEK , F., K ARGER , D., M ORRIS , R.,
AND S TOICA , I. Wide-area cooperative storage with cfs. In
Proc. ACM SOSP01 (Banff, Canada, 2001), pp. 202215.