Está en la página 1de 61






Ni dung
1. 2. 3.

Gii thiu Napster Peer-to-peer middleware


Routing overlays
Overlay case studies: Pastry, Tapestry


Application case studies: Squirel, OceanStore, Ivy

1. Gii thiu

Peer to peer system (p2p) l g?

- Dng ch nhng h thng phn tn m khng c my tnh iu khin trung tm. - Tt c cc my tnh tham gia (node, peer) u c chc nng ging nhau. - Mt node c chc nng va l client va l server ca cc node khc.

Peer to peer system l g?

Cc ng dng

Phn loi

Unstructure P2P
Ni lu gi file khng lin quan n overlay topology (cu trc hnh hc ca mng). - K thut tm kim:

- n gin (ch yu):

- flooding vi cc gii thut u tin theo chiu rng (breadthfirst) hoc chiu su (depth-first).

- Phc tp:
- bc nhy ngu nhin (random walk), - ch s routing (routing indices)

Ph hp vi h thng c cc node vo ra thng xuyn ty .

Unstructure P2P

Th h th nht

Unstructure P2P

Th h th hai Mng ng dng: Gnutela 0.6, Kazaa, Skype


Structure P2P

Cung cp nh x gia ni dung (id ca file) vi v tr ca node (a ch ca node). Khc phc nhc im tm kim ca mng khng cu trc bng cch s dng h thng bng bm phn tn (DHTDistributed Hash Table)


Structure P2P
Lin kt gia cc nt mng trong mng ph theo mt thut ton c th - Mi nt mng s chu trch nhim i vi mt phn d liu chia s trong mng. - Mng ng dng:

- Pastry, Tapestry, CAN, Chord, Kademlia.


C ch ca DHT


Cc c trng ca P2P system

m bo mi user u c th ng gp ti nguyn cho h thng. Tt c cc node trong h thng c chc nng v trch nhim nh nhau. Hot ng khng ph thuc vo h thng qun l trung tm. C th gii hn mc n danh i vi nh cung cp v ngi s dng Cho php la chn thut ton v v tr ca d liu trn nhiu my ch v sau truy cp vo n c cn bng v sn c m khng phi thm chi ph no.


Cc c trng ca P2P system

u im:
- Khng cn server ring, cc client chia s ti nguyn, khi mng cng m rng th kh nng hot ng cng tt. - R. - D ci t v bo tr - Thun li cho vic chia s file, my in, CDROM

Nhc im:
- Chm - Khng tt cho cc ng dng CSDL. - Km tin cy.


L mng P2P c quy m ln du tin trn th gii Thnh lp nm 1999 Chia s nhc qua mng internet Cc file nhc c to v chia s bi c nhn, thng l copy t CD.


Napster: chia s file ngang hng vi ch mc trung tm, bn sao


P2p middleware

Cung cp c ch gip client truy cp ti nguyn nhanh v c lp v tr ca chng . Cc nodes cn xc nh v tr v truy xut bt k ti nguyn sn c no mc d ti nguyn c phn b rng khp v lin tc c thm mi hoc xa b.



Cc yu cu chc nng:

n gin ha vic xy dng dch v l thc hin trn nhiu host c phn b rng khp. - Kh nng thm mi v xa b ti nguyn cng nh thm host n dch v v xa chng. - Tng t middleware, p2p middleware cung cp giao din ngi lp trnh c lp vi loi ti nguyn phn b m chng trnh thao tc.


P2P midleware

Cc yu cu phi chc nng: Global Scalability Load Balancing Local Optimization Accommodating to high dynamic host availability Security of data Anonymity, deniability, and resistance to censorship

Routing overlay
L thut ton phn b ca p2p midleware chu trch nhim nh v node v i tng. Mt node c th truy cp ti nguyn bng cch nh hng yu cu qua mt chui cc node. Mt yu cu c nh hng n node gn nht c cha bn sao ca ti nguyn yu cu.


Routing overlay
Global User IDs (GUID) nh danh node v object. GUID thng c lu di dng s 128 bits, hin th bng s hexa 32

V d: 21EC2020-3AEA-1069-A2DD08002B30309D

c tnh ton bng m bm secure hash: SHA-1


Figure 10.1: Distinctions between IP and overlay routing for peer-to-peer applications
IP Scale IP v4 is li m ited to 232 addressablenodes. The IP v6 name space is much more generous (2128), but addresses in both versions are hierarchically structured n a d much of the space is pre-allocated accordi ng to administrative requirements. Loads on routers are determin ed by network topologyand associated traffic patterns. Application -level routing over lay P eer-to-peer systems can addressmore objects. The GUID name space is very largeand flat (>2128), allowing it to be much morefully occupied.

Load balanc ing Network dynamics ( addition/deletion of objects/no des) Fault tolerance

Target identificatio n Security andanonymity

Object locations can be ra ndomized and hence traffic patterns are divorced from the network topology. IP routingtables are updated asy nchronouslyon Routing tables can be u pdated synchronously or a best-efforts basis with time constants on the asynchronously with fractions of a second order of 1 hour. delays. Redundancy is designed into the IP network by Routes and object refer ences can be replicated its managers, ensuring toleran ce of a single n-fold, ensuring toleran ce of n failures of nodes router or network co nnectivityfailure. n-fold or connections. replication is costly. Each IP address maps to exactly one target Messages can be rout ed to the nearest replica of node. a target object. Addressing is only secu re when all nodes are Security can be achiev ed even in environm ents trusted. Anonymity for the owners of addresses with lim ited trust.A limited degree of is not achievable. anonymity can be provided.

Instructors Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012


Figure 10.3: Distribution of information in a routing overlay

As rou ti ng kno wle dge Ds rou ti ng kno wle dge


B Obj ect: No de: Bs rou ti ng kno wle dge Cs rou ti ng kno wle dge

Instructors Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012


Basic programming interface for a distributed hash table (DHT) as implemented by the PAST API over Pastry:

Put(GUID, data) Remove(GUID)

Publish an object with GUID. The data is stored in all the nodes responsible for a replica. Deletes all references to GUID and the associated data.

Value = get(GUID)

The data associated with GUID is retrieved from one of the nodes responsible it.

The DHT layer take responsibility for choosing a location for data item, storing it (with replicas to ensure availability) and providing access to it via get() operation.


C ch hot ng: When a client requires to publish a resource, it has to. . .
1. compute the GUID 2. ask the routing overlay to publish it

When the routing overlay is asked to publish a resource, it. . .

1. stores the resource in the node whose GUID is closest to that of the resource 2. stores r replicas of the resources in the r nodes whose GUIDs are closest to that of the resource. r is the replication factor

Basic programming interface for distributed object location and routing (DOLR) as implemented by Tapestry


GUID can be computed from the object. This function makes the node performing a publish operation the host for the object corresponding to GUID. Makes the object corresponding to GUID inaccessible. Sent a message msg to n replicas of object whose GUID is GUID


SendToObj(msg, GUID, [n])

Object can be stored anywhere and the DOLR layer is responsible for maintaining a mapping between GUIDs and the addresses of the nodes at which replicas of the objects are located.

Overlay case studies: Pastry and Tapestry

C 2 lp nh tuyn dng cho mng ng ng c cu trc v cho php nh tuyn tin t (the prefix routing approach)


Pastry l mt lp nh tuyn c a ra bi [Rowstron and Druschel 2001, Castro et al. 2002a, ]. Tt c cc node v i tng trong Pastry c gn vi 128 bit GUIDs. GUID c tnh ton bng cch p dng mt hm bm an ton i vi:

Public key: nu l cc node Objects name hoc objects storage state: nu l cc i tng (chng hn nh files).


Trong mt mng c N node, thut ton nh tuyn Pastry s gi mt gi tin chnh xc n bt k GUID no trong O(logN) bc. Nu GUID xc nh mt node ang hot ng th tin nhn c gi n node .

Nu khng, tin nhn s c gi n node c s GUID gn n nht.

Cc node ang hot ng c trch nhim x l cc yu cu ca cc i tng ln cn.


Thut ton nh tuyn y s s dng mt bng nh tuyn ti mi node chuyn tip thng ip n ch mt cch hiu qu nht. ( gii thch thut ton, ta c th chia n lm 2 giai on)

Giai on u m t hnh thc n gin ca thut ton (ch nhm mc ch gii thch). Giai on 2 m t hon chnh thut ton (s dng trong thc t).


Giai on 1:

Mi node lu tr:
mt leaf set (tp l) mt vector L (kch thc 2l) cha GUIDs a ch IP ca cc node c s GUIDs nm v 2 bn gn n nht.

Cc leaf set c duy tr bi Pastry mi khi c node tham gia hoc ri khi mng.

Thm ch sau khi mt node gp li, n cng c th c sa cha rt nhanh. (Vn sa cha s c tho lun sau).


Figure 10.6: Circular routing alone is correct but inefficient

Based on Rowstron and Druschel [2001]



Giai on 2:

S dng bng nh tuyn:

mi node s duy tr mt bng nh tuyn vi cu trc cy lu gi GUIDs. a ch IP cho mt tp 2128 gi tr GUIDs.


Figure 10.7: First four rows of a Pastry routing table


Figure 10.8: Pastry routing example


Figure 10.9: Pastrys routing algorithm



Join to Pastry


Tng t Pastry Nhng c khc bit:

phng php nh x cc kha vo cc node cch qun l vic nhn rng mng li.




From structured to unstructured peer-to-peer

Structured peer-to-peer Unstructured peer-to-peer

u im

Bo m xc nh v tr cc C th t t chc v phc i tng, c th cung cp hi mt cch t nhin cc thi gian v phc tp node b li. ca hot ng nh tuyn.

Nhc im

Cn duy tr thng xuyn cu trc lp phc tp, gy kh khn v kh tn chi ph, c bit l trong mi trng m cc node tham gia mt cch linh ng.

C tnh xc sut nn khng th m bo xc nh c v tr ca cc i tng. Chi ph cho cc thng ip qu nhiu do nh hng n kh nng m rng.


Phn loi node:

Node l:
ch duy tr mt kt ni duy nht n mt ultrapeer. c th duy tr nhiu kt ni vi cc node l (10-100) mt s lng nh kt ni n cc ultrapeer khc (<10).



nh tuyn vi ultrapeer c th thc hin theo 2 cch:

Reflector indexing (lp ch mc i chiu):
Ultrapeer gi cc truy vn lp ch mc nh k xung cc node l, cp nht cc tp tin c chia s thay mt cc node l p ng cc lu lng truy vn. Dng gim s truy cn ca cc node

nh tuyn QRP (Query Routing Protocol): QRT (Query Routing Table)






Application case studies: Squirrel, OceanStore, Ivy

Cc lp ph nh tuyn m t trn c th nghim trong mt s ng dng v kt qu c nh gi rng ri. 3 trong s cc ng dng tiu biu s c cp sau y l:

Squirrel web caching service: OceanStore. Ivy file stores.

da trn Pastry.


Squirrel web cache

Cc tc gi ca Pastry pht trin Squirrel web caching s dng trong mng ni b ca my tnh cc nhn. Trong mt mng cc b va v ln, web caching thng c to ra bng cch dng mt hay mt cm cc my ch chuyn dng.

H thng Squirrel cng thc hin nhim v tng t nhng bng cch khai thc vic lu tr v cc ti nguyn tnh ton sn c ti cc my tnh c nhn.


Web cache


Trong Squirrel, mi node trong mng cho php cc node khc truy xut n web cache ca n. Nh vy, mi node ng c 2 vai tr l web browsing v web cache.







Kt qu thu c khi m phng m hnh ti trng trong hai mi trng thc t vi Microsoft (105 active clients in Cambridge and 36,000 in Redmond) c nh gi theo 3 tiu ch:



Gim bng thng ngoi mng:

tr khi truy cp:

Web cache server: 29%(Remond), 38%(Cambridge). Squirrel: 28%(Remond), 37%(Cambridge). Mi client ng gp 100MBytes vng nh lu tr web cache. Web cache service: 1 message duy nht truy cp cache. Squirrel: trung bnh 4,11 ln chuyn thng ip (Redmond) v 1,8 ln (Cambridge). Tuy nhin xt theo phn cng Ethernet th tr truy cp c xt theo mili giy (10-100), (cc tc gi ca Squirrel tranh lun v tr truy cp khi c nhiu i tng khng c tm thy trong b nh)



Chi ph cho vic tnh ton v lu tr t ln cc client nodes:

Trung bnh ch c 0,31 yu cu gi n mi node trong 1 pht (Remond) => t l ti nguyn tiu th l rt thp


OceanStore file store

Cc nh pht trin Tapestry xy dng mt nguyn mu cho vic lu tr cc tp tin ngang hng. N cho php lu tr cc tp tin c th thay i c. Thit k OceanStore [Kubiatowicz et al. 2000; Kubiatowicz 2003; Rhea et al. 2001, 2003] cung cp mt quy m rt ln cho vic m rng c s lu tr mt cch bn b v lu di trong mt mi trng lin tc thay i v kt ni mng v cc ti nguyn tnh ton.


OceanStore file store


OceanStore file store


OceanStore file store


Ivy file system

Tng t nh OceanStore Ivy l mt h thng file h tr a ngi dng trong mt lp ph nh tuyn da trn bng bm cc a ch d liu lu tr. Tuy nhin, im khc bit l h thng file ca Ivy c m phng nh l mt my ch Sun NFS.

N lu tr trng thi ca cc file di dng cc logs yu cu cp nht bi cc Ivy clients


Ivy file system


Ivy file system

Mt h thng tp tin Ivy ch bao gm mt tp hp cc bn ghi (logs), mi ngi tham d ch c cp duy nht 1 log. Cc logs c lu tr trong Dhash. Mi ngi tm kim d liu trong cc logs nhng sa i trong chnh logs ca mnh @@. Mc tiu ca s sp xp ny l gip Ivy duy tr siu d liu h thng m khng cn kha