Está en la página 1de 29

1

INTRODUCTION

A simple example:
Job: put on socks and shoes Processor: a pair of hands Sequential algorithm: put on right sock, right shoe, put on left sock, left shoe. Need 4 time units Parallel algorithm: Two processors: one for left foot and another for right foot. Need 2 time units. Question: Can we use four processors to further speed up to, say, 1 time unit?

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

Parallel computer models


Physical architecture models

Multiprocessors Uniform memory access (UMA), a single shared


memory space.

Nonuniform memory access (NUMA), distributed


shared-memory multiprocessors (DSM).

Multicomputers (distributed memory) Hypercube architecture Mesh connected architecture Networks of workstations (NOW)
An inexpensive way to build parallel computers.

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

Theoretical models Used to estimate the performance bounds on algorithms.

Review of time and space complexity Time complexity: a function of the problem
size

Big O notation (worst case complexity):


a time complexity that of if there exist positive constants and

is said to be

so

for all nonnegative values


.

Sequential complexity: the complexity of sequential algorithm

Parallel complexity: the complexity of parallel algorithm

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

NP-problems An algorithm has time complexity


where is the problem size.

P-class (polynomial): is a polynomial. NP-class (nondeterministic polynomial): poly

nomial veriable for a guessed solution, but

Examples:

is exponential.

P-class: search max in a list:

NP-class: Traveling salesman problem (travel all cities with minimum cost):

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

Parallel complexity Sequential complexity Parallel complexity of a -processor machine


the algorithm is scalable.

Not every problem can achieve this due to


data dependence

An example:
putting on socks and shoes

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

Parallel random access machine (PRAM).


Consists of

processors Processors are connected to a large shared,


random access memory .

Processors have a private or local memory


for their own computation, but all communication among them takes place via the shared memory

Each time step has three phases: read phase,


computation phase and write phase.

Processors synchronized (write at the same


time)

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

Four subclasses, depending on how concurrent


read/write is handled:

EREW-PRAM: exclusive read exclusive write.


Allow only one processor to read or write a memory location

CREW-PRAM: concurrent read exclusive write.


Allow multiple processors to read the same memory location, but not allow concurrent write.

ERCW-PRAM: exclusive read concurrent write. CRCW-PRAM: concurrent read current write.

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

How to resolve the write conicts Common: all simultaneous writes store the
same value to that memory location

Arbitrary: choose one value ignore others Minimum: store the value of the processor
with the minimum index

Priority: some combination of all values, such


as summation or maximum

In PRAM model, synchronization and memory


access overhead are ignored.

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

Example:
An algorithm on a PRAM: Multiplication of two matrices in time on a PRAM (CREW) with cessors.

pro-

First assume processors:

Standard algorithm:

We put the nal results in

for

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

10

Step 1:

Step 2:

Now look at

processors.

Step 1:


. .

Step 2:

Modify the code:

to

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

11

VLSI complexity model ( model) Set limits on memory, I/O and communication, for implementing parallel algorithms with VLSI chips.

A: chip area (chip complexity) T: time for completing a given computation s: problem size There exists a lower bound such that Memory requirement sets a lower bound on

chip area A

Information ows through the chip for a period of time T.

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

12

AT: the amount of information owing through


the chip during time T. The number of input bits cannot exceed the volume AT.

Bisection

(usually use

): maximum

information exchange between the two halves of the chip during time T.

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

13

Example:
Matrix multiplication.

matrices,

2-D mesh architecture, PEs broadcast bus for inter-PE communication chip area complexity: time complexity

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

14

How to solve a typical computation task sorting using different types of computation models. Problem description: A sequence

A linear order

is dened on .

Find a new sequence

such that

for

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

15

Sequential algorithm.

Lower bound: Mergesort (optimal)


Time

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

16

Parallel algorithm on CRCW model.

Write conict: storing the sum of all values being written.

Sorting by enumeration:

processors. and stores

Two lists in shared memory:

stores .

is the number of of elements in smaller than

If

and

then

in the sorted

list.

Each compares

and and stores in

Time Processors: Cost: This algorithm is not optimal.



position

of .

If

optimal.

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

17

Procedure CRCW sort(S) Step 1: for

to doall to doall for or and if then writes 1 in else writes 0 in


end if end for end for

Step 2: for

stores

to doall

in position

of

end for

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

18

Parallel algorithm on CREW model. Divide into subsets and one processor sorts a subset.

Optimal algorithm.

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

19

A special purpose parallel architecture designed for sorting (hardware sorter)

Specialized processors connection networks

custom-designed

inter-

Odd-even sorting network

Very simple processor:

comparator

Basic idea: merge sort

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

20

merging network: merges two length- sorted lists into one length sorted list.

merging network comparator merging network


a1 a2

P1

c1

P3 b1 b2

c2 c3

P2

c4

One more comparator to compare

and

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

21

merging network( is a power of 2): Recursive construction using two merg


ing networks

connected to the

rst merger

connected to the sec-

ond merger

Additional comparators

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

22

a1 a2

P1

c1

P3 b1 b2

c2 c3

P2

c4

a1 a2 a3 a4 an-1 an

d1 d2 d3 (n/2, n/2) merging net di+1

c1 P1 P2 c2 c3 c4 c5

b1 b2 b3 b4 bn-1 bn

e1 e2 Pn-1 (n/2, n/2) merging net en ei

c2i c2i+1

c2n

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

23

Proof of correctness.

Note that subsequences are sorted, and we have


and

is the min of all elements is the max of all elements


Now, we need to prove:

Consider sequence

Suppose elements of

are in

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

24

They must be the rst


elements

Then

elements must be the rst


elements in

elements

,

. These

Look at the largest element


Plug in

is greater than

Similarly,

is greater than


Then we have

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

25

Similarly, consider

So

of

are in

of

are in s, and

is greater than

is greater than

s.

We have

for

Now let

, we have

Since

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

26

Then

For

Then

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

27

Analysis for merger: Time:

Processors:

Cost:

Not optimal ( is optimal).

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

28

Back to odd-even sorting network:

Time:

Processors:

Cost:

ESE536/CSE636 Switching and Routing in Parallel and Distributed Systems

29

Summary for sorting Odd-even sorting network

Not optimal, but a practical network. Sequential algorithm

Optimal. The best parallel algorithm: AKS sorting network (CREW model)

Optimal, but very large hidden constant, complex.

También podría gustarte