Está en la página 1de 40

INTRODUCTION TO PARALLEL

COMPUTING
B S RAMANJANEYULU

System Software Development Group,


CDAC, Bangalore.

1
Presentation Outline

• Need for Parallel Computing


• Requirements of Parallel Computing
• Parallel Computing Terminology
• Parallel computer architectures
• Designing parallel algorithms
• Architectural taxonomy (SISD, SIMD, MISD and
MIMD)
• Symmetric multiprocessing (SMP)
• Clusters
• Parallel programming models
2
How to Run Applications Faster?

 There are 3 ways to improve performance:


• Work Harder
• Work Smarter
• Get Help (multiple workers)
 Computer Analogy
• Use faster hardware: e.g. reduce the time per instruction
• Optimized algorithms and techniques.
• Multiple computers to solve problem.

3
Sequential vs. Parallel

Sequential

Parallel

4
Sequential vs. Parallel (Contd…)

 Traditional sequential programs execute one instruction at a time


using one processor
 Parallelism implies executing tasks simultaneously (on multiple
processors) to complete the job faster
 Parallelism can be done by:
− Breaking up the task into smaller tasks
− Assigning the smaller tasks to multiple workers (processors) to
work on simultaneously
− Coordinating the workers (processors)
− Parallel problem solving is natural. Examples: Building
construction; Automobile manufacturing

5
The Need For Faster Machines

 Grand Challenge Problems:


 Climate Modeling
 Computational Fluid dynamics
 Combustion Systems
 Human Genome
 Structural Mechanics
 Molecular Modeling
 Astrophysical Calculations
 Seismic Data Processing

6
Data Parallelism
Example:

if CPU=“1" then
start=1
end=50
else if CPU=“2" then
start=51
end=100
end if

do
i = start , end
Task on d(i)
end do
7
Task Parallelism
Task parallelism

 Multiple tasks executing concurrently is called task parallelism.

 All the CPUs execute separate code blocks simultaneously.

Example:

if CPU=“1" then
do “Task 1”
else if CPU=“2" then
do “Task 2”
end if

8
Definition

Definition :

 In computer architecture point of view, a parallel


computer is a “Collection of processing elements that
communicate and co-operate to solve large problems
fast”.
 When this architecture is combined with a parallel
algorithm, we get the ‘parallel computing system’.

9
Sequential vs. Parallel Computing

SEQUENTIAL COMPUTING

 Fetch/Store

 Compute

PARALLEL COMPUTING

 Fetch/Store

 Compute
 communicate

10
Execution Time

• Sequential system
– Execution time as a function of size of input
• Parallel system
– Execution time as a function of input size,
and number of processors used

11
Terminology of Parallel Computing

Speedup : Speedup ‘Tp’ is defined as the ratio of the serial


runtime of the best sequential algorithm for solving a problem to
the time taken by the parallel algorithm to solve the same
problem on ‘p’ processors.
Tp=T(seq) / T(parallel)
The ‘p’ processors used by the parallel algorithm are assumed to
be identical to the one used by the sequential algorithm

Efficiency: Ratio of speedup to the number of processors.


Efficiency = Tp / P

12
Terminology of Parallel Computing
Throughput (in FLOPS): (Contd…)

It is obtained by taking the clock rate of the given system and


dividing it by the number of clock cycles a floating point
instruction requires.

Cost : Cost of solving a problem on a parallel system is the


product of parallel runtime and the number of processors used ,
i.e., E = p.Sp

13
Requirements for Parallel Computing

 Multiple processors
(The workers)

 Network
(Link between workers)

 OS support

14
Requirements for Parallel Computing (Contd…)

Parallel Programming Paradigms


 Message Passing (MPI , PVM )
 Data Parallel (Fortran 90/High Performance Fortran )
 Multi-Threading
 Hybrid
 Others (OpenMP, shmem)
 Decomposition of the problem into pieces that multiple
workers can perform.

15
Issues in Parallel Computing

• Parallel computer architectures


• Efficient parallel algorithms
• Parallel programming models
• Parallel computer languages
• Methods for evaluating parallel algorithms
• Parallel programming tools

16
Designing Parallel Algorithms

 Detect and exploit any inherent parallelism in an existing


sequential Algorithm

 Invent a new parallel algorithm

 Adopt from another parallel algorithm that solves a similar


problem

17
Decomposition Techniques

Decomposition Techniques

The process of splitting the computations in a problem into a set of


concurrent tasks is referred to as decomposition.

 Decomposing a problem effectively is of paramount importance


in parallel computing.

 Without a good decomposition, we may not be able to achieve a


high degree of concurrency.

 Decomposing a problem must ensure good load balance.

18
Decomposition Techniques (Contd…)

What is meant by good decomposition?

 It should lead to high degree of concurrency (fine-granularity).

 The interaction among tasks should be as little as possible


(coarse-granularity).

•The ratio between computation and communication is known as granularity.

19
Success depends on the combination of

 Architecture, Compiler, Choice of Right Algorithm

 Portability, Maintainability, and Efficient implementation

20
Architectural Taxonomy

Flynn's taxonomy uses the relationship of program instructions


to program data. The four categories are:
 SISD – Single Instruction, Single Data Stream
 SIMD – Single Instruction, Multiple Data Stream
 MISD - Multiple Instruction, Single Data Stream
(no practical examples)
 MIMD - Multiple Instruction, Multiple Data Stream

21
SISD Model features
 Not a parallel computer
 Conventional serial, scalar von Neumann computer
 A single instruction is issued in each clock cycle
 Each instruction operates on a single (scalar) data element
 Performance measured in MIPS
 Examples: most PCs and single CPU workstations

22
SIMD Model features

Also von Neumann architectures but


more powerful instructions
Each instruction may operate on more
than one data element
Usually intermediate host executes
program logic and broadcasts
instructions to other processors
Examples: Array Processors and Vector
Processors (used in the supercomputers of
1970’s and 80’s

23
MIMD Model features

 Parallelism achieved by connecting multiple processors together


 Each processor executes its own instruction stream independent of other
processors on unique data stream
 Advantages
 Processors can execute multiple job streams simultaneously
 Each processor can perform any operation regardless of what
other processors are doing
Disadvantages
Load balancing overhead - synchronization needed to coordinate
processors at end of parallel structure in a single application
Can be difficult to program 24
MIMD Block Diagram

25
MIMD Classification

26
Parallel Computer Architecture Memory Models

Shared Memory Distributed Memory

27

Hybrid Memory
Symmetric Multiprocessors (SMP)

28
Symmetric Multiprocessors (SMP)
(Contd…)

•Uses commodity microprocessors with on-chip and off-chip


cache.
•Processors are connected to a shared memory through a high-
speed bus
•Single address space.
•Easy application development.
•Difficult to scale.
•Difficult to repair/ replace the faulty node (when compared to
clusters)

29
SMP, MPP and clusters

30
Competing Architectures

• Massively Parallel Processors (MPP)-proprietary systems built for


specific purposes
– high cost and a low performance/price ratio.
• Symmetric Multiprocessors (SMP)
– suffers from scalability
• Distributed Systems
– difficult to extract high performance.
• Clusters
– High Performance Computing--- With Commodity Processors
– High Availability Computing --- for Critical Applications

31
What is a Cluster?

 A cluster is a type of parallel or distributed processing system,


which consists of a collection of interconnected stand-
alone/complete computers cooperatively working together as a
single, integrated computing resource.

 A typical cluster consists of:


• Faster, closer connection Network than a typical LAN
• Low latency communication protocols
• Looser connection than SMP

32
Motivation for using Clusters

 The communications bandwidth between


workstations is increasing as new networking
technologies and protocols are implemented in
LANs and WANs.

 Workstation clusters are easier to integrate into


existing networks than special parallel computers.

33
Cluster Computer Architecture

34
Components of Cluster Computers
• Multiple High Performance Computers
– PCs
– Workstations
– SMPs
• State-of-the-art Operating Systems
– Layered
– Micro-kernel based
• High Performance Networks/Switches
– Gigabit Ethernet
– PARAMNet
– Myrinet
• Network Interface Cards (NICs)
• Fast Communication Protocols and Services
– Active Messages (AM)
– Virtual Interface Architecture (VIA)
35
Components of Cluster Computers (Contd…)

• Parallel Programming Environments and Tools


– Compilers
– PVM [Parallel Virtual Machine]
– MPI [Message Passing Interface]
• Applications

– Sequential
– Parallel or Distributed

36
Parallel programming models -- MPI, PVM and OpenMP

•MPI – Messaging Passing Interface


•PVM – Parallel Virtual Machine
•Both MPI and PVM are based on message passing mechanism.
•Both MPI and PVM can be used with shared-memory and
distributed memory architectures.
•MPI
- MPI is mainly for data-parallel problems.
- Collective and asynchronous operations are more powerful
in MPI.
•OpenMP – Open Multiprocessing
- OpenMP is thread-based multiprocessing.
37

- OpenMP – more suitable to SMP systems.


Features of CDAC’s PARAM Supercomputers

 Distributed memory at system level and Shared memory at Node level.

 Nodes connected by low latency high throughput System Area


Networks PARAMNet and Fast/Gigabit Ethernet.

 Standard Message Passing interface (MPI) i.e. SUN MPI, IBM MPI,
Public Domain MPI and C-DAC’s own MPI (CMPI).

 C-DAC’s High Performance Computing and Communication Software


(HPCC) for Parallel Program Development and run time support.

38
References

• http://www.llnl.gov/computing/tutorials/parallel_comp/
• Tutorials located in the Maui High Performance
Computing Center's "SP Parallel Programming
Workshop".
• Linux Parallel procesing HOW TO from
http://www.tldp.org/HOWTO/Parallel-Processing-
HOWTO.html

39
Thank you.

40