Arch

Computer Architecture & Related Topics
Ben Schrooten Shawn Borchardt, Eddie Willett Vandana Chopra
Presentation Topics

Computer Architecture History Single Cpu Design GPU Design (Brief) Memory Architecture
Communications Architecture Dual Processor Design Parallel & Supercomputing Design
Part 1 History and Single Cpu
Ben Schrooten
HISTORY!!!
One of the first computing devices to come about was . .
The ABACUS!
The ENIAC : 1946
Completed:1946 Programmed:plug board and switches Speed:5,000 operations per second Input/output:cards, lights, switches, plugs Floor space:1,000 square feet
The EDSAC(1949) and The UNIVAC I(1951)

UNIVAC EDSAC
Speed:1,905 operations per second
Technology:vacuum tubes unityper, printer Input/output:magnetic tape, Memory:1K size:1,000 12-digit words in delay Memory words Speed:714 operations per second
Memory type:delay lines, magnetic tape lines
First practical stored-program Technology:serial vacuum tubes, delay lines, computer tape magnetic
Floor space:943 cubic feet Cost:F.O.B. factory $750,000 plus $185,000 for a high speed printer
Intel 4004 1971
Progression of The Architecture

Vacuum tubes -- 1940 1950 Transistors -- 1950 1964 Integrated circuits -- 1964 1971 Microprocessor chips -- 1971 present
Current CPUArchitecture
Basic CPU Overview
Single Bus Slow Performance
Example of Triple Bus Architecture
Motherboards / Chipsets / Sockets

OH MY!
Chipset
In charge of: Memory Controller EIDE Controller PCI Bridge Real Time Clock DMA Controller IRDA Controller Keyboard Mouse Secondary Cache Low-Power CMOS SRAM
Sockets
Socket 4 & 5 Socket 7 Socket 8 Slot 1 Slot A
GPUs
Allows for Real Time Rendering Graphics on a small PC GPUs are true processing units Pentium 4 contains 42 million transistors on a 0.18 micron process Geforce3 contains 57 million transistors on a 0.15 micron manufacturing process
More GPU
Sources
Source for DX4100 Picture Oneironaut http://oneironaut.tripod.com/dx4100.jpg Source for Computer Architecture Overview Picture http://www.eecs.tulane.edu/courses/cpen201/slides/201Intro.pdf Pictures of CPU Overview, Single Bus Architecture, Tripe Bus Architecture Roy M. Wnek Virginia Tech. CS5515 Lecture 5 http://www.nvc.cs.vt.edu/~wnek/cs5515/slide/Grad_Arch_5.PDF Historical Data and Pictures The Computer Museum History Center. http://www.computerhistory.org/ Intel Motherboard Diagram/Pentium 4 Picture Intel Corporation http://www.intel.com The Abacus Abacus-Online-Museum http://www.hh.schule.de/metalltechnikdidaktik/users/luetjens/abakus/china/china.htm Information Also from Clint Fleri http://www.geocities.com/cfleri/ Memory Functionality Dana Angluin http://zoo.cs.yale.edu/classes/cs201/Fall_2001/handouts/lecture -13/node4.html Benchmark Graphics Digital Life http://www.digit-life.com/articles/pentium4/index3.html Chipset and Socket Information Motherboards.org http://www.motherboards.org/articlesd/techplanations/17_2.html Amd Processor Pictures Toms hardware http://www6.tomshardware.com/search/search.html?category=a ll&words=Athlon GPU Info 4th Wave Inc. http://www.wave-report.com/tutorials/gpu.htm NV20 Design Pictures Digital Life http://www.digit-life.com/articles/nv20/
Main Memory
Memory Hierarchy
DRAM vs. SRAM

DRAM is short for Dynamic Random Access Memory SRAM is short for Static Random Access Memory DRAM is dynamic in that, unlike SRAM, it needs to have its storage cells refreshed or given a new electronic charge every few milliseconds. SRAM does not need refreshing because it operates on the principle of moving current that is switched in one of two directions rather than a storage cell that holds a charge in place.
Parity vs. Non-Parity

Parity is error detection that was developed to notify the user of any data errors. By adding a single bit to each byte of data, this bit is responsible for checking the integrity of the other 8 bits while the byte is moved or stored. Since memory errors are so rare, many of todays memory is non-parity.
SIMM vs. DIMM vs. RIMM?

SIMM-Single In-line Memory Module DIMM-Dual In-line Memory Modules RIMM-Rambus In-line Memory Modules SIMMs offer a 32-bit data path while DIMMs offer a 64bit data path. SIMMs have to be used in pairs on Pentiums and more recent processors RIMM is the one of the latest designs. Because of the fast data transfer rate of these modules, a heat spreader (aluminum plate covering) is used for each module
Evolution of Memory
1970 1987 1995 1997 1998 1999 1999/2000 2000 2001 RAM / DRAM FPM EDO PC66 SDRAM PC100 SDRAM RDRAM PC133 SDRAM DDR SDRAM EDRAM 4.77 MHz 20 MHz 20 MHz 66 MHz 100 MHz 800 MHz 133 MHz 266 MHz 450MHz
FPM-Fast Page Mode DRAM -traditional DRAM EDO-Extended Data Output -increases the Read cycle between Memory and the CPU SDRAM-Synchronous DRAM -synchronizes itself with the CPU bus and runs at higher clock speeds
RDRAM-Rambus DRAM -DRAM with a very high bandwidth (1.6 GBps)
EDRAM-Enhanced DRAM -(dynamic or power-refreshed RAM) that includes a small amount of static RAM (SRAM) inside a larger amount of DRAM so that many memory accesses will be to the faster SRAM. EDRAM is sometimes used as L1 and L2 memory and, together with Enhanced Synchronous Dynamic DRAM, is known as cached DRAM.
Read Operation
On a read the CPU will first try to find the data in the cache, if it is not there the cache will get updated from the main memory and then return the data to the CPU.
Write Operation
On a write the CPU will write the information into the cache and the main memory.
References

http://www-ece.ucsd.edu/~weathers/ece30/downloads/Ch7_memory(4x).pdf http://home.cfl.rr.com/bjp/eric/ComputerMemory.html http://aggregate.org/EE380/JEL/ch1.pdf
Defining a Bus

A parallel circuit that connects the major components of a computer, allowing the transfer of electric impulses from one connected component to any other
VESA - Video Electronics Standards Association

32 bit bus Found mostly on 486 machines Relied on the 486 processor to function People started to switch to the PCI bus because of this Otherwise known as VLB
ISA - Industry Standard Architecture

Very old technology Bus speed 8mhz Speed of 42.4 Mb/s maximum Very few ISA ports are found in modern machines.
MCA - Micro Channel Bus

IBMs attempt to compete with the ISA bus 32 bit bus Automatically configured cards (Like Plug and Play) Not compatible with ISA
EISA - Extended Industry Standard Architecture

Attempt to compete with IBMs MCA bus Ran on a 8.33Mhz cycle rate 32 bit slots Backward compatible with ISA Went the way of MCA
PCI Peripheral Component Interconnect

Speeds up to 960 Mb/s Bus speed of 33mhz 16-bit architecture Developed by Intel in 1993 Synchronous or Asynchronous PCI popularized Plug and Play Runs at half of the system bus speed
PCI X

Up to 133 Mhz bus speed 64-bit bandwidth 1GB/sec throughput Backwards compatible with all PCI Primarily developed for increased I/O demands of technologies such as Fibre Channel, Gigabit Ethernet and Ultra3 SCSI.
AGP Accelerated Graphics Port

Essentially a high speed PCI Port Capable of running at 4 times PCI bus speed. (133mhz) Used for High speed 3D graphics cards Considered a port not a bus

Only two devices involved Is not expandable
BUS
Width (bits) 8 16 32 32 32 32 32 32
8-bit ISA 16-bit ISA EISA VLB PCI AGP AGP(X2) AGP(X4)
Bus Speed (Mhz) 8.3 8.3 8.3 33 33 66 66 X 2 66 X 4
Bus Bandwith (Mbytes/sec) 7.9 15.9 31.8 127.2 127.2 254.3 508.6 1017.3
IDE - Integrated Drive Electronics

Tons of other names: ATA, ATA/ATAPI, EIDE, ATA-2, Fast ATA, ATA-3, Ultra ATA, Ultra DMA Good performance at a cheap cost Most widely used interface for hard disks
SCSI - Small Computer System Interface skuzzy

Capable of handling internal/external peripherals Speed anywhere from 80 640 Mb/s Many types of SCSI
TYPE
Bus Speed, MBytes/ Sec. Max. 5 10 20 20 40 40 80 160 320
Bus Width, bits 8 8 16 8 16 8 16 16 16
Max. Device Support 8 8 16 8 16 8 16 16 16
SCSI-1 Fast SCSI Fast Wide SCSI Ultra SCSI Ultra Wide SCSI Ultra2 SCSI Wide Ultra2 SCSI Ultra3 SCSI Ultra320 SCSI
Serial Port

Uses DB9 or DB25 connector Adheres to RS-232c spec Capable of speeds up to 115kb/sec
USB

1.0

hot plug-and-play Full speed USB devices signal at 12Mb/s Low speed devices use a 1.5Mb/s subchannel. Up to 127 devices chained together data rate of 480 mega bits per second
2.0

USB On-The-Go

For portable devices. Limited host capability to communicate with selected other USB peripherals A small USB connector to fit the mobile form factor
Firewire i.e. IEEE 1394 and i.LINK

High speed serial port 400 mbps transfer rate 30 times faster than USB 1.0 hot plug-and-play
PS/2 Port

Mini Din Plug with 6 pins Mouse port and keyboard port Developed by IBM

Parallel port i.e. printer port

Old type Two new types ECP (extended capabilities port) and EPP (enhanced parallel port)

Ten times faster than old parallel port Capable of bi-directional communication.
Game Port

Uses a db15 port Used for joystick connection to the computer
Parallel Computer Architecture

By Vandana Chopra
Need for High Performance Computing

Theres a need for tremendous computational capabilities in science engineering and business There are applications that require gigabytes of memory and gigaflops of performance
What is a High Performance Computer

Definition of a High Performance computer : An HPC computer can solve large problems in a reasonable amount of time Characteristics : Fast Computation Large memory High speed interconnect High speed input /output
How is an HPC computer made to go fast

Make the sequential computation faster Do more things in parallel
Applications
1> Weather Prediction 2> Aircraft and Automobile Design 3> Artificial Intelligence 4> Entertainment Industry 5> Military Applications 6> Financial Analysis 7> Seismic exploration 8> Automobile crash testing
Who Makes High Performance Computers

* SGI/Cray
Power Challenge Array Origin-2000 T3D/T3E * HP/Convex SPP-1200 SPP-2000 * IBM SP2 * Tandem
Trends in Computer Design

Performance of the fastest computer has grown exponentially from 1945 to the present averaging a factor of 10 every five years The growth flattened somewhat in 1980s but is accelerating again as massively parallel computers became available
Increase in the No of Processors
Real World Sequential Processes

Sequential processes we find in the world. The passage of time is a classic example of a sequential process.
Day breaks as the sun rises in the morning. Daytime has its sunlight and bright sky. Dusk sees the sun setting in the horizon. Nighttime descends with its moonlight, dark sky and stars.
Parallel Processes
Music An orchestra performance, where every instrument plays its own part, and playing together they make beautiful music.
Parallel Features of Computers

Various methods available on computers for doing work in parallel are : Computing environment
Operating system Memory Disk Arithmetic
Computing Environment - Parallel Features

Using a timesharing environment
The computer's resources are shared among many users who are logged in simultaneously. Your process uses the cpu for a time slice, and then is rolled out while another users process is allowed to compute. The opposite of this is to use dedicated mode where yours is the only job running.
The computer overlaps computation and I/O

While one process is writing to disk, the computer lets another process do some computation
Operating System - Parallel Features

Using the UNIX background processing facility
a.out > results & man etime
Using the UNIX Cron jobs feature

You submit a job that will run at a later time. Then you can play tennis while the computer continues to work. This overlaps your computer work with your personal time.
Memory - Parallel Features

Memory Interleaving
Memory is divided into multiple banks, and consecutive data elements are interleaved among them. There are multiple ports to memory. When the data elements that are spread across the banks are needed, they can be accessed and fetched in parallel. The memory interleaving increases the memory bandwidth.
Memory - Parallel Features(Cont)

Multiple levels of the memory hierarchy

Global memory which any processor can access. Memory local to a partition of the processors. Memory local to a single processor: cache memory memory elements held in registers
Disk - Parallel Features

RAID disk
Redundant Array of Inexpensive Disk
Striped disk
When a dataset is written to disk, it is broken into pieces which are written simultaneously to different disks in a RAID disk system. When the same dataset is read back in, the pieces of the dataset are read in parallel, and the original dataset is reassembled in memory.
Arithmetic - Parallel Features

We will examine the following features that lend themselves to parallel arithmetic:
Multiple Functional Units Super Scalar arithmetic Instruction Pipelining
Parallel Machine Model (Architectures)

von Neumann Computer
MultiComputer

A multicomputer comprises a number of von Neumann computers or nodes linked by a interconnection network In a idealized network the cost of sending the a message between two nodes is independent of both node location and other network traffic but does depend on message length
Locality Scalibility Concurrency
Distributed Memory (MIMD)
MIMD means that each processor can execute separate stream of instructions on its own local data,distributed memory means that memory is distributed among the processors rather than placed in a central location
Difference between multicomputer and MIMD The cost of sending a message between multicomputer and the distributed memory is not independent of node location and other network traffic
Examples of MIMD machine
MultiProcessor or Shared Memory MIMD

All processors share access to a common memory via bus or hierarchy of buses
Example for Shared Memory MIMD

Silicon Graphics Challenge
SIMD Machines

All processors execute the same instruction stream on a different piece of data
Example of SIMD machine:

MasPar MP
Use of Cache
Why is cache used on parallel computers?
The advances in memory technology arent keeping up with processor innovations. Memory isnt speeding up as fast as the processors. One way to alleviate the performance gap between main memory and the processors is to have local cache. The cache memory can be accessed faster than the main memory. Cache keeps up with the fast processors, and keeps them busy with data.
Shared Memory
Network Cache Memory 1 processor 1 Cache Memory 2 processor 2 Cache Memory 3 processor 3
Cache Coherence
What is cache coherence?
Keeps a data element found in several caches current with each other and with the value in main memory. Various cache coherence protocols are used. snoopy protocol directory based protocol
Various Other Issues

Data Locality Issue Distributed Memory Issue Shared Memory Issue
Thanks

Arch

Cargado por

Información del documento

Descripción original:

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Arch

Cargado por

Copyright:

Formatos disponibles

Computer Architecture & Related Topics

Ben Schrooten Shawn Borchardt, Eddie Willett Vandana Chopra

Communications Architecture Dual Processor Design Parallel & Supercomputing Design

Part 1 History and Single Cpu

The ENIAC : 1946

The EDSAC(1949) and The UNIVAC I(1951)

Intel 4004 1971

Progression of The Architecture

Basic CPU Overview

Single Bus Slow Performance

Example of Triple Bus Architecture

Motherboards / Chipsets / Sockets

DRAM vs. SRAM

Parity vs. Non-Parity

SIMM vs. DIMM vs. RIMM?

RDRAM-Rambus DRAM -DRAM with a very high bandwidth (1.6 GBps)

http://www-ece.ucsd.edu/~weathers/ece30/downloads/Ch7_memory(4x).pdf http://home.cfl.rr.com/bjp/eric/ComputerMemory.html http://aggregate.org/EE380/JEL/ch1.pdf

VESA - Video Electronics Standards Association

ISA - Industry Standard Architecture

MCA - Micro Channel Bus

EISA - Extended Industry Standard Architecture

PCI Peripheral Component Interconnect

AGP Accelerated Graphics Port

Only two devices involved Is not expandable

Bus Speed (Mhz) 8.3 8.3 8.3 33 33 66 66 X 2 66 X 4

IDE - Integrated Drive Electronics

SCSI - Small Computer System Interface skuzzy

Bus Speed, MBytes/ Sec. Max. 5 10 20 20 40 40 80 160 320

Bus Width, bits 8 8 16 8 16 8 16 16 16

Max. Device Support 8 8 16 8 16 8 16 16 16

Firewire i.e. IEEE 1394 and i.LINK

Parallel port i.e. printer port

Uses a db15 port Used for joystick connection to the computer

Parallel Computer Architecture

Need for High Performance Computing

What is a High Performance Computer

How is an HPC computer made to go fast

Make the sequential computation faster Do more things in parallel

Who Makes High Performance Computers

Trends in Computer Design

Increase in the No of Processors

Real World Sequential Processes

Parallel Features of Computers

Computing Environment - Parallel Features

The computer overlaps computation and I/O

Operating System - Parallel Features

Using the UNIX Cron jobs feature

Memory - Parallel Features

Memory - Parallel Features(Cont)

Multiple levels of the memory hierarchy

Disk - Parallel Features

Arithmetic - Parallel Features

Parallel Machine Model (Architectures)

von Neumann Computer

Locality Scalibility Concurrency

Distributed Memory (MIMD)

Examples of MIMD machine

MultiProcessor or Shared Memory MIMD

Example for Shared Memory MIMD

Silicon Graphics Challenge

Example of SIMD machine:

Various Other Issues

Data Locality Issue Distributed Memory Issue Shared Memory Issue