Está en la página 1de 88

Computer Architecture & Related Topics

Ben Schrooten Shawn Borchardt, Eddie Willett Vandana Chopra

Presentation Topics
   

Computer Architecture History Single Cpu Design GPU Design (Brief) Memory Architecture

Communications Architecture Dual Processor Design Parallel & Supercomputing Design

Part 1 History and Single Cpu

Ben Schrooten

HISTORY!!!
One of the first computing devices to come about was . .

The ABACUS!

The ENIAC : 1946

Completed:1946 Programmed:plug board and switches Speed:5,000 operations per second Input/output:cards, lights, switches, plugs Floor space:1,000 square feet

The EDSAC(1949) and The UNIVAC I(1951)


UNIVAC EDSAC
Speed:1,905 operations per second

Technology:vacuum tubes unityper, printer Input/output:magnetic tape, Memory:1K size:1,000 12-digit words in delay Memory words Speed:714 operations per second
Memory type:delay lines, magnetic tape lines

First practical stored-program Technology:serial vacuum tubes, delay lines, computer tape magnetic
Floor space:943 cubic feet Cost:F.O.B. factory $750,000 plus $185,000 for a high speed printer

Intel 4004 1971

Progression of The Architecture


Vacuum tubes -- 1940 1950 Transistors -- 1950 1964 Integrated circuits -- 1964 1971 Microprocessor chips -- 1971 present

Current CPUArchitecture

Basic CPU Overview

Single Bus Slow Performance

Example of Triple Bus Architecture

Motherboards / Chipsets / Sockets


OH MY!

Chipset

In charge of: Memory Controller EIDE Controller PCI Bridge Real Time Clock DMA Controller IRDA Controller Keyboard Mouse Secondary Cache Low-Power CMOS SRAM

Sockets
Socket 4 & 5 Socket 7 Socket 8 Slot 1 Slot A

GPUs
Allows for Real Time Rendering Graphics on a small PC GPUs are true processing units Pentium 4 contains 42 million transistors on a 0.18 micron process Geforce3 contains 57 million transistors on a 0.15 micron manufacturing process

More GPU

Sources
Source for DX4100 Picture Oneironaut http://oneironaut.tripod.com/dx4100.jpg Source for Computer Architecture Overview Picture http://www.eecs.tulane.edu/courses/cpen201/slides/201Intro.pdf Pictures of CPU Overview, Single Bus Architecture, Tripe Bus Architecture Roy M. Wnek Virginia Tech. CS5515 Lecture 5 http://www.nvc.cs.vt.edu/~wnek/cs5515/slide/Grad_Arch_5.PDF Historical Data and Pictures The Computer Museum History Center. http://www.computerhistory.org/ Intel Motherboard Diagram/Pentium 4 Picture Intel Corporation http://www.intel.com The Abacus Abacus-Online-Museum http://www.hh.schule.de/metalltechnikdidaktik/users/luetjens/abakus/china/china.htm Information Also from Clint Fleri http://www.geocities.com/cfleri/ Memory Functionality Dana Angluin http://zoo.cs.yale.edu/classes/cs201/Fall_2001/handouts/lecture -13/node4.html Benchmark Graphics Digital Life http://www.digit-life.com/articles/pentium4/index3.html Chipset and Socket Information Motherboards.org http://www.motherboards.org/articlesd/techplanations/17_2.html Amd Processor Pictures Toms hardware http://www6.tomshardware.com/search/search.html?category=a ll&words=Athlon GPU Info 4th Wave Inc. http://www.wave-report.com/tutorials/gpu.htm NV20 Design Pictures Digital Life http://www.digit-life.com/articles/nv20/

Main Memory

Memory Hierarchy

DRAM vs. SRAM


DRAM is short for Dynamic Random Access Memory SRAM is short for Static Random Access Memory DRAM is dynamic in that, unlike SRAM, it needs to have its storage cells refreshed or given a new electronic charge every few milliseconds. SRAM does not need refreshing because it operates on the principle of moving current that is switched in one of two directions rather than a storage cell that holds a charge in place.

Parity vs. Non-Parity




Parity is error detection that was developed to notify the user of any data errors. By adding a single bit to each byte of data, this bit is responsible for checking the integrity of the other 8 bits while the byte is moved or stored. Since memory errors are so rare, many of todays memory is non-parity.

SIMM vs. DIMM vs. RIMM?


  

SIMM-Single In-line Memory Module DIMM-Dual In-line Memory Modules RIMM-Rambus In-line Memory Modules SIMMs offer a 32-bit data path while DIMMs offer a 64bit data path. SIMMs have to be used in pairs on Pentiums and more recent processors RIMM is the one of the latest designs. Because of the fast data transfer rate of these modules, a heat spreader (aluminum plate covering) is used for each module

Evolution of Memory
1970 1987 1995 1997 1998 1999 1999/2000 2000 2001 RAM / DRAM FPM EDO PC66 SDRAM PC100 SDRAM RDRAM PC133 SDRAM DDR SDRAM EDRAM 4.77 MHz 20 MHz 20 MHz 66 MHz 100 MHz 800 MHz 133 MHz 266 MHz 450MHz

FPM-Fast Page Mode DRAM -traditional DRAM EDO-Extended Data Output -increases the Read cycle between Memory and the CPU SDRAM-Synchronous DRAM -synchronizes itself with the CPU bus and runs at higher clock speeds

RDRAM-Rambus DRAM -DRAM with a very high bandwidth (1.6 GBps)

EDRAM-Enhanced DRAM -(dynamic or power-refreshed RAM) that includes a small amount of static RAM (SRAM) inside a larger amount of DRAM so that many memory accesses will be to the faster SRAM. EDRAM is sometimes used as L1 and L2 memory and, together with Enhanced Synchronous Dynamic DRAM, is known as cached DRAM.

Read Operation
On a read the CPU will first try to find the data in the cache, if it is not there the cache will get updated from the main memory and then return the data to the CPU.

Write Operation
On a write the CPU will write the information into the cache and the main memory.

References
  

http://www-ece.ucsd.edu/~weathers/ece30/downloads/Ch7_memory(4x).pdf http://home.cfl.rr.com/bjp/eric/ComputerMemory.html http://aggregate.org/EE380/JEL/ch1.pdf

Defining a Bus


A parallel circuit that connects the major components of a computer, allowing the transfer of electric impulses from one connected component to any other

VESA - Video Electronics Standards Association


   

32 bit bus Found mostly on 486 machines Relied on the 486 processor to function People started to switch to the PCI bus because of this Otherwise known as VLB

ISA - Industry Standard Architecture


   

Very old technology Bus speed 8mhz Speed of 42.4 Mb/s maximum Very few ISA ports are found in modern machines.

MCA - Micro Channel Bus


  

IBMs attempt to compete with the ISA bus 32 bit bus Automatically configured cards (Like Plug and Play) Not compatible with ISA

EISA - Extended Industry Standard Architecture


    

Attempt to compete with IBMs MCA bus Ran on a 8.33Mhz cycle rate 32 bit slots Backward compatible with ISA Went the way of MCA

PCI Peripheral Component Interconnect


      

Speeds up to 960 Mb/s Bus speed of 33mhz 16-bit architecture Developed by Intel in 1993 Synchronous or Asynchronous PCI popularized Plug and Play Runs at half of the system bus speed

PCI X
    

Up to 133 Mhz bus speed 64-bit bandwidth 1GB/sec throughput Backwards compatible with all PCI Primarily developed for increased I/O demands of technologies such as Fibre Channel, Gigabit Ethernet and Ultra3 SCSI.

AGP Accelerated Graphics Port


 

Essentially a high speed PCI Port Capable of running at 4 times PCI bus speed. (133mhz) Used for High speed 3D graphics cards Considered a port not a bus
 

Only two devices involved Is not expandable

BUS

Width (bits) 8 16 32 32 32 32 32 32

8-bit ISA 16-bit ISA EISA VLB PCI AGP AGP(X2) AGP(X4)

Bus Speed (Mhz) 8.3 8.3 8.3 33 33 66 66 X 2 66 X 4

Bus Bandwith (Mbytes/sec) 7.9 15.9 31.8 127.2 127.2 254.3 508.6 1017.3

IDE - Integrated Drive Electronics




Tons of other names: ATA, ATA/ATAPI, EIDE, ATA-2, Fast ATA, ATA-3, Ultra ATA, Ultra DMA Good performance at a cheap cost Most widely used interface for hard disks

SCSI - Small Computer System Interface skuzzy


 

Capable of handling internal/external peripherals Speed anywhere from 80 640 Mb/s Many types of SCSI

TYPE

Bus Speed, MBytes/ Sec. Max. 5 10 20 20 40 40 80 160 320

Bus Width, bits 8 8 16 8 16 8 16 16 16

Max. Device Support 8 8 16 8 16 8 16 16 16

SCSI-1 Fast SCSI Fast Wide SCSI Ultra SCSI Ultra Wide SCSI Ultra2 SCSI Wide Ultra2 SCSI Ultra3 SCSI Ultra320 SCSI

Serial Port
 

Uses DB9 or DB25 connector Adheres to RS-232c spec Capable of speeds up to 115kb/sec

USB


1.0
   

hot plug-and-play Full speed USB devices signal at 12Mb/s Low speed devices use a 1.5Mb/s subchannel. Up to 127 devices chained together data rate of 480 mega bits per second

2.0


USB On-The-Go
 

For portable devices. Limited host capability to communicate with selected other USB peripherals A small USB connector to fit the mobile form factor

Firewire i.e. IEEE 1394 and i.LINK


   

High speed serial port 400 mbps transfer rate 30 times faster than USB 1.0 hot plug-and-play

PS/2 Port


Mini Din Plug with 6 pins Mouse port and keyboard port Developed by IBM

 

Parallel port i.e. printer port


Old type  Two new types  ECP (extended capabilities port) and EPP (enhanced parallel port)



Ten times faster than old parallel port Capable of bi-directional communication.

Game Port
 

Uses a db15 port Used for joystick connection to the computer

Parallel Computer Architecture


By Vandana Chopra

Need for High Performance Computing




Theres a need for tremendous computational capabilities in science engineering and business There are applications that require gigabytes of memory and gigaflops of performance

What is a High Performance Computer




Definition of a High Performance computer : An HPC computer can solve large problems in a reasonable amount of time Characteristics : Fast Computation Large memory High speed interconnect High speed input /output

How is an HPC computer made to go fast




Make the sequential computation faster Do more things in parallel

Applications
1> Weather Prediction 2> Aircraft and Automobile Design 3> Artificial Intelligence 4> Entertainment Industry 5> Military Applications 6> Financial Analysis 7> Seismic exploration 8> Automobile crash testing

Who Makes High Performance Computers


* SGI/Cray
Power Challenge Array Origin-2000 T3D/T3E * HP/Convex SPP-1200 SPP-2000 * IBM SP2 * Tandem

Trends in Computer Design




Performance of the fastest computer has grown exponentially from 1945 to the present averaging a factor of 10 every five years The growth flattened somewhat in 1980s but is accelerating again as massively parallel computers became available

Increase in the No of Processors

Real World Sequential Processes


Sequential processes we find in the world. The passage of time is a classic example of a sequential process.
Day breaks as the sun rises in the morning. Daytime has its sunlight and bright sky. Dusk sees the sun setting in the horizon. Nighttime descends with its moonlight, dark sky and stars.

Parallel Processes
Music An orchestra performance, where every instrument plays its own part, and playing together they make beautiful music.

Parallel Features of Computers


Various methods available on computers for doing work in parallel are : Computing environment
Operating system Memory Disk Arithmetic

Computing Environment - Parallel Features


Using a timesharing environment
The computer's resources are shared among many users who are logged in simultaneously. Your process uses the cpu for a time slice, and then is rolled out while another users process is allowed to compute. The opposite of this is to use dedicated mode where yours is the only job running.

The computer overlaps computation and I/O


While one process is writing to disk, the computer lets another process do some computation

Operating System - Parallel Features


Using the UNIX background processing facility
a.out > results & man etime

Using the UNIX Cron jobs feature


You submit a job that will run at a later time. Then you can play tennis while the computer continues to work. This overlaps your computer work with your personal time.

Memory - Parallel Features


Memory Interleaving
Memory is divided into multiple banks, and consecutive data elements are interleaved among them. There are multiple ports to memory. When the data elements that are spread across the banks are needed, they can be accessed and fetched in parallel. The memory interleaving increases the memory bandwidth.

Memory - Parallel Features(Cont)




Multiple levels of the memory hierarchy


Global memory which any processor can access. Memory local to a partition of the processors. Memory local to a single processor: cache memory memory elements held in registers

Disk - Parallel Features


RAID disk
Redundant Array of Inexpensive Disk

Striped disk
When a dataset is written to disk, it is broken into pieces which are written simultaneously to different disks in a RAID disk system. When the same dataset is read back in, the pieces of the dataset are read in parallel, and the original dataset is reassembled in memory.

Arithmetic - Parallel Features


We will examine the following features that lend themselves to parallel arithmetic:
Multiple Functional Units Super Scalar arithmetic Instruction Pipelining

Parallel Machine Model (Architectures)




von Neumann Computer

MultiComputer


A multicomputer comprises a number of von Neumann computers or nodes linked by a interconnection network In a idealized network the cost of sending the a message between two nodes is independent of both node location and other network traffic but does depend on message length

Locality Scalibility Concurrency

Distributed Memory (MIMD)

MIMD means that each processor can execute separate stream of instructions on its own local data,distributed memory means that memory is distributed among the processors rather than placed in a central location

Difference between multicomputer and MIMD The cost of sending a message between multicomputer and the distributed memory is not independent of node location and other network traffic

Examples of MIMD machine

MultiProcessor or Shared Memory MIMD




All processors share access to a common memory via bus or hierarchy of buses

Example for Shared Memory MIMD




Silicon Graphics Challenge

SIMD Machines


All processors execute the same instruction stream on a different piece of data

Example of SIMD machine:




MasPar MP

Use of Cache
Why is cache used on parallel computers?
The advances in memory technology arent keeping up with processor innovations. Memory isnt speeding up as fast as the processors. One way to alleviate the performance gap between main memory and the processors is to have local cache. The cache memory can be accessed faster than the main memory. Cache keeps up with the fast processors, and keeps them busy with data.

Shared Memory

Network Cache Memory 1 processor 1 Cache Memory 2 processor 2 Cache Memory 3 processor 3

Cache Coherence
What is cache coherence?
Keeps a data element found in several caches current with each other and with the value in main memory. Various cache coherence protocols are used. snoopy protocol directory based protocol

Various Other Issues


  

Data Locality Issue Distributed Memory Issue Shared Memory Issue

Thanks

También podría gustarte