Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Presentation Topics
Computer Architecture History Single Cpu Design GPU Design (Brief) Memory Architecture
Ben Schrooten
HISTORY!!!
One of the first computing devices to come about was . .
The ABACUS!
Completed:1946 Programmed:plug board and switches Speed:5,000 operations per second Input/output:cards, lights, switches, plugs Floor space:1,000 square feet
Technology:vacuum tubes unityper, printer Input/output:magnetic tape, Memory:1K size:1,000 12-digit words in delay Memory words Speed:714 operations per second
Memory type:delay lines, magnetic tape lines
First practical stored-program Technology:serial vacuum tubes, delay lines, computer tape magnetic
Floor space:943 cubic feet Cost:F.O.B. factory $750,000 plus $185,000 for a high speed printer
Current CPUArchitecture
Chipset
In charge of: Memory Controller EIDE Controller PCI Bridge Real Time Clock DMA Controller IRDA Controller Keyboard Mouse Secondary Cache Low-Power CMOS SRAM
Sockets
Socket 4 & 5 Socket 7 Socket 8 Slot 1 Slot A
GPUs
Allows for Real Time Rendering Graphics on a small PC GPUs are true processing units Pentium 4 contains 42 million transistors on a 0.18 micron process Geforce3 contains 57 million transistors on a 0.15 micron manufacturing process
More GPU
Sources
Source for DX4100 Picture Oneironaut http://oneironaut.tripod.com/dx4100.jpg Source for Computer Architecture Overview Picture http://www.eecs.tulane.edu/courses/cpen201/slides/201Intro.pdf Pictures of CPU Overview, Single Bus Architecture, Tripe Bus Architecture Roy M. Wnek Virginia Tech. CS5515 Lecture 5 http://www.nvc.cs.vt.edu/~wnek/cs5515/slide/Grad_Arch_5.PDF Historical Data and Pictures The Computer Museum History Center. http://www.computerhistory.org/ Intel Motherboard Diagram/Pentium 4 Picture Intel Corporation http://www.intel.com The Abacus Abacus-Online-Museum http://www.hh.schule.de/metalltechnikdidaktik/users/luetjens/abakus/china/china.htm Information Also from Clint Fleri http://www.geocities.com/cfleri/ Memory Functionality Dana Angluin http://zoo.cs.yale.edu/classes/cs201/Fall_2001/handouts/lecture -13/node4.html Benchmark Graphics Digital Life http://www.digit-life.com/articles/pentium4/index3.html Chipset and Socket Information Motherboards.org http://www.motherboards.org/articlesd/techplanations/17_2.html Amd Processor Pictures Toms hardware http://www6.tomshardware.com/search/search.html?category=a ll&words=Athlon GPU Info 4th Wave Inc. http://www.wave-report.com/tutorials/gpu.htm NV20 Design Pictures Digital Life http://www.digit-life.com/articles/nv20/
Main Memory
Memory Hierarchy
Parity is error detection that was developed to notify the user of any data errors. By adding a single bit to each byte of data, this bit is responsible for checking the integrity of the other 8 bits while the byte is moved or stored. Since memory errors are so rare, many of todays memory is non-parity.
SIMM-Single In-line Memory Module DIMM-Dual In-line Memory Modules RIMM-Rambus In-line Memory Modules SIMMs offer a 32-bit data path while DIMMs offer a 64bit data path. SIMMs have to be used in pairs on Pentiums and more recent processors RIMM is the one of the latest designs. Because of the fast data transfer rate of these modules, a heat spreader (aluminum plate covering) is used for each module
Evolution of Memory
1970 1987 1995 1997 1998 1999 1999/2000 2000 2001 RAM / DRAM FPM EDO PC66 SDRAM PC100 SDRAM RDRAM PC133 SDRAM DDR SDRAM EDRAM 4.77 MHz 20 MHz 20 MHz 66 MHz 100 MHz 800 MHz 133 MHz 266 MHz 450MHz
FPM-Fast Page Mode DRAM -traditional DRAM EDO-Extended Data Output -increases the Read cycle between Memory and the CPU SDRAM-Synchronous DRAM -synchronizes itself with the CPU bus and runs at higher clock speeds
EDRAM-Enhanced DRAM -(dynamic or power-refreshed RAM) that includes a small amount of static RAM (SRAM) inside a larger amount of DRAM so that many memory accesses will be to the faster SRAM. EDRAM is sometimes used as L1 and L2 memory and, together with Enhanced Synchronous Dynamic DRAM, is known as cached DRAM.
Read Operation
On a read the CPU will first try to find the data in the cache, if it is not there the cache will get updated from the main memory and then return the data to the CPU.
Write Operation
On a write the CPU will write the information into the cache and the main memory.
References
Defining a Bus
A parallel circuit that connects the major components of a computer, allowing the transfer of electric impulses from one connected component to any other
32 bit bus Found mostly on 486 machines Relied on the 486 processor to function People started to switch to the PCI bus because of this Otherwise known as VLB
Very old technology Bus speed 8mhz Speed of 42.4 Mb/s maximum Very few ISA ports are found in modern machines.
IBMs attempt to compete with the ISA bus 32 bit bus Automatically configured cards (Like Plug and Play) Not compatible with ISA
Attempt to compete with IBMs MCA bus Ran on a 8.33Mhz cycle rate 32 bit slots Backward compatible with ISA Went the way of MCA
Speeds up to 960 Mb/s Bus speed of 33mhz 16-bit architecture Developed by Intel in 1993 Synchronous or Asynchronous PCI popularized Plug and Play Runs at half of the system bus speed
PCI X
Up to 133 Mhz bus speed 64-bit bandwidth 1GB/sec throughput Backwards compatible with all PCI Primarily developed for increased I/O demands of technologies such as Fibre Channel, Gigabit Ethernet and Ultra3 SCSI.
Essentially a high speed PCI Port Capable of running at 4 times PCI bus speed. (133mhz) Used for High speed 3D graphics cards Considered a port not a bus
BUS
Width (bits) 8 16 32 32 32 32 32 32
8-bit ISA 16-bit ISA EISA VLB PCI AGP AGP(X2) AGP(X4)
Bus Bandwith (Mbytes/sec) 7.9 15.9 31.8 127.2 127.2 254.3 508.6 1017.3
Tons of other names: ATA, ATA/ATAPI, EIDE, ATA-2, Fast ATA, ATA-3, Ultra ATA, Ultra DMA Good performance at a cheap cost Most widely used interface for hard disks
Capable of handling internal/external peripherals Speed anywhere from 80 640 Mb/s Many types of SCSI
TYPE
SCSI-1 Fast SCSI Fast Wide SCSI Ultra SCSI Ultra Wide SCSI Ultra2 SCSI Wide Ultra2 SCSI Ultra3 SCSI Ultra320 SCSI
Serial Port
Uses DB9 or DB25 connector Adheres to RS-232c spec Capable of speeds up to 115kb/sec
USB
1.0
hot plug-and-play Full speed USB devices signal at 12Mb/s Low speed devices use a 1.5Mb/s subchannel. Up to 127 devices chained together data rate of 480 mega bits per second
2.0
USB On-The-Go
For portable devices. Limited host capability to communicate with selected other USB peripherals A small USB connector to fit the mobile form factor
High speed serial port 400 mbps transfer rate 30 times faster than USB 1.0 hot plug-and-play
PS/2 Port
Mini Din Plug with 6 pins Mouse port and keyboard port Developed by IBM
Ten times faster than old parallel port Capable of bi-directional communication.
Game Port
Theres a need for tremendous computational capabilities in science engineering and business There are applications that require gigabytes of memory and gigaflops of performance
Definition of a High Performance computer : An HPC computer can solve large problems in a reasonable amount of time Characteristics : Fast Computation Large memory High speed interconnect High speed input /output
Applications
1> Weather Prediction 2> Aircraft and Automobile Design 3> Artificial Intelligence 4> Entertainment Industry 5> Military Applications 6> Financial Analysis 7> Seismic exploration 8> Automobile crash testing
Performance of the fastest computer has grown exponentially from 1945 to the present averaging a factor of 10 every five years The growth flattened somewhat in 1980s but is accelerating again as massively parallel computers became available
Parallel Processes
Music An orchestra performance, where every instrument plays its own part, and playing together they make beautiful music.
Striped disk
When a dataset is written to disk, it is broken into pieces which are written simultaneously to different disks in a RAID disk system. When the same dataset is read back in, the pieces of the dataset are read in parallel, and the original dataset is reassembled in memory.
MultiComputer
A multicomputer comprises a number of von Neumann computers or nodes linked by a interconnection network In a idealized network the cost of sending the a message between two nodes is independent of both node location and other network traffic but does depend on message length
MIMD means that each processor can execute separate stream of instructions on its own local data,distributed memory means that memory is distributed among the processors rather than placed in a central location
Difference between multicomputer and MIMD The cost of sending a message between multicomputer and the distributed memory is not independent of node location and other network traffic
All processors share access to a common memory via bus or hierarchy of buses
SIMD Machines
All processors execute the same instruction stream on a different piece of data
MasPar MP
Use of Cache
Why is cache used on parallel computers?
The advances in memory technology arent keeping up with processor innovations. Memory isnt speeding up as fast as the processors. One way to alleviate the performance gap between main memory and the processors is to have local cache. The cache memory can be accessed faster than the main memory. Cache keeps up with the fast processors, and keeps them busy with data.
Shared Memory
Network Cache Memory 1 processor 1 Cache Memory 2 processor 2 Cache Memory 3 processor 3
Cache Coherence
What is cache coherence?
Keeps a data element found in several caches current with each other and with the value in main memory. Various cache coherence protocols are used. snoopy protocol directory based protocol
Thanks