Está en la página 1de 55

ELEC-H-473 Microprocessor architectures

Lecture 01 Dragomir Milojevic


dmilojev@ulb.ac.be

General information
1. Agenda
Lectures
2 ECTS = 12 sessions, 2h/session Monday ! from 10.00 to 12.00 (C3.122); CONFLICT 2B solved ! Friday ! from 08.00 to 10.00 (H.2213)

TPs
3 ECTS Monday ! from 14.00 to 18.00 (Solbosch, building U UA5.217) Friday ! cancelled (moved to Monday)

2. ELEC-H-473 Internet resources

http://beams.ulb.ac.be/beams/
login: etudiants, password: SquareG! (it is case sensitive) Attention: if you do not login you will not even see the notes.
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

General information
3. Conict dates 3, 7 and 21 March I am travelling (we will organise this) 4. Practical work Presence is mandatory ! Mini-projects to be implemented; each project to be presented (you
have to show the working demo); Q&A are part of the evaluation

Practical work account for 45% of the nal mark Is oral Most of the questions are theoretical but some of the questions could be closely related to the practical work You are expected not only to show the lecture content (copy slides), but be able to reason on the matter
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

5. Examen

Today

1. 2. 3. 4. 5.

Tale on computing machines IC manufacturing technology perspective Computing systems performance Example of poor usage : data centers What could happen in the future ?

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

The lecture starts with a tale on computing machines ...


how they are made and and how to push their limits ...

Do u know from where does this comes from?

So once upon a time ...


5

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

i u t r a n r h i o e o A E- mp olo nt ld e A t l h on-Chip W-CDM ma er gy o i, b o a n a t n T e , f e Dr r il: , F K C ap k m e t r in or om an ob S Un ago t r e o s a l k p i l R a m t i pu A 2 ep Sc ve i e h n o n a e mc n t h 16 ien rsit r M ame d E k= er D o h r o c u e n c 5 e l e i . e e p S E- /5 s, u l l n o a T nk ys ,am i stn ma 6, B Lib jev t F a e t T a a a r i m i A i
6

o w s r o h . t p u l T A e he , i d l l o b r a i C A f o o t r t e n e b o

aba ht lebleoh o r R R h r s h s e w o e l w D ob D b e lp p ou . o ee r l p Te epTe . u d t r a o t l aa c r o d Ta ch T t h A r m n A m i E t

w m 2 s i l p 0 t ) i i P e n h a cally prototyped . s n r 5 i g r e o a e t t l i n l e h d i t mp N there was a mathematician A et u N K cell search algorit t o i M M o e n e c gl on yw s esiD a C platforms. The fo s d n that prepared a BIG question for XXth century: l o i i i Co r c n s r a hip d W c s w CDMA in 20 . 5 ms l E Could maths be automatized? u y M p s i e = W u h tion) with the to David Hilbert, 1900 K m l t C C i 2 c -P Au D 2 n M a ro compared to si ab th o 2 c A c le ors m d l he w single Ninesilica c u S Ro = o o l o p u w b E
b in ic a 36 ca c ce lly pl l l s t e a C tf io o D c r tio M oA si)n mp co n i m w sin p gl i et K gl are N e d Ko ey N i n e in w on y w o es -C C o h r E r h i ds ipp ds E = = W W m AA m cc22 utu --C

i t l u M s A s M d D r o C w W y e K p i h C on c m De = e . l e k d s i r r l p t T . E l o o a i d e p k h r A c e t Keywords t T h m ul el Mult e l g n i

lik e

i u t a n r h o e o m o A E- p lo nt ld e A t l h on-Chip W-CD ma er gy o i, b o a a t n Dr r m il: e, F , K f C Tap k e t r in or om an ob S Un ago t r e s a l k p i R a tn n ea pu A 2 ep Sc ive mi o a E = mc d n r i k t h r e m 16 n sit M e D o h o r c u n e c 5 e l e i . e p S E- /5 es, u l l n o a T nk ys ,am stn ma 6, B Lib jev t F a e twould Ts -a am il: Av io, re ic Authors u m b m lik
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

o w s r o h . t p u A e hel , i d l l o b r a i A f o o t r t e n e b o m K
7

aba ht lebleoh o r R R h r s h s e w o e l w D ob D b e l p p ou . e o e e r ld Te pTe rtpa t.o u ld a co r T

e N d he K cell search algor t o i M M o e n e c l y e D a o s s g w n BIG QUESTION got anin answer: BIG NO ! i C platforms. The s d l o i Co r c s r ac hip d W s w CDMA in 20 . 5 m l E Kurt Gdel, 1931 y W M p i e = u h tion) with the t K m l t C C i 2 c -P Au D 2 n M r compared to a o ab th o 2 c A le ors m d l h single uS w R = Ninesilica
3 ca ce pl l C at D t io c si)n co n sin mp g K gl ar e Ko ey N n w on ey w o -C C o h E r h i ipp E = = m AA m cc22 utu

s M d D r o C w W y e K p i h C on c e o m l o o p uld w be E De = e . k s i r r l p t T . E l o o a i d e p k h r l l A c e t Keywords t e Mu T h m u

What can machines compute then?


Alain Turing, 1936

Anything that can be computed with a Turing machine !


Innite paper roll

b & * @ & a

HEAD

Alphabet

Rules

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

Turing machine : conceptual but also real !


Enigma, 1936 The Bomb, 1940

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

Mechanics are not the best medium !


Claude E. Shannon, 1937

but electric switches are, for sure !

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

10

How to make a usable switch?


ENIAC, 1944

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

11

... but the size does meter !


Transistor, 1947 Integrated Circuit, 1958

AND THE SCALING WAS BORN !!!


1cm
12
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

So, in the 80 ...


ZX81

and today:
Mobile Encyclopedia or 2,5 Penta FLOPS in a big room

1kB RAM machine


13

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

What is scaling?
IBM XT, 1983 ENIAC, 1947

MacBook pro, 2011

Tendency, transformed in a law...


Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

14

Scaling : Moores law and state of the art


Intel Dunnington, 2008

6 processors on the same die 2 billion transistors 1 cm2


Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

15

... if only the car industry did the same

Speed Fuel Price

180.000.000 km/h 0,04 l/100km 0,0003$


16

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

Technology scaling: the sky is the limit?


Light

Mask Lens Pattern

Wafer

No, but the size of the light IS!

Potential end in 2020 ?


Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

17

What to do next ?
Go for a non-exploited dimension 3D Circuits, 2010

But, even if this solution sound fantastic, it is JUST to push the limits A BIT FURTHER AWAY, for next couple of years (u r concerned)
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

18

But what about the far future?

Optical computing, 2???

Quantum computing, 2???

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

19

Lets (really) think of the future ...


a) New computational paradigm is LESS engineering problem and MORE fundamental one, as of today b) Fundamental sciences are not that predictive and require some degree of fussiness ... Just think of what Ive said in the beginning of this presentation ... and how mathematics led to all this c) We cant denitively PLAN, PROJECT MANAGE and/or PREDICT the arrival of a lets say Quantum Computer on May 15th in 2035

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

20

and the business view of it ...


Fortune, 2007
Company SW 1 2 3 4 5 6 7 8 9 10 HW 1 2 3 4 5 6 7 Hewlett-Packard Intl. Business Machines Dell Apple Xerox Sun Microsystems NCR Pitney Bowes Gateway Palm $ millions 91,658 91,424 57,095 19,315 15,895 13,068 6,142 5,811 3,981 1,579 305,968 44,282 14,380 4,143 3,805 2,951 2,575 2,362 74,498 380,466

Computer industry

390.000.000.000$

Microsoft Oracle Symantec CA Electronic Arts Adobe Systems Intuit Total SW + HW

But prots are less


even Apple moves to low-cost to get the volume
21

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

Conicting visions
We need to make an important step forward, and the current state of the art says: THAT WE ARE ABOUT TO HIT THE WALL IN BOTH WORLDS !!!

Will everything stop because of the lack of gain/or because people would like to go back to their sources (i.e. life without computers)?
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

22

Questions for XXIth century ...


That we/you need to answer

Do we want/need better technology for the future?


! Personally, I would like this to happen ...

How to motivate/enable fundamental research in this eld? How to encourage capitalism to become more human friendly and really invest in fundamental research?
After all, didnt it all started as a very romantic and COMPLETELY un-protable story?
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

23

Conclusion ...

Scaling (tech/business) model comes to an end ... Long term solution


!

Solid paradigm change (going beyond a short term solutions)


Better technology (still possible) Better system understanding (today more then ever) Co-design of SW/HW/IC technology

Short term solutions


! ! !

These lectures are about understanding HW better and how we can get the best out of it
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

24

2. IC manufacturing technology perspective

25

1.qxd 1/27/07 10:20 AM Page 29

From CMOS transistor ...


1.7 CMOS Transistors

n-type and p-type transistors:


29
The source and drain terminals are physically symmetric. SiO2 However, we say that charge flows from the source to the drain. In an nMOS transistor, the charge is carried by elecPage 30 n 1/27/07 10:20 AM nChapter 01.qxd p p trons, which flow from negan p substrate substrate tive voltage to positive voltage. In a pMOS transistor, gate gate the charge is carried by holes, which flow from positive voltsource drain source drain age to negative voltage. If we draw schematics with the (a) nMOS (b) pMOS 30 CHAPTER ONE From Zero to One most positive voltage at the top and the most negative at Figure 1.29 nMOS and pMOS transistors the bottom, the source of (negative) charges in an source drain nMOS transistor is the bottomsource gate terminal and the source of There are two flavors of MOSFETs: nMOS and pMOS (pronounced GND (positive) charges in a pMOS n-moss and p-moss). Figure 1.29 shows cross-sections of each type, transistor is the top terminal. made by sawing through a wafer and looking at it from the side. The Polysilicon source gate drain source gate drain

Current ow is controlled by the gate:


gate VDD drain

+++++++ n-type transistors, called nMOS, have regions of n-type dopants adjacent to the gate called the source and the drain and are built on a p-type - -- -- -n n n n semiconductor substrate. The pMOS transistors are just the opposite, channel consisting of p-type source and drain regions in an n-type substrate. p p substrate substrate A MOSFET behaves as a voltage-controlled switch in which the gate GND GND voltage creates an electric field that turns ON or OFF a connection (a) (b) between the source and drain. The term field effect transistor comes from this principle of operation. Let us start by exploring the operation Figure 1.30 nMOS transistor operation Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir of an nMOS transistor. The substrate of an nMOS transistor is normally tied to GND, the

26

NOT gate built with CMOS transisand the flat bar indicates V ; these chematics. The nMOS transistor, N1, Y output. The pMOS transistor, P1, pMOS transistors work in just the fashion, as might be g= 0 opposite gates g=1 e Y output. Both transistor are guessed from the bubble on their symbol. The substrate is tied to V .
DD

t GND, the channel inverts to p-type the pMOS transistor is ON. 1.7 and CMOS Transistors Page 30 Unfortunately, MOSFETs are not perfect switches. In particular, OS transistors pass 0s well but pass 1s poorly. Specifically, when the e of an nMOS transistor is at VDD, the drain will only swing between nd VDD Vt. Similarly, pMOS transistors pass 1s well but 0s rly. If However, we will see it is possible to build logic acts gates that the control is that binary, the transistor like From Zero to One in their good mode. transistors only a switch: nMOS transistors need a p-type substrate, and pMOS transistors d an n-type substrate. To build both flavors of transistors on the source drain source gate drain e chip, manufacturing processes typically start with a p-type wafer, gate VDD n implant n-type regions called wells where the pMOS transistors GND uld go. These processes that provide both flavors of transistors are +++++++ ed Complementary MOS or CMOS. CMOS processes are used to - -- -- -n transistors fabricated n d the vast n majority of all today. n channel In summary, CMOS processes give us two ptypes of electrically p substrate substrate DD trolled switches, asGND shown in Figure 1.31. The voltage at the gate (g) GND (a) the flow of current between (b) ulates the source (s) and drain (d). nMOS Figure 1.30gate nMOS is transistor operation nsistors are OFF when the 0 and ON when the gate is 1.

31

to gate (switch)

site: ON when the gate is 0 and OFF

2 switches used to make an inverter:


VDD A P1 Y N1 GND

ON. Hence, Y is connected to V nMOS transistors pass 0s well but pass 1s poorly. Specifically, when the gate oflogic an nMOS 1. transistor is passes at V , the drain will only swing between to a P1 a good 1. If s s 0 and V V . Similarly, pMOS transistors pass 1s well but 0s s However, we will see that it is possible to build logicN1 gates that OFF ndpoorly. Y is pulled a logic 0. g down toON pMOS P2 use transistors only in their good mode. d aFigure nMOS transistors need p-typed substrate, and transistors d pMOS the truth table in 1.12, we see need an n-type substrate. To build both flavors of transistors on the A same chip, manufacturing processes typically start with a p-type wafer, Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir e. then implant n-type regions called wells where the pMOS transistors
DD DD t

d transistor is OFF. d When the gate is also at VDD When the gate d , the pMOS is at GND, the channel and the pMOS transistor is ON. g inverts to p-typeOFF nMOS ON Unfortunately, MOSFETs are not perfect switches. In particular, s DD s s

Figure 1.32 NOT gate schematic

P1 N1

Y
27

Evolution of CMOS

We print features on silicon If we can print smaller features :
! ! !

We can reduce transistors size We can reduce width/length of the interconnect More functionality at higher performance for the same area (cost) ! This is SCALING

Currently:
! ! !

28, 22nm but with lots of issues 14nm Intel !DELAYED 11nm should arrive sometimes in the near future
Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

28

Scaling enables better performance


1.1 Introduction

10,000

1000

1978

1980

1984

1986

2000

2002

2004

2006

Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir relative to the Figure 1.1 Growth in processor performance since the mid-1980s. This chart plots performance VAX 11/780 as measured by the SPECint benchmarks (see Section 1.8). Prior to the mid-1980s, processor perfor-

29

Evolution of CMOS

This was the model that run smoothly for past 50 years This is not the case any more After 100nm (sub-micron, ultra deep sub-micron) technology nothing is going to be the same as before ! More then Moore paradigm
! ! !

Inversion of scaling properties Gains are not the same We start even loosing

Scaling side effects !


Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

30

Scaling side effects : a) $$$$$


Fabrication cost
100 1E+08 1E+07 1E+06 60 1E+05 1E+04 80

1/2
1E+09 1E+08 1E+07 1E+06 1E+05

Less productivity

M$
40

1E+04 1E+03 1E+03 1E+02 1E+01

20

1E+02 1E+01
250 180 130 90 65 45 32

Technologie [nm]

1E+00 Technology [nm] 1981 1985 1989 1993 1997 2001 2005 2009

Conception de masque Logiciel Conception, test, verication de circuit

Transistors per IC Transistors placed per month

Consequence ! Technology evolution and design capability do not follow the same path !
Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

31

Cost examples
disk disk 4x 4x in in 33years years
pykc - 7-Oct-01

Logic Logic DRAM DRAM

2x 2x in in 33years years 4x 4x in in 33years years

ISE1/EE2 Computing

Capacity Capacity 1.4x in 10 years 1.4x in 10 years Logic 2x Logic 2x in in 33years years DRAM 4x Lecture 1 - 17 DRAM 4x in in 33years years disk disk 4x 4x in in 33years years

2x 2x in in33years years 1.4x 1.4x in in10 10years years

Year

0 1997

1999

Speed Speed 2x 2x in in33years years

Cost of a processor

IC cost : very complex equation that is in general carefully balanced (in a very simplied form) Performance im
pykc - 7-Oct-01 ISE1/EE2 Computing
Lecture 1 - 17

1.4x pykc - 7-Oct-01 1.4x in in10 10years years 1.4x 1.4x in in10 10years years

2/2

microprocessor
1000

Cost of + a processor IC cost = Die cost + Testing cost Packaging cost


Final test yield
Packaging Cost: depends on pins, heat dissipation
Performance

Chip Die In practice:cost constantly increasing ! Chip Die Package Test & Total Packaging Cost: depends on pins, heat dissipation pins type cost Assembly cost 386DX $4 386DX $4 486DX2 $12 486DX2 $12 PowerPC $53 PowerPC601 601 $53 HP $73 HPPA PA7100 7100 $73 DEC $149 DECAlpha Alpha $149 SuperSPARC SuperSPARC $272 $272 Pentium $417 Pentium $417 pins 132 132 168 168 304 304 504 504 431 431 293 293 273 273

IC cost = Die cost + Testing cost + Packaging cost Final test yield Package Test & Total
10

100

pykc - 7-Oct-01

type cost Assembly QFP $4 $9 QFP $1 $1 $4 $9 PGA $11 $12 Chip Die Test PGA $11 $12 $35 Package 1 Chip Die $35 Package Test& & Total Total cost pins QFP $21 cost $77 pins type type cost cost Assembly Assembly QFP $3 $3 $21 $77 386DX $4 132 $4 $9 PGA $35 $16 386DX $4 $124 132 QFP QFP $1 $1 $4 $9 PGA $35 $16 $124 486DX2 $12 168 $12 $35 PGA $30 $23 486DX2 $12 $202 168 PGA PGA $11 $11 $12 0.1 $35 PGA $30 $23 $202 1965 1970 PowerPC 601 $53 304 QFP $3 $21 $77 PGA $20 $34 $326 PowerPC $53 $326 304 QFP $3 $21 $77 PGA $20 601 $34 PGA $19 $37 HP 7100 $73 504 $16 $124 PGA $19 $37 $473 HPPA PA 7100 $73 $473 504 PGA PGA $35 $35 $16 $124 N Microprocessors t DEC $149 431 $23 DECAlpha Alpha $149 431 PGA PGA $30 $30 $23 $202 $202 ISE1/EE2 Computing pykc - 7-Oct-01 Lecture 1 - 19 SuperSPARC 293 $34 SuperSPARC $272 $272 293 PGA PGA $20 $20 $34 $326 $326 Pentium $417 273 $37 Pentium $417 273 PGA PGA $19 $19 $37 $473 $473
pykc - 7-Oct-01 ISE1/EE2 Computing
Lecture 1 - 19

Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

32

Scaling2 side effects : b) performance gains WIRING DOMINATES NANOMETER DESIGN


We have gate delays that decrease, but not those of the wires : ! We can compute fast, but we communicate slowly !
35 30 25
Delay, ps

Wires must be the centerpiece of any nanometer methodology. Without such a methodology, design tea be able to create massively complex nanometer ICs in a timeframe of relevance.

In nanometer design, wiring delay accounts for the vast majority of overall delay. It is well known that de been shifting from gates to wires for quite some time. As shown in Figure 1, wiring delay exceeds gate d 0.18 micron and below in aluminum processes, and at 0.13 micron and below in copper. By 90 nm, wiring account for some 75% of the overall delay. As a result, design teams need to shift their focus from logic optimization to wire optimization.
Total delay AI, Si02

Interconnect AI, Si02

20 15 10 5 Gate delay 0 0.65 0.5 0.35 0.25 0.18 0.13 0.1 Total delay Cu, low k Interconnect Cu, low k

Gate delay Wire delay

Feature size generation, micron

Consequence ! Optimization should be done at communication level too !!! (NoCs) 2.1 THE CHANGING NATURE OF DELAY

Figure 1: Wire and gate delay in Al and Cu

In addition to dominating overall delay, nanometer design exacerbates physical effects that introduce su Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir 33 delay notably signal integrity (SI) and IR (voltage) drop. These effects can be considerable even at 0.18

Scaling side effects : c) power


Tendency is changing
(curves are normalised to dynamic power dissipation)
100

Power (normalized)

0,01

0,0001

0.0000001 1990 1995 2000 2005 2010 2015 2020

Dynamique

Statique

Technologie

Consequence ! Get 10% savings in dynamic power dissipation is not signicant any more !
Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

34

End result is :

That the CPU F do not increase anymore, to get more functionality (performance) we increase the parallelism

Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

35

System level impact of scaling


Memories and CPUs Computer Components do not scale equally !
3000
Core frequency (MHz) bus bandwidth (MTs)

67

2500 Core to bus ratios are increasing at 20% per year 2000

1500 Core freq increases 40% per year 1000

500 Bus rate inc 20% per year 1993 1995 1997 Year 1999 2001 2003

0 1991

Figure 2-7 The memory gap. (Source: Sandpile.org.) Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

36

3. Computing systems performance

37

Performance

Clock cycle (Clk)


!

Clk is there because CPU is a synchronous logic circuit (circuits with feedback) system state is stored in ip-ops Clk is used to drive all ip-ops in the design (data-ow from ops to ops, so for the combinatory circuits too) Typically one master clock that supply different clock domains

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

38

Performance

We can measure the number of cycles required to execute all


instruction within a computer program

We can count the number of executed instructions Cycles per instruction (CPI) !on average for a given program :
Total number of cycles to execute CPI = Total number of instructions in the program

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

39

Performance

CPI of each instruction (CPU data sheet)


! addition, logic operation (simple) !1 cycle, ! multiplication (complex operation) from 1 to few cycles, depending on hardware

Instruction(s) Per Cycle (IPC) for an application


! ! !

IPC = 1/CPI !but computed a posteriori (proling) Measures the parallelism if it is > 1 Most of the computers should have this TRUE !!!

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

40

Performance

Execution of a computer program (IC app instruction count):


CPU_time = Clk x CPI x IC

How to minimize CPU_time ?


!

Increase Clk ! Increase F (will not hold that long) ! look at IC scaling predictions for the future from node to node:
!"#$%&"'()*" +&"# , ! -.. /01"& /01"&'2"3()$4

56'7778'9: 9:'7778':: ::'7778'?5 ?5'7778'?;

;<=66 ;<=66 ;<=66 ;<=66

;<6= ;<6= ;<6= ;<6=

;<>>6 ?<? ;<@:6 ;<>:> ;<>>6 ?<;A ;<@6 ;<>5A ;<>>6 ?<;6 ;<@=6 ;<>>5 ;<>>6 ?<;5 ;<@A6 ;<>=?

?<;@> ?<?96 ?<?>: ?<?=6

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

41

Performance

How to minimize CPU_time ?


! !

Increase Clk ! Increase F (will not hold that long) Reduce CPI ! Parallelism: inter et intra CPU (multi, scalar, super-pipeline etc.) Reduce IC ! Algorithm, SIMD, implementation (SW),

Certain mechanisms are automatic, others are not !


! Optimizations as function of the architecture

You need to know HW and the way that operate to be able to


exploit at best all the possibilities that are there !
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

42

Solutions?

Improve tech Increase parallelism multi, many core ! multi-processor Better usage at application level After all, all these systems are used badly Lets see this on a concrete example DATA CENTERS!!! (cloud computing)

Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

43

4. Example of poor usage: Data Centers

44

Data centers are power hungry !


Board Rack Building

in all, thousands of CPUs using considerable power. Did BIG ones (MS, Yahoo, etc.) became GREEN ?

$$$ Electricity bill $$$ In 2007: 7.2 Billions US$


Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

45

Data centers use traditional cores


Heavily pipelined Bunch of FPUs SIMD support Big, shared
caches

Complex circuits,
built to suit any application
(as long as it is not embedded)

Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

46

How good multi-core really is?


B/W unused! Cores too fat!

Too few cores! 10 MB (80%) waste of silicon (no reuse)!


Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

47

But how good parallelism really is?

Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

48

But how good parallelism really is?

Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

49

Learnings

One ts all solution was the only one economically viable


!

Same CPU: for gaming, scientic computing, grandmas wordprocessing and data center Worked very well in the past (Intel), but ... Doesnt work any more !

! !

Computing usage habit changed: we eventually went back to the


terminal/main frame concept from the past (tablet/cloud)
!

Small/or not embedded computing power with IO capacity

Demand on high-perf CPUs is slowing down, much more then


even almighty Intel predicted: 14nm fab is delayed !
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir

50

5. What could happen in the future ?


(this is not a tale)

51

Computer classes and important issues

Desktop Computing
!

Price-performance ratio and graphics capabilities (gaming!, look at NVIDIA)

Servers
!

Throughput, availability, scalability

Embedded Computers
!

Price, power consumption, application-specic performance


52

Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

Computer classes Winds of change

Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

53

What could happen in the near future ?



Desktop Computing !disappears, Intel opens their fab and stop working on CPUs Servers !made using low power cores like ARM Embedded Computers !made using the same lowpower cores used for servers (just look at the Apple products: iPhone/iPad) What about CPUs?
! !

CPU architectures are stable Instructions set do not change much (although they can be adapted to a particular app) We need to start really using them ! plus system integration
Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

54

So whats for us there?

Whatever underlaying tech will be used (even in the far future) some processing devices will always be there
!

Atomic adder it is still an adder

Processing device = CPU Architectural concepts of the CPU may vary depending on the technology offering, but lots of fundamental concepts will probably remain the same
!

Even if low-power CPUs are killing desktop CPUs they still


! ! !

Have pipelined structure Use reg les and ALUs to compute things Parallelize what ever could be done in parallel & many others
Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir

55

También podría gustarte