Documentos de Académico
Documentos de Profesional
Documentos de Cultura
General information
1. Agenda
Lectures
2 ECTS = 12 sessions, 2h/session Monday ! from 10.00 to 12.00 (C3.122); CONFLICT 2B solved ! Friday ! from 08.00 to 10.00 (H.2213)
TPs
3 ECTS Monday ! from 14.00 to 18.00 (Solbosch, building U UA5.217) Friday ! cancelled (moved to Monday)
http://beams.ulb.ac.be/beams/
login: etudiants, password: SquareG! (it is case sensitive) Attention: if you do not login you will not even see the notes.
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir
General information
3. Conict dates 3, 7 and 21 March I am travelling (we will organise this) 4. Practical work Presence is mandatory ! Mini-projects to be implemented; each project to be presented (you
have to show the working demo); Q&A are part of the evaluation
Practical work account for 45% of the nal mark Is oral Most of the questions are theoretical but some of the questions could be closely related to the practical work You are expected not only to show the lecture content (copy slides), but be able to reason on the matter
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir
5. Examen
Today
1. 2. 3. 4. 5.
Tale on computing machines IC manufacturing technology perspective Computing systems performance Example of poor usage : data centers What could happen in the future ?
i u t r a n r h i o e o A E- mp olo nt ld e A t l h on-Chip W-CDM ma er gy o i, b o a n a t n T e , f e Dr r il: , F K C ap k m e t r in or om an ob S Un ago t r e o s a l k p i l R a m t i pu A 2 ep Sc ve i e h n o n a e mc n t h 16 ien rsit r M ame d E k= er D o h r o c u e n c 5 e l e i . e e p S E- /5 s, u l l n o a T nk ys ,am i stn ma 6, B Lib jev t F a e t T a a a r i m i A i
6
o w s r o h . t p u l T A e he , i d l l o b r a i C A f o o t r t e n e b o
w m 2 s i l p 0 t ) i i P e n h a cally prototyped . s n r 5 i g r e o a e t t l i n l e h d i t mp N there was a mathematician A et u N K cell search algorit t o i M M o e n e c gl on yw s esiD a C platforms. The fo s d n that prepared a BIG question for XXth century: l o i i i Co r c n s r a hip d W c s w CDMA in 20 . 5 ms l E Could maths be automatized? u y M p s i e = W u h tion) with the to David Hilbert, 1900 K m l t C C i 2 c -P Au D 2 n M a ro compared to si ab th o 2 c A c le ors m d l he w single Ninesilica c u S Ro = o o l o p u w b E
b in ic a 36 ca c ce lly pl l l s t e a C tf io o D c r tio M oA si)n mp co n i m w sin p gl i et K gl are N e d Ko ey N i n e in w on y w o es -C C o h r E r h i ds ipp ds E = = W W m AA m cc22 utu --C
i t l u M s A s M d D r o C w W y e K p i h C on c m De = e . l e k d s i r r l p t T . E l o o a i d e p k h r A c e t Keywords t T h m ul el Mult e l g n i
lik e
i u t a n r h o e o m o A E- p lo nt ld e A t l h on-Chip W-CD ma er gy o i, b o a a t n Dr r m il: e, F , K f C Tap k e t r in or om an ob S Un ago t r e s a l k p i R a tn n ea pu A 2 ep Sc ive mi o a E = mc d n r i k t h r e m 16 n sit M e D o h o r c u n e c 5 e l e i . e p S E- /5 es, u l l n o a T nk ys ,am stn ma 6, B Lib jev t F a e twould Ts -a am il: Av io, re ic Authors u m b m lik
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir
o w s r o h . t p u A e hel , i d l l o b r a i A f o o t r t e n e b o m K
7
e N d he K cell search algor t o i M M o e n e c l y e D a o s s g w n BIG QUESTION got anin answer: BIG NO ! i C platforms. The s d l o i Co r c s r ac hip d W s w CDMA in 20 . 5 m l E Kurt Gdel, 1931 y W M p i e = u h tion) with the t K m l t C C i 2 c -P Au D 2 n M r compared to a o ab th o 2 c A le ors m d l h single uS w R = Ninesilica
3 ca ce pl l C at D t io c si)n co n sin mp g K gl ar e Ko ey N n w on ey w o -C C o h E r h i ipp E = = m AA m cc22 utu
s M d D r o C w W y e K p i h C on c e o m l o o p uld w be E De = e . k s i r r l p t T . E l o o a i d e p k h r l l A c e t Keywords t e Mu T h m u
b & * @ & a
HEAD
Alphabet
Rules
10
11
and today:
Mobile Encyclopedia or 2,5 Penta FLOPS in a big room
What is scaling?
IBM XT, 1983 ENIAC, 1947
14
15
Wafer
17
What to do next ?
Go for a non-exploited dimension 3D Circuits, 2010
But, even if this solution sound fantastic, it is JUST to push the limits A BIT FURTHER AWAY, for next couple of years (u r concerned)
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir
18
19
20
Computer industry
390.000.000.000$
Conicting visions
We need to make an important step forward, and the current state of the art says: THAT WE ARE ABOUT TO HIT THE WALL IN BOTH WORLDS !!!
Will everything stop because of the lack of gain/or because people would like to go back to their sources (i.e. life without computers)?
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir
22
How to motivate/enable fundamental research in this eld? How to encourage capitalism to become more human friendly and really invest in fundamental research?
After all, didnt it all started as a very romantic and COMPLETELY un-protable story?
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir
23
Conclusion ...
These lectures are about understanding HW better and how we can get the best out of it
Universit libre de Bruxelles/Facult des Sciences Appliques/BEAMS/MILOJEVIC Dragomir
24
25
+++++++ n-type transistors, called nMOS, have regions of n-type dopants adjacent to the gate called the source and the drain and are built on a p-type - -- -- -n n n n semiconductor substrate. The pMOS transistors are just the opposite, channel consisting of p-type source and drain regions in an n-type substrate. p p substrate substrate A MOSFET behaves as a voltage-controlled switch in which the gate GND GND voltage creates an electric field that turns ON or OFF a connection (a) (b) between the source and drain. The term field effect transistor comes from this principle of operation. Let us start by exploring the operation Figure 1.30 nMOS transistor operation Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir of an nMOS transistor. The substrate of an nMOS transistor is normally tied to GND, the
26
NOT gate built with CMOS transisand the flat bar indicates V ; these chematics. The nMOS transistor, N1, Y output. The pMOS transistor, P1, pMOS transistors work in just the fashion, as might be g= 0 opposite gates g=1 e Y output. Both transistor are guessed from the bubble on their symbol. The substrate is tied to V .
DD
t GND, the channel inverts to p-type the pMOS transistor is ON. 1.7 and CMOS Transistors Page 30 Unfortunately, MOSFETs are not perfect switches. In particular, OS transistors pass 0s well but pass 1s poorly. Specifically, when the e of an nMOS transistor is at VDD, the drain will only swing between nd VDD Vt. Similarly, pMOS transistors pass 1s well but 0s rly. If However, we will see it is possible to build logic acts gates that the control is that binary, the transistor like From Zero to One in their good mode. transistors only a switch: nMOS transistors need a p-type substrate, and pMOS transistors d an n-type substrate. To build both flavors of transistors on the source drain source gate drain e chip, manufacturing processes typically start with a p-type wafer, gate VDD n implant n-type regions called wells where the pMOS transistors GND uld go. These processes that provide both flavors of transistors are +++++++ ed Complementary MOS or CMOS. CMOS processes are used to - -- -- -n transistors fabricated n d the vast n majority of all today. n channel In summary, CMOS processes give us two ptypes of electrically p substrate substrate DD trolled switches, asGND shown in Figure 1.31. The voltage at the gate (g) GND (a) the flow of current between (b) ulates the source (s) and drain (d). nMOS Figure 1.30gate nMOS is transistor operation nsistors are OFF when the 0 and ON when the gate is 1.
31
to gate (switch)
ON. Hence, Y is connected to V nMOS transistors pass 0s well but pass 1s poorly. Specifically, when the gate oflogic an nMOS 1. transistor is passes at V , the drain will only swing between to a P1 a good 1. If s s 0 and V V . Similarly, pMOS transistors pass 1s well but 0s s However, we will see that it is possible to build logicN1 gates that OFF ndpoorly. Y is pulled a logic 0. g down toON pMOS P2 use transistors only in their good mode. d aFigure nMOS transistors need p-typed substrate, and transistors d pMOS the truth table in 1.12, we see need an n-type substrate. To build both flavors of transistors on the A same chip, manufacturing processes typically start with a p-type wafer, Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir e. then implant n-type regions called wells where the pMOS transistors
DD DD t
d transistor is OFF. d When the gate is also at VDD When the gate d , the pMOS is at GND, the channel and the pMOS transistor is ON. g inverts to p-typeOFF nMOS ON Unfortunately, MOSFETs are not perfect switches. In particular, s DD s s
P1 N1
Y
27
Evolution of CMOS
We print features on silicon If we can print smaller features :
! ! !
We can reduce transistors size We can reduce width/length of the interconnect More functionality at higher performance for the same area (cost) ! This is SCALING
Currently:
! ! !
28, 22nm but with lots of issues 14nm Intel !DELAYED 11nm should arrive sometimes in the near future
Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir
28
10,000
1000
1978
1980
1984
1986
2000
2002
2004
2006
Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir relative to the Figure 1.1 Growth in processor performance since the mid-1980s. This chart plots performance VAX 11/780 as measured by the SPECint benchmarks (see Section 1.8). Prior to the mid-1980s, processor perfor-
29
Evolution of CMOS
This was the model that run smoothly for past 50 years This is not the case any more After 100nm (sub-micron, ultra deep sub-micron) technology nothing is going to be the same as before ! More then Moore paradigm
! ! !
Inversion of scaling properties Gains are not the same We start even loosing
30
1/2
1E+09 1E+08 1E+07 1E+06 1E+05
Less productivity
M$
40
20
1E+02 1E+01
250 180 130 90 65 45 32
Technologie [nm]
1E+00 Technology [nm] 1981 1985 1989 1993 1997 2001 2005 2009
Consequence ! Technology evolution and design capability do not follow the same path !
Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir
31
Cost examples
disk disk 4x 4x in in 33years years
pykc - 7-Oct-01
ISE1/EE2 Computing
Capacity Capacity 1.4x in 10 years 1.4x in 10 years Logic 2x Logic 2x in in 33years years DRAM 4x Lecture 1 - 17 DRAM 4x in in 33years years disk disk 4x 4x in in 33years years
Year
0 1997
1999
Cost of a processor
IC cost : very complex equation that is in general carefully balanced (in a very simplied form) Performance im
pykc - 7-Oct-01 ISE1/EE2 Computing
Lecture 1 - 17
1.4x pykc - 7-Oct-01 1.4x in in10 10years years 1.4x 1.4x in in10 10years years
2/2
microprocessor
1000
Chip Die In practice:cost constantly increasing ! Chip Die Package Test & Total Packaging Cost: depends on pins, heat dissipation pins type cost Assembly cost 386DX $4 386DX $4 486DX2 $12 486DX2 $12 PowerPC $53 PowerPC601 601 $53 HP $73 HPPA PA7100 7100 $73 DEC $149 DECAlpha Alpha $149 SuperSPARC SuperSPARC $272 $272 Pentium $417 Pentium $417 pins 132 132 168 168 304 304 504 504 431 431 293 293 273 273
IC cost = Die cost + Testing cost + Packaging cost Final test yield Package Test & Total
10
100
pykc - 7-Oct-01
type cost Assembly QFP $4 $9 QFP $1 $1 $4 $9 PGA $11 $12 Chip Die Test PGA $11 $12 $35 Package 1 Chip Die $35 Package Test& & Total Total cost pins QFP $21 cost $77 pins type type cost cost Assembly Assembly QFP $3 $3 $21 $77 386DX $4 132 $4 $9 PGA $35 $16 386DX $4 $124 132 QFP QFP $1 $1 $4 $9 PGA $35 $16 $124 486DX2 $12 168 $12 $35 PGA $30 $23 486DX2 $12 $202 168 PGA PGA $11 $11 $12 0.1 $35 PGA $30 $23 $202 1965 1970 PowerPC 601 $53 304 QFP $3 $21 $77 PGA $20 $34 $326 PowerPC $53 $326 304 QFP $3 $21 $77 PGA $20 601 $34 PGA $19 $37 HP 7100 $73 504 $16 $124 PGA $19 $37 $473 HPPA PA 7100 $73 $473 504 PGA PGA $35 $35 $16 $124 N Microprocessors t DEC $149 431 $23 DECAlpha Alpha $149 431 PGA PGA $30 $30 $23 $202 $202 ISE1/EE2 Computing pykc - 7-Oct-01 Lecture 1 - 19 SuperSPARC 293 $34 SuperSPARC $272 $272 293 PGA PGA $20 $20 $34 $326 $326 Pentium $417 273 $37 Pentium $417 273 PGA PGA $19 $19 $37 $473 $473
pykc - 7-Oct-01 ISE1/EE2 Computing
Lecture 1 - 19
32
Wires must be the centerpiece of any nanometer methodology. Without such a methodology, design tea be able to create massively complex nanometer ICs in a timeframe of relevance.
In nanometer design, wiring delay accounts for the vast majority of overall delay. It is well known that de been shifting from gates to wires for quite some time. As shown in Figure 1, wiring delay exceeds gate d 0.18 micron and below in aluminum processes, and at 0.13 micron and below in copper. By 90 nm, wiring account for some 75% of the overall delay. As a result, design teams need to shift their focus from logic optimization to wire optimization.
Total delay AI, Si02
20 15 10 5 Gate delay 0 0.65 0.5 0.35 0.25 0.18 0.13 0.1 Total delay Cu, low k Interconnect Cu, low k
Consequence ! Optimization should be done at communication level too !!! (NoCs) 2.1 THE CHANGING NATURE OF DELAY
In addition to dominating overall delay, nanometer design exacerbates physical effects that introduce su Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir 33 delay notably signal integrity (SI) and IR (voltage) drop. These effects can be considerable even at 0.18
Power (normalized)
0,01
0,0001
Dynamique
Statique
Technologie
Consequence ! Get 10% savings in dynamic power dissipation is not signicant any more !
Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir
34
End result is :
That the CPU F do not increase anymore, to get more functionality (performance) we increase the parallelism
35
67
2500 Core to bus ratios are increasing at 20% per year 2000
500 Bus rate inc 20% per year 1993 1995 1997 Year 1999 2001 2003
0 1991
Figure 2-7 The memory gap. (Source: Sandpile.org.) Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir
36
37
Performance
Clk is there because CPU is a synchronous logic circuit (circuits with feedback) system state is stored in ip-ops Clk is used to drive all ip-ops in the design (data-ow from ops to ops, so for the combinatory circuits too) Typically one master clock that supply different clock domains
38
Performance
We can count the number of executed instructions Cycles per instruction (CPI) !on average for a given program :
Total number of cycles to execute CPI = Total number of instructions in the program
39
Performance
IPC = 1/CPI !but computed a posteriori (proling) Measures the parallelism if it is > 1 Most of the computers should have this TRUE !!!
40
Performance
Increase Clk ! Increase F (will not hold that long) ! look at IC scaling predictions for the future from node to node:
!"#$%&"'()*" +&"# , ! -.. /01"& /01"&'2"3()$4
;<>>6 ?<? ;<@:6 ;<>:> ;<>>6 ?<;A ;<@6 ;<>5A ;<>>6 ?<;6 ;<@=6 ;<>>5 ;<>>6 ?<;5 ;<@A6 ;<>=?
41
Performance
Increase Clk ! Increase F (will not hold that long) Reduce CPI ! Parallelism: inter et intra CPU (multi, scalar, super-pipeline etc.) Reduce IC ! Algorithm, SIMD, implementation (SW),
42
Solutions?
Improve tech Increase parallelism multi, many core ! multi-processor Better usage at application level After all, all these systems are used badly Lets see this on a concrete example DATA CENTERS!!! (cloud computing)
43
44
in all, thousands of CPUs using considerable power. Did BIG ones (MS, Yahoo, etc.) became GREEN ?
45
Complex circuits,
built to suit any application
(as long as it is not embedded)
46
47
48
49
Learnings
Same CPU: for gaming, scientic computing, grandmas wordprocessing and data center Worked very well in the past (Intel), but ... Doesnt work any more !
! !
50
51
Desktop Computing
!
Servers
!
Embedded Computers
!
53
CPU architectures are stable Instructions set do not change much (although they can be adapted to a particular app) We need to start really using them ! plus system integration
Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir
54
Whatever underlaying tech will be used (even in the far future) some processing devices will always be there
!
Processing device = CPU Architectural concepts of the CPU may vary depending on the technology offering, but lots of fundamental concepts will probably remain the same
!
Have pipelined structure Use reg les and ALUs to compute things Parallelize what ever could be done in parallel & many others
Universit libre de Bruxelles/Facult des Sciences Appliques/PARTS/MILOJEVIC Dragomir
55