Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Page 1
20 hours
of video
uploaded to YouTube
Approximately
9 billion
video files owned are
high-definition
every minute
50 million +
digital media files
added to personal content libraries
every day
1000 images
are uploaded to Facebook
every second
Page 2
System balance
Memory Technologies and System Design Interconnect Design
| The Future Is Heterogeneous Computing | Oct 27, 2010
Page 3
Thermal Design Points (TDPs) in all market segments continue to drop Lightly loaded and idle power characteristics are key parameters in the Operational Expense (OpEx) equation Percent of total world energy consumed by computing devices continues to grow year-on-year
Page 4
When?
Early 1990s
Implication
Memory wait time dominates computing
Industry Solutions
Non-blocking caches O-o-O Machines Larger Caches Cache Hierarchies Elaborate prefetch Huge Caches Multiple Memory Controllers Extreme PHYs Accelerated Parallel Processing Chip Stacking
TBD
Mid 1990s
New & Emerging Abstractions Browser-based Runtimes Image/Video as basic data types Throughput-based designs 2009 and beyond Even larger working sets Larger data types
Page 6
Interconnect Challenges
Coherence domain knowing when to stop
Interesting implications for on-chip interconnect networks
Data centers of tomorrow are going to take great interest in this area
Page 7
Single-thread Performance
Moores Law
Integration (log scale)
o
we are here
Time
we are here
Time
Time
Locality
Single-thread Perf
we are here
Performance
o
we are here
Issue Width
Cache Size
Time
Page 8
1 SW + (1 SW ) / N
Assume 100W TDP Socket 10W for global clocking 20W for on-chip network/caches 15W for I/O (memory, PCIe, etc) This leaves 55W for all the cores 850mW per Core !
0% Serial 0% Serial 10% Serial 100% 35% Serial Serial 100% Serial
Page 9
Single-thread Performance (SpecINT) Frequency (MHz) Typical Power (Watts) Number of Cores
Original data collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond and C. Batten Dotted line extrapolations by C. Moore
Page 10
The use of multiple cores forces each core to actually slow down
At some point, the power limits will not even allow you to activate all of the cores at the same time
Small, low-power cores tend to be very weak on single-threaded general purpose workloads
Customer value proposition will continue to demand excellent performance on general purpose workloads The transition to compelling general purpose parallel workloads will not be a fast one
Page 11
Multi-Core Era
Enabled by: Moores Law Desire for Throughput 20 years of SMP arch Constrained by: Power Parallel SW availability Scalability
Throughput Performance
o
we are here
o
we are here
o
we are here
Time
Time (# of Processors)
Page 13
2005
Dual-Core AMD Opteron
2007
Quad-Core AMD Opteron
2008
45nm QuadCore AMD Opteron
2009
Six-Core AMD Opteron
2010
AMD Opteron 6100 Series
90nm SOI K8
90nm SOI K8
Core 4 L3 CACHE
Core 5
Core 6
12 AMD64 x86 Cores 18 MB on-chip cache 4 Memory Channels @ 1333 MHz 4 HT Links @ 6.4 GT/sec
Page 15
Page 16
Cypress
ATI RADEON
* Peak single-precision performance; For RV670, RV770 & Cypress divide by 5 for peak double-precision performance
HD 5870
ATI RADEON
RV770
V8700 9250 9270
R600
V7600 V8600 V8650
ATI RADEON
RV670
HD 3800 V7700 9170
HD 4800
ATI FirePro AM D FireStream
R580(+)
X19xx
HD 2900
ATI FireGL
R520
X1800 V7200 V7300 V7350
ATI FireStream
ATI FireGL
Unified Shaders
Ap r- 0 7
Ju n08
No v -0 7
Oc t-0 6
Se p
17
De
Ju l-0 9
-06
ar
c-0 8
GPU Efficiency
16 14
14.47
GFLOPS/W
12
10
4.50
6
7.90
GFLOPS/mm 2
2.01 1.07
4.56
0.42
Nov-05 ATI Radeon X1800 XT
0
Jan-06 ATI Radeon X1900 XTX Jun-08 ATI Radeon HD 4870 Oct-09 ATI Radeon HD 5870
18
Gaming
Productivity
Sciences
Government
Engineering
19
OpenCL
OpenGL
AMD GPUs
OpenCL -
AMD CPUs
Other CPUs/GPUs
Cross-platform development Interoperability with OpenGL and DX CPU/GPU backends enable balanced platform approach
20
Heterogeneous Computing:
Next-Generation Software Ecosystem
Increase ease of application development
End-user Applications
High Level Frameworks Advanced Optimizations & Load Balancing Tools: HLL compilers, Debuggers, Profilers
Load balance across CPUs and GPUs; leverage AMD Fusion performance advantages
21
Delivers
advanced performance
22
Lots of conditional data Maps very well to parallelism. Benefits Throughput-oriented from closer coupling data parallel engines between CPU & GPU
| The Future Is Heterogeneous Computing | Oct 27, 2010
i,j=0 i++ j++ load x(i,j) fmul store cmp j (100000) bc cmp i (100000) bc
Microprocessor Advancement
CPU
Single-Core Era
Multi-Core Era
Programmability
System-level programmable
Throughput Performance
GPU
24
GPU Advancement
25
DISCLA IMER The inf ormation presented in this document is f or inf ormat ional purposes only and may cont ain t echnical inaccuracie s, omissions and typographical errors. The inf ormation contained herein is subject to change and may be rendered inaccurat e f or many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product dif f erences between dif f ering manuf act urers, sof t ware changes, BIOS f lashes, f irmware upgrades, or t he like. A MD assumes no obligat ion t o updat e or ot herwise correct or revise this inf ormation. However, A MD reserves t he right to revise this inf ormation and to make changes f rom time t o time t o the cont ent hereof without obligat ion of A MD to not if y any person of such revisions or changes. A MD MA KES NO REPRESENTA TIONS OR WA RRA NTIES WITH RESPECT TO THE CONTENTS HEREOF A ND A SSUMES NO RESPONSIBILITY FOR A NY INA CCURACIES, ERRORS OR OMISSIONS THA T MA Y A PPEA R IN THIS INFORMA TION. A MD SPECIFICA LLY DISCLA IMS A NY IMPLIED WA RRA NTIES OF MERCHA NTA BILITY OR FITNESS FOR A NY PA RTICULA R PURPOSE. IN NO EVENT WILL A MD BE LIA BLE TO A NY PERSON FOR A NY DIRECT, INDIRECT, SPECIA L OR OTHER CONSEQUENTIA L DA MA GES A RISING FROM THE USE OF A NY INFORMA TION CONTA INED HEREIN, EVEN IF A MD IS EXPRESSLY A DVISED OF THE POSSIBILITY OF SUCH DA MA GES. T his presentation c ontains forward- looking s tatements c oncerning AMD and tec hnology partner produc t offerings whic h are made purs uant to the s afe harbor provis ions of the P rivate Sec urities L itigation Reform A ct of 1 9 95. Forward- looking s tatements are c ommonly identified by words s uc h as "would," "may," "expects," "believes," "plans," "intends," s trategy, roadmaps , "projects" and other terms with s imilar meaning. I nvestors are c autioned that the forward- looking s tatements in this presentation are bas ed on c urrent beliefs , as sumptions and expectations, s peak only as of the date of this pres entation and involve risks and unc ertainties that c ould c ause ac tual results to differ materially from c urrent expectations. A T TRIBUTIO N 2 0 1 0 Advanced M icro D evices, I nc. A ll rights reserved. A MD , the A MD A rrow logo, A M D O pteron, A TI, the A TI logo, Radeon and c ombinations thereof are trademarks of A dvanced M ic ro D evices, I nc. M icrosoft, Windows , and Windows V ista are registered trademarks of M icrosoft Corporation in the U nited States and/or other juris dictions. O penCL is trademark of A pple I nc. us ed under license to the Khronos G roup I nc. O ther names are for informational purposes only and may be trademarks of their res pective owners .
26