Está en la página 1de 25

Python for GPUs

Bryan Catanzaro, NVIDIA Research



Some slides from Mark Harris (NVIDIA)
and Andreas Klckner (NYU)
2013 NVIDIA Corporation
Rapid
Development
Powerful
Libraries
Commercial
Support
Large
Community
2013 NVIDIA Corporation
Is Python Fast Enough?
Python apps often implement
performance critical functions in C/C++.
2013 NVIDIA Corporation
Three Python projects
PyCUDA/PyOpenCL (Andreas Klckner)
Bindings for GPU runtimes
Intended to be used with Runtime Code Generation

NumbaPro (Continuum Analytics)
Write CUDA code in Python
GPU bindings
Copperhead (Bryan Catanzaro)
A data parallel Python dialect
Runtime compiled to GPUs and CPUs
2013 NVIDIA Corporation
PyCUDA: Programming Approaches
Intro Py{OpenCL,CUDA} Code Gen. Loo.py Conclusions
Programming Approaches
Decisions that determine your
approach to throughput computing:
AOT vs JIT
Meta vs not
In-language vs Hybrid
If hybrid, why not use a scripting language?
Andreas Klockner GPU Programming in Python
2013 NVIDIA Corporation
PyCUDA: Why do scripting?
Intro Py{OpenCL,CUDA} Code Gen. Loo.py Conclusions
Why do Scripting for GPUs?
GPUs are everything that scripting
languages are not.
Highly parallel
Very architecture-sensitive
Built for maximum FP/memory
throughput
complement each other
CPU: largely restricted to control
tasks (1000/sec)
Scripting fast enough
Python + OpenCL = PyOpenCL
Python + CUDA = PyCUDA
Andreas Klockner GPU Programming in Python
2013 NVIDIA Corporation
Dive into PyCUDA
!"#$%& #()*+,-,*&$!.!&
!"#$%& #()*+,-+%!/0% ,1 +%/
!"#$%& .*"#(

2%$" #()*+,-)$"#!30% !"#$%& 4$*%)05$+*30
"$+ 6 4$*%)05$+*307888
99:3$;,399 /$!+ "*3&!#3(9&<0"723$,& =+01&> 23$,& =,> 23$,& =;?
@
)$.1& !.& ! 6 &<%0,+A+B-BC
+01&D!E 6 ,D!E = ;D!EC
F
888?

CUDA Code
2013 NVIDIA Corporation
Dive into PyCUDA, cont.
"*3&!#3(9&<0" 6 "$+-:0&92*.)&!$.78"*3&!#3(9&<0"8?

, 6 .*"#(-%,.+$"-%,.+.7GHH?-,1&(#07.*"#(-23$,&IJ?
; 6 .*"#(-%,.+$"-%,.+.7GHH?-,1&(#07.*"#(-23$,&IJ?

+01& 6 .*"#(-K0%$193!L07,?
"*3&!#3(9&<0"7
+%/-M*&7+01&?> +%/-A.7,?> +%/-A.7;?>
;3$)L67GHH>N>N?> :%!+67N>N??

#%!.& +01&O,=;
numpy
interop
kernel
launch
2013 NVIDIA Corporation
PyCUDA/PyOpenCL Philosophy
Intro Py{OpenCL,CUDA} Code Gen. Loo.py Conclusions Hello World About In the box PyCUDA
PyOpenCL Philosophy
Provide complete access
Automatically manage resources
Provide abstractions
Allow interactive use
Check for and report errors
automatically
Integrate tightly with numpy
Andreas Klockner GPU Programming in Python
2013 NVIDIA Corporation
PyCUDA/PyOpenCL: Completeness
PyCUDA exposes all of the CUDA driver API
For example:
Streams/events
Surfaces/textures
Peer to peer access, pinned memory
Profiling,
PyOpenCL exposes all of OpenCL
2013 NVIDIA Corporation
Workflow
Intro Py{OpenCL,CUDA} Code Gen. Loo.py Conclusions Hello World About In the box PyCUDA
PyOpenCL, PyCUDA: Workow
Edit
PyOpenCL/PyCUDA
Run
Program("...")
Cache?
Compiler
no
Binary
Upload to GPU
Run on GPU
Andreas Klockner GPU Programming in Python
2013 NVIDIA Corporation
Metaprogramming
Intro Py{OpenCL,CUDA} Code Gen. Loo.py Conclusions How?
Metaprogramming
Idea
Python Code
GPU Code
GPU Compiler
GPU Binary
GPU
Result
Machine
Human
In GPU scripting,
GPU code does
not need to be
a compile-time
constant.
(Key: Code is datait wants to be
reasoned about at run time)
Good for code
generation
P
yC
U
D
A PyOpenCL
Andreas Klockner GPU Programming in Python
2013 NVIDIA Corporation
How to metaprogram in PyCUDA/
PyOpenCL
Intro Py{OpenCL,CUDA} Code Gen. Loo.py Conclusions How?
PyOpenCL: Support for Metaprogramming
Three (main) ways of generating code:
Simple %-operator substitution
Combine with C preprocessor: simple, often sucient
Use a templating engine (Mako works very well)
codepy:
Build C syntax trees from Python
Generates readable, indented C
Many ways of evaluating codemost important one:
Exact device timing via events
Andreas Klockner GPU Programming in Python
2013 NVIDIA Corporation
Other nice things
Elementwise functions very similar to numpy ufuncs
reductions, scans
gpuarray with overloaded arithmetic operators
random number generators
2013 NVIDIA Corporation
PyCUDA/PyOpenCL information
Intro Py{OpenCL,CUDA} Code Gen. Loo.py Conclusions Hello World About In the box PyCUDA
PyOpenCL, PyCUDA: Vital Information
http://mathema.tician.de/
software/pyopencl (or /pycuda)
Downloads:
Direct: PyOpenCL 60k, PyCUDA 30k
Binaries: Win, Debian, Arch, Fedora,
Gentoo, . . .
MIT License
Compiler Cache, RAII, Error checking
Require: numpy, Python 2.4+
(Win/OS X/Linux)
Community: mailing list, wiki, add-on
packages (PyFFT, scikits.cuda, Sailsh,
PyWENO, Copperhead. . . )
Andreas Klockner GPU Programming in Python
2013 NVIDIA Corporation
NumbaPro from Continuum
Anaconda Accelerate from Continuum Analytics
NumbaPro array-oriented compiler for Python & NumPy
Compile Python for GPUs or CPUs
Automatically compile Python functions on NumPy arrays
Or write CUDA Python kernels for maximum performance
Fast Development + Fast Execution: Ideal Combination
http://continuum.io
Free Academic
License
2013 NVIDIA Corporation
1024
2
Mandelbrot Time Speedup v. Pure Python
Pure Python 4.85s --
NumbaPro (CPU) 0.11s 44x
CUDA Python (K20) .004s 1221x
@cuda.jit(restype=uint32, argtypes=[f8, f8, uint32], device=True)
def mandel(x, y, max_iters):
c = complex(x, y)
z = 0.0j
for i in range(max_iters):
z = z*z + c
if (z.real*z.real + z.imag*z.imag) >= 4:
return i
return max_iters

@cuda.jit(argtypes=[uint8[:,:], f8, f8, f8, f8, uint32])
def mandel_kernel(img, xmin, xmax ymin, ymax, iters):
x, y = cuda.grid(2)
if x < img.shape[0] and y < img.shape[1]:
img[y, x] = mandel(min_x+x*((max_x-min_x)/img.shape[0]),
min_y+y*((max_y-min_y)/img.shape[1]), iters)

gimage = np.zeros((1024, 1024), dtype = np.uint8)
d_image = cuda.to_device(gimage)
mandel_kernel[(32,32), (32,32)](d_image, -2.0, 1.0, -1.0, 1.0, 20)
d_image.to_host()
CUDA Python
CUDA Programming,
Python Syntax
2013 NVIDIA Corporation
Copperhead
Goal: Efficiency and
Productivity

Note: Copperhead is
a research project,
not a product.
Copperhead
Python
Data
Parallelism
Need for
productivity
Copperhead code is just Python code.
No C-isms, no annotations.
http://copperhead.github.io
2013 NVIDIA Corporation
Hello world of data parallelism
Consider this intrinsically parallel procedure
!"# %&'()%* &* (+,
-"./-0 1%')2%13!% &4*(4, %5&4 6 (4* &* (+
or for the lambda averse
!"# %&'()%* &* (+,
-"./-0 7%5&4 6 (4 #8- &4*(4 40 94')&*(+:
This procedure is both
completely valid Python code
compilable to data parallel substrates (CUDA, OpenCL,
OpenMP+AVX intrinsics, etc.)
2013 NVIDIA Corporation
Support for Heterogeneity
Programmer specifies execution place




Currently support:






P!&< #3,)01-:#*HQ
:#*9%01*3& 6 ,B#(7---?
P!&< #3,)01-$#0."#Q
)#*9%01*3& 6 ,B#(7---?
CUDA
OpenMP
TBB
Sequential C++
2013 NVIDIA Corporation
Runtime Data Management
The Copperhead runtime manages all data
Data lazily transferred to and from memory
spaces






Memory is garbage collected via Pythons
garbage collector
Data interoperates with .*"#(, ",&#3$&3!;, etc.
,
a b c d
d
, 6 R
; 6 2$$7,?
) 6 2$$7;?
+ 6 2$$7)?
#%!.&7+?
CPU
GPU
2013 NVIDIA Corporation
Runtime code generation
Runtime code generation
Copperhead compiler produces C++ code
C++ code is compiled to a dynamic library using )$+0#(
Compilation artifacts persistently stored in 99#(),)<099
Runtime overhead: ~10-100 !sec (from Python, per fn call)
0
2
4
6
8
10
12
Minimal Black Scholes
S
e
c
o
n
d
s

Compile Time
0.00E+00
2.00E-05
4.00E-05
6.00E-05
8.00E-05
1.00E-04
1.20E-04
1.40E-04
Minimal Black Scholes
S
e
c
o
n
d
s

Execution Overhead
;66
1%<"
2013 NVIDIA Corporation
Some results (GTX480)
Solving Laplaces
equation (from Travis
Oliphants blog)





0.1
1
10
100
1000
S
e
c
o
n
d
s

0.001
0.01
0.1
1
Pure Python Numpy Copperhead
S
e
c
o
n
d
s

Sorting array of 1M
float32 elements




2013 NVIDIA Corporation
Conclusion
Increasing options for Python on GPUs:
PyCUDA/PyOpenCL (Andreas Klckner)
Bindings for GPU runtimes

NumbaPro (Continuum Analytics)
Write CUDA code in Python

Copperhead (Bryan Catanzaro)
A data parallel Python dialect

2013 NVIDIA Corporation
Questions?
Bryan Catanzaro
bcatanzaro@nvidia.com

http://research.nvidia.com

Bryan Catanzaro

También podría gustarte