Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Introduction
Ray tracing Ray tracing algorithms Ray traversal hardware pipeline Streaming processors GPGPU
Performance degradation of 1.5X-2.5X
Roll No:7 Mtech CSIS FISAT January 11 2
Introduction
2 stage traversal process
1. Hardware implementation 2. User defined algorithm
Introduction
Performance Simulator created
streaming processor architecture Kd tree as software traversal algorithm
Previous Work
Accelerated Data Structures
Hierarchical Space Subdivision Schemes Bounding Volume Hierarchies GPU implementations Vector operations Large programmable multi-core architectures Graphics computations in parallel Multiple threads on each processor Software kernels Roll No:7 Mtech CSIS FISAT January 11 Vector operations and vectorized processors
Graphics Hardware
Grid Concepts
Hierarchical Bounding Volume
Grid Concepts
Spatial Subdivisions
Ray projection from original GrUG grouping in A to next GrUG grouping in B. To compute the next point along the ray for the hash function, the ray is projected by the tmin value.
10
KD-Tree
tmin B Y C A A tmax B C D D Y X Z X Z
11
KD-Tree Traversal
X B Y C A A B C D D Y Z X Z
12
Observation
X B Y C A A B C D D Y Z X Z
Current leaf s tmax Mtech= Next leaf s tmin Roll No:7 CSIS FISAT
January 11
13
Overview of GrUG
2 spatial seperation methods
Uniform Grid GrUG groups
Hash function starting with X,Y,Z coordinates and outputting the memory address of a GrUG grouping that can be passed to a software traversal algorithm.
15
16
17
18
Starts at GrUG groupings Kd tree is used Uniform grid structure Only leaf nodes need to be present in memory
Roll No:7 Mtech CSIS FISAT January 11
19
Pipeline Architecture
Standalone processing block inside processor Fixed Hardware
Memory address registers Ray Projection Ray undergoes GrUG traversal Read bounding box of the GrUG groups tmax value is computed
20
Pipeline architecture
Rays per clock cycle Pipeline stages can be vectorized Ideal for streaming processors
21
Integration of the GrUG pipeline into a multi-core graphics processor and the fixed hardware stages for the GrUG pipeline.
22
Hash Function
Determine grid cell of a ray Grid cell id to memory address Locate root node for software traversal Input: Ray location (x,y,z) Output: 9 bit value from each hash function pipeline Maximum grid size support 512 X 512 X 512 Floating point values from -1.0 to 1.0
Roll No:7 Mtech CSIS FISAT January 11 23
Architecture of GrUG hash function for one axis using a 512 grid
24
Implementation
Simulator
GPGPU SIM simulator PTX assembly files generated-NVIDIA NVCC compiler PTX assembly code modification
25
Implementation
Kernel Code
Ray generation Post GrUG traversal operation
Read selected GrUG grouping bounding box Compute ray s tmax value
Kd tree algorithm
Radius CUDA
Kernel Code
27
Benchmark Scenes
8 scenes Resolution 512 X 512
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Results a) Performance
Box Bunny Robots Kitchen
12.9
Roll over brute-force Relative speedup No:7 Mtech CSIS FISAT intersection. January 11
48
Performance Results
Reduced the number o f tree traversal steps by 32.5x for visible rays. Overall Speedup : Average 1.6X for visible rays Performance for grid size of 128 is improved over software implementation by 1.9X compared to 2.15X for a grid size of 512. Conference benchmark scene at resolution 128
Roll No:7 Mtech CSIS FISAT January 11
49
Results
b) Memory
50
Memory Requirements
Overhead of storing hash table in memory 4 bytes / grid cell -> 4,294,967,296 GrUG groups
512 MB hash table
Smaller grid size -> upto 4MB hash table 128 grid size -> 1.5 times memory of kd tree 512 grid size -> 27.6 times memory of kd tree
Roll No:7 Mtech CSIS FISAT January 11 51
Memory Requirements
Smaller grid sizes are more efficient
Balance between performance and memory
Stores kd tree structure bounding dimensions of threshold nodes Similar memory requirement for storing a full kd tree.
52
Results
c) Bandwidth
January 11
53
Bandwidth requirements
Average memory bandwidth per frame is smaller Less down tree traversals -> less device memory transactions Bandwidth is used for post GrUG software traversal GrUG Memory bandwidth + down tree traversal < down traversals by full software implementation
Roll No:7 Mtech CSIS FISAT January 11 54
Advantages
Maintains user programmability Increases ray tracing performance Diverse implementation scope
55
Conclusion
New graphics hardware architecture Small fixed hardware pipeline Offload part of the acceleration traversal computations Diverse implementation scope of processor architecture User programmability Overall run time performance
Roll No:7 Mtech CSIS FISAT January 11 56
Future Work
57
References
[1] Algorithm for 3D digital differential algorithm CG351-551 Raytracing Algorithm for 3DDDA.htm [2] Introduction to GRIDS flipcode - Raytracing Topics & Techniques.mht [3] KD-Tree Acceleration Structures for a GPU Raytracer. Tim Foley, Jeremy Sugerman Stanford University [4] Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure Michael Steffen and Joseph Zambreno , Department of Electrical and Computer Engineering Iowa State University, USA. [5] Analyzing CUDA Workloads Using a Detailed GPU Simulator Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong and Tor M. Aamodt University of British Columbia,Vancouver, BC, Canada, {bakhoda,gyuan,wwlfung,henryw,aamodt}@ece.ubc.ca [6] Ray Tracing on a GPU with CUDA Comparative Study of Three Algorithms Martin Zlatu ka Czech Technical University in Prague,Faculty of Electrical Engineering Czech Republic,zlatum1{@}fel.cvut.cz [7] Wikepedia, Ray Tracing basics.
58
Thank you
59