Documentos de Académico
Documentos de Profesional
Documentos de Cultura
37th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group
September 9, 2008
Outline
8:00 9:00 Introduction to HDF5 data, programming models and tools 9:00 9:30 Advanced features 10:00 12:00 Introduction to Parallel HDF5 13:15 14:15 Caching and buffering in HDF5 14:45 16:45 New features in HDF5 1.8.0
September 9, 2008
September 9, 2008
What is HDF?
September 9, 2008
HDF is
A file format for managing any kind of data Software system to manage data in the format Designed for high volume or complex data Designed for every size and type of system Open format and software library, tools There are two HDFs: HDF4 and HDF5 Today we focus on HDF5
September 9, 2008
September 9, 2008
September 9, 2008
Datasets
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 8
HDF5 model
Groups provide structure among objects Datasets where the primary data goes
Data arrays Rich set of datatype options Flexible, efficient storage and I/O
September 9, 2008
September 9, 2008
10
HDF5 Software
September 9, 2008
11
HDF5 Philosophy
September 9, 2008
13
September 9, 2008
14
Who uses HDF5? Applications that deal with big or complex data Over 200 different types of apps 2+million product users world-wide Academia, government agencies, industry
September 9, 2008
15
September 9, 2008
16
NASA EOS remote sense data HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission. Petabytes of data stored in HDF and HDF5 to support the Global Climate Change Research Program.
September 9, 2008
17
Large simulations
A simulation can have billions of elements Each element can have dozens of associated values
September 9, 2008
18
Large images
Electron tomography
September 9, 2008
19
September 9, 2008
20
10
Data complexity
Thanks to Mark Miller, LLNL September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 21
Contig Qualities
Coverage Depth
Reads
Aligned bases
Percent match
September 9, 2008
22
11
September 9, 2008
23
September 9, 2008
24
12
September 9, 2008
25
HDF5 Dataset
Metadata
Dataspace
3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7
Data
IEEE 32-bit float Time = 32.4 Chunked Compressed Pressure = 987 Temp = 56
September 9, 2008
26
13
HDF5 Dataspace
Two roles
Dataspace contains spatial info about a dataset stored in a file
Rank and dimensions Permanent part of dataset definition
Dataspace describes applications data buffer and data elements participating in I/O
September 9, 2008
27
HDF5 Datatype
Datatype how to interpret a data element
Permanent part of the dataset definition Two classes: atomic and compound Can be stored in a file as an HDF5 object (HDF5 committed datatype) Can be shared among different datasets
September 9, 2008
28
14
HDF5 Datatype
HDF5 atomic types include normal integer & float user-definable (e.g., 13-bit integer) variable length types (e.g., strings) references to objects/dataset regions enumeration - names mapped to integers array HDF5 compound types Comparable to C structs (records) Members can be atomic or compound types
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 29
int8
int4
int16
September 9, 2008
30
15
extendable
external
Dataset Fred
File A
Metadata for Fred
September 9, 2008
HDF5 Attribute
Attribute data of the form name = value, attached to an object by application Operations similar to dataset operations, but Not extendible No compression or partial I/O Can be overwritten, deleted, added during the life of a dataset
September 9, 2008
32
16
HDF5 Group
A mechanism for organizing collections of related objects Every file starts with a root group Similar to UNIX directories Can have attributes
/
September 9, 2008
33
/
bar
temp
September 9, 2008
34
17
B P
September 9, 2008
35
September 9, 2008
36
18
September 9, 2008
37
September 9, 2008
38
19
September 9, 2008
39
Solid modeling
September 9, 2008
40
20
Solid modeling
September 9, 2008
41
HDF5mesh
September 9, 2008
42
21
43
HDF5 Software
September 9, 2008
44
22
September 9, 2008
45
September 9, 2008
46
23
September 9, 2008
47
Partial I/O
Move just part of a dataset
(b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 48
24
Partial I/O
Move just part of a dataset
node
node
node
node
I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers
September 9, 2008
50
25
September 9, 2008
51
Storage
Memory
Network
September 9, 2008
52
26
UDM
LANL
SAF
LLNL, SNL
H5Part
Grids
IDL
COTS
HDF-EOS
NASA
Storage
HDF5 format
File
September 9, 2008
http://www.hdfgroup.org/HDF5/release/index.html
QA
Daily regression tests on key platforms Meets NASAs highest technology readiness level
September 9, 2008
54
27
Other Software
The HDF Group
HDFView Java tools Command-line utilities Web browser plug-in Regression and performance testing software Parallel h5diff
3rd Party (IDL, MATLAB, Mathematica, PyTables, HDF Explorer, LabView) Communities (EOS, ASC, CGNS) Integration with other software (iRODS, OPeNDAP)
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 55
September 9, 2008
56
28
Storm
4x6 array of integers
September 9, 2008
57
Demo
Demonstrate the use of HDFView to create the HDF5 file Use h5dump to see the contents of the HDF5 file Use h5import to add data to the HDF5 file Use h5repack to change properties of the stored objects Use h5diff to compare two files
September 9, 2008
58
29
September 9, 2008
59
September 9, 2008
60
30
September 9, 2008
62
31
Library functions are categorized by object type H5Lite API supports basic capabilities
September 9, 2008
63
Example APIs:
H5D : Dataset interface e.g., H5Dread H5F : File interface e.g., H5Fopen H5S : dataSpace interface e.g., H5Sclose
September 9, 2008
64
32
September 9, 2008
65
September 9, 2008
66
33
Object is opened or created Object is accessed, possibly many times Object is closed
September 9, 2008
67
Order of Operations
An order is imposed on operations by argument dependencies For Example: A file must be opened before a dataset -becausethe dataset open call requires a file handle as an argument. Objects can be closed in any order.
September 9, 2008
68
34
September 9, 2008
70
35
September 9, 2008
71
September 9, 2008
72
36
2. Create property lists, if necessary 3. Create the file 4. Close the file and the property lists, as needed
September 9, 2008
73
H5F_ACC_TRUNC flag removes existing file H5P_DEFAULT flags create regular UNIX file and access it with HDF5 SEC2 I/O file driver
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 74
37
September 9, 2008
75
Dataset Components
Metadata
Dataspace
3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7
Data
IEEE 32-bit float Time = 32.4 Chunked Compressed Pressure = 987 Temp = 56
September 9, 2008
76
38
September 9, 2008
77
2. Decide where to put it root group 3. Decide link or path A 4. Create link and dataset in file 5. (Eventually) Close everything
/ (root)
A
September 9, 2008
78
39
Create a dataspace
current dims dims[0] = 4; rank dims[1] = 6; dataspace_id = H5Screate_simple (2, dims, NULL); pathname datatype
dataset_id = H5Dcreate(file_id,A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT);
79
file.h5
September 9, 2008
80
40
September 9, 2008
81
September 9, 2008
82
41
HDF5 Information
HDF Information Center
http://www.hdfgroup.org
September 9, 2008
83
Questions?
September 9, 2008
84
42