Está en la página 1de 42

HDF5 Tutorial

37th SPEEDUP Workshop on HPC Albert Cheng, Elena Pourmal The HDF Group

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

Outline
8:00 9:00 Introduction to HDF5 data, programming models and tools 9:00 9:30 Advanced features 10:00 12:00 Introduction to Parallel HDF5 13:15 14:15 Caching and buffering in HDF5 14:45 16:45 New features in HDF5 1.8.0

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

Introduction to HDF5 Data, Programming Models and Tools

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

What is HDF?

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

HDF is
A file format for managing any kind of data Software system to manage data in the format Designed for high volume or complex data Designed for every size and type of system Open format and software library, tools There are two HDFs: HDF4 and HDF5 Today we focus on HDF5

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

HDF5 The Format

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

An HDF5 file is a container

lat|lon|temp || 12|23|3.1 15|24|4.2 17|21|3.6

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

Structures to organize objects


Groups

Datasets
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 8

HDF5 model
Groups provide structure among objects Datasets where the primary data goes
Data arrays Rich set of datatype options Flexible, efficient storage and I/O

Attributes, for metadata

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

HDF5 The Software

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

10

HDF5 Software

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

11

Users of HDF5 Software


Mostdataconsumersarehere. ScienHc/engineeringapplicaHons. Domainspeciclibraries/API,tools. ApplicaHons,toolsusethisAPIto create,read,write,query,etc. Powerusers(consumers) ModulestoadaptI/Otospecic featuresofsystem,ordoI/Oin somespecialway.

HDF5Application ProgrammingInterface Virtuallelayer(VFL)

Filecouldbeonparallelsystem, inmemory,network,collecHonof les,etc.


September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 12

HDF5 Philosophy

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

13

Who uses HDF5?

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

14

Who uses HDF5? Applications that deal with big or complex data Over 200 different types of apps 2+million product users world-wide Academia, government agencies, industry

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

15

Applications with large amounts of data

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

16

NASA EOS remote sense data HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission. Petabytes of data stored in HDF and HDF5 to support the Global Climate Change Research Program.

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

17

Large simulations
A simulation can have billions of elements Each element can have dozens of associated values

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

18

Large images
Electron tomography

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

19

It is not just about size

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

20

10

Data complexity

Thanks to Mark Miller, LLNL September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 21

Complex relationships within data


Contig Summaries
Discrepancies

Contig Qualities

Coverage Depth

Reads

Aligned bases

Percent match

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

22

11

Different views of data


Flight test

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

23

HDF5 Data Model

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

24

12

HDF5 model (recap)


Groups provide structure among objects Datasets where the primary data goes
Data arrays Rich set of datatype options Flexible, efficient storage and I/O

Attributes, for metadata Other objects


Links (point to data in a file or in another HDF5 file) Datatypes (can be stored for complex structures and reused by multiple datatsets)

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

25

HDF5 Dataset
Metadata
Dataspace
3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7

Data

IEEE 32-bit float Time = 32.4 Chunked Compressed Pressure = 987 Temp = 56

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

26

13

HDF5 Dataspace
Two roles
Dataspace contains spatial info about a dataset stored in a file
Rank and dimensions Permanent part of dataset definition

Dataspace describes applications data buffer and data elements participating in I/O

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

27

HDF5 Datatype
Datatype how to interpret a data element
Permanent part of the dataset definition Two classes: atomic and compound Can be stored in a file as an HDF5 object (HDF5 committed datatype) Can be shared among different datasets

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

28

14

HDF5 Datatype
HDF5 atomic types include normal integer & float user-definable (e.g., 13-bit integer) variable length types (e.g., strings) references to objects/dataset regions enumeration - names mapped to integers array HDF5 compound types Comparable to C structs (records) Members can be atomic or compound types
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 29

HDF5 dataset: array of records


3

int8

int4

int16

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

30

15

Special storage options for dataset


chunked compressed
Better subsetting access time; extendable Improves storage efficiency, transmission speed Arrays can be extended in any direction File B

extendable

external

Dataset Fred

File A
Metadata for Fred

Metadata in HDF5 file, raw data in a binary file

Data for Fred


31

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

HDF5 Attribute
Attribute data of the form name = value, attached to an object by application Operations similar to dataset operations, but Not extendible No compression or partial I/O Can be overwritten, deleted, added during the life of a dataset

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

32

16

HDF5 Group
A mechanism for organizing collections of related objects Every file starts with a root group Similar to UNIX directories Can have attributes
/

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

33

Path to HDF5 object in a file


Y / (root) /X /Y /Y/temp /Y/bar/temp temp

/
bar

temp

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

34

17

Shared HDF5 objects


A R
/

B P

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

35

HDF5 Data Model Example


ENSIGHT Automotive crash simulation

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

36

18

Automotive crash simulation

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

37

Automotive crash simulation

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

38

19

Automotive crash simulation

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

39

Solid modeling

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

40

20

Solid modeling

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

41

HDF5mesh

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

42

21

Mesh Example, in HDFView

April 28, 2008 September 9, 2008

SPEEDUP Workshop LCI Tutorial - HDF5 Tutorial

43

HDF5 Software

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

44

22

HDF5 software stack

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

45

Structure of HDF5 Library

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

46

23

Write from memory to disk

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

47

Partial I/O
Move just part of a dataset

(a) Hyperslab from a 2D array to the corner of a smaller 2D array

(b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 48

24

Partial I/O
Move just part of a dataset

(c) A sequence of points from a 2D array to a sequence of points in a 3D array.

(d) Union of hyperslabs in file to union of hyperslabs in memory.


September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 49

Layers parallel example


Application
Parallel computing system (Linux cluster)
Compute Compute Compute Compute

node

node

node

node

I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers

I/O ows through many layers from application to disk.

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

50

25

Virtual I/O layer

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

51

Virtual file I/O layer


A public API for writing I/O drivers Allows HDF5 to interface to disk, the network, memory, or a user-defined device

Virtual le I/O drivers


Stdio File Family MPI I/O Memory Network

Storage

Memory

Network

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

52

26

Applications & Domains


Simulation, visualization, remote sensing
Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models

Common domain-specific data models


Domain-specific APIs

UDM
LANL

SAF
LLNL, SNL

H5Part
Grids

IDL
COTS

HDF-EOS
NASA

Storage

HDF5 format
File
September 9, 2008

Split metadata File on parallel and raw data les le system


SPEEDUP Workshop - HDF5 Tutorial

Across the network User-dened or to/from another device application or library


53

Portability & Robustness


Runs almost anywhere
Linux and UNIX workstations Windows, Mac OS X Big ASC machines, Crays, VMS systems TeraGrid and other clusters Source and binaries available from

http://www.hdfgroup.org/HDF5/release/index.html

QA
Daily regression tests on key platforms Meets NASAs highest technology readiness level

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

54

27

Other Software
The HDF Group
HDFView Java tools Command-line utilities Web browser plug-in Regression and performance testing software Parallel h5diff

3rd Party (IDL, MATLAB, Mathematica, PyTables, HDF Explorer, LabView) Communities (EOS, ASC, CGNS) Integration with other software (iRODS, OPeNDAP)
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 55

Creating an HDF5 File with HDFView

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

56

28

Example: Create this HDF5 File


/ (root)
A B

Storm
4x6 array of integers

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

57

Demo
Demonstrate the use of HDFView to create the HDF5 file Use h5dump to see the contents of the HDF5 file Use h5import to add data to the HDF5 file Use h5repack to change properties of the stored objects Use h5diff to compare two files

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

58

29

Introduction to HDF5 Programming Model and APIs

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

59

Structure of HDF5 Library (recap)

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

60

30

Goals of HDF5 Library


Provide flexible API to support a wide range of operations on data. Support high performance access in serial and parallel computing environments. Be compatible with common data models and programming languages. Because of these goals, the HDF5 API is rich and large
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 61

Operations Supported by the API


Create groups, datasets, attributes, linkages Create complex data types Assign storage and I/O properties to objects Perform complex subsetting during read/write Use variety of I/O devices (parallel, remote, etc.) Transform data during I/O Query about file and structure and properties Query about object structure, content, properties

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

62

31

Characteristics of the HDF5 API


For flexibility, the API is extensive
300+ functions
Victronix Swiss Army Cybertool 34

This can be daunting but there is hope


A few functions can do a lot Start simple Build up knowledge as more features are needed

Library functions are categorized by object type H5Lite API supports basic capabilities

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

63

The General HDF5 API


Currently C, Fortran 90, Java, and C++ bindings. C routines begin with prefix H5?
? is a character corresponding to the type of object the function acts on

Example APIs:
H5D : Dataset interface e.g., H5Dread H5F : File interface e.g., H5Fopen H5S : dataSpace interface e.g., H5Sclose

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

64

32

Compiling HDF5 Applications


h5cc HDF5 C compiler command Similar to mpicc h5fc HDF5 F90 compiler command Similar to mpif90 h5c++ HDF5 C++ compiler command To compile:
% h5cc h5prog.c % h5fc h5prog.f90

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

65

Compile option: -show


-show: displays the compiler commands and options without executing them % h5cc show Sample_c.c
gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -c Sample_c.c gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o -L/home/packages/hdf5_1.6.6/Linux_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/ lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

66

33

General Programming Paradigm


Properties of object are optionally defined
Creation properties Access property lists Default values used if none are defined

Object is opened or created Object is accessed, possibly many times Object is closed

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

67

Order of Operations
An order is imposed on operations by argument dependencies For Example: A file must be opened before a dataset -becausethe dataset open call requires a file handle as an argument. Objects can be closed in any order.

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

68

34

HDF5 Defined Types


For portability, the HDF5 library has its own defined types:
hid_t: hsize_t: hssize_t: herr_t: hvl_t: object identifiers (native integer) size used for dimensions (unsigned long or unsigned long long) for specifying coordinates and sometimes for dimensions (signed long or signed long long) function return value variable length datatype

For C, include hdf5.h in your HDF5 application.


September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 69

Example: Create this HDF5 File


/ (root)
A B

4x6 array of integers

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

70

35

Example: Step by Step


/ (root)
B

4x6 array of integers

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

71

Example: Create a File


/ (root)

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

72

36

Steps to Create a File


1. Decide any special properties the file should have
Creation properties, like size of user block Access properties, such as metadata cache size

2. Create property lists, if necessary 3. Create the file 4. Close the file and the property lists, as needed

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

73

Code: Create a File


hid_t file_id; file_id = H5Fcreate("file.h5",H5F_ACC_TRUNC, H5P_DEFAULT,H5P_DEFAULT);

H5F_ACC_TRUNC flag removes existing file H5P_DEFAULT flags create regular UNIX file and access it with HDF5 SEC2 I/O file driver
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 74

37

Example: Add a Dataset


/ (root)
A

4x6 array of integers

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

75

Dataset Components
Metadata
Dataspace
3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7

Data

IEEE 32-bit float Time = 32.4 Chunked Compressed Pressure = 987 Temp = 56

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

76

38

Dataset Creation Property List


Dataset creation property list: information on how to store data in a file

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

77

Steps to Create a Dataset


1. Define dataset characteristics
Dataspace 4x6 Datatype integer Properties (if needed) Obtain location identifier

2. Decide where to put it root group 3. Decide link or path A 4. Create link and dataset in file 5. (Eventually) Close everything

/ (root)
A

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

78

39

Code: Create a Dataset


1 2 3 4 5 6 7 8 hid_t hsize_t herr_t file_id, dataset_id, dataspace_id; dims[2]; status;

Create a dataspace

file_id = H5Fcreate (file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

current dims dims[0] = 4; rank dims[1] = 6; dataspace_id = H5Screate_simple (2, dims, NULL); pathname datatype
dataset_id = H5Dcreate(file_id,A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT);

Create a dataset dataspace

Terminate access to dataset, dataspace, file


9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id);
September 9, 2008

property list (default)

SPEEDUP Workshop - HDF5 Tutorial

79

Example: Create a Group


/ (root)
A B

4x6 array of integers

file.h5

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

80

40

Steps to Create a Group


1. Decide where to put it root group
Obtain location identifier

2. Decide link or path B 3. Create link and group in file


Specify number of bytes to store names of objects to be added to group (as a hint) or use default.

4. (Eventually) Close the group.

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

81

Code: Create a Group


hid_t file_id, group_id; ... /* Open file.h5 */ file_id = H5Fopen(file.h5, H5F_ACC_RDWR, H5P_DEFAULT); /* Create group "/B" in file. */ group_id = H5Gcreate(file_id,"/B",0); /* Close group and file. */ status = H5Gclose(group_id); status = H5Fclose(file_id);

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

82

41

HDF5 Information
HDF Information Center
http://www.hdfgroup.org

HDF Help email address


help@hdfgroup.org

HDF users mailing lists


news@hdfgroup.org hdf-forum@hdfgroup.org

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

83

Questions?

September 9, 2008

SPEEDUP Workshop - HDF5 Tutorial

84

42

También podría gustarte