Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Bob Netherton
Technical Specialist, Solaris Adoption Sun Microsystems, Inc. http://blogs.sun.com/bobn
What is ZFS? Why a new file system? What's different about it? What can I do with it? How much does it cost? Where does ZFS go from here?
2
What is ZFS?
End-to End Data Integrity
Fat locks, fixed block size, naive pre-fetch, dirty region logging
Transactional operation
> Maintain consistent on-disk format > Reorder transactions for performance gains big
performance win
Lower 1GB
Upper 1GB
Even 1GB
Odd 1GB
Left 1GB
Right 1GB
Concatenated 2GB
Striped 2GB
Mirrored 1GB
7
No partitions / volumes Grow / shrink automatically All bandwidth always available All storage in pool is shared
ZFS ZFS
Volume Manager
Volume to Disk
> Block device interface > Write each block to each disk
immediately to sync mirrors > Loss of power = resync > Synchronous & slow
SP to Disk
> Schedule, aggregate, and issue I/O
DATA
INTEGRITY
10
Everything is transactional
> Related changes succeed or fail as a whole > No need for journaling
Everything is checksummed
> No silent corruptions > No panics from bad metadata
New Pointers
End-to-End Checksums
Checksums are separated from the data
Prevents: > Silent data corruption > Panics from corrupted metadata > Phantom writes > Misdirected reads and writes > DMA parity errors > Errors from driver bugs > Accidental overwrites
13
Self-Healing Data
ZFS can detect bad data using checksums and heal the data using its mirrored copy.
Application ZFS Mirror Application ZFS Mirror Application ZFS Mirror
Disk Scrubbing
Uses checksums to verify the integrity of all the data Traverses metadata to read every copy of every block Finds latent errors while they're still correctable It's like ECC memory scrubbing but for disks Provides fast and reliable re-silvering of mirrors
15
RAID-Z Protection
RAID-5 and More
16
128-bit File System No Practical Limitations on File Size, Directory Entries, etc. All metadata is dynamic Concurrent Everything
EASIER
ADMINISTRATION
18
Easier Administration
Pooled Storage Design makes for Easier Administration
No need for a Volume Manager!
Straightforward Commands and a GUI > Snapshots & Clones > Quotas & Reservations > Compression > Pool Migration > ACLs for Security
19
ZFS
ZFS
Storage Pool
20
21
22
# zpool add tank mirror c9t43d0 c13t12d0 # df -h -F zfs Filesystem size used avail capacity Mounted on tank 66G 24K 66G 1% /tank tank/home 66G 27K 66G 1% /export/home
23
9:38 2006
ZFS Snapshots
Provide a read-only point-in-time copy of file system Copy-on-write makes them essentially free Very space efficient only changes are tracked And instantaneous just doesn't delete the copy
New Uber-block
Snapshot Uber-block
Current Data
26
ZFS Snapshots
Simple to create and rollback with snapshots
# zfs list -r tank NAME USED tank 20.0G tank/home 20.0G tank/home/ahrens 24.5K tank/home/billm 24.5K tank/home/bonwick 24.5K AVAIL 46.4G 46.4G 10.0G 46.4G 66.4G REFER 24.5K 28.5K 24.5K 24.5K 24.5K MOUNTPOINT /tank /export/home /export/home/ahrens /export/home/billm /export/home/bonwick
# zfs snapshot tank/home/billm@s1 # zfs list -r tank/home/billm NAME USED AVAIL REFER tank/home/billm 24.5K 46.4G 24.5K tank/home/billm@s1 0 - 24.5K
MOUNTPOINT /export/home/billm -
27
ZFS Clones
A clone is a writable copy of a snapshot
> Created instantly, unlimited number
Perfect for read-mostly file systems source directories, application binaries and configuration, etc.
# zfs list -r tank/home/billm NAME USED AVAIL tank/home/billm 24.5K 46.4G tank/home/billm@s1 0 REFER 24.5K 24.5K MOUNTPOINT /export/home/billm -
# zfs clone tank/home/billm@s1 tank/newbillm # zfs list -r tank/home/billm tank/newbillm NAME USED AVAIL REFER MOUNTPOINT tank/home/billm 24.5K 46.4G 24.5K /export/home/billm tank/home/billm@s1 0 - 24.5K tank/newbillm 0 46.4G 24.5K /tank/newbillm
28
Adaptive Endian-ness - Hosts always write in their native endian-ness Opposite Endian Systems - Write and copy operations will eventually byte
swap all data!
Config Data is Stored within the Data - When the data moves, so does its config info
if needed
Data Compression
Reduces the amount of disk space used Reduces the amount of data transferred to disk increasing data throughput
ZFS
Data Compression
32
Data Security
The uber-block checksum can serve as a digital signature for the entire filesystem
> 256 bit, military grade checksum (SHA-256) available
Encrypted filesystem support coming soon Secure deletion (scrubbing) coming soon
33
tank
Global Zone
35
36
Must set mountpoint to legacy so that the zone manages the mount
37
Mounted on /p2/z1b
39
41
# zfs list NAME USED AVAIL REFER MOUNTPOINT p1 3.37G 8.14G 36K /zones p1/z1 127M 8.14G 127M /zones/z1 p1/z2 3.24G 8.14G 3.24G /zones/z2 # cp z2.conf z3.conf <make changes necessary for z3 identity> # zonecfg -z z3 -f z3.conf # zoneadm -z z3 clone z2 Cloning snapshot p1/z2@SUNWzone1 Instead of copying, a ZFS clone has been created for this zone. # zfs list NAME USED AVAIL REFER MOUNTPOINT p1 3.37G 8.14G 37K /zones p1/z1 127M 8.14G 127M /zones/z1 p1/z2 3.24G 8.14G 3.24G /zones/z2 p1/z2@SUNWzone1 94.5K - 3.24G p1/z3 116K 8.14G 3.24G /zones/z3
42
iSCSI
Swap
Raw
# zfs create -V 4g tank/v1 # newfs /dev/zvol/rdsk/tank/v1 <newfs output> # mount /dev/zvol/dsk/tank/v1 /mnt # df -h /mnt Filesystem size used /dev/zvol/dsk/tank/v1 3.9G 4.0M
Mounted on /mnt
44
BREATHTAKING
PERFORMANCE
45
Copy-on-Write Design Multiple Block Sizes Pipelined I/O Dynamic Striping Intelligent Pre-Fetch
ZFS is FREE*
*Free
USD0 EUR0 GBP0 SEK0 YEN0 YUAN0
47
ZFS source code is included in Open Solaris > 47 ZFS patents added to CDDL patent commons
Pool resize and device removal Booting / root file system Integration with Solaris Containers
More Secure
More Reliable