Está en la página 1de 44

OPENVMS I/O AND STORAGE

Tips and Best Practices for good performance

Rafiq Ahamed K OpenVMS Engineering

2011 Hewlett-Packard Development Company, L.P. 2010 The information contained herein is subject to change without notice

Agenda

OpenVMS I/O Facts


I/O

Evolution on Integrity servers to expect from hardware do you know about multipathing on IOPERFORM and I/O

What What Notes

NUMA

Storage Tips and Tricks


EVA

Best Practices Connectivity a sneak into EVA PA

OpenVMS Have

Q&A

9/19/2011

Operating System performance is largely dependent on the underlying hardware

..So know your hardware capabilities

39/19/2011

Integrity server I/O


PCI PCI (66MHz, Interconnects 0.5GB/sec) PCI-X PCIe Gen 1 (133MHz,1GB/sec) (2.5Gb/sec/Lane, PCI-X 250MB/sec) (266MHz, 2GB/sec) Ultra SCSI 320 2G FC 600GB 3G SAS (LSI Logic) 4G FC 1.2TB PCIe Gen 2 (5Gb/sec/Lane, 500MB/sec) 6G SAS (p410i) 8G FC 7.2TB

Core I/O SAN I/O Disk Size

Ultra SCSI 160 1G FC 300GB

# I/O Device 3

12/16

Architecture has evolved drastically for I/O Devices within Integrity, Performance & Scalability - doubling in each new hardware release

9/19/2011

Examples of latest speeds and feeds of I/O on Integrity platforms

Leadership in I/O and Storage on i2 architecture


High performance, reliable and scalable SAS provides a point-to-point connection to each HDD @ 6G Speeds Provides four p410 RAID controllers (one per blade) on BL890c i2, One 410 RAID on rx2800
Configured as RAID 0/1 or HBA mode [Future]

Stripe data within and across multiple p410 RAID controllers (OpenVMS Shadowing)
Striping within provides high Performance. Striping across controllers provides no SPOF storage

Parallel SCSI with rx7640 has a shared bus Ultra160 SCSI

Each BL890c/rx2800 supports eight SFF SAS HDD, up to 7.2TB capacity

9/19/2011

Core I/O on i2 servers


Data shows the impact of p410i Caching and Striping
rx2800 i2 - Core SAS Caching
700 600 2500 2000
IOPS

rx2800 i2 - SAS Logical Disk (Striping)

500
IOPS

400 300 200 100 0 1 2 4 8 16


Load

1500 1000 500 0

32

64

128

256

16
Load

32

64

128

256

W/O Cache

Cache

1 disk with Cache

2 disk with Cache

4 disk with Cache

Use p410i Cache Batter Kit for faster response Stripe across multiple disks to maximize the utilization and throughput

9/19/2011

Customer Concerns..

How is I/O Performance on Integrity Servers? How they compare against my existing high end alpha servers? After migrating to new platform what should I expect? Why is i2 server I/O market differentiator?

9/19/2011

Software capabilities; multipathing

10

Multipathing 1(4)

Multipathing (MP) is a technique to manage multiple paths to storage device through failover and failback mechanisms It helps user to load balance across multiple paths to storage By default multipath will be enabled on OpenVMS OpenVMS MP supports ALUA (Asymmetric Logical Unit Access) [> V8.3]

OpenVMS MP supports FAILOVER and FAILBACK


OpenVMS MP load balances the storage devices across all the paths available
It

spreads # of devices evenly across all the paths during boot time

At any point in time only single path can be active to a device


Users are recommended to use static load balancing techniques

11

9/19/2011

MP Connections Good and Bad


IA64 switch IA64 IA64 switch switch
Shows single controller configurations

switch

HSV
Single

HSV
Two Paths

HSV
Four Paths

Alpha

IA64

Alpha

IA64

switch

switch

switch

switch

HSV

HSV

HSV

HSV

12

9/19/2011

Multipathing 2(4)

Device discovery initiates the path discovery and forms MP set for each device
MC First The

SYSMAN IO AUTO, SDA > SHOW DEVICE DGAxx shows MP Set path discovered is considered primary active path is called current path selection algorithms optimized to support Active/Active arrays

Automatic Always

active optimized (AO) paths are picked for I/O, if no alternative then active nonoptimized (ANO) is picked [ how to fix this will discuss during EVA best practices]

With latest firmware's on storage, very rare you will get connected to ANO

13

9/19/2011

Multipathing 3(4)
VMS switches its path to a LUN when:

An I/O error triggers mount verification (MV)


Device

is not reachable on current path and another path works

MOUNT of a device with current path offline Manual path switch via SET DEVICE/SWITCH/PATH= Some local path becomes available when current path is MSCP
Path

switch from MSCP to local triggered by the poller [not if manual sw]

Note: Any MV might trigger a path switch


MV MV

due to loss of cluster quorum due to SCSI error flush

14

9/19/2011

Multipathing 4(4)

MPDEV_POLLER is light weight, will poll all paths for availability


SET

DEVICE device/POLL/NOPOLL

MV is not bad. It only indicates that OpenVMS validated your device


Shadow

devices can initiate and complete a MV ; Each shadow member operate independently on a path switch

But, MV followed with path switch is indication of failover/failback


The

operator logs will indicate the details DEV device/full will show details of path switch [time etc] SHOW DEV device logs lot of diagnostic information in MPDEV structure

SHOW SDA>

15

9/19/2011

SCSI Error Poller

We have seen customers reporting huge traffic of Unit Attention(UA) in SAN resulting in Cluster hangs, slow disk operations, high mount verifications etc! These UA are initiated due to any changes in SAN like Firmware Upgrades, Bus Reset etc SCSI_ERROR_POLL is poller responsible for clearing the latched errors (like SCSI UA) on all the fibre and SCSI devices, which can otherwise cause confusion in SAN By default the poller will be enabled SYSGEN>SHOW SCSI_ERROR_POLL

16

9/19/2011

Customer/Field Concerns..
OpenVMS Multipathing

After upgrading my SAN components, I see large number of mount verifications, does that indicate problem? Does multipathing do load balance? Or policies? I see too many mount verification messages in operator log, will it impact the volume performance (especially latency) How do I know if my paths are well balanced or not? How do I know my current path is active optimized or not? Does multipathing support Active/Active Arrays, ALUA, third party storage, SAS device, SSD device

17

9/19/2011

Did you know QIO is one of the most heavily used interface in OpenVMS. We want to put it on diet. What should we do? 1. Optimize QIO 2. Replace QIO 3. Provide alternative

18 9/19/2011

IOPERFORM/FastIO

Fast I/O is a performance-enhanced alternative to performing QIOs Substantially reduces the setup time for an I/O request Fast I/O uses buffer objects to eliminate the I/O overhead of manipulating I/O buffers locked memory doubly mapped Performed using the buffer objects and the following system services:
sys$io_setup

,sys$io_perform, sys$io_cleanup
(jacket) /sys$create_bufobj_64

sys$create_bufobj $

dir sys$examples:io_perform.c

System management considerations:


SYSGEN parameter
Creating

MAXBOBMEM limits memory usage (defaults to 100)

VMS$BUFFER_OBJECT_USER identifier is required for process

buffer objects once and using them for the time of application is faster

19

9/19/2011

Impact of IOPERFORM/FASTIO
I/O Data Rate (MB/sec)

Resource usage is reduced by 20-70% depending on load and system


Small

size random workloads doubles the throughput with increased loads size sequential workloads operate same

450 400 350 300 250 200 150 100 50 0 1 2 3 4 Threads 5 6 7

128K_READ 128K_READ_QIO 128K_WRITE 128K_WRITE_QIO

Larger

Throughput (IOPS)
45000 40000 35000 30000 25000 20000 15000 10000 5000 0

8K_READ

8K_READ_QIO

8
Threads

16

32

64

20

9/19/2011

NUMA/RAD Impact
What you should know?

DGA100

P1

22

9/19/2011

BL890c i2 (architecturally 4 Blades conjoined)

NUMA/RAD Impact

In RAD based system, each RAD is made of CPU, Memory and I/O devices The accessibility of I/O devices from local to remote domains results in accessing remote memory and remote interrupt latency
Impact of RAD on I/O Device
3700 3600 3500 IO Rate 3400 3300 3200 3100

Optimized Performance

10-15 % overhead

3000
0 1 2 3 4 RAD #
23 9/19/2011

Opt/sec

RAD Guidelines for I/O


Keep I/O Devices close to process which is heavily accessing it Make use of FASTPATHING efficiently
Make The

sure to FASTPATH the Fibre Devices close to the process which is initiating the I/O

overhead involved in handling the remote I/O can impact the throughput [chart]

FASTPATH algorithms assign the CPU on round robin basis Statically load balance the devices across multiple RADs Make use to SET PROC/AFFINITY to bind processes with high I/O Use SET DEVICE device/PREFERRED_CPUS

24

9/19/2011

STORAGE BEST PRACTICES

25

EVA Differences
Speeds and Feeds
EVA4400
Model Memory /controller pair HSV300 4GBytes

EVA6400
HSV400 8GBytes

EVA8400
HSV450 14/22GBytes

P6300
HSV340 4GBytes

P6500
HSV360 8GBytes

Host Ports / controller pair

4 FC 20 w switches

8 FC

8 FC

Host Port speed Device Ports, # Device Ports, Speed # 3-1/2 drives # 2-1/2 drives Max. Vdisk I/O Read Bandwidth I/O Write Bandwidth Random Read I/O

4Gb/s FC 4 4Gb/s FC 96 0 1024 780 MB/s 590 MB/s 26,000 IOPs

4Gb/s FC

4Gb/s FC

8 4Gb/s FC 216 0 2048 1,250 MB/s 510 MB/s 54,000 IOPs

12 4Gb/s FC 324 0 2048 1,545 MB/s 515 MB/s 54,000 IOPs

8 FC, 0 GbE 4 FC, 8 1GbE 4 FC, 4 10GbE 8Gb/s FC 8Gb/s FC 1Gb/s iSCSI 1Gb/s iSCSI 10Gb/s 10Gb/s iSCSI/FCoE iSCSI/FCoE 8 16
6Gb/s 120 250 1024 1,700 MB/s 600 MB/s 45,000 IOPs 6Gb/s 240 500 2048 1,700 MB/s 780 MB/s 55,000 IOPs

8 FC, 0 GbE 4 FC, 8 1GbE 4 FC, 4 10GbE

27

9/19/2011

General I/O issues reported

After upgrading the OS or applying the patch, I/O response has become slower
We

see 5-6 millisecond delay in completion of I/O compared to yesterday

After moving to new blade in same SAN environment, our CRTL FSYNC is running slow After upgrading, we see additional CPUs ticks for copy, delete and rename Our database is suddenly responding slow Some nodes in cluster see high I/O latency after mid-night

Customer wants to know if this storage is enough for next 5 years


Customer is migrating from older version of EVA to newer version, can you advise
9/19/2011

28

Maximum number of storage performance issues reported are due to mis-configuration of SAN components

29 9/19/2011

Best Practices..1(6)

Number of disks influences performance - Yes


Fill

the EVA with as many disk drives as possible.


have shown linear growth in throughput (small random)

Tests

Number of disk groups influences performance No


In

mixed load environments, it would be ok to have random vs. sequential application disk groups best performance over the widest range of workloads; however, Vraid5 is better for some sequential-write workloads provides the best random write workload performance , but no protection Use for non-critical storage needs

Vraid level influences performance Yes


Vraid1 Vraid0

30

9/19/2011

Best Practices..2(6)

Fibre channel disk speed - Yes


10K

vs 15K Speed 15K rpm disks provide highest performance


sequential I/O, speed doesnt matter, but capacity matters random I/O, 30-40% gains in request rates is seen

Large-block Small-block Best

price-performance

For the equivalent cost of using 15k rpm disks, consider using more 10k rpm disks.

Combine disks with different performance characteristics in the same disk group
Do

not create separate disk groups to enhance performance

31

9/19/2011

Best Practices..4(6)

Mixing disk capacities Yes


The No Use

EVA stripes LUN capacity across all the disks in a disk group. The larger disk will have more LUN capacity leading into imbalanced density control over the demand to the larger disks. disks with equal capacity in a disk group.

Read cache management influences performance and always ENABLE LUN count Yes, No
Good

to have few LUNs per controller on Host Requests and Queue Depths OpenVMS Queue depth

Depends Monitor

Transfer size Yes


Impacts
Tune

SEQUENTIAL Workloads

the write transfer size to be a multiple of 8K and no greater than 128K.

OpenVMS

Max Transfer Size is 128K for Disk and 64K for Tapes! DEVICE_MAX_IO_SIZE
9/19/2011

32

Best Practices..3(6)

SSD performance - Yes, Yes, Yes


SSDs

are about 12 times better

OpenVMS performed 10x times better than FC [next slide details]

Workloads Spread SSDs Monitor

like transaction processing, data mining, and databases such as Oracle are ideal SSDs evenly across all available back end loops and HDDs may be mixed in the same drive enclosure

your application and EVA, accordingly can assign SSD or HDD to individual Controllers, or enable write through mode for SSDs can help [Experiment!!] use SSD drives to keep the critical path data, where the response time is un-compromised

Customers

33

9/19/2011

OpenVMS 8.4 Performance Results


SSD Drive Through EVA
4K Mixed QIO - FC vs SSD
14000 12000 10000 IOPS 8000 6000 4000 2000 0 1 2 4 8 16 Threads QIO_FC QIO_SSD FC 32 64 128 256 Res Time(msec)

OpenVMS V8.4

10x Faster
250 200 150 100 50 0 0

4K Mixed QIO FC vs SSD

10x Faster

50

100

150 Threads SSD

200

250

300

Mixed Load, 8 Disks SSD/FC DG on EVA4400 Smaller IOs(4K/8K) showed 9-10x times sustained increased in IOPS and MB/sec with increase in load for SSD carved LUNs compared to FC With 10 times faster response time, SSD carved LUN was able to deliver 10 time more performances and bandwidth for smaller size IOs
9/19/2011

34

Best Practices..5(6)

Controller balancing Yes, Yes, Yes


Maximize Manually

the utilization present LUNs simultaneously through both controllers, Ownership is only one

Active/Active

load balance LUN ownership to both controllers (use EVAPerf) either through using CV EVA or using OpenVMS SET DEVICE/SWITCH/PATH='PATH_NAME' 'DEV_NAME' command Path During the initial boot of the EVA the preferred path parameter is read and determines the managing controller [see below figure for options] LUN ownership is reassigned after a failed controller has been repaired the workload as evenly as possible across all the Host Ports
DGA99 answers Inquiry on these ports DGA99 does I/O on these ports HSVxxx HSVxxx

Preferred Verify

Balance

DGA99
35 9/19/2011

Command View EVA Preferred Path Settings

Customer Scenario
Controller Load Imbalance & No Equal Port Load Distribution

36

9/19/2011

Best Practices..6(6)

Ensure there are no HW issues Specially Battery failure (Cache Battery failure causes to change to Write-Through mode, hence Write performance becomes an issue), Device loop failure etc. Drive reporting timeouts Deploy the array only in supported configurations Stay latest on EVA firmware!! BC and CA have different best practices and beyond the scope of this discussion

37

9/19/2011

Some Points to Remember

Large latencies may be quite natural in some contexts, such as an array processing large I/O requests Array processor utilizations tend to be high under intense, small-block, transaction oriented workloads but low under intense large-block workloads

38

9/19/2011

OpenVMS IO Data
P6500 36G RAID 5 Volume 4G FC Infrastructure
Sequential Read (MB/sec)
45 40 35 30 25 20 15 10 5 0 138 286 367 383 404 411 412 412 MB/sec resp time(msec)

Higher bandwidth can be obtained with larger blocks. Larger blocks can drain the interconnects faster due large data transfers.
128K IOs pushing 4G FC line speeds!

Random Read (IO/sec)


6 resp time(msec) 5 4 3 2 1 0 6562 12935 22287 31888 38136 40052 43452 45504 46202 IOPS

Higher throughputs can be obtained with smaller blocks. Usually smaller blocks need a lot of processing power
8K workloads pushing close to EVA Max Throughputs!

40

9/19/2011

STORAGE PERFORMANCE ANALYSIS TOOLS & REFERENCES

41 9/19/2011

Storage Performance Tools

OpenVMS Host Utilities


T4,

TLViz Disk, FCP, VEVAMON (older EVAs) > FC [for fibre devices], PKR/PKC [for SAS devices] SCSI_INFO.EXE, SCSI_MODE.EXE, FIBRE_SCAN.EXE more..

SDA

SYS$ETC: Many

EVAPerf Command Line EVA Performance Data Collector

EVA Performance Advisor [ Year End Release]


XP Performance Advisor Storage Essential Performance Pack

42

9/19/2011

Salient aspects of HP P6000 Performance Advisor


Slated to release soon Can participate in early adaptor program

Integrated with Command View 10.0 in single pane of glass


User centric design compliance Features:
Dashboard Threshold Key Chart Report

monitoring & notification

metric

Quick setup Events Database

43

9/19/2011

Sizing EVA
HP StorageWorks Sizing Tool

44

9/19/2011

References EVA Performance

HP StorageWorks Enterprise Virtual Array A tactical approach to performance problem diagnosis. HP Document Library

45

9/19/2011

Questions/Comments

Business Manager (Rohini Madhavan)


rohini.madhavan@hp.com

Office of Customer Programs


OpenVMS.Programs@hp.com

46

9/19/2011

THANK YOU

EVA Models - Reference


EVA Model EVA3000 EVA5000 EVA3000 EVA5000 EVA4000 EVA6000 EVA4100 EVA6100 Controller Model HSV100 HSV1 10 HSV101 HSV1 1 1 HSV200 or HSV200-A HSV200 or HSV200-A HSV200-B HSV200-B 5.XXX or 6.XXX (Latest 6220) Firmware

1.XXX, 2.XXX or 3.XXX (Latest 31 10)


4.XXX (Latest 4100)

EVA8000
EVA8100 EVA4400 EVA6400 EVA8400

HSV210 or HSV210-A
HSV210-B HSV300 HSV400 HSV450 09XXXXXX or 10000000 095XXXXX, or 10000000 10000090

P6300
P6500
48 9/19/2011

HSV340
HSV360