Documentos de Académico
Documentos de Profesional
Documentos de Cultura
PowerPath Best Practice for AIX 2.3, 3.3, 3.4, 3.5, 3.6, 3.7, 4.2 3.8, 3.9, 4.3 4.4: emc108127 AIXV5 patch level 4GB 4.5emc111635 1.2 PP5.0 AIX requirement 3.10 VCMDB FA ports 2.2: EMC69100 3.12 AIX4.3.3 3.11: TF/Clone experience sharing 4.6use PowerPath instead of AIX MPIO 4.7emc116729 FSCSI_ERR10 4.8: emc155602 iostat incorrect AIX 4.9EMC163961: AIX Host Lost access to storage during powermt restore( same as powermt coinfig / syminq / symrdf / symsnap) AIX 4 powerpath 5.1 SP1 AIX 3.10 Symmetrix HACMP5.4 2.2: EMC69100 2.3 Starting from HACMP5.4, Primus69100 emcpowerreset not needed 2.3 EMC156011 HACMP5.4 emcpowerreset 3.11 emc81931 emc81930 Solution Enabler SCSI Reservation Lock
7.1
2008 1 6
7.2
2008 2 14
7.3 7.4
2008 4 26 2008 6 2
7.5
2008 8 12
EMC EMC
...............................................................................................................................2 ...............................................................................................................................3 1 AIX HOST CONNECTIVITY..........................................................................................................................6 1.1 DOWNLOADING LATEST AIX ODM FROM EMC WEBSITE:....................................................................................6 1.2 AIX FIX LEVEL INFORMATION:..........................................................................................................................6 1.3 FC ADAPTER USEFUL COMMANDS:........................................................................................................................8 1.4 USEFUL SETTING AND PARAMETERS:..................................................................................................................12 1.4.1 FSCSI device Fast_Fail & Dyntrk ..............................................................................12 1.4.2 Set HACMP break reserves in parallel =true (Recommended)................................12 1.5 USEFUL SMALL SCRIPTS:....................................................................................................................................12 1.5.1 hdiskpower .......................................................................................12 1.5.2 hdiskpower0 hdiskpower100 ........................................................12 1.5.3 hdisk .................................................................................................12 1.5.4 hdisk ......................................................................................13 2 HACMP.............................................................................................................................................................14 2.1 HACMP EMC 2-3 .......................................................14 2.2 EMC 69100: HACMP TAKEOVER EMC HACMP SCSI RESERVATION .....................................................................................17 2.3 EMC HACMP5.4 EMCPOWERRESET...............................................................................27 2.4 TOPAS
IOSTAT
2.5 HDISKPOWER , BUSY.....................................................29 2.6 EMC143075, SET_SCSI_ID HAS BEEN REPLACED BY CFGSCSI_ID WITH CLARIION LUNS IN HACMP/PP
ENVIRONMENT.......................................................................................................................................................29
2.7 CLARIION LUN AIX ----AUTHOR DANNY LI...........................................33 2.8 AIX CHDEV COMMAND CAUSE DISK OPERATION ERROR ON CLARIION PASSIVE(STANDBY) PATH --- AUTHOR DANNY LI ...........................................................................................................................................................................34 2.9 QUEUE DEPTH ---AUTHOR DANNY LI..........................................................................................38 2.10 AIX JFS2 MOUNT OPTION "CIO" CAUSE VERY LOW PERFORMANCE FOR CP AND DD COMMAND--- AUTHOR DANNY LI ...........................................................................................................................................................................38 2.11 EMC PROCEDURES WHEN REPLACING HBA CARD ON AIX--- BY DANNY LI.......................................................40 2.12 STEPS FOR TF/SNAP IMPLEMENTATION ----- BY DANNY LI................................................................................40 2.13 VCMDB FA PORTS:..............................................................................41 2.14 COOKBOOK FOR TIMEFINDER INCREMENTAL CLONE AND CLONE RESTORE..............................................................41 2.15 AIX433 .....................................................................................................................41 3 POWERPATH.................................................................................................................................................41 3.1 FAST I/O FAILURE AND DYNAMIC TRACKING.....................................................................................................42 3.2
EMC145919:
POWERPATH 4.5.X: AIX LVM I/O ERRORS DUE TO MISSING DEVICE RESERVATIONS.".......................46
3.4 EMC108127: METHOD ERROR /ETC/METHODS/CFGPOWER -L POWERPATH0 0514-040 ERROR INITIALIZING A DEVICE
INTO THE KERNEL IN
3.5 EMC111635: HOW TO REMOVE LUNZ DEVICES ON AN AIX HOST....................................................................51 3.6 USE POWERPATH INSTEAD OF AIX MPIO.........................................................................................................53 3.7 EMC116729: FSCSI_ERR10 CONFIGURATION MISMATCH ERRORS IN ERRPT AFTER TURNING ON DYNAMIC TRACKING............................................................................................................................................................54 3.8 EMC155602: AIX HOST WITH POWERPATH DOESN'T UPDATE DKSTAT CORRECTLY THUS IOSTAT SUMMARY APPEARS
INCORRECT............................................................................................................................................................56
3.9 EMC163961: AIX HOST LOST ACCESS TO STORAGE DURING POWERMT RESTORE( SAME AS POWERMT COINFIG /
SYMINQ
/ SYMRDF / SYMSNAP).................................................................................................................................57
ftp://ftp.emc.com/pub/elab/aix/ODM_DEFINITIONS/
1.2
Type
Service Pack Service Pack
Prereqs Date
?? 2008 ?? 2008
5300-06 5300-05-CSP 5300-05-06 5300-05-05 5300-05-04 5300-05-03 5300-05-02 5300-05-01 5300-05 5300-04-CSP 5300-04-03 5300-04-02 5300-04-01 5300-04 5300-03 5300-02 5300-01
Technology Level Concluding Service Pack Service Pack Service Pack Service Pack Service Pack Service Pack Service Pack Technology Level Concluding Service Pack Service Pack Service Pack Service Pack Technology Level Maintenance Level Maintenance Level Maintenance Level 5300-04 5300-04 5300-04 5300-04 5300-05 5300-05 5300-05 5300-05 5300-05 5300-05 5300-05
?? 2007 ?? 2007 ?? 2007 ?? 2007 ??? 2006 ??? 2006 ?? 2006 ?? 2006 ?? 2006 ?? 2006 ?? 2006 ?? 2006 ?? 2006 ?? 2006 ?? 2005 ?? 2005 ?? 2005
AIX5.2 service pack: Technology Levels and Service Packs Type Prereqs Service Pack TL 5200-10 Service Pack Service Pack Service Pack Technology Level Concluding Service Pack Service Pack Service Pack Service Pack Service Pack Service Pack Service Pack Technology Level Concluding Service Pack Service Pack Service Pack Technology Level Concluding Service Pack 5200-07 TL 5200-08 TL 5200-08 TL 5200-08 TL 5200-09 TL 5200-09 TL 5200-09 TL 5200-09 TL 5200-09 TL 5200-09 TL 5200-09 TL 5200-10 TL 5200-10 TL 5200-10
Name 5200-10-04-0750 5200-10-03-0744 5200-10-02-0730 5200-10-01-0722 5200-10 5200-09-CSP 5200-09-06 5200-09-05 5200-09-04 5200-09-03 5200-09-02 5200-09-01 5200-09 5200-08-CSP 5200-08-02 5200-08-01 5200-08 5200-07-CSP
Date December 2007 October 2007 July 2007 June 2007 June 2007 June 2007 March 2007 February 2007 January 2007 November 2006 October 2006 September 2006 August 2006 August 2006 April 2006 February 2006 February 2006 March 2006
Maintenance Level packages (legacy) Type Prereqs Date Maintenance Level September 2005 Maintenance Level Maintenance Level Maintenance Level Maintenance Level Maintenance Level Maintenance Level May 2005 January 2005 December 2004 May 2004 October 2003 May 2003
1.3
fcs0 fcs1
Device Specific.(Z1)........00000000 Device Specific.(Z2)........00000000 Device Specific.(Z3)........03000909 Device Specific.(Z4)........FFC01158 Device Specific.(Z5)........02C82135 Device Specific.(Z6)........06C32135 Device Specific.(Z7)........07C32135 Device Specific.(Z8)........20000000C9488F4C Device Specific.(Z9)........BS2.10X8 <===========2.10X8 Device Specific.(ZA)........B1D2.10X8 Device Specific.(ZB)........B2D2.10X8 Device Specific.(YL)........P1-I1/Q1
EC Level....................A Serial Number...............1A31600450 Manufacturer................001A Feature Code................280B 197E 1957 FRU Number.................. Device Specific.(ZM)........3 Network Address.............10000000C9332A79 ROS Level and ID............02881914 Device Specific.(Z0)........1001206D Device Specific.(Z1)........00000000 Device Specific.(Z2)........00000000 Device Specific.(Z3)........03000909 Device Specific.(Z4)........FF801315 Device Specific.(Z5)........02881914 Device Specific.(Z6)........06831914 Device Specific.(Z7)........07831914 Device Specific.(Z8)........20000000C940F419 Device Specific.(Z9)........TS1.90A4 <=== firmware level Device Specific.(ZA)........T1D1.90A4 Device Specific.(ZB)........T2D1.90A4 Device Specific.(YL)........U7879.001.DQD0JHD-P1-C1-T1 (FC card WWN) 80P4544
FC6228 2GB
fcs0 00-04 IBM Gigabit Fiber Channel PCI Adapter for 64 bit bus Part Number.................00P4494 EC Level....................A Serial Number...............1A25001206 Manufacturer................001A FRU Number.................. 00P4495 Network Address.............10000000C930D02B ROS Level and ID............02C03951 Device Specific.(Z0)........2002606D Device Specific.(Z1)........00000000 Device Specific.(Z2)........00000000 Device Specific.(Z3)........03000909 Device Specific.(Z4)........FF401210 Device Specific.(Z5)........02C03951 Device Specific.(Z6)........06433951 Device Specific.(Z7)........07433951 Device Specific.(Z8)........20000000C930D02B Device Specific.(Z9)........CS3.91X4 Device Specific.(ZA)........C1D3.91X4 Device Specific.(ZB)........C2D3.91X4
FC62271GB
fcs0 00-04 IBM Gigabit Fiber Channel PCI Adapter ROS Level and ID...........02903291 Device Specific (Z9).......SS3.22A1 Part Number....................09P4038, 09P1162, 03N4167, 24L0023 Serial Number..................00000000 Device Specific.(YL).......P2-I5
#lsattr -El fscsi0 fscsi fastfail, dyntrk scsi_id attach 0x10300 switch Adapter SCSI ID FC Class for Fabric False False True How this adapter is CONNECTED
sw_fc_class 3
fc_err_recov delayed_fail
#rmdev -dl fscsi0 fcs0 #rmdev -dl fscsi0 -R hdiskpower, hdisk, FC tape #cfgmgr -vl fcs0#cfgmgr -v
1.4
1.4.1
2.1& 4.3 chdev -l fscsi0 -a fc_err_recov=fast_fail -P chdev -l fscsi0 -a dyntrk=yes -P Note: Only enable it in multipath software(such as PowerPath) environment. Only enable it in Switch environment.
1.4.2
4.1 smitty hacmp -> extended configuration -> extended resource configuration -> HACMP Extended resources configuration -> configure custom disk methods -> change/show Custom disk methods ->select "disk/pseudo/power" then scroll down to "break reserves in parallel" and change the value to true.
1.5
1.5.1
>do
1.5.2
# i=0
hdiskpower0 hdiskpower100
# while [ i -le 100 ] >do >rmdev -dl hdiskpower$i >let i=i+1 >done
1.5.3
>do
hdisk
1.5.4
fcs0 >do
hdisk
Available 20-58 FC Adapter
# lsdev -Cc adapter|grep fcs0 # for i in `lsdev -Cc disk|grep 20-58|cut -c 1-8` >rmdev -dl $i >done
2 HACMP
2.1 HACMP EMC 2-3
Symptom
configuration -> HACMP Extended resources configuration -> configure custom disk methods -> change/show Custom disk methods select "disk/pseudo/power" then scroll down to "break reserves in parallel" and change the value to true. Repeat this on all nodes in the cluster, then resync the cluster. Note: This will break reserves for all disks in each volume group in parallel. Thus a cluster with a large number of disks in each volume group will see substantially better improvement than a cluster where each volume group contains a small number of disks. Note If disk/pseudo/power is not defined as a custom disk method on your cluster and you have PowerPath installed, please see EMC69100
Fact Fact
emcpowerreset utility used Product: CLARiiON Storage Array HACMP Cluster failover is slow emcpowerreset gets called on cluster initial start with no devices reserved. emcpowerreset gets called during node failure event for devices with no disk emcpowerreset takes a longer period of time to execute.
The scdiskutil binary uses the PowerPath pseudo devices scsi_id ODM attribute but sends ioctl(s) directly to the adapter/HBA interface instead of using the PowerPath interface. In some cases this scsi_id ODM attribute will correspond to the CLARiiON passive not ready?path rather then the Active?path to the device required. When HACMP tries to access a LUN down the passive not ready?path, the following problems can be encountered: 1) In a Cascading HACMP env, the HACMP failover time will increase for each device accessed down the passive path. 2) In a Concurrent Access env, the emcpowerreset?utility might get run on some devices down the passive path resulting in a Permanent SCSI Reservation on LUNs which should never have SCSI Reservation. See Primus case emc104555 for more details on this issue. Fix Fixed in PP 4.6 (ETA Q3/2006). In the mean time Customer can get the fix from EMCs PowerLink web site. All of the above problems have been fixed with a new version on the set_scsi_id?script. To address a number of backward and forward compatibility issues, this new script now combines the original "set_scsi_id.sh" script with part of the existing "rc.emcpower". Also note: 1. This new set_scsi_id?script is now required for ALL CLARiiON attach devices to an AIX node with HACMP and PP. 2. 3. 4. 5. 6. This new set_scsi_id?script will replace the existing versions of the set_scsi_id.sh?and rc.emcpower?scripts. With this new set_scsi_id?script the old "rc.emcpower" utility is no longer required. This new set_scsi_id?script is backward compatible with PP 4.3 and higher. The name of the file has changed from set_scsi_id.sh?to set_scsi_id? The default location that the file will be installed in when added via a PP install CD has changed. It was placed in ?usr/lpp/EMC/CLARiiON/bin? it will now be put in /etc. Customers adding this new set_scsi_id?script to an AIX node for the first time should follow the steps below to acquire the file from EMCs PowerLink website and then use the "claddcustom" command, as shown in the procedure below, to add the set_scsi_id?script into their HACMP environment. Customers who already have the previous version of the set_scsi_id.sh?script configured in their HACMP env will need to delete the old set_scsi_id.sh?script and modify their HACMP env to point to the new file in the new location. This can be done a number of ways. EMC recommends following
the exact same steps in the procedure below except, in Step 3, replace claddcustom? with clchcustom ?to change the current HACMP configuration to point to the new file name in the new location. 1. Copy the set_scsi_id.zip?file from EMCs PowerLink web site to a windows based system with WinZip. set_scsi_id.zip or http://powerlink.emc.com Select Support -> Downloads and Patches -> Downloads D-R -> PowerPath for UNIX. Then select the file set_scsi_id.zip?and follow the pop up to save the file to disk. 2. Uncompress the file set_scsi_id.zip?using unzip. 3. Copy the script set_scsi_id?to the /etc directory of all AIX node needing it and make it executable. chmod +x /etc/set_scsi_id 4. Add the custom cluster event to your configuration. This event is the name given to the script that will be added later to select pre-defined HACMP events. Note that is is one long command. /usr/es/sbin/cluster/utilities/claddcustom -t event -n'set_scsi_id' -I'Set correct scsi id on EMC CLARiiON pseudo devices.'-v'/etc/set_scsi_id' 5. Verify your custom cluster event was added. (0)odmget HACMPcustom HACMPcustom: name = "set_scsi_id" type = "event" description = "Set correct scsi id on EMC CLARiiON pseudo devices." value = "/etc/set_scsi_id'" relation = "" status = 0 6. Modify the pre-defined HACMP event by giving the event command your custom cluster event as a pre-event command. /usr/es/sbin/cluster/utilities/clchevent -O'node_up' -s'/usr/es/sbin/cluster/events/node_up' -b 'set_scsi_id' -c '0' /usr/es/sbin/cluster/utilities/clchevent -O'node_down' -s'/usr/es/sbin/cluster/events/node_down' -b 'set_scsi_id' -c '0' 7. Verify the events were properly modified. (0)odmget -q name=node_up HACMPevent HACMPevent: name = "node_up" desc = "Script run when a node is attempting to join the cluster." setno = 101 msgno = 7 catalog = "events.cat" cmd = "/usr/es/sbin/cluster/events/node_up" notify = "" pre = "set_scsi_id" post = "" recv = ""
count = 0 event_duration = 0 odmget -q name=node_down HACMPevent HACMPevent: name = "node_down" desc = "Script run when a node is attempting to leave the cluster." setno = 101 msgno = 8 catalog = "events.cat" cmd = "/usr/es/sbin/cluster/events/node_down" notify = "" pre = "set_scsi_id" post = "" recv = "" count = 0 event_duration = 0 8. Synchronize your cluster and ensure there are no errors resulting from the addition of the preevent scripts. Notes: The modified script must reside in the same location on all nodes in the cluster. The script must be made executable on all nodes. Synchronization for the cluster will fail if it is not.
2.2
Question: ETA emc69100: How to prevent an AIX HACMP or RVSD cluster from failing during a node failover event. Question: How to set up PSSP/VSD on AIX 5.x to enable LUN reset on PowerPath devices. Question: How to set up HACMP on AIX 5.x to enable LUN reset on PowerPath devices (emcpowerreset). Question: How to confirm emcpowerreset is installed and configured correctly. Environment: EMC Technical Advisory Environment: Product: Symmetrix Environment: Product: CLARiiON Environment: OS: IBM AIX 5.x Environment: IBM HACMP 4.5 Environment: IBM HACMP 5.1 Environment: IBM HACMP 5.2 Environment: IBM HACMP 5.3 Environment: EMC SW: PowerPath 3.0.3 and above Environment: Application SW: Current ESM Supported versions of HACMP or RVSD for AIX Environment: The problem does not affect HACMP or RVSD on AIX 4.3.3 with PowerPath 3.x. Environment: The problem does not affect HACMP or RVSD when using PowerPath versions below 3.0.3.
Environment: Starting with HACMP 5.4 IBM HACMP native cl_fscsilunreset utility correctly clears disk reservations. For more information see entry for HACMP 5.4 in fix statement below. EMC and IBM continue to recommend that emcpowerreset be used to clear reserves for EMC arrays as this has been shown to provide 70% faster fail-over times in the event of a node outage. Problem: In a non-cooperative node failover, the takeover node cannot always clear the SCSI device reservation held by the primary node. Problem: The problem occurs when the takeover node cannot always clear the SCSI device reservation held by the primary node. If the words "invalid argument" appear in the hacmp.out or vsd.log file in response to an SCIOLSTART, you may have encountered this issue. For example, the hacmp.out log file will contain the following: cl_disk_available[187] cl_fscsilunreset fscsi0 hdiskpower1 false cl_fscsilunreset[124]: openx(/dev/hdiskpower1, O_RDWR, 0, SC_NO_RESERVE): Device busy cl_fscsilunreset[400]: ioctl SCIOLSTART id=0X11000 lun=0X1000000000000 : Invalid argument Problem: Failure to Restablish SCSI-2 reservations following a Fibre link event. Problem: Installed the PowerPath utility emcpowerreset. Fix: Install and configure the EMC PowerPath utility "emcpowerreset" into the HACMP environment. This binary can be obtained with any recent version of the EMC ODM package (Version 5.1.0.2 and hgher) from the EMC ftp server location: ftp://ftp.emc.com/pub/elab/aix/ODM_DEFINITIONS Once the EMC ODM package is downloaded and installed, the emcpowerreset utility can be found in the following directories: /usr/lpp/EMC/Symmetrix/bin /usr/lpp/EMC/CLARiiON/bin The binary is also available for download from Powerlink using the following process. http://powerlink.emc.com Click on Support -> Software Downloads -> PowerPath for UNIX -> EMCPowerReset Patch Version 2.0. For the Powerlink download there is 1 binary in the tar file called "emcpowerreset" as well as a README file with instructions. EMC Powerreset Version 2.0, supports PowerPath version 3.0.3 and higher. EMC Powerreset Version 1.2, supports PowerPath versions 3.0.3, 3.0.4. EMC has developed a binary called "emcpowerreset" for removing disk reservations, held by PowerPath devices, in the event that a node crashes. This binary is required for any HACMP installations on AIX 5.1, and higher when running PowerPath version 3.0.3 and higher. To determine the different emcpowerreset versions, run the following command: "cksum emcpowerreset" Version 1 = 1108394902 7867 emcpowerreset
The emcpowerreset binary takes as options two parameters, <adapter name> and <device name>. These two parameters are automatically passed to the binary whenever it is invoked within the HACMP script logic. AIX OPERATING SYSTEM REQUIREMENTS: ------------------------------------------------------------AIX 5.1, and higher * Not required for AIX version 4.3.3. POWERPATH REQUIREMENTS: --------------------------------------------PowerPath version 3.0.3 and higher. SYMMETRIX REQUIREMENTS: -------------------------------------------Symmetrix 3000, 5000, 8000, DMX800, DMX1000, DMX2000, DMX3000 CLARiiON REQUIREMENTS: ---------------------------------------CLARIION FC4700, FC4700-2, CX-Series DEVICE RESET BEHAVIOR: --------------------------------------The "emcpowerreset" binary will perform a device LUN reset depending on the Symmetrix microcode version. If the Symmetrix Array is running minimum 5x67.18.13S, then LUN reset is supported. Device LUN reset will be performed on CLARiiON arrays. INSTALLATION: ----------------------a) Create the directory below to contain the binary obtained from EMC. customers, running the EMC ODM package, should already have this directory. mkdir -p /usr/lpp/EMC/Symmetrix/bin or mkdir -p /usr/lpp/EMC/CLARiiON/bin b) Copy the emcpowerreset binary into the /usr/lpp/EMC/Symmetrix/bin or /usr/lpp/EMC/CLARiiON/bin subdirectory. c) Make sure the "emcpowerreset" binary is root executable. d) The steps used to configure the new "emcpowerreset" utility into HACMP will vary depending on the version of HACMP used. Follow the steps in the applicable section below to complete the installation process. I) For HACMP 5.1 and higher (SMIT): 1. Enter into the SMIT fastpath for HACMP "smitty hacmp". 2. Select Extended Configuration. 3. Select Extended Resource Configuration. 4. Select HACMP Extended Resources Configuration. 5. Select Configure Custom Disk Methods. 6. Select Add Custom Disk Methods. 7. The following fields are available, enter as follows: For IBM Fibre Channel configurations with PowerPath 3.0.3 and higher: NOTE: Symmetrix
Disk Type (PdDvLn field from CuDv) = disk/pseudo/power Method to identify ghost disks = SCSI3 Method to determine if a reserve is held = SCSI_TUR Method to break a reserve = /usr/lpp/EMC/Symmetrix/bin/emcpowerreset or /usr/lpp/EMC/CLARiiON/bin/emcpowerreset Break reserves in parallel = true Method to make the disk available = MKDEV For IBM SCSI configurations with PowerPath 3.0.3, 3.0.4: Disk Type (PdDvLn field from CuDv) = disk/pseudo/power Method to identify ghost disks = SCSI2 Method to determine if a reserve is held = SCSI_TUR Method to break a reserve = /usr/lpp/EMC/CLARiiON/bin/emcpowerreset Break reserves in parallel = true Method to make the disk available = MKDEV 8. Configure the same custom disk processing method on each node in the cluster and synchronize the cluster resources. The cluster verification process ensures that the method that you configured exists and is executable on all nodes. The synchronization process ensures that the ODM entries are the same on all nodes, but will not synchronize the methods named in the ODM entries. II) For HACMP 4.5 Classic and ES (SMIT): 1. Enter into the SMIT fastpath for HACMP "smitty hacmp". 2. Select Cluster Configuration. 3. Select Cluster Custom Modification. 4. Select Work with Custom Disk Methods. 5. Select Add Custom Disk Methods. 6. The following fields are available, enter as follows: For IBM Fibre Channel configurations with PowerPath 3.0.3 and higher: Disk Type (PdDvLn field from CuDv) = disk/pseudo/power Method to identify ghost disks = SCSI3 Method to determine if a reserve is held = SCSI_TUR Method to break a reserve = /usr/lpp/EMC/Symmetrix/bin/emcpowerreset or /usr/lpp/EMC/CLARiiON/bin/emcpowerreset Break reserves in parallel = true Method to make the disk available = MKDEV For IBM SCSI configurations with PowerPath 3.0.3, 3.0.4: Disk Type (PdDvLn field from CuDv) = disk/pseudo/power Method to identify ghost disks = SCSI2 Method to determine if a reserve is held = SCSI_TUR Method to break a reserve = /usr/lpp/EMC/CLARiiON/bin/emcpowerreset Break reserves in parallel = true Method to make the disk available = MKDEV
7. Configure the same custom disk processing method on each node in the cluster and synchronize the cluster resources. The cluster verification process ensures that the method that you configured exists and is executable on all nodes. The synchronization process ensures that the ODM entries are the same on all nodes, but will not synchronize the methods named in the ODM entries. III) For HACMP 4.4.1 Classic and ES (SMIT): 1. Enter into the SMIT fastpath for HACMP "smitty hacmp". 2. Select Cluster Configuration. 3. Select Cluster Custom Modification. 4. Select Define Custom Disk Methods. 5. Select Add Custom Disk Methods. 6. The following parameters options are available enter as follows: For IBM Fibre Channel configurations with PowerPath 3.0.3 and higher: Disk Type (PdDvLn field from CuDv) = disk/pseudo/power Method to identify ghost disks = SCSI3 Method to determine if a reserve is held = SCSI_TUR Method to break a reserve = /usr/lpp/EMC/Symmetrix/bin/emcpowerreset or /usr/lpp/EMC/CLARiiON/bin/emcpowerreset Break reserves in parallel = true Method to make the disk available = MKDEV For IBM SCSI configurations with PowerPath 3.0.3, 3.0.4: Disk Type (PdDvLn field from CuDv) = disk/pseudo/power Method to identify ghost disks = SCSI2 Method to determine if a reserve is held = SCSI_TUR Method to break a reserve = /usr/lpp/EMC/CLARiiON/bin/emcpowerreset Break reserves in parallel = true Method to make the disk available = MKDEV 7. Configure the same custom disk processing method on each node in the cluster and synchronize the cluster resources. The cluster verification process ensures that the method that you configured exists and is executable on all nodes. The synchronization process ensures that the ODM entries are the same on all nodes, but will not synchronize the methods named in the ODM entries. IV) For all HACMP included versions (Command Line): For IBM Fibre Channel configurations: 1. Enter the following at the AIX command prompt on each node in the configuration; /usr/sbin/cluster/utilities/clcustomdisk or /usr/sbin/cluster/utilities/clcustomdisk For IBM SCSI configurations: 2. Enter the following at the AIX command prompt on each node in the configuration; /usr/sbin/cluster/utilities/clcustomdisk -c -tdisk/pseudo/power -Ndisk/pseudo/power -gSCSI2 -hSCSI_TUR -b/usr/lpp/Symmetrix/bin/emcpowerreset -ptrue -mMKDEV -c -tdisk/pseudo/power -Ndisk/pseudo/power -gSCSI3 -hSCSI_TUR -b/usr/lpp/EMC/CLARiiON/bin/emcpowerreset -ptrue -mMKDEV -c -tdisk/pseudo/power -Ndisk/pseudo/power -gSCSI3 -hSCSI_TUR -b/usr/lpp/EMC/Symmetrix/bin/emcpowerreset -pfalse -mMKDEV
3. Synchronize the cluster resources. The cluster verification process ensures that the method that you configured exists and is executable on all nodes. The synchronization process ensures that the ODM entries are the same on all nodes, but will not synchronize the methods named in the ODM entries. The HACMPdisktype ODM object class will get updated with the following attributes: For IBM Fibre Channel configurations: HACMPdisktype: PdDvLn = "disk/pseudo/power" ghostdisks = "SCSI3" checkres = "SCSI_TUR" breakres = "/usr/lpp/EMC/Symmetrix/bin/emcpowerreset" parallel = "true" makedev = "MKDEV" reserved1 = "" reserved2 = "" reserved3 = "" or PdDvLn = "disk/pseudo/power" ghostdisks = "SCSI3" checkres = "SCSI_TUR" breakres = "/usr/lpp/EMC/CLARiiON/bin/emcpowerreset" parallel = "true" makedev = "MKDEV" reserved1 = "" reserved2 = "" reserved3 = "" For IBM SCSI configurations: HACMPdisktype: PdDvLn = "disk/pseudo/power" ghostdisks = "SCSI2" checkres = "SCSI_TUR" breakres = "/usr/lpp/EMC/Symmetrix/bin/emcpowerreset" parallel = "true" makedev = "MKDEV" reserved1 = "" reserved2 = "" reserved3 = ""
V) For HACMP 5.4: Starting with HACMP 5.4 the IBM HACMP native cl_fscsilunreset utility correctly clears disk reservations for EMC devices and the installation of emcpowerreset, set_scsi_id, and cfgscsi_id
utilities are no longer required, but are still highly recommended by both EMC and IBM.
If you
choose to use the recommended emcpowereset utility on HACMP 5.4, follow the installation steps provided for HACMP 5.3 to implement (for both Symmetrix and CLARiiON array) emcpowerreset and (for CLARiiON arrays) set_scsi_id. We highly recommend using emcpowerreset, set_scsi_id, and cfgscsi_id with HACMP 5.4 to reduce longer cluster startup and failover times (see Primus ID emc156011) and to avoid running into a rare race condition that could affect cluster failovers that require disk reservations to be broken. If you upgrade to HACMP 5.4 and choose to use the native lun reset usitlity for simplicity of management and faster fail-over times are not an issue follow the steps below to remove emcpowerreset: TO REMOVE EMCPOWERRESET via smitty ==================================== 1. Enter into the SMIT fastpath for HACMP "smitty hacmp". 2. Select Extended Configuration. 3. Select Extended Resource Configuration. 4. Select HACMP Extended Resources Configuration. 5. Select Configure Custom Disk Methods. 6. Select Remove Custom Disk Methods. 7. Select disk/pseudo/power TO REMOVE EMCPOWERRESET via command line ======================================== 1. Check to see if emcpowerreset is installed. Go to "/etc/objrepos" and enter the command "odmget HACMPdisktype". HACMPdisktype: PdDvLn = "disk/pseudo/power" ghostdisks = "SCSI3" checkres = "SCSI_TUR" breakres = "/usr/lpp/EMC/Symmetrix/bin/emcpowerreset" parallel = "true" makedev = "MKDEV" reserved1 = "" reserved2 = "" reserved3 = "" 2. Use the following command line to remove emcpowerreset utility: /usr/sbin/cluster/utilities/clcustomdisk -r -t'disk/pseudo/power' 3. Verify emcpowerreset utility was removed using command odmget HACMPdisktype. TO REMOVE SET_SCSI_ID / CFGSCSI_ID custom hacmp event: ==================================================== 1. Verify your custom set_scsi_id / cfgscsi_id cluster event was added using command odmget HACMPcustom. See example output below: HACMPcustom: If installed correctly you should see "emcpowerreset" called on the
name = "set_scsi_id" type = "event" description = "Set correct scsi id on EMC CLARiiON pseudo devices." value = "/etc/set_scsi_id" relation = "" status = 0 2. Remove the set_scsi_id / cfgscsi_id custom cluster event from your configuration using command below: If using set_scsi_id use command /usr/es/sbin/cluster/utilities/clrmcustom -t event -n'set_scsi_id If using set_scsi_id.sh use command /usr/es/sbin/cluster/utilities/clrmcustom -t event -n'set_scsi_id.sh If using cfgscsi_id use command /usr/es/sbin/cluster/utilities/clrmcustom -t event -n'cfgscsi_id 3. Verify your custom cluster event was removed using command odmget HACMPcustom TO REMOVE pre-defined node_up / node_down pre-event command: =========================================================== 1. Verify the node_up pre-defined HACMP event was properly modified to run pre-event set_scsi_id / cfgscsi_id command using command odmget -q name=node_up HACMPevent. See example output below. HACMPevent: name = "node_up" desc = "Script run when a node is attempting to join the cluster." setno = 101 msgno = 7 catalog = "events.cat" cmd = "/usr/es/sbin/cluster/events/node_up" notify = "" pre = "set_scsi_id" post = "" recv = "" count = 0 event_duration = 0 2. Verify the node_down pre-defined HACMP events was properly modified to run pre-event set_scsi_id / cfgscsi_id command using command odmget -q name=node_down HACMPevent. See example output below. HACMPevent: name = "node_down" desc = "Script run when a node is attempting to leave the cluster." setno = 101 msgno = 8 catalog = "events.cat" cmd = "/usr/es/sbin/cluster/events/node_down" notify = "" pre = "set_scsi_id"
post = "" recv = "" count = 0 event_duration = 0 3. Modify the node_up pre-defined HACMP event to remove set_scsi_id / cfgscsi_id command using the following command: /usr/es/sbin/cluster/utilities/clchevent -O'node_up' -s'/usr/es/sbin/cluster/events/node_up' -c '0' 4. Modify the node_down pre-defined HACMP event to remove set_scsi_id / cfgscsi_id command using the following command: /usr/es/sbin/cluster/utilities/clchevent -O'node_down' -s'/usr/es/sbin/cluster/events/node_down' -c '0' 5. Verify the node_up pre-defined HACMP events was properly modified to remove pre-event set_scsi_id / cfgscsi_id command using command odmget -q name=node_up HACMPevent. See example output below. HACMPevent: name = "node_up" desc = "Script run when a node is attempting to join the cluster." setno = 101 msgno = 7 catalog = "events.cat" cmd = "/usr/es/sbin/cluster/events/node_up" notify = "" pre = "" post = "" recv = "" count = 0 event_duration = 0 6. Verify the node_down pre-defined HACMP events was properly modified to remove pre-event set_scsi_id / cfgscsi_id command using command odmget -q name=node_down HACMPevent. See example output below. HACMPevent: name = "node_down" desc = "Script run when a node is attempting to leave the cluster." setno = 101 msgno = 8 catalog = "events.cat" cmd = "/usr/es/sbin/cluster/events/node_down" notify = "" pre = "" post = "" recv = "" count = 0 event_duration = 0
Notes: PRIOR TO IMPLEMENTING THE FIX,, THE FOLLOWING NOTES APPLY: Note 1: The fix below applies only to AIX HACMP with AIX 5.1 and higher and PowerPath 3.0.3 and higher. Note 2: If this is a PSSP/VSD environment (version 3.4 and 3.5) with AIX 5.1 and higher and PowerPath 3.0.3 and higher do not use this emcpowerreset utility. Instead, this problem is fixed by upgrading to EMC AIX ODM file set 5.1 and higher and upgrading the IBM VSD software with APARs IY45767 for PSSP/VSD 3.4 or IY45770 for PSSP/VSD 3.5 or higher. Note 3: If the customer has upgraded an existing HACMP environment from AIX 4.3.x , PP 3.0.2 and below, or a non PowerPath environment, they will need to modify their disktype.lst file to remove any lines added to enable LUN reset. See solution emc27897 for more information. Note 4: If PP 4.2 and higher is installed a new (Rev 2) version of the emcpowerreset binary is available. The new version is backward compatible with PP 3.x. Note 5: SCSI attach is no longer supported with PP 4.2 and higher. Note 6: All Symmetrix arrays are assumed to be at Enginuity levels 5267.26.18s / 5567.33.18s or 5268.05.06S / 5568.05.05S and higher. These levels are required to support LUN reset. Note 7: In the examples below, it is assumed that the emcpowerreset binary has been copied to the directory /usr/lpp/EMC/Symmetrix/bin for Symmetrix attach and /usr/lpp/EMC/CLARiiON /bin for CLARiiON attach. If this is not the case, simply change the path to reflect the correct location. Note 8: If implementing HACMP on CLARiiON LUNs with PowerPath either the "set_scsi_id" script or the "cfgscsi_id" executable is required. does not apply to HACMP 5.4. Note 9: Because emcpowerreset can take aproximately 20 seconds per device to break reservations, EMC recommends setting the break reserves in parallel to true to improve fail-over times. See emc122557 for additional details. Notes: To help insure emcpowerreset is installed and configured correctly, go to "/etc/objrepos" and enter the command "odmget HACMPdisktype". If installed correctly you should see "emcpowerreset" called on the "breakres" line. See example below: PdDvLn = "disk/pseudo/power" ghostdisks = "SCSI3" checkres = "SCSI_TUR" breakres = "/usr/lpp/EMC/Symmetrix/bin/emcpowerreset" parallel = "true" makedev = "MKDEV" reserved1 = "" reserved2 = "" reserved3 = "" In addition, after a failover, you can look at the "hacmp.out" file and confirm emcpowerreset appears for the FC Path and/or hdisk entry. See example below: HACMPdisktype: Please refer to solution EMC97234 for details on the "set_scsi_id" script and Primus Solution emc143075 for details on the "cfgscsi_id" executable. This
Clar_Cascade_1:cl_disk_available[13] ckres_rc=2 Clar_Cascade_1:cl_disk_available[13] [[ 2 != 0 ]] Clar_Cascade_1:cl_disk_available[29] /usr/lpp/EMC/CLARiiON/bin/emcpowerreset fscsi1 hdiskpower38 Clar_Cascade_1:cl_disk_available[1228] echo Clar_Cascade_1,Clar_Cascade_1 Clar_Cascade_1:cl_disk_available[1228] read vg
2.3
ID: emc156011 Content: cl_fscsilunreset routine reports error 'Invalid argument' during the start of HACMP Environment: OS: IBM AIX 5.x Environment: Application SW: HACMP 5.4 Environment: EMC SW: PowerPath 4.5.2 Problem: cl_fscsilunreset routine reports "Invalid argument" error like the following during the start of HACMP: cl_fscsilunreset[228]: get_sid_lun(hdiskpowercl_fscsilunreset[939]: ioctl SCIOLSTART id=0X10BCD lun=0X18000000000000: Invalid argument cl_fscsilunreset[939]: ioctl SCIOLSTART id=0X10BCD lun=0X1A000000000000: Invalid argument cl_fscsilunreset[939]: ioctl SCIOLSTART id=0X10BCD lun=0X1C000000000000: Invalid argument cl_fscsilunreset[939]: ioctl SCIOLSTART id=0X10BCD lun=0X1E000000000000: Invalid argument cl_fscsilunreset[939]: ioctl SCIOLSTART id=0X10BCD lun=0X14000000000000: Invalid argument cl_fscsilunreset[939]: ioctl SCIOLSTART id=0X10BCD lun=0X17000000000000: Invalid argument cl_fscsilunreset[939]: ioctl SCIOLSTART id=0X10BCD lun=0XF000000000000: Invalid argument Root Cause: In previous versions of HACMP this would indicate that a problem of the takeover node not always clearing the SCSI device reservation held by the primary node. If the words "invalid argument" appear in the hacmp.out file in response to an SCIOLSTART, you may have encountered this issue. The EMC PowerPath "emcpowerreset" utility was developed to fix the issue.
Starting with HACMP 5.4 the IBM HACMP native cl_fscsilunreset utility correctly clears disk reservations for EMC devices and the installation of emcpowerreset, set_scsi_id, and cfgscsi_id utilities are no longer required, but are still highly recommended by both EMC and IBM. Fix: EMC highly recommends using emcpowerreset, set_scsi_id, and cfgscsi_id with HACMP 5.4 to improve cluster startup and failover times and to avoid running into a rare race condition that could affect cluster failovers that require disk reservations to be broken. See solution EMC69100 for details on implementing this fix.
Notes: IBM reported that cl_fscsilunreset utility does successfully clear the reserves but reports these log notifications. Explanation: When HACMP uses the standard method to break disk reservations it opens the device, tries to logically connect to it to issue the SCIOLSTART command, then tests to see if there is in fact a reserve. If a reserve exists HACMP tries to break it by first issuing a lun reset. If this fails HACMP will then issue a target reset. Finally, if neither method works HACMP forcefully opens the device using openx. In these listed steps any failures that may happen are logged using perror. When custom disk methods are used instead we specifically tell HACMP what method to follow for a particular disk since different methods may need to be followed for scsi2 and scsi3 devices. The source of these specific warnings is the failure of the ioctl (fd, SCIOLSTART, buff). It fails saying invalid arguments because none of the arguments appear to be invalid even though ultimately the reservation does get broken. In the tests that have been conducted, analysis shows that when LVM is in picture, i.e. vg is active, it has already issued an SCIOLSTART. This is irrespective of multi-pathing software. In addition any other process issuing the SCIOLSTART call will also result in the failure of that call, this is expected. On most storage if reserves are broken this way using cl_disk_available and LVM is in use, this failure will occur and the Invalid Argument message will be posted to the hacmp.out log. [i.e. the usual method] Messages in the hacmp.out file are really intended for use by IBM, and not for general customer information. (E.g., they are not NLS translated). HACMP identifies OEM volume groups and filesystems and performs all major functions with them however, some of the HACMP functions have limitations for OEM volume groups and filesystems: Providing the requirements set out in Primus EMC69100 for HACMP 5.4 are met these messages can be ignored. To eliminate the messages, the recommendation is to install the "emcpowerreset" tool.
2.4
Answer: 1. If using dd to test that the filesystem/Logical volume IO speed is normal. But the problem occurs when doing database application. The problem should be database. No EMC related. 2. When using two CLARiiON/Symmetrix to do AIX LV Mirror (e.g. one disk from CX-1 and one disk from CX-2 to make a VG) , when one disk is missing (e.g. one CX down or trespassing) , the OS will holding the IO for 20s (powerpath not installed) or 70s (using hdiskpower to make VG). In this situation, disks will show 100% busy and theres no I/O.
2.5
hdiskpower , BUSY
Answer: Sequence for remove devices should be: hdiskpower# -> hdisk# -> fscsi# -> fcs# (at AIX5.2+PP4.5.2 environment I met many cases that hdisk# was deleted before hdiskpower#, and this caused related hdiskpower# and hdisk# to a busy state, rmdev command can't remove those devices and reboot could not solve the problem. Method 1: Reboot the host without the fibre connection, then use command rmdev -Rdl fscsi0to delete all hdiskpower & hdisk. Method 2: Online delete PowerPath related definition in ODM repository. Vary off volume groups on SAN (if any): for i in $(lsdev -Cc disk | grep hdiskpower | awk '{print $1}') do odmdelete -q name=$i -o CuAt odmdelete -q name=$i -o CuDv odmdelete -q value3=$i -o CuDvDr done odmdelete -q name=powerpath0 -o CuDv odmdelete -q name=powerpath0 -o CuAt rm /dev/powerpath0
2.6
Emc143075, set_scsi_id has been replaced by cfgscsi_id with CLARiiON Luns in HACMP/PP environment
ID: emc143075
Goal
New " cfgscsi_id " executable is now required with CLARiiON LUNs in an HACMP / PP environment.
Goal
How to set up HACMP on AIX with CLARiiON LUNs under PowerPath control.
Goal
New " cfgscsi_id " executable significantly improves node up and HACMP failover speed with CLARiiON attached.
Fact
Fact
Fact
Fact
Fact
Symptom
HACMP Cluster failover with many CLARiiON hdisks take a very long time.
Symptom
HACMP Node_up and Node_down utility with many CLARiiON hdisks take a very long time to run.
Cause
Prior to the "cfgscsi_id" executable the script "set_scsi_id" was used to set the correct 'Active' FA path (scsi ID) for each CLARiiON LUN. When "set_scsi_id" ran on an hdisk that was configured to the 'Passive' path it had to timeout and fail before going on to configure the next hdisk. When many hdisks were configured this process was noticeably long (~14 minutes for 125 CLARiiON devices configured down 4 FA paths (two 'Active' and two 'Passive')).
Note
Due to the significantly improved HACMP Failover times, EMC recommends all Customers replace the
"set_scsi_id" script with the new "cfgscsi_id" executable. The "set_scsi_id" script will continue to be supported for Customers who are currently using this script and are content with the failover time. Solution emc97234 for detail on implementing the "set_scsi_id" script. See Primus
Fix
A new executable called "cfgscsi_id" has been created which uses a different technique to set the correct FA path (scsi ID) for each CLARiiON LUN. This new executable is noticeably faster (~4 seconds for the same 125 CLARiiON devices configured down 4 FA paths (two 'Active' and two 'Passive')).
The "cfgscsi_id" executable (cksum 1575008215 13976) will ship with PP 5.0 and higher. Until PP 5.0 is available Customers can download the utility via EMC's PowerLink web site under Resources/Tools > CS Support > Downloads and Patches > Downloads D-R > PowerPath for UNIX. Note that when PP 5.0 is installed "cfgscsi_id" will be placed in the directory /usr/sbin.
After going to Powerlink to download "cfgscsi_id", put it into /usr/sbin on all nodes in the cluster, and make it executable. Use the procedure below, on every node in the cluster, to add the "cfgscsi_id" utility into the HACMP environment.
1. Check for and remove any previous "set_scsi_id" or "set_scsi_id.sh" cluster events that may have been used. Do this by running "odmget HACMPcustom" and note any events named "set_scsi_id" or "set_scsi_id.sh". If one is found run the command below to remove it and then re-run odmget command to confirm all are gone.
2. Add the custom cluster event to your HACMP configuration. This event is the name given to the script that will be added later to select pre-defined HACMP events. Note that this is one long command with no carriage returns.
/usr/es/sbin/cluster/utilities/claddcustom -t event -n'cfgscsi_id' -I'Set correct scsi id on EMC CLARiiON pseudo devices.' -v'/usr/sbin/cfgscsi_id'
3. Verify your custom cluster event was added correctly by running the command below and comparing the output.
odmget HACMPcustom
HACMPcustom: name = "cfgscsi_id" type = "event" description = "Set correct scsi id on EMC CLARiiON pseudo devices." value = "/usr/sbin/cfgscsi_id" relation = "" status = 0
4. Run the two commands below to modify the pre-defined HACMP event by giving the event command your custom cluster event as a pre-event command.
5. Verify the events were properly modified by running the two commands below and comparing the output.
HACMPevent: name = "node_up" desc = "Script run when a node is attempting to join the cluster." setno = 101 msgno = 7 catalog = "events.cat" cmd = "/usr/es/sbin/cluster/events/node_up" notify = ""
HACMPevent: name = "node_down" desc = "Script run when a node is attempting to leave the cluster." setno = 101 msgno = 8 catalog = "events.cat" cmd = "/usr/es/sbin/cluster/events/node_down" notify = "" pre = "cfgscsi_id" post = "" recv = "" count = 0 event_duration = 0
6. Once the "cfgscsi_id" utility has been installed and configured on every node in the cluster, synchronize the cluster and ensure there are no errors resulting from the addition of the pre-event scripts.
Notes: The modified script must reside in the same location on all nodes in the cluster. The script must be made executable on all nodes. Synchronization for the cluster will fail if it is not.
2.7
AIXClariionLUNhdiskpower 1AIXcfgmgrHBA fcs0fcs1 2LUNClariionRAID GroupLUN sheet1 3PowerPathRAID GroupLUN ID storage processorLUN SPRAID GroupSPLUN RAID GroupLUN ID
LUN
LUN
2.8
AIX chdev command cause disk operation Passive(standby) Path --- Author Danny Li
error
on
Clariion
Schneider, GregW (ENG Hopkinton) Friday, August 11, 2006 11:00 PM Richard, Jeffrey; Li, Danny; TS Worldwide RE: AIX chdev command cause disk operation error on Clariion Passive(standby) Path
I want to expand on Jeff's response and all support personnel should pay particular attention to this: If you find yourself telling a customer to ignore hdisk errors and you are not also explaining to them how to decode the SCSI ASC and ASCQ, scsi opcode, and SCSI status codes, then please stop and consult Tech support individuals who are trained in this area. To do so could lead customers to ignore critical errors that could result in loss of data.
_____________________________________________ From: Sent: To: Subject: Richard, Jeffrey Friday, August 11, 2006 10:50 AM Li, Danny; TS Worldwide RE: AIX chdev command cause disk operation error on Clariion Passive(standby) Path
Danny, Just a follow up to your observations, on this error,,, A SC_DISK_ERR2 in AIX with the 0403 error code in most cases can be ignored. In this case since
the opt code on the top line of the sense bytes was a 00 then it can be ignored. If it is a 12 it can also be ignored, but if it is a 0a,08,2800,2a00 then it can't be ignored. The point I'm trying to make is that just because it is a 0403 error unless you look at the complete sense bytes, you can't assume that this error is benign, and just ignore it.. If you want to respond back, please just respond back to me. Thank you, Jeff Richard
_____________________________________________ From: Sent: To: Subject: Li, Danny Thursday, August 10, 2006 8:16 PM TS Worldwide AIX chdev command cause disk operation error on Clariion Passive(standby) Path
Dear all, FYI.. Env: AIX 5.3, Powerpath4.5.2, Clarrion CX700 You may get many disk operation errors on AIX when using Clariion and powerpath, in below error log sense data "0403" means device not ready and powermt output show hdisk57 is passive path for LUN221, obviously it should be not ready for access. When operating on hdiskpower44, hdisk57 or hdisk177 shouldn't be accessed directly. Currently I find "chdev" and "mkvg" could cause this kind of errors if: 1. when issue mkvg command and there is no PVID for the hdiskpower device e.g.: #mkvg -B -s 64 -y testvg hdiskpowerxx ... 2. when issue chdev -l hdiskpowerxx -a pv=yes 3. when issue chdev -l hdiskpowerxx -a pv=clear If the it is the first time you use those hdiskpower device to create a volume group, as no PVID had been Assigned, you will get this kind of errors. It is strange and not correct as we operate on hdiskpower device, passive path should be screened by powerpath. If you meet same errors, let your customer know it is no harm to application. This kind of error also are also found in old version of AIX and powerpath environment. I will open a case On this problem.
root@szxcm102-in#powermt display dev=hdiskpower44 Pseudo name=hdiskpower44 CLARiiON ID=CK200062300348 [SG_HA_cm101_cm102_in] Logical device ID=60060160E6111900DF8DD893D916DB11 [LUN 221] state=alive; policy=CLAROpt; priority=0; queued-IOs=0 Owner: default=SP B, current=SP B ============================================================================== ---------------- Host --------------- - Stor - -- I/O Path - -- Stats --### HW Path I/O Paths Interf. Mode State Q-IOs Errors
============================================================================== 3 fscsi3 3 fscsi3 2 fscsi2 2 fscsi2 hdisk137 SP B1 hdisk177 SP A3 hdisk57 SP A1 hdisk97 SP B3 active alive active alive active alive active alive 0 0 0 0 0 0 0 0
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION B6267342 0809104306 P H hdisk53 B6267342 0809104306 P H hdisk173 B6267342 0809104106 P H hdisk88 B6267342 0809104106 P H hdisk128 B6267342 0809104106 P H hdisk54 B6267342 0809104106 P H hdisk174 B6267342 0809104106 P H hdisk57 B6267342 0809104006 P H hdisk177 B6267342 0809104006 P H hdisk62 B6267342 0809104006 P H hdisk182 DISK OPERATION ERROR DISK OPERATION ERROR DISK OPERATION ERROR DISK OPERATION ERROR DISK OPERATION ERROR DISK OPERATION ERROR DISK OPERATION ERROR DISK OPERATION ERROR DISK OPERATION ERROR DISK OPERATION ERROR
Date/Time:
Sequence Number: 446 Machine Id: Node Id: Class: Type: 00CB15BC4C00 szxcm101-in H PERM
Resource Name: hdisk57 Resource Class: disk Resource Type: CLAR_FC_raid5 Location: VPD: Manufacturer................DGC U5791.001.99B09XX-P2-C04-T1-W5006016130603E42-L12000000000000
Machine Type and Model......RAID 5 ROS Level and ID............0219 Serial Number...............CK200062300348 Device Specific.(SI)........CX700 Device Specific.(PQ)........00 Device Specific.(VS)........BF0000B8DFCL Device Specific.(UI)........60060160E61119001F47588BF70FDB11 Device Specific.(FL)........00BF Device Specific.(Z0)........10 Device Specific.(Z1)........10
Detail Data PATH ID 0 SENSE DATA 0600 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0102 0000 7000 0200 0000 000A 0000 0000 0403 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0012 001A ---------------------------------------------------------------------------
2.9
Queue depthOSSCSI driverqueueIO IOPS CX700Storage ProcessorIO2048 IO40msSP1000/40x2048=51200IOPS CX700LUNIOSPLUN queue SPIOLUNqueue depth16LUN IOPS1000/40x16=4004LUNstrippingIOPS1600 40msIOIO20msIOPS DMX3000 IOPS26000LUN queue depth1632SAN
2.10 AIX JFS2 mount option "cio" cause very low performance for cp and dd command--- Author Danny Li
When customer use "cio" option(this option is supported after AIX5.3) to mount their JFS2 filesystem, they will experience low performance for cp and dd command, below result is using internal ultra-320 scsi disk: root# time dd if=/dev/zero of=/testfs/test1.out bs=32K count=65536 65536+0 records in. 65536+0 records out. real user sys 4m26.37s 0m0.28s 0m4.58s
32K x 65536 / 266.37 = 7873.1 KB/s (stripping on 4 DMX3 LUN get 20MB/s result) root# time cp test1.out test2.out real user sys 35m52.45s 0m1.34s 0m28.17s
32K x 65536 / 2152.45 = 974.3 KB/s (stripping on 4 DMX3 LUN get 8MB/s result) Whie not use "cio" option on same disk, will get the result: root# time dd if=/dev/zero of=/testfs/test3.out bs=32K count=65536 65536+0 records in. 65536+0 records out. real user 0m17.17s 0m0.21s
sys
0m8.50s
32K x 65536 / 17.17 = 122140.5 KB/s (stripping on 4 DMX3 LUN get 285MB/s result)
root# time cp test1.out test2.out real user sys 0m49.83s 0m0.31s 0m11.59s (stripping on 4 DMX3 LUN get 166MB/s result)
"cio" option equals to "dio"+"no inode lock", that is, when use this option to mount a filesystem, cache for this FS will not be used and OS doesn't provide inode locking function. Although cp and dd command get low performance, it doesn't mean same problem will happen on database application that support "cio".
TimeFinder-Snap v1.1.rar
2.15 AIX433
Notice though we have the solution for AIX 4.3.3 connection which is not in our support matrix anymore, delivery PROCESS still need to be followed. RPQ is a must before the implementation like this. Or we won't get corp. engineers' support in case problems occur.
3 PowerPath
3.1
The above command would have to be repeated for all present fscsi adapters in the system. The Fast Fail feature is now enabled for the fscsi0 adapter. When the adapter driver receives an event from the switch that there has been a link event (RSCN), the driver invokes the Fast Fail logic. Fast Fail functionality is desirable when multipathing software (such as EMC PowerPath) is installed. Setting the fc_err_recov attribute to fast_fail should decrease the failure times due to a link failures. In a single direct-connect path environment, it is recommended that the Fast Fail feature be disabled.
Requirements
The requirements for Fast I/O Failure support are:
Fast
6227 adapter firmware level 3.22A1 or greater. 6228 adapter firmware level 3.82a1 or greater. 6239 adapter firmware level 1.00X5 or greater. 5716 adapter firmware level 1.90A4 or greater.Channel 1977 adapter firmware level 1.90A4 or greater.
Requirements
6227 adapter firmware level 3.22A1 or greater. 6228 adapter firmware level 3.82a1 or greater. 6239 adapter firmware level 1.00X5 or greater. 5716 adapter firmware level 1.90A4 or greater. 1977 adapter firmware level 1.90A4 or greater.hannel device World Wide Name (Port Name) and Node Name must
The
remain constant, and the World Wide Name must be unique. Changing the World Wide Name or Node Name of an available or online device could result in I/O failures.
Each
included in EMC AIX ODM package 5.1.0.0 and later. Updated filesets that contain the sn_location attribute (discussed in the list item) should also be updated to contain the
world_wide_name and node_name.
The
unique serial number for each LUN. The AIX Fibre Channel device drivers will not autodetect serial number location, so the method for serial number extraction must be explicitly provided by any storage vendor in order to support dynamic tracking for their devices. This information is conveyed to the drivers using the
sn_location ODM attribute for each storage device. If the disk
or tape driver detects that the 'sn_location' ODM attribute is missing, an error log of type INFO will be generated and dynamic tracking will not be enabled.
Note: The sn_location attribute may be non-displayable, so running the 'lsattr' command on an hdisk, for example, may not show the attribute, but it may, indeed, be present in ODM.
The
a SAN fabric (a fabric as seen from a single host bus adapter) if the N_Port IDs on the fabric stabilize within about 15 seconds. If cables are not reseated or N_Port IDs continue to change after the initial 15 seconds, I/O failures could result.
Devices
only track if they remain visible from the same HBA that they were originally connected to. For example, if device A were moved from one location to another on fabric A attached to host bus adapter A (i.e., its N_Port on fabric A changes), the device would seamlessly be tracked without any user intervention and I/O to this device can
continue.annel However, if a device A is visible from HBA A, but not from HBA B, and device A is moved from the fabric attached to HBA A to the fabric attached to HBA B, device A will not be accessible on fabric A nor on fabric B. User intervention would be required to make it available on fabric B by invoking cfgmgr. The AIX device instance on fabric A would no longer be usable, and a new device instance on fabric B would be created. This device would have to manually be added to volume groups, multipath device instances, etc. In essence, this is the same as removing a device from fabric A and adding a new device to fabric B.
No
devices while an AIX system dump is in progress. In addition, dynamic tracking is not supported during boot or during cfgmgr invocations. SAN changes should not be made while any of these operations are in progress.
Once
information as SCSI IDs in ODM will no longer reflect actual SCSI IDs on the SAN. ODM will remain in this state until cfgmgr is run manually or the system is rebooted, provided all drivers, including any third-party Fibre Channel SCSI target drivers, are dynamic-tracking capable. If cfgmgr is run manually, cfgmgr must be invoked on all affected fscsi devices, which can easily be accomplished by running cfgmgr without any options, or by invoking cfgmgr on each fscsi device individually.
Note: Running cfgmgr at run time to recalibrate the SCSI IDs may not update the SCSI ID in ODM for a storage device if the storage device is currently opened, such as when volume groups are varied on. cfgmgr would need to be run on devices that are not opened or the system should be rebooted to recalibrate the SCSI IDs. Note that stale SCSI IDs in ODM have no adverse affect on the FC drivers and recalibration of SCSI IDs in ODM is not necessary for the FC drivers to function properly. Any applications that communicate with the adapter driver directly via ioctl calls and use the SCSI ID values from ODM, however, need to be updated as indicated in the next bullet to avoid using potentially stale SCSI IDs.
Even
encouraged to make SAN changes, such as cable moves/swaps and establishing ISL links, during maintenance windows. Making SAN changes during full production runs is discouraged. This is due to the fact that there is a short interval of time to perform any SAN changes. Cables that are not reseated correctly, for example,Channel could result in I/O failures. Performing these operations during a time of little/no traffic minimizes impact of I/O failures due to
3.2
emc145919: PowerPath 4.5.X: AIX LVM I/O errors due to missing device reservations."
Goal ETA emc145919: PowerPath 4.5.X: AIX LVM I/O errors due to missing device reservations.
Note
If you don't have a shared disk environment (2 AIX systems sharing the same disk) then this Primus
Fact
Fact
Fact
Product: Symmetrix
Fact
Product: CLARiiON
Fact
Symmetrix device lost reservation due to Target (Bus) Reset being received on FA port.
Fact
CLARiiON device lost reservation due to Target (Bus) Reset being received on SP port.
Symptom
Symptom
EMC CLI command "symdev list -resv" displays no reserves on devices that should have them.
Symptom
In a shared disk environment with Volume Group <test_vg> varied ON on node A, the customer is able to access devices in <test_vg> (for example, run lqueryvg <test_vg>) from node B.
Symptom
After a Target (Bus) Reset the SCSI-2 reservations are lost on all devices visible to the host
initiator / target path where the Target (Bus) Reset arrives, and are not restored on the next I/O performed to the device.
[NOT] SymptomThis issue does not affect environments that use SCSI-3 Persistent Group Reserves such as Symantec VCS.
[NOT] SymptomThis issue does not affect environments that do not use SCSI Reserves such as HACMP Concurrent Access or GPFS.
Cause
By design, a Target (Bus) Reset will clear the SCSI-2 reservation on all devices seen on the
path. PowerPath includes code to re-establish these reservations on the next I/O to each device. This code is not working correctly in PowerPath versions 4.5.1, 4.5.2, and 4.5.3.
Applications should work correctly without device reservations. However, it is not recommended because, in a shared disk environment, devices without reservations are exposed to utilities that may try to access the device by placing a reservation.
Fix
PowerPath Engineering created the Hot Fixes below which correct this issue. Now, after a Target (Bus)
Reset, the device reservation is once again re-established on the next I/O to each device.
The fix is available for PP 4.5.3 in Hot Fix 1 #1022 The fix is available for PP 4.5.1 in Hot Fix 2 #1023
To obtain these fixes, contact EMC customer service and refer to this knowledgebase article.
Note: The fix for this issue is also included in PowerPath 5.0.
Note
Workarounds: This issue should only be of concern in environments in which multiple nodes share access to the same devices (for example, HACMP). In addition, there are three events that must occur for customers to be affected by this issue: 1) 2) PowerPath 4.5.x must be installed. A Target (Bus) Reset must occur. Two common ways to produce a Target (Bus) Reset are:
? = : 7 > Run "varyonvg -bu <test_vg>" on an AIX node that does not currently have <test_vg> varied ON. ? = : 7 > n some cases, broken hardware will cause a Target (Bus) Reset. 3) A utility must be run from another host that places a SCSI Reservation on a device long enough to cause the application that is supposed to own that device to fail, resulting in AIX LVM I/O errors. If AIX LVM I/O errors are seen due to loss of device reservation, and if the PowerPath fix cannot be applied, the steps below can be used as a temporary workaround. I
1.
Stop running any utilities that may be trying to access devices that are reserved by another node. One known utility is EMC's inq utility built on SIL version 6.2.x. Refer to knowledgebase solution emc148231 for additional information.
2.
Varyoff and varyon all VGs. This process will re-establish all the device reservations. Remove the condition that is causing the Target (Bus) Resets. If it is broken hardware, fix it. If it is due to the "varyonvg -bu" command, only run the "varyonvg -bu <test_vg>" command from the AIX node that has '<test_vg>' active. Refer to knowledgebase solution emc127211 for more information about the "varyonvg -bu" command.
3.
3.3
3.4
EMC108127: Method Error /etc/methods/cfgpower -l powerpath0 0514-040 Error initializing a device into the kernel in AIX with minor number mismatch
ID: emc108127 Domain: EMC1 Solution Class: 3.X Compatibility Fact OS: IBM AIX 5.1
Fact Fact Fact Fact Fact Fact Fact Fact Fact Fact Fact Fact Fact
OS: IBM AIX 5.2 OS: IBM AIX 5.3 EMC SW: PowerPath 4.2 EMC SW: PowerPath 4.3 EMC SW: PowerPath 4.4 EMC SW: PowerPath 4.4.2 EMC SW: PowerPath 4.4.2.2 EMC SW: PowerPath 4.5.1 EMC SW: PowerPath 4.5.2 EMC SW: PowerPath Product: CLARiiON Product: Symmetrix 8000 Family Product: Symmetrix DMX Family After a clean PowerPath 4.2 installation, a Powermt config command is run and this
Symptom
error appears: Method Error /etc/methods/cfgpower -l powerpath0 0514-040 Error initializing a device into the kernel. Symptom Method error (/etc/methods/cfgpowerdisk -l hdiskpower228 ):
0514-043 Error getting or assigning a minor number on several devices. Symptom Method error (/etc/methods/cfgpower -l powerpath0 ):
0514-012 Cannot open a file or device. Symptom Unable to create a new volumegroup with hdiskpower device.
Error msg: "0516-1182 mkvg Open Failure on hdiskpower0" Symptom Symptom Upgrading to PowerPath 4.3 and PowerPath 4.4 shows the same errors. All of the hdiskpowers show up as defined,and the Powermt display dev=all may show
no hdiskpower devices configured Symptom name Symptom Symptom Symptom number:: lsdev -Cc disk shows hdiskpower devices as "available" Hdiskpower devices do not show in an INQ or SYMINQ ls_-Ralsi__dev. command shows that the minor does not match with the hdiskpower powermt display dev=all may show devices other than hdiskpowerX as pseudo device
4258 0 brw------- 1 root hdiskpower0 4392 0 brw------- 1 root hdiskpower1 4394 0 brw------- 1 root hdiskpower2
system
system system
Change Fix
The following will correct the out of order device minor numbers. This is a workaround. A possible theory is the order in which the devices are presented to the host on the original PowerPath installation. The following procedure will correct the problem:
1. Vary off volume groups on SAN (if any): for i in $(lsdev -Cc disk | grep hdiskpower | awk '{print $1}') do odmdelete -q name=$i -o CuAt odmdelete -q name=$i -o CuDv odmdelete -q value3=$i -o CuDvDr done 2. odmdelete -q name=powerpath0 -o CuDv odmdelete -q name=powerpath0 -o CuAt 3. rm /dev/powerpath0 4. Remove powerpath software 5. installp -u EMCpower cd /dev; rm hdiskpower*; rm rhdiskpower*
6. savebase -v 7. Reboot 8. Install PowerPath Note Note The problem did not appear when we backreved to PowerPath 3.0.4 or 3.0.5. Knowledgebase solution emc93300 should be tried first, if it is a cleanup of the ODM.
If this is a specific issue, such as the one documented here, then this KB solution should be applied.
Note
If the problem has already appeared, upgrading to PowerPath 4.5.x will not resolve the
issue. You must first follow the ODM cleanup steps that are documented in this solution. After this is done, you should not have the problem again on the host on which this was run.
The understanding is that, at this time, Engineering has not been able to fix this problem.
3.5
ID: emc111635 Domain: EMC1 Solution Class: 3.X Compatibility Goal Fact Fact Fact Fact How to remove LUNZ devices Product: CLARiiON EMC SW: Navisphere CLI EMC SW: Navisphere OS: IBM AIX Why do LUNZ devices appear on an AIX host even though storage is attached? Unable to see one LUN from an AIX server SC_DISK_ERR2 or SC_DISK_ERR4 LUNZ devices appear after running cfgmgr, but were not there previously.
Change
The Array LUN Unit (ALU) associated with the Host LUN Unit (HLU) 0 was removed from
the Storage Group. Change Cause CLARiiON LUNs attached to host. LUNZ devices are visible to a host, regardless of the operating system, when the
Arraycomm Path setting is enabled for an HBA initiator record and that initiator does not see a physical LUN with an address of 0. Prior to seeing physical storage (i.e. adding a host to a storage group), every zoned IO path to a Clariion should see a LUNZ. It's purpose is for initial Navisphere Agent communication and registration with the Clariion array. When a LUN is added to a storage group (by default the first one added is given HLU 0) the LUNZ is no longer necessary because the agent will then communicate through the physical storage. If the LUN with address 0 is removed then the LUNZ devices will once again become visible provided that Arraycomm Path is enabled. Fix
First verify you are experiencing the problem: From the Enterprise Storage?window in Navisphere Manager: 1. 2. Click the Hosts?tab. Right-click the host in question and select Properties.?o:p> 3. Click the Storage?tab.
Note: You should not see a HLU 0 in the Host LUN ID?column. Fix 1 ?Using Navisphere Manager: Run the following commands on the AIX host: 1. 2. lsdev -Cc disk | grep LUNZ to get a list of the hdisk devices that are LUNZ devices. rmdev -dl hdiskn for every LUNZ device, where n is the hdisk number of the LUNZ
Change Failover settings in Navisphere Manager: 3. Click Tools?=> Failover Setup Wizard?/SPAN> 4. Select the appropriate options in the
Wizard. ?/SPAN> ?/SPAN> Open ?/SPAN> FailoverMode = 1 Arraycomm Path = Disabled ?/SPAN> Unit Serial Number = Array Initiator Type = Clariion
5. 6.
cfgmgr lsdev -Cc disk | grep LUNZ and verify LUNZ devices are no longer visible.
Fix 2 ?Using the navicli command set: On the AIX host run the following: 1. 2. lsdev -Cc disk | grep LUNZ to get a list of the hdisk devices that are LUNZ devices. rmdev -dl hdiskn for every LUNZ device, where n is the hdisk number of the LUNZ 3. lsdev -Cc disk | grep LUNZ to verify LUNZ devices are now gone.
Fix 3 ?Creating a new LUN: On the AIX host: 1. 2. lsdev -Cc disk | grep LUNZ to get a list of the hdisk devices that are LUNZ devices. rmdev -dl hdiskn for every LUNZ device, where n is the hdisk number of the LUNZ
From Navisphere Manager: 3. 4. 5. 6. menu. 7. Apply and OK Bind a small LUN. Add the LUN to the Storage Group in question. Before applying the configuration scroll the "Selected LUNs" field to the right. Change the "Host ID" field to 0 by clicking the blank space and choosing from the drop down
On the AIX host: 8. 9. Run cfgmgr on the host. Run lsdev -Cc disk | grep LUNZ to verify LUNZ devices are now gone
Note
If the above procedures fail to rid the AIX host of LUNZ devices, verify you are not zoned to
3.6
MPIO can not be uninstalled from OS, if we want to use PowerPath and dont want to see MPIO devices, we can go through following steps:
1. Shutdown all applications and make sure nothing is holding on to the external storage devices. (Safer to unmount all filesystems after stopping applications) 2. Make sure you have the correct ODM fileset installed. 3. If the correct version of ODM is already installed, make sure to remove entries relating to MPIO if present in the package. Use the command lslpp -L all | grep EMC to check what ODM filesets are installed. 4. If ODM is not yet installed, go ahead and install it but remember NOT to install any MPIO related fileset. NOTE: Make sure to include EMC.CLARiiON.ha.rte (CLARiiON support for HCMP clusters) or EMC.Symmetrix.ha.rte (Symmetrix support for HCMP clusters) as applicable 5. Use lsdev -Cc disk to list the (disk) devices and use rmdev to remove any device owned by MPIO from previous installation.
3.7
EMC116729: FSCSI_ERR10 Configuration Mismatch Errors in errpt after turning on Dynamic Tracking
ID: emc116729 Domain: EMC1 Solution Class: 3.X Compatibility Fact Fact Fact Fact Fact Fact Host Level Protection: Host Clustering System: IBM RS/6000 OS: IBM AIX 5.3 Product: Symmetrix DMX1000 Enginuity: 5670 dynamic tracking is enables FSCSI_ERR10 CONFIGURATION MISMATCH Powerpath reports dead paths that will not recover with powermt restore. (reboot may
lspath shows failed paths that will not recover with chpath -s enable command. (reboot
Symptom
Restarting ECC symmagent generates FSCSI_ERR10 error in errpt
Symptom
powerprotect inq -no_dots generates FSCSI_ERR10 error in errpt.
inq generates FSCSI_ERR10 CONFIGURATION MISMATCH powerpath marks a path as dead and sometimes you can not bring it back alive. powermt display unmanaged generates a FSCSI_ERR10 error in errpt when a wrap
plug is in an HBA or a fiber cable is detached (this appears to have started with AIX 5.3 ML4) Symptom cfgmgr generates a FSCSI_ERR10 error in errpt when a wrap plug is in an HBA (this
appears to have started with AIX 5.3 ML4) Symptom Symptom Sense data for FSCSI_ERR10 contains 0000 0000 0000 00D3 on first line. status: 208 (An error occurred trying to open the pdev for raw passthrough I/O. Try
Cause
ODM is slightly different when MPIO is configured. The appropriate FCP flag in the
SIL isn't issuing version 1 ioctl. For AIX 5.2 and above, this check will be avoided and version 1 ioctl will be used. Fix For the powermt display unmanaged symptom - upgrade powerpath to 4.5.2 and the
Fix
Upgrade to Solutions enabler 6.0.3 or higher AND upgrade the EMC inquiry utility to sil
version 6.0.3 (available on ftp.emc.com/pub/symm3000/inquiry). Please note that often inq is embedded in scripts and copies may exist in other locations than the default install location. It is recommended that if the mismatch errors persist after replacing the utility that the system in question be searched for older copies of inq.
The INQ version is V7.3-653 (Rev 3.0) SIL Version V6.0.3.0 (Edit Level 653) This is in Grab 3.7.1 If you are these levels that will correct the issues seen in this primus
Note
If sense data for FSCSI_ERR10 contains 0000 0000 0000 00D9 on first line see
solution EMC122051 Note Note Workaround: turn off dynamic tracking. Also other software to look at that could cause these errors are our INQ which is in the
grab,HACMP,TSM,BEST PATROL,or even H.P. APPLICATIONS have been know to cause these errors.Even the Microcode has been the cause,by not accepting Dynamic Tracking
3.8
EMC155602: AIX host with PowerPath doesn't update dkstat correctly thus iostat summary appears incorrect
ID: emc155602 Domain: EMC1 Solution Class: 3.X Compatibility Fact Fact AIX 5.3 EMC SW: PowerPath 5.0
Symptom Cause
AIX host with PowerPath reports inaccurate summary data on iostat IBM states dkstat isn't getting the proper flags set.
3.9
EMC163961: AIX Host Lost access to storage during powermt restore( same as powermt coinfig / syminq / symrdf / symsnap)
ID: emc163961 Domain: EMC1 Solution Class: 3.X Compatibility Fact Fact Fact IBM AIX 5.2 TL9 SP6 IBM AIX 5.3 TL5 SP6 IBM AIX 5.3 TL5 CSP EMC SW: PowerPath CLARiiON Symmetrix DMX Invista cfgmgr and "powermt cofig" may hang also. REBOOT IS REQUIRED TO CLEAR Some solution enabler commands (timefinder, SRDF, etc.) that passthru instructions to Hanging commands may stack up behind each other and never complete.
REBOOT IS REQUIRED TO CLEAR THIS! Symptom powermt restore command hangs. Reboot is required to clear this condition. Symptom CLARiiON trespasses but I/O fails. Symptom Host may hang. Symptom Error: LVM I/O failure. Change paths. Change Cause The powermt restore command can also be used on the DMX if paths are lost. IBM made a change to the devices.fcp.disk.rte file set. For details on these changes Some event caused a CLARiiON trespass and caused PowerPath paths to fail. The fault
was corrected and powermt restore was issued to bring LUNs back to default SP or to restore lost
please see the IBM support web site or contact IBM. IBM will be releasing a final set of APARS to address this issue for AIX 5.2 and 5.3 in Q1/08 FIX Customers running PowerPath should not upgrade to 5.2 TL9 SP6 or 5.3 TL5 SP6. If
circumstances require upgrading to these levels, customers should upgrade to AIX 5.3 TL6 SP1 WITH APAR IY98352 or AIX 5.2 TL10 WITH APAR IY98572. Additionally, customers should contact IBM for a fix for the following IZ04838 and IZ08170 for AIX 5.3, IZ08652 for AIX 5.2 and IZ08365 for AIX 6.1 which correct a similar hang condition during error recovery. Root cause Impacted Filesets level are: devices.fcp.disk.rte 5.2.0.98 & 5.3.0.56
3.10 Symmetrix
ODM 5.3.0.1 Powerpath 5.1.1 Solution Enabler 6.4.3.0 VCBDB is open IBM P550 HBA 5758*2 AIX 5300-05-CSP 1. /powermt display syminq 2 hdiskpower VG , 3 dd if=/dev/hdiskpower2 of=/dev/null, /dev/hdiskpower2 4 dd if=/dev/hdisk5 of=/dev/null, /dev/hdisk5 hdisk5 hdiskpower2 5. /dev/hdiskpower2 /dev/hdisk5 root: system 1. symcfg list lock symcfg list lockn ALL exclusive lock Symmetrix external lock symdev list lock -RANGE 32:3D symcfg release force lockn # symdev -force release lock # -RANGE 32:3D (#) symcfg discover 2. 3. account CE PSE symcfg discover cfgmgr, dd VG HV Solution Enabler PSE Product Support Engineer **** Problem Description **PSE Log Notes 14:55:37 04/23/2008 hdisk hdiskpower
Plaese page the CE to contact the customer for the issue on the device range 32 - 3D , 44 - 5F , 65 80 and 87 - 96 and all these device ranges are assigned to SAF5A PORT B AND 12A PORT B however these devices are locked on channels that the devices are not assigned to for example the devices 32 - 35 are locked on the SAF3A and the device range 44-5F are locked on the SAF3B. **PSE Log Notes 16:56:27 04/23/2008 I referenced the linked primus soloution 1.0.149496436.2795452 for the issue and I was able to clear the device locks for the meta devices in the ranges below: 32 - 3D
44 - 5F 65 - 80 87 - 96 All of the above devices are assigned to the SAF5A PORT B and SAF12A PORT B and the devices were locked by channels they were not assigned to and I referenced the linjked primus soloution 1.0.149496436.2795452 and I was able to clear the device locks forthe first group 32 - 3D and the customer than verified that they were able to access this device group and I then claered the device locks for the remaining meta groups. PSE TOM EXT 5289 DIRECT DIAL 1-800-782-4362 OPTION 1 EXT 5289 SYMM SUPPORT EMEA ROTATING SHIFT PATTERN (02:00 - 12:00 EST) **PSE Log Notes 17:06:01 04/23/2008
there was also a drive replacement script that had not compleated and was at step 42 step name "RemoveSpare_Verify_SpareSplit" and I confirmed that there was no failed drive and I ran the "Remove Dynamic spare" script to confirm this and I checked that there were no no entries in the hot spare request buffer and I skipped the step "RemoveSpare_Verify_SpareSplit" and I compleated the drive replacement script. PSE TOM EXT 5289 DIRECT DIAL 1-800-782-4362 OPTION 1 EXT 5289 SYMM SUPPORT EMEA ROTATING SHIFT PATTERN (02:00 - 12:00 EST) dd if=/dev/hdisk5 of=/dev/null, dd if=/dev/hdiskpower2 of=/dev/null powermt remove dev=all rmdev dl fscsix R hdiskpower hdisk cfgmgr v powerpath odm powerpath
Group and exclusive SCSI reservations can be cleared with: # symld -g dg_name break LdevName For example: # symdev -resv list -sid 32 Symmetrix ID : 000187400732 Device Reservation Initiator SA :P Type
---------------------------- ------------- ---------------------------Sym Config ---------------------------- ------------- ---------------------------/dev/rhdisk6 N/A 03C2 2-Way Mir 0434 2-Way Mir 00 00 04A:0 Exclusive 05C:0 Exclusive
# symdg create testdg # symld -g testdg add dev 3C2 -sid 32 # symdg show testdg Group Name: testdg Group Type Device Group in GNS Valid Symmetrix ID Group Creation Time Vendor ID Application ID : Yes : 000187400732 : Mon Mar 8 12:05:44 2004 : EMC Corp : SYMCLI : : 0 : : 0 0 0 0 0 1 : REGULAR : Yes
Number of STD Devices in Group Number of Associated GK's Number of Locally-associated BCV's Number of Locally-associated VDEV's
Number of Remotely-associated BCV's (STD RDF): Number of Remotely-associated BCV's (BCV RDF): Number of Remotely-assoc'd RBCV's (RBCV RDF) : Standard (STD) Devices (1): {
-------------------------------------------------------------------Sym LdevName DEV001 } # symld -g testdg break DEV001 -nop # symdev -resv list -sid 32 Symmetrix ID : 000187400732 Device Reservation Initiator SA :P Type PdevName /dev/rhdisk6 Cap Dev Att. Sts 03C2 RW (MB) 2033
--------------------------------------------------------------------
---------------------------- ------------- ---------------------------Sym Config ---------------------------- ------------- ---------------------------N/A 0434 2-Way Mir 00 05C:0 Exclusive
Notes: See solution emc81930 for an explanation of group and exclusive reservations.
Releasing SCSI reservations should be done with care. If there is I/O to a physical volume when the reservation is released, the device will immediately be re-reserved. However, if there is no I/O and a reservation is released, that device can be accessed by any other host and initiator that the device is mapped to.
SCSI-3 persistent reservations (used by applications like Sun Cluster 3.0 and above) cannot be displayed or broken using SYMCLI commands . ( SCSI-3
persistent reservation ons, AIX emcpowerreset /usr/lpp/emc/Sy*/bin/emcpowerrest fscsiX hdiskpowerY fscsiX powerpath fscsi CX, emcpowerreset usr/lpp/emc/Cl*/bin/emcpowerrest) The symld command will not always be successful in breaking SCSI reservations. Even if symdev -resv list shows a reservation, an attempt to clear the reservation may fail with a "no locks on device" error. UPDATE - According to: emc138185, there was a display issue which is resolved in Solutions Enabler 6.4. However, if the host is not or cannot upgrade to Solutions Enabler 6.4, call into the support center to clear the reservation via the Symm. Support Center: You will "NEED" to obtain written permission and exact device numbers before you can proceed with releasing scsi reservations using INLINES. Refer to "How to release a scsi reservation using INLINES", or F0,RCVR,HELP for more information. If you have any questions about the F0, RCVR command please speak with a PSE or a SSC Tech Lead before running it.
SCSI Reservation Lock EMC81930 SCSI reservations are placed on devices in a volume group by an AIX host whenever that volume group is varied on. The reservation is meant to prevent physical volumes from being accessed by any host other than the one that places the reservation. The term "Exclusive" means that the physical volume can be accessed only by the initiator (HBA) that is accessing the physical volume when the reservation is placed. Exclusive reservations are placed on devices when the volume group uses native AIX hdisk devices. The other kind of reservation that is seen in AIX environments is a "Group" reservation. When volume groups are accessed using hdiskpower pseudo devices a group reservation is placed on the device. This is necessary when PowerPath is installed because it means that all initiators in an initiator group will be able to access a Symm device. This allows PowerPath to provide load balancing and failover to a physical volume from multiple initiators on a single host. Code5567 5670 SCSI Reservation Lock PSE support SCSI Reservation Lock CE Health Check Recently in MOR SD, we met some issues with AIX6.1 -- all DMX related hdisks become "define" and some new ones appear when rebooting system. The root cause is the "Reserve Lock" and it will be solved by apply "APARs IZ63813, IZ64056 and IZ64133". So if you met AIX6.1 later, please check these patches first! Details are discribed in emc227301 as below: Knowledgebase Solution Question: Environment: Environment: Environment: Environment: Environment: Environment: Environment: Environment: Environment: Problem: Problem: Problem: Problem: Problem: Problem:
ETA emc227301: AIX host hangs upon running cfgmgr on non-MPIO devices. Reserve Lock chang to yes after upgrading to AIX 6.1TL4 SP1 and VIOS 2.1 Fix Pack 22. EMC Technical Advisory
OS: IBM AIX 6.1 TL2 OS: IBM AIX 5.3 TL11 OS: VIOS 2.1.10.22 (FP22) EMC SW: PowerPath 5.1 EMC SW: PowerPath 5.3.1 Product: Symmetrix Product: CLARiiON System hangs on cfgmgr when attempting to configure SAN devices. Reserve Lock, which is needed in VIO, Oracle and PowerPath environments, changes back to yes should be no. Devices go defined. CLARiiON LUNs may be trespassed. The system may eventually boot up after a number of hours. AIX system shows an LED 999 upon bootup and will not boot. After the system comes up, some or all hdisks and PowerPath devices may be defined. CLARiiO
Failovermode for CLARiiON arrays must be set to 3 or 4. It can no longer set to 1. Refer to KB articles emc227765 and emc99467 for complete information about what settings should be used.
Problem: Change: Root Cause: Root Cause: Root Cause:
Fix:
For all Symmetrix devices, new hdisks are created upon reboot. Upgraded to AIX 6.1 TL4 SP1 or VIOS FP22 and rebooted. AIX issue introduced in 6.1 TL4 SP1 and VIOS FP22, only in PowerPath environments. The prob appeared in AIX 6.1 TL2 in testing. The Reserve Lock issue appears after upgrading to PowerPath 5.3 SP1 and upgrading to VIO 2.1 22. The cause is non-MPIO (hdiskpower) devices. FC DISKS will define a new Hdisk instance up reboot if a PVID stamp exists on the disk, but no PVID attributes exists in ODM. Reserve Lock changed back to yes, and a reserve is placed on all the new Hdisk instances. The fix was created non-MPIO FC Disks that have a PVID stamp against the connection information of the device. NOTE: If you experience any problems obtaining the APARs or IFIXes listed below, please IBM directly.
NOTE: IBM uses different terms to describe their APARs. IFIX is one of them.
IZ63818, IZ64056, IZ64133 Are planned to be part of AIX 6.1 TL4 SP2, which is targeted to be availab 2010
All three APARs/IFIXes are currently available for this problem, and must be loaded as a group for AIX 6.1 T.L
Obtain IZ64056 through normal download channels of IBM. For IZ63813 & IZ64133, please obtain them from the public IBM FTP site as described:
IZ63977 & IZ63808 - Scheduled to be part of AIX 5.3 base TL12, currently targeted to be available in Ap
These are two APARs are currently available for this problem, and must be loaded as a group for this leve
Obtain IZ63977 through normal download channels of IBM. For IZ63808, please obtain it from the public IBM FTP site as described:
For IZ63813_vios, please obtain it from the public IBM FTP site as described:
This problem will occur on any multipath host (non-MPIO). If the customer's VIO client is set up such that the
a single path from more then one VIOS, and afterwards was still set up with just a single path to a single VIOS
this problem will not appear. Also, the problem does not seem to be present with VIO Fix Pack 21. These sol were tested with the Reserve Lock.
NOTE: If running PowerPath 5.3, SP1 MUST be installed as part of the fix.
Notes: Notes:
Fix Pack 22 (or above) is not on the ESM as of the time this KB article was written. If customers use it, they must open an RPQ. It is targeted for a future ESM release. Warning! ETAs constitute formal notification from EMC to customers, partners, and EMC field pe Changes to this solution require approval of the Customer Service TS2 ETA Approver, and this must be recorded in the Comments of the solution. To identify the CS ETA Approver, go to the Co tab or the List of ETA Approvers, located on the ETA web page of the GS web site.
BTW, Please also note that the problem "Paths go dead frequently in AIX PowerPath 5.X without any (hardware) issues" will appear in some new version SE environment. Solution is described in emc217815 as below:
Problem: Paths go dead frequently in AIX PowerPath 5.X without any (hardware) issues. Problem: PowerPath keeps marking paths dead/alive frequently without any real connection failures (see Note 1 b Root Cause: It appears that when syminq is issued against a pseudo and a native device, the EACCES occurs. Note it can open both pseudo and native at the same time. Fix: Workaround: 1. Upgrade PowerPath to 5.3 HF02 or later 2. Upgrade to Solutions Enabler 7.0 or later and set two options as follows:
CAUTION: This setting storapid:parallel_inquiry_size = 0, in the daemon_options file is an This setting should not be set with first consulting with Soluitons Enabler Engineering. /var/symapi/config/daemon_options storapid:parallel_inquiry_size = 0 /var/symapi/config/options SYMAPI_WAIT_ON_LOCKED_GK= ENABLE
SYMAPI operations may fail because the storapid or storsrvd daemons run o should run: /usr/ccs/bin/ldedit -bmaxdata:0x80000000 /usr/symcli/daemons/storapid
Notes:
For additional information, refer to the Solutions Enabler 7.0/7.01 Release N Note 1:
This issue occurs quite often when you run syminq and/or symcfg of Solution may see the issue when EMC ControlCenter Symmetrix agent starts issuing happen for normal I/Os if a pseudo device and its native device is opened/cl
Notes: Note 2:
When you take a trace, you will see EACCES for the device open as follows.
SCDISKDD entry_open: errno: 00 devno: 8000001300000015 rwflag: 000000 0000000000000000 ext: 0000000000000008 SCDISKDD exit_open: errno: 0D devno: 8000001300000015 SCDISKDD entry_open: errno: 00 devno: 8000001300000015 rwflag: 000000 0000000000000000 ext: 0000000000000001 SCDISKDD exit_open: errno: 0D devno: 8000001300000015
Notes: Note 3: A PowerPath 5.5 major release targeted for 1H 2010 will handle a more graceful device open.