Documentos de Académico
Documentos de Profesional
Documentos de Cultura
This guide contains detailed instructions for configuring and troubleshooting HP StorageWorks XP Cluster Extension Software in AIX, Windows, Solaris, and Linux environments. The intended audience has independent knowledge of related software and of the HP StorageWorks XP disk array and its software.
Legal and notice information Copyright 2003-2010 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. UNIX is a registered trademark of The Open Group.
Contents
1 XP Cluster Extension features ............................................................. 11
Integration into cluster software ................................................................................................... Enhanced disaster tolerance ....................................................................................................... Automated monitoring and redirecting of XP Continuous Access Software pairs ................................ Rolling disaster protection .......................................................................................................... Command-line interface (CLI) ..................................................................................................... Fast Failback using XP Continuous Access Software ...................................................................... XP Cluster Extension configurations ............................................................................................. One-to-one configurations ................................................................................................... Consolidated-site configuration ............................................................................................ Supported XP Continuous Access Software configurations and fence levels ................................ XP Cluster Extension server configurations .............................................................................. Planning for XP Cluster Extension ................................................................................................ Before configuring XP Cluster Extension resources ................................................................... Cluster setup considerations ................................................................................................. MNS quorum clusters (MSCS) ........................................................................................ SLE HA cluster setup considerations ................................................................................ RHCS cluster setup considerations .................................................................................. Setting up XP RAID Manager ............................................................................................... Creating XP RAID Manager command devices ................................................................ Creating XP RAID Manager instances ............................................................................. Creating XP RAID Manager device groups ...................................................................... Network considerations ................................................................................................ Starting and stopping the XP RAID Manager instances ..................................................... Test takeover function ................................................................................................... 11 11 11 12 12 12 12 12 13 14 14 15 15 15 15 15 18 20 20 20 21 21 22 22
Configuring XP Cluster Extension ................................................................................................ Starting the XP Cluster Extension configuration tool ................................................................. Defining XP Cluster Extension configuration information using the GUI ....................................... Defining XP Cluster Extension configuration information using the CLI ........................................ Importing and exporting configuration information ................................................................. Exporting configuration settings using the GUI ................................................................. Exporting configuration settings using the CLI .................................................................. Importing configuration settings using the GUI ................................................................. Importing configuration settings using the CLI .................................................................. Adding an XP Cluster Extension resource ..................................................................................... Adding an XP Cluster Extension resource using the Cluster Administrator GUI (Windows Server 2003) ............................................................................................................................... Adding an XP Cluster Extension resource using the Failover Cluster Management GUI (Windows Server 2008/2008 R2) ...................................................................................................... Adding an XP Cluster Extension resource using the Microsoft CLI cluster commands .................... Changing an XP Cluster Extension resource name ......................................................................... Changing an XP Cluster Extension resource name (Windows Server 2003) ................................ Changing an XP Cluster Extension resource name (Windows Server 2008/2008 R2) ................. Configuring XP Cluster Extension resources ................................................................................... Setting Microsoft cluster-specific resource and service or application properties ........................... Setting XP Cluster Extension-specific resource properties .......................................................... Setting XP Cluster Extension resource properties using the Cluster Administrator GUI (Windows Server 2003) .............................................................................................................. Setting XP Cluster Extension resource properties using the GUI (Windows Server 2008/2008 R2, Server Core, and Hyper-V Server) ............................................................................. Setting XP Cluster Extension resource properties using the MMC ........................................ Setting XP Cluster Extension resource properties using the CLI ............................................ Setting XP Cluster Extension properties using a UCF ......................................................... Adding dependencies on an XP Cluster Extension resource ............................................................ Adding dependencies using Cluster Administrator (Windows Server 2003) ............................... Adding dependencies using Failover Cluster Management (Windows Server 2008/2008 R2) ..... Adding dependencies using the CLI ...................................................................................... Disaster-tolerant configuration example using a file share ............................................................... Managing XP Cluster Extension resources .................................................................................... Bringing a resource online ................................................................................................... Taking a resource offline ..................................................................................................... Deleting a resource ............................................................................................................ Using Hyper-V Live Migration with XP Cluster Extension .................................................................. Timing considerations for MSCS ................................................................................................. Bouncing service or application .................................................................................................. Administration .......................................................................................................................... Remote management of XP Cluster Extension resources in a cluster (Windows Server 2008/2008 R2) ................................................................................................................................... Remote management of XP Cluster Extension resources in a cluster (Windows Server 2003) ......... System resources ................................................................................................................ Logs ................................................................................................................................. Hyper-V Live Migration log entries ..................................................................................
37 37 38 40 41 41 42 42 42 42 43 44 44 44 44 45 45 46 49 49 54 62 63 64 64 65 65 66 66 70 70 70 70 71 72 72 73 73 73 74 74 74
Including the XP Cluster Extension resource type ........................................................................... Configuring the XP Cluster Extension resource ............................................................................... XP Cluster Extension resource types ....................................................................................... Resource type definition ...................................................................................................... Adding an XP Cluster Extension resource ..................................................................................... Adding an XP Cluster Extension resource using the VCS CLI ..................................................... Adding an XP Cluster Extension resource using the VCS Cluster Manager GUI ........................... Changing XP Cluster Extension attributes ..................................................................................... Changing an attribute value using the VCS CLI ...................................................................... Changing an attribute value using the VCS Cluster Manager GUI ............................................. Linking an XP Cluster Extension resource ...................................................................................... Linking other resources to the XP Cluster Extension resource ..................................................... Linking other resources using the VCS Cluster Manager GUI .................................................... Bringing an XP Cluster Extension resource online .......................................................................... Enabling and bringing an XP Cluster Extension resource online using the CLI ............................. Enabling and bringing an XP Cluster Extension resource online using the VCS Cluster Manager GUI .................................................................................................................................. Taking an XP Cluster Extension resource offline ............................................................................. Taking an XP Cluster Extension resource offline using the VCS Cluster Manager GUI ................... Deleting an XP Cluster Extension resource .................................................................................... Deleting a resource using the VCS CLI .................................................................................. Deleting a resource using the VCS Cluster Manager GUI ......................................................... Disabling the XP Cluster Extension agent ...................................................................................... Pair/resync monitor integration ................................................................................................... Log-level reporting .............................................................................................................. Timing considerations for VCS .................................................................................................... Enabling/disabling service groups .............................................................................................. Restrictions for VCS with XP Cluster Extension ............................................................................... Unexpected offline conditions ..................................................................................................... Bringing the XP Cluster Extension resources online ..................................................................
79 79 79 79 80 80 80 83 83 83 84 84 85 85 85 85 86 86 87 87 87 87 88 88 88 89 90 90 91
Rescanning multipath devices ................................................................................................... Configuring the rescan script ............................................................................................. Finding the user-friendly name of a multipath device ............................................................. Configuring the pair/resync monitor .......................................................................................... Updating the remote access hosts file .................................................................................. Configuring the pair/resync monitor port ............................................................................ Activating the pair/resync monitor ............................................................................................ Timing considerations ..............................................................................................................
Failover error handling ............................................................................................................ HACMP-specific error handling ................................................................................................. Start errors ...................................................................................................................... Failover errors .................................................................................................................. MSCS-specific error handling ................................................................................................... Resource start errors ......................................................................................................... Failover errors .................................................................................................................. Using the Domain user account (Windows Server 2008/2008 R2 only) ................................. VCS-specific error handling ...................................................................................................... Start errors ...................................................................................................................... Failover errors .................................................................................................................. Linux-specific error handling ..................................................................................................... Failover errors .................................................................................................................. The FC link is down (RHCS) ............................................................................................... A storage replication link is down (RHCS) ............................................................................ A data center is down (SLE HA and RHCS) .......................................................................... Pair/resync monitor messages in syslog/errorlog/messages/event log ...........................................
150 150 151 151 153 153 153 154 154 155 155 156 157 157 158 158 159
Figures
1 One-to-one (1:1) configuration ................................................................................. 13 2 Consolidated-site configuration ................................................................................. 14 3 HACMP configuration example ................................................................................. 29 4 Service or application example (quorum service control disks not shown) ....................... 67 5 CLX_FILESHARE resource sample .............................................................................. 67 6 XP Cluster Extension resource tree for CLX_SHARE ....................................................... 68 7 VERITAS Cluster Service configuration example ........................................................... 76 8 Sample resource graph of the CLX_WEB_SERVER service group .................................... 77 9 Sample configuration .............................................................................................. 93 10 Disaster-tolerant configuration with rolling disaster protection ...................................... 143 11 Incompatible XP disk pair state .............................................................................. 154 12 Incompatible XP disk pair state (VCS Cluster Manager Log Desk window) .................... 156 13 Detailed information of the XP disk pair state (VCS Log Desk) ..................................... 156
Tables
1 Setting resource properties and values in the GUI ........................................................ 47 2 Service or application properties and values .............................................................. 48 3 XP disk pair states ................................................................................................. 119 4 Cluster software supported objects .......................................................................... 125 5 Document conventions ........................................................................................... 162
10
11
One-to-one configurations
In one-to-one (1:1) configurations, cluster host nodes are split between two geographically separate data centers and use redundant, diversely routed network connections for intra-cluster communications. (See Figure 1 on page 13.) These links must be as reliable as possible to prevent false failover operations or split-brain situations.
12
Each cluster host node needs redundant FC or SCSI I/O paths to the XP disk array. Individual hosts cannot be connected to both the primary (P-VOL) and the secondary (S-VOL) copy of the application disk set. HP recommends a minimum of two cluster host nodes per site. This allows for a preferred local failover in case of a system failure. Local failover operations are faster than a remote failover between XP disk arrays because the mirroring direction of the XP disks does not need to be changed. XP Cluster Extension can be deployed in environments where several clusters use the same pair of XP disk arrays. Although XP disk arrays can be mirrored in various configurations, XP Cluster Extension does not support multiple disk arrays as both primary and secondary disk arrays. XP Cluster Extension supports configurations where two or more disk arrays use one remote disk array in a logical one-to-one configuration. CAUTION: XP Cluster Extension can operate with only one system at each site, with a single I/O path between the server system and the disk array and a single link in each direction between disk arrays. However, those configurations are not considered highly available, nor are they disaster tolerant. XP Cluster Extension configurations with single points of failure are not supported by HP.
Consolidated-site configuration
In consolidated-site configurations, a single XP disk array in the secondary (remote) data center is connected to up to four other primary XP disk arrays (see Figure 2 on page 14.) The restrictions outlined in One-to-one configurations on page 12 also apply to consolidated configurations. XP
13
Cluster Extension does not support configurations in which the application service's data disk set is spread over two or more XP disk arrays.
14
or remote site, the other site together with the added node would have a majority. In a MNS with File Share Witness configuration, the, file share should be located at the third site. TIP: To upgrade XP firmware while the application service is running, use host load balancing and multipathing software, such as Auto Path, HP MPIO Full Featured Device Specific Module (DSM) for XP family of Disk Arrays (HP MPIO XP DSM), or VERITAS for Sun Solaris. XP Cluster Extension allows you to configure the failover behavior so that the application service startup is stopped if no remote cluster members can be reached. The default configuration of XP Cluster Extension expects the cluster software to deal with the split-brain syndrome.
15
quorum depends on the defined no-quorum policy. This behavior is in effect until the cluster is fenced. When the cluster is fenced, the resources owned by the fenced nodes fail over to active cluster nodes. STONITH STONITH is an SLE HA cluster fencing method. SLE HA cluster provides STONITH plug-ins for devices such as UPS, PDU, Blade power control devices, and lights out devices. Some plug-ins can STONITH more than one node (for example, Split Brain Detector STONITH) and some can STONITH only one node (for example, HP iLO STONITH). HP iLO STONITH uses the power control functions of an HP iLO device to STONITH a node that has lost quorum and needs to be fenced. IMPORTANT: If all of the iLO devices in a cluster are connected using a single network, a single switch failure might disable iLO, preventing nodes from being fenced. This failure might be difficult to detect, especially before a node failure where iLO features would be required. The STONITH action can be set to power off or reset, depending on the environment requirements. Power off: The STONITH agent powers off the nodes in the errant subcluster. Reset: The STONITH agent resets the nodes in the errant subcluster, and the nodes try to automatically rejoin the cluster. NOTE: IPMI fencing can be used for Integrity servers that do not support RIBCL scripting.
Networking in an SLE HA cluster Configuring redundant and independent cluster communication paths is a good way to avoid Split Brain conditions. With redundancy in communication paths, the loss of a single interface or switch does not break the communication between nodes and prevents Split Brain conditions. Administrators can configure multiple independent communication paths. HP recommends using bonded Ethernet channels. Resource constraints Resource constraints allow administrators to specify which cluster nodes resources can run on, the order resources are loaded, and the other resources a specific resource is dependent on. There are three types of resource constraints: Resource location: Defines the nodes on which a resource can run, cannot run, or is preferred to be run. Resource colocation: Defines which resources can or cannot run together on a node. Resource order: Defines the sequence of actions for resources running on a node. Resource operation attribute SLE HA does not monitor resource health by default. To enable this feature, add the monitor operation to the resource definition. You can specify the interval attribute and the timeout attribute for a monitor
16
operation. The interval attribute defines the time interval in which the monitor operation is executed. The timeout attribute determines how long to wait before considering the resource as failed. Define start, stop, and monitor operations for the XP Cluster Extension resource. XP Cluster Extension resource dependency A Group resource in an SLE HA cluster ensures that the member resource agents are started and stopped in the required order. An XP Cluster Extension resource must be added as the first member of the group. This way, all primitive resources added after the XP Cluster Extension resource are dependent on XP Cluster Extension. Since the primitive resources within a resource group can be failed over independently, set a collocation constraint for each resource group ID with the last resource in the group to achieve the failover of the entire group when any primitive resource fails. Failover order Use location constraints to define the failover order for a resource group. For each node, define a location constraint with the appropriate score to prioritize the resource group on that particular node. During failover, the cluster calculates the score of the resource group on the available nodes, and the node with the highest score is considered the next preferred owner. For more information, see the SLE HA documentation. Failback option HP does not recommend auto failback in configurations with XP Cluster Extension because the resource failovers due to storage failure can cause resources to go into an unstable state (failover/failback might toggle the resource between the nodes). SLE HA provides the meta-attribute resource-stickiness to determine how much a resource agent prefers to stay where it is. To disable auto failback, set resource-stickiness to the lowest value compared to the other resource location constraints. Migration-threshold A resource is automatically restarted if it fails. If a restart cannot be achieved on the current node or it fails to start a certain number of times on the current node, it tries to fail over to another node. You can define the number of failures for resources (a migration-threshold) after which they migrate to a new node. If you have more than two nodes in your cluster, the high availability software chooses the node a particular resource fails over to. When an XP Cluster Extension resource fails, HP recommends configuring your cluster to fail over the resource without restarting on the local node. To set this preference, set the migration-threshold to 1. Disk monitoring For the situations in which disk access is lost or read/write protection is in effect due to storage fencing, application monitoring agents, file system agents, or LVM resource agents detect the IO failure. XP Cluster Extension does not monitor the disk access status.
17
Qdisk configuration Red Hat recommends the use of a Qdisk configuration to bolster quorum to handle failures such as half (or more) of the members failing, a tie-breaker in equal split partition, and a SAN failure. In an XP Cluster Extension configuration with multiple storage arrays, a Qdisk configuration is not supported. Failover domains A cluster service is associated with a failover domain, which is a subset of cluster nodes that are eligible to run a particular cluster service. To maintain data integrity, each cluster service can run on only one cluster node at a time. By assigning a cluster service to a restricted failover domain, you can limit the nodes that are eligible to run a cluster service in the event of a failover, and you can order the nodes by preference to ensure that a particular node runs the cluster service (as long as that node is active).
18
A failover domain can have the following characteristics: Unrestricted: Specifies that the subset of members is preferred, but the cluster service assigned to this domain can run on any available member. Restricted: The cluster service is allowed to run only on a subset of failover domain members. Unordered: The member on which the cluster service runs is chosen from the available list of failover domain members with no preference order. Ordered: The failover domain member on which the cluster service runs is selected based on preference order. The member at the top of the list (as specified in /etc/cluster/ cluster.conf) is the most preferred, followed by the second member, and so on. For an orderly failover, HP recommends using the Ordered and Restricted options for your failover domains. Failback policy HP does not recommend auto failback in configurations with XP Cluster Extension because the resource failovers due to storage failure can cause resources to go into an unstable state (failover/failback might toggle the resource between the nodes). In this situation, HP recommends correcting the failure and then manually failing back to the intended data center or server. To disable the auto failback, set the nofailback flag for the failover domain. Enabling this option for an ordered failover domain prevents automated failback after a more-preferred node rejoins the cluster. Recovery policy When a resource inside the service fails, the default action is to restart the service on the local node before the failover. In an XP Cluster Extension environment, it is always expected to relocate the service during restart. To enable this functionality, set the service recovery policy to relocate. Service hierarchical structure and resource dependency In RHCS, a service is a collection of cluster resources configured into a single entity that is managed (started, stopped, or relocated) for high availability. A service is represented as a resource tree that specifies each resource, its attributes, and its relationship among other resources in the resource tree. The relationships can be parent, child, or sibling. Even though a service is seen as a single entity, the hierarchy of the resources determines the order in which each resource within the service is started and stopped. In the case of a child-parent relationship, the startup or shutdown is simple. All parents are started before children, and children must all stop cleanly before a parent can be stopped. For a resource to be considered in good health, all of its children must be in good health. A service is considered failed if any of its resources fail. In this case, the expected course of action is to restart the entire service, including the failed resource and the other resources that did not fail. In an XP Cluster Extension environment, configure the XP Cluster Extension resource as the parent resource in the service so that XP Cluster Extension can control the service behavior based on the user configuration and storage device status. This means that the XP Cluster Extension resource must be configured at the highest level in the dependency hierarchy.
19
Disk monitoring For the situations in which disk access is lost or read/write protection is in effect due to storage fencing, application monitoring agents, file system agents, or LVM resource agents detect the IO failure. XP Cluster Extension does not monitor the disk access status.
20
Network considerations
Because XP RAID Manager is essential to XP Cluster Extension, HP recommends that you use the heartbeat network (a private network) for XP RAID Manager communications. Alternative network paths are highly recommended. Configure the networks XP RAID Manager uses for each device group in the HORCM_INST part of the XP RAID Manager configuration file.
21
22
The output of the pairdisplay command indicates whether the local disk is the secondary (S-VOL) disk and if so, the horctakeover command shows a SWAP-takeover as a result. If pairdisplay shows the local disk as the primary (P-VOL) disk, log in to a system connected to the secondary (S-VOL) disk and invoke the horctakeover command there. If the horctakeover command does not result in a SWAP-takeover, see Recovery sequence on page 120 to resolve the issue. The t option of the horctakeover command is used only for fence level ASYNC (both Async and Journal).
23
24
Configuring resources
The XP Cluster Extension resource gathers all necessary information about the disk arrays and configured device groups whenever a resource group is brought online. If configured, a pair/resync monitor is started to monitor the XP Cluster Extension resource. To use this monitor, set the standard HACMP event release_vg_fs to call a pre-event. Call the XP Cluster Extension binary clxhacmp as a pre-event of the standard HACMP event get_disk_vg_fs to check the status of the XP RAID Manager device group and, if necessary, to allow access to the device group before HACMP tries to access the disks of the resource group. Configure XP Cluster Extension parameters with the user configuration file: /etc/opt/hpclx/conf/ UCF.cfg. See Chapter 8 on page 123 for more information about the user configuration file.
25
3.
26
4.
Configure the newly created Custom Cluster Event as a pre-event of get_disk_vg_fs: #smitty hacmp
5.
Select the following options: Extended Configuration Extended Event Configuration Change/Show Pre-Defined HACMP Events Select the event get_disk_vg_fs. Define the previously defined custom event get_disk_vg_fs_pre as a pre-event of get_disk_vg_fs.
6. 7.
NOTE: With HACMP 5.2 and later, to have the get_disk_vg_fs event called, you must specify Serial Acquisition Order. If Serial Acquisition Order is not specified, AIX will use the default Parallel Acquisition Order. 8. Use SMIT and select the following options: Extended Configuration Extended Resource Configuration Configure Resource Group Run-Time Policies Configure Resource Group Processing Ordering
27
9.
10. Once the resource groups have been specified, press Enter to complete the configuration process. XP Cluster Extension controls the disk pairs based on XP RAID Manager device groups. The volume group definition of the HACMP resource group is used to determine the corresponding XP RAID Manager device group. The mapping of the HACMP volume group configuration and the corresponding XP RAID Manager device group is done by the XP Cluster Extension user configuration file /etc/ opt/hpclx/conf/UCF.cfg. Because of this mapping mechanism, you must specify the volume groups owned by the HACMP resource groups in the user configuration file.
28
Figure 3 on page 29 and the examples that follow show two possible mappings.
Example 1
The application OracleRG corresponds to an HACMP resource group OracleRG, which consists of the volume groups ora1vg and ora2vg. The corresponding XP RAID Manager device group oracle controls all disks which form the volume groups of the HACMP resource group. The resource group is configured to wait for a pair resynchronization in case you have not done any disk pair recovery after the resource group has been moved to an alternate system. The resource group will be brought online on the local system again (ApplicationStartup object is set to RESYNCWAIT). The AutoRecover object is set to NO, which means that you will not utilize XP Cluster Extension capabilities to automatically recover suspended disk pair states. The DataLoseMirror object and DataLoseDataCenter object are set to NO, which means XP Cluster Extension will not allow you to bring the resource group online if the disk pair is suspended or a takeover operation leads to a suspended disk pair.
Example 2
The application SapRG uses the device group sap to control all the disks of the corresponding HACMP resource group SapRG, which uses the volume group sap1vg and sap2vg. The resource group is configured to fail back to the remote system rather than waiting for a pair resynchronization, in case you have not done any disk pair recovery after the resource group has been moved to an alternate system. The resource group will be brought online on the local system again (ApplicationStartup object is set to FASTFAILBACK per default). This setup will lead to an error loop, as HACMP does not provide the feature to automatically failback after an error has been reported. The AutoRecover object is set to NO by default, which means that you will not utilize XP Cluster Extension capabilities to automatically recover suspended disk pair states. The following example shows the configuration file for examples 1 and 2:
29
COMMON LogDir /var/opt/hpclx/log/ #default (optional) LogLevel error # error|info default: error (optional) APPLICATION OracleRG # package/service group test_application Vgs ora1vg ora2vg # HACMP specific, to map vg to OracleRG ApplicationDir /etc/opt/hpclx XPSerialNumbers 30368 30380 RaidManagerInstances 11 DeviceGroup oracle # raid manager device group FenceLevel data # values: data | never | async ApplicationStartup resyncwait # values: fastfailback | resyncwait AutoRecover no # possible values: yes | no DataLoseMirror no # possible values: yes | no DataLoseDataCenter no # possible values: yes | no PreExecScript /etc/opt/hpclx/ora_pre.sh PostExecScript /etc/opt/hpclx/ora_post.sh APPLICATION Vgs XPSerialNumbers RaidManagerInstances DeviceGroup FenceLevel SapRG # package/service group test_application sap1vg sap2vg # HACMP specific, to map vg to SapRG 30368 30380 11 sap # raid manager device group never # possible values: data | never | async
30
3.
3.
31
32
3.
Enter the following values: Cluster Event Name: release_vg_fs_pre Cluster Event Description: XP Cluster Extension Pre-Event Cluster Event Script Filename: /opt/hpclx/bin/clxstopmonhacmp
Redefining the Custom Cluster Event as a pre-event of the standard HACMP event
1. Run SMIT (HACMP section): #smitty hacmp 2. Select the following: Extended Configuration Extended Event Configuration Change/Show Pre-Defined HACMP Events Select event release_vg_fs.
3.
33
4.
Define the previously defined Custom Cluster Event release_vg_fs_pre as a pre-event of release_vg_fs.
Timing considerations
XP Cluster Extension is designed to prefer XP disk array operations over cluster software operations. If XP Cluster Extension invokes disk pair resynchronization operations or gathers information about the remote XP disk array, XP Cluster Extension will wait until the requested status information is reported. This assumption has been made to clearly prioritize data integrity over the cluster software's failover behavior. In some cases, however, this behavior could lead to an HACMP error event (config_too_long). The default timeout value is 6 minutes. Use the chssys command to increase the timeout. For example: #chssys -s clstrmgr -a "-u 60000" NOTE: You must stop the cluster to run the chssys command. The described timeouts can occur in the following situations: XP Cluster Extension uses XP RAID Manager instances to communicate with the remote XP disk array. Depending on the settings of the XP RAID Manager instance timeout parameter and the number of remote instances, the online operation could time out. This can happen if the local XP RAID Manager instance cannot reach the remote XP RAID Manager instance. See Setting up XP RAID Manager on page 20 for more details. XP Cluster Extension tries to resynchronize disk pairs and waits until the XP RAID Manager device group is in PAIR state if the ApplicationStartup attribute is set to RESYNCWAIT. Depending on the XP RAID Manager version and the XP firmware version, this could be a full resynchronization, which can take much longer than the online timeout interval. Even if the XP RAID Manager version and the XP firmware version allow a delta resynchronization, the delta between the primary and the secondary could be big enough for the copy process to exceed the online timeout value.
34
If running in fence level ASYNC, the default value of the AsyncTakeoverTimeout can cause the resource group online process to fail because its value is set very high. This is because the takeover process for fence level ASYNC can take longer when slow communication links are in place. To prevent takeover commands from being terminated prematurely by the takeover timeout, measure the time to copy the installed XP disk array cache. To measure the copy time, use only the slowest XP Continuous Access Software link. This ensures that the XP disk array cache can be transferred from the remote XP disk array, even in the event of a single surviving replication link between the XP disk arrays. Because the failover environment is dispersed into two (or more) data centers, the failover time cannot be expected to be the same as it would be in a single data center with a single shared disk device.
Failure behavior
XP Cluster Extension will run in an endless loop if either of the following is discovered: A configuration error An XP disk pair state that does not allow automated actions This event is logged in the log files: /var/opt/hpclx/log/clxhacmp.log and /tmp/hacmp.out. To return control to the HACMP cluster software, you must remove the lock file: application_dir/application_name.LOCK. For example: /etc/opt/hpclx/OracleRG.LOCK This process has been adopted from HACMP's behavior. In the case of a failure, HACMP will also run in an endless loop until you recover all errors and manually start the application. After all errors have been recovered, invoke the command clruncmd to return control to the cluster software.
35
36
37
NOTE: The service name clxmonitor is appended with the text (not configured) unless the port number is configured in the configuration tool.
38
3.
Specify the XP RAID Manager instances that define the device groups you want to manage with XP Cluster Extension. For more information about XP Cluster Extension and XP RAID Manager, see Setting up XP RAID Manager on page 20. a. Click Add in the XP RAID Manager Instance Configuration section to open the Add XP RAID Manager instances window.
b. 4.
Select the XP RAID Manager instances to use, and then click OK.
Specify the servers that are possible owners for the XP Cluster Extension-managed disks. A server is a possible owner of a disk if it is capable of managing the disk when failover occurs. a. Click Add in the Server Configuration section to display the Add Servers window.
b.
Select the servers that are possible owners of the XP Cluster Extension-managed disks, and then click OK. NOTE: See the Microsoft Cluster Administrator (Windows Server 2003) or Failover Cluster Management (Windows Server 2008/2008 R2) documentation for more information about possible owners.
39
5.
Click OK to save the information and close the configuration tool. The configuration information is saved to the ClxXPCfg file. NOTE: XP Cluster Extension provides an XP RAID Manager service, which automatically starts XP RAID Manager instances at system boot time. This feature reduces resource group and service and application failover times because the XP Cluster Extension resource does not need to start the XP RAID Manager instances. When you click Apply or OK in the configuration tool, the XP RAID Manager service is started.
6.
Use the procedures in Importing and exporting configuration information on page 41 to copy the ClxXpCfg file to the other cluster nodes.
40
3.
Specify the XP RAID Manager instances that define the device groups you want to manage with XP Cluster Extension. For more information about XP Cluster Extension and XP RAID Manager, see Setting up XP RAID Manager on page 20. To view the available XP RAID Manager instances, enter CLXXPCONFIG RM. To add an XP RAID Manager instance, enter CLXXPCONFIG RM /ADDVAL=value, where value is the XP RAID Manager instance you want to add. For example: Enter CLXXPCONFIG RM /ADDVAL=101 to add XP RAID Manager instance number 101. To remove an XP RAID Manager instance, enter CLXXPCONFIG RM /REMOVEVAL=value, where value is the XP RAID Manager instance you want to remove. For example: Enter CLXXPCONFIG RM /REMOVEVAL=101 to remove XP RAID Manager instance number 101.
NOTE: XP Cluster Extension provides an XP RAID Manager service, which automatically starts XP RAID Manager instances at system boot time. This feature reduces resource group and service or application failover times because the XP Cluster Extension resource does not need to start the XP RAID Manager instances. Adding or removing XP RAID Manager instances will start or restart the XP RAID Manager service. 4. Specify the servers that are possible owners for the XP Cluster Extension-managed disks. A server is a possible owner of a disk if it is capable of managing the disk when failover occurs. To determine whether cluster nodes have been configured for XP Cluster Extension, enter CLXXPCONFIG SERVER. To add a server, enter CLXXPCONFIG SERVER /ADD /NAME=servername, where servername is the server to add. To remove a server, enter CLXXPCONFIG SERVER /REMOVE /NAME=servername, where servername is the server to remove. 5. Use the procedures in Importing and exporting configuration information on page 41 to copy the configuration information to the other cluster nodes.
41
42
CAUTION: Do not use the following characters in XP Cluster Extension resource names: \ / : * ? " < > |. Using these characters might affect the creation of the resourcename.online file, which is used for the XP Cluster Extension resource health check mechanism. If the file creation fails and the pair/resync monitor is not configured, the cluster will report a failed state for the XP Cluster Extension resource.
Adding an XP Cluster Extension resource using the Cluster Administrator GUI (Windows Server 2003)
Use the procedure in this section to add a resource using the Cluster Administrator GUI. For instructions on using the CLI, see Adding an XP Cluster Extension resource using the Microsoft CLI cluster commands on page 44. 1. 2. 3. Add a resource group in the Cluster Administrator GUI, as described in your Microsoft documentation. From the File menu, select File > New > Resource. Enter the following values, and then click Next: Name: The name of the resource. Description: As appropriate for the resource. Resource type: Select Cluster Extension XP from the list. Group: Select a group to associate with the resource. 4. Add or remove possible resource owners, and then click Next. The Dependencies window appears. 5. Do not add any dependencies. Click Next to open the Parameters window. The Parameters window contains values entered during the XP Cluster Extension configuration steps.
6.
Modify the resource property values of the new XP Cluster Extension resource as needed, and then click Finish.
43
Adding an XP Cluster Extension resource using the Failover Cluster Management GUI (Windows Server 2008/2008 R2)
Use the procedure in this section to add a resource using the Failover Cluster Management GUI. For instructions on using the CLI, see Adding an XP Cluster Extension resource using the Microsoft CLI cluster commands on page 44. 1. 2. Add a service or application in the Failover Cluster Management GUI, as described in your Microsoft documentation. Right-click the service or application and select Add a resource > More resources > Add Cluster Extension XP.
Adding an XP Cluster Extension resource using the Microsoft CLI cluster commands
You can use the cluster commands in this section with Windows Server 2003, Windows Server 2008/ 2008 R2, Server Core, and Hyper-V Server. Use the following command to add an XP Cluster Extension resource: cluster resource resource_name /create /group:service_or_application_name /type:"Cluster Extension XP" Example This command adds an XP Cluster Extension resource called clx_fileshare to the CLX_SHARE service or application.
cluster resource clx_fileshare /create /group:CLX_SHARE /type:"Cluster Extension XP"
44
1. 2. 3. 4.
Open Cluster Administrator. Open the resource Properties window and click the General tab. Enter a new name in the Name field. Click OK to save your changes and close the window.
45
is configured for that resource. If the device group is the last monitored disk pair, and you take the resource offline, the pair/resync monitor will be stopped. Windows Server 2008 only: If an XP Cluster Extension resource is not configured, the resource icon in the Failover Cluster Management GUI shows the message not configured next to the resource status. The XP Cluster Extension resource must be the first resource for all disk resources in the dependency list of a resource cluster group. If you have an application in a cluster that uses more than one physical disk from the same device group, configure a single XP Cluster Extension resource, and ensure that all of the application disks depend on that resource. If the disks are split into different device groups, you must configure multiple XP Cluster Extension resources since an XP Cluster Extension resource operates at the device-group level. The PendingTimeout value must be greater than the ResyncWaitTimeout value. The PendingTimeout must be greater than twice the wait time of all remote XP RAID Manager instances multiplied by the number of remote systems. Otherwise, the XP Cluster Extension resource will fail to go online if there is a complete remote data center failure. tonline > nremote systems x 2 x tWT where: tonline = resource online timeout nremote systems = number of remote systems configured to run XP RAID Manager instances tWT = wait time until remote error will be reported by local XP RAID Manager instance If a post-executable is specified, the PendingTimeout must be greater than the number of remote systems multiplied by three times tWT.
XP Cluster Extension requirements for Cluster Administrator and Failover Cluster Management resource properties are described in Table 1 on page 47. If there is no required value for a property, the valid and/or default values are specified. Set these properties in the resource properties window or the CLI. If you use the CLI, use the following command: cluster.exe resource ResourceName /prop PropertyName="PropertyValue".
46
For more information about setting resource properties, see your Microsoft documentation. Table 1 Setting resource properties and values in the GUI Property
Thorough Resource Health Check Interval (Windows Server 2008/2008 R2) Is Alive poll interval (Windows Server 2003) IsAlivePollInterval (CLI)
Format
Integer
Description
Used to poll Alive state for the resource. Decreasing this value allows faster resource failure detection but also consumes more system resources. Set this value in the Advanced Policies tab of the resource properties window in Failover Cluster Management, or in the Advanced tab of the resource properties window in Cluster Administrator.
Value
Windows Server 2008/2008 R2 GUI: 01:00 mm:ss (Default) Windows Server 2003 GUI: 60000 milliseconds (Default) CLI: 60000 milliseconds (Default)
Basic Resource Health Check Interval (Windows Server 2008/2008 R2) Looks Alive poll interval (Windows Server 2003) LooksAlivePollInterval (CLI)
Integer
Used to poll Alive state for the resource. Decreasing this value allows for faster resource failure detection but also consumes more system resources. Set this value in the Advanced Policies tab of the resource properties window in Failover Cluster Management, or in the Advanced tab of the resource properties window in Cluster Administrator.
Windows Server 2008/2008 R2 GUI: 00:05 mm:ss (Default) Windows Server 2008/2008 R2 CLI: 5000 milliseconds (Default) Windows Server 2003 GUI: 60000 milliseconds (Default) Windows Server 2003 CLI: 60000 milliseconds (Default) 0 (Required)
If a resource fails, attempt restart on current node Maximum restarts in the specified period (Windows Server 2008/2008 R2) Restart Threshold (Windows Server 2003) RestartThreshold (CLI) If restart is unsuccessful, fail over all resources in this service or application (Windows Server 2008/2008 R2) RestartAction (Windows Server 2003) RestartAction (CLI)
Integer
Defines whether a resource can be automatically restarted after it has failed. Set this value in the Policies tab of the resource properties window in Failover Cluster Management, or in the Advanced tab of the resource properties window in Cluster Administrator.
Integer
Defines whether resources will be failed over if a restart is unsuccessful. Set this value in the Policies tab of the resource properties window in Failover Cluster Management, or in the Advanced tab of the resource properties window in Cluster Administrator. Windows Server 2003 only: This value must affect the group. This ensures that the resource group fails over to another system if a resource is reported FAILED.
Windows Server 2008/2008 R2: Check (Required) Windows Server 2003: Restart and affect the group (Required, Default) CLI: 2 restart and affect the group (Required)
If a resource fails, attempt restart on current node Period for restarts (Windows Server 2008/2008 R2) RestartPeriod (Windows Server 2003) RestartPeriod (CLI)
Integer
Determines the amount of time for restart. Set this value in the Policies tab of the resource properties window in Failover Cluster Management, or in the Advanced tab of the resource properties window in Cluster Administrator.
Windows Server 2008/2008 R2: 15:00 mm:ss (Default) Windows Server 2003: 900 seconds (Default) CLI: 900000 milliseconds (Default)
47
Property
Pending timeout (GUI) PendingTimeout (CLI)
Format
Integer
Description
Used to specify the timeout for status resolution. For more information, see Timing considerations for Microsoft Cluster Service on page 72. Set this value in the Policies tab of the resource properties window in Failover Cluster Management, or in the Advanced tab of the resource properties window in Cluster Administrator.
Value
Windows Server 2008/2008 R2: 03:00 mm:ss Windows Server 2003: 180 seconds (Default) CLI: 180000 milliseconds (Default)
XP Cluster Extension requirements for service or application properties are described in Table 2 on page 48. If no specific value is required, the default value is listed. Set these values in the Failover tab of the service or application properties window (Windows Server 2008/2008 R2), the resource group properties window (Windows Server 2003), or in the CLI. For more information about setting service or application properties, see your Microsoft documentation. TIP: To change the properties in Table 2 on page 48 with the CLI, use the following command: cluster group groupname /prop propertyname="propertyvalue".
Format
Integer
Description
Prevents automatic fail back of a service or application to its primary system. Transfer the service or application back manually after the failure has been recovered. This allows for recovery of all possible failure sources and pair resynchronization (if necessary) while the application service is still running. Determines the time (in hours) over which the cluster service attempts to fail over a service or application. See Timing considerations for Microsoft Cluster Service on page 72 for more information. Determines the number of failover attempts. The default value allows the cluster service to transfer the service or application to each system once in case of subsequent system failure. Due to the nature of this parameter, it is possible that the service or application automatically restarts on a system several times if all cluster systems are not members of the cluster at that time. If this value is set to a number higher than the current number of clustered systems for the cluster group, the service or application will continue to restart until either the FailoverThreshold value or the FailoverPeriod timeout value is reached.
Value
GUI: Prevent failback CLI: 0 (required)
String
6 (Default)
GUI: Maximum failures in the specified period (Windows Server 2008/2008 R2), Threshold (Windows Server 2003) CLI: FailoverThreshold
Integer
Windows Server 2008/2008 R2: Number of nodes in the cluster minus 1. Windows Server 2003, CLI: 10 (Default)
48
Setting XP Cluster Extension resource properties using the Cluster Administrator GUI (Windows Server 2003)
You can set XP Cluster Extension properties by using the Parameters tab in the Cluster Administrator GUI.
Configuring XP RAID Manager instance numbers for XP RAID Manager service Use the Cluster Administrator Properties window to change XP RAID Manager instance numbers. 1. 2. 3. 4. Open Cluster Administrator and double-click the resource you want to edit. Click the Parameters tab. To remove an instance, select it and click Remove. To add an instance: a. b. 5. Click Add to open the Add RAID Manager instances window. Select one or more instances, and then click OK.
Configuring the XP RAID Manager device group details 1. Open Cluster Administrator and double-click the resource you want to edit. 2. Click the Parameters tab.
49
3. 4.
To change the device group details, select a new value in the RM XP device group menu. Click OK to save your changes and close the window.
Configuring XP RAID Manager device group advanced properties The Parameters tab of the XP Cluster Extension resource offers basic settings and is used to enter environment data, such as XP RAID Manager instances. The more advanced settings can be accessed through additional buttons in the Parameters tab. 1. 2. 3. Open Cluster Administrator and double-click the resource you want to edit. Click the Parameters tab. Click the Advanced button to open the Advanced Fence Level Failover Behavior window. The available settings in this window depend on the fence level used with your device groups. For the DATA fence level, you can update the Data lose mirror and DATA lose data center values. See DataLoseDataCenter on page 133 and DataLoseMirror on page 134 for more information about these values.
For the ASYNC fence level, you can update the ASYNC takeover timeout value. See AsyncTakeoverTimeout on page 130 for more information about this value.
50
For the journal fence level, you can update the Journal data currency on S-VOL and ASYNC takeover timeout values. See JournalDataCurrency on page 136 and AsyncTakeoverTimeout on page 130 for more information about these values.
4. 5. Notes
Update the settings as needed, and then click OK to close the window. Click OK to save your changes and close the window.
After a device group is configured in the resource configuration utility, do not change the device group name or swap the name with another device group name in the HORCM file. If you do this, restart the HORCM manager instance and reconfigure the XP Cluster Extension resource. Do not use HORCM commands to change the device group property for a device group that is configured for an XP Cluster Extension resource. If you do this, the changed property is not reflected immediately in the Parameters tab. To work around this situation, re-select the device group from the XP RM device group menu in the Parameters tab. Configuring server data center assignments 1. Open Cluster Administrator and double-click the resource you want to edit. 2. 3. 4. 5. 6. Click the Parameters tab. To remove a data center assignment, select the assignment, and then click Remove. To modify a data center assignment, select the assignment, and then click Modify. Enter the new Data center name in the Modify Node in Data Center List window, and then click OK. To add a data center assignment, click Add. Select a host and a data center, and then click OK. Click OK to save your changes and close the window.
Changing failover and failback behavior 1. Open Cluster Administrator and double-click the resource you want to edit. 2. Click the Parameters tab.
51
3.
4. 5.
Update the ApplicationStartup and AutoRecover values as needed, and then click OK. Click OK to save your changes and close the window.
Activating the pair/resync monitor The pair/resync monitor detects and responds to suspended XP Continuous Access links if the ResyncMonitor object is set to YES. If the ResyncMonitorAutoRecover object is set to YES, automatic disk pair resynchronization is also activated. When the resource is taken offline, the monitor is stopped for the XP RAID Manager device group used for this resource. CAUTION: If a resource cannot be taken offline manually, and goes into a failed state, the cluster administrator must disable monitoring of the device group for this resource. To avoid data corruption, this task must be part of the recovery procedure when XP Cluster Extension is deployed in an MSCS/Failover Cluster Service environment. See Stopping the pair/resync monitor on page 115. You must ensure that the pair/resync monitor does not monitor and resynchronize the disk pair (device group) from both disk array sites. To use the pair/resync monitor with an XP Cluster Extension resource: 1. 2. 3. Open Cluster Administrator and double-click the resource you want to edit. Click the Parameters tab. Click Pair ResyncMon to open the Pair/Resync Monitor Properties window.
4.
Select the Use pair/resync monitor check box to set the ResyncMonitor object to YES.
52
5. 6. 7. 8.
Select the Pair/resync monitor autoRecovery check box to set the ResyncMonitorAutoRecover object is to YES. If you want to change the monitoring interval (ResyncMonitorInterval), enter a value in the Monitor interval box. Click OK to save your changes and close the Pair/Resync Monitor Properties window. Click OK to save your changes and close the Properties window.
TIP: You can activate ResyncMonitor from cluster commands in the CLI. For example, if your XP Cluster Extension resource is clx_fileshare, enter the following command: cluster resource clx_fileshare /privprop ResyncMonitor=yes.
Configuring takeover actions Pre-executables and post-executables can be defined to be executed before or after XP Cluster Extension invokes its takeover functions. 1. 2. 3. Open Cluster Administrator and double-click the resource you want to edit. Click the Parameters tab. Click Pre/Post Exec to display the Pre/Post Executable Properties window.
4.
Update the PreExecScript, PostExecScript, and PostExecCheck values as needed, and then click OK. When configuring pre/post takeover executable paths, enter the full path to the script. If a script fails, the XP Cluster Extension resource will fail.
5.
Click OK to save your changes and close the Properties window or Resource Configuration tool.
Configuring rolling disaster protection To configure rolling disaster protection for an XP Cluster Extension resource: 1. 2. Open Cluster Administrator and double-click the resource you want to edit. Click the Parameters tab.
53
3.
4.
Add mirror units to each data center: a. b. c. Click Add MU # to DC A. Select mirror units from the list, and click OK. Repeat the previous steps for Data Center B.
5. 6.
Update the BCResyncEnabledA, BCResyncEnabledB, BCResyncMuListA, and BCResyncMuListB values as needed, and then click OK. Click OK to save your changes and close the Properties window.
NOTE: For more information, see Setting objects to enable rolling disaster protection on page 141.
Setting XP Cluster Extension resource properties using the GUI (Windows Server 2008/2008 R2, Server Core, and Hyper-V Server)
This section describes the procedures for setting XP Cluster Extension resource properties with a GUI. You can perform these procedures through the resource configuration utility using the Failover Cluster Management GUI or the standalone resource configuration tool. For instructions on using the two GUI options, see the following sections: Using Failover Cluster Management to set resource properties (Windows Server 2008/2008 R2), page 55 Using the resource configuration tool to set resource properties (Server Core and Hyper-V Server), page 55
54
TIP: For information on managing XP Cluster Extension resources from a remote management station through the MMC, see Setting XP Cluster Extension resource properties using the MMC on page 62.
Using Failover Cluster Management to set resource properties (Windows Server 2008/2008 R2) For Windows Server 2008/2008 R2, use the Failover Cluster Management GUI to set resource properties. 1. 2. 3. 4. Open Failover Cluster Management. Double-click the XP Cluster Extension resource in the summary pane to open the Properties window. Click the Parameters tab. Make the necessary parameter changes, and then click OK.
Using the resource configuration tool to set resource properties (Server Core and Hyper-V Server) For Server Core or Hyper-V Server, use the XP Cluster Extension resource configuration tool to set resource properties. When using the resource configuration tool: You must run the tool on a Server Core or Hyper-V cluster node. You cannot run the tool on a remote management station. You cannot use the resource configuration tool to add or delete a resource.
55
You can use the tool to configure multiple resources at one time. This saves time because you can switch resources from the tool menu. The resource configuration tool is recommended for Hyper-V and Server Core environments because the properties you enter are validated. When you configure XP Cluster Extension resource properties from a remote management station or through the CLI, the properties you enter are not validated.
56
To use the resource configuration tool: 1. 2. 3. Open a command window and enter ClxXpResConfig.exe. Select the resource you want to change in the XP CLX resource menu. Make the necessary parameter changes, and then click OK.
Configuring XP RAID Manager instance numbers for XP RAID Manager service To configure XP RAID Manager instance numbers from the Failover Cluster Management Parameters tab or the resource configuration tool: 1. To add an instance: a. b. 2. 3. Click Add to open the Add RAID Manager instances window. Select one or more instances and click OK.
To remove an instance, select it and click Remove. Click OK to save your changes and close the window.
57
Configuring the XP RAID Manager device group details To configure XP RAID Manager device group details from the Failover Cluster Management Parameters tab or the resource configuration tool: 1. 2. Select a value in the RM XP device group menu. Click OK to save your changes and close the window.
Configuring XP RAID Manager device group advanced properties The Parameters tab of the XP Cluster Extension resource offers basic settings and is used to enter environment data, such as XP RAID Manager instances. The more advanced settings can be accessed through additional buttons in the Parameters tab. To configure XP RAID Manager advanced properties from the Failover Cluster Management Parameters tab or the resource configuration tool: 1. Click the Advanced button to open the Advanced Fence Level Failover Behavior window. The available settings in this window depend on the fence level used with your device groups. For the DATA fence level, you can update the Data lose mirror and DATA lose data center values. See DataLoseDataCenter on page 133 and DataLoseMirror on page 134 for more information about these values.
58
For the ASYNC fence level, you can update the ASYNC takeover timeout value. See AsyncTakeoverTimeout on page 130 for more information about this value.
For the journal fence level, you can update the Journal data currency on S-VOL and ASYNC takeover timeout values. See JournalDataCurrency on page 136 and AsyncTakeoverTimeout on page 130 for more information about these values.
2. 3. Notes
Update the settings as needed, and then click OK to close the window. Click OK to save your changes and close the window.
After a device group is configured in the resource configuration utility, do not change the device group name or swap the name with another device group name in the HORCM file. If you do this, restart the HORCM manager instance and reconfigure the XP Cluster Extension resource. Do not use HORCM commands to change the device group property for a device group that is configured for an XP Cluster Extension resource. If you do this, the changed property is not reflected immediately in the Parameters tab. To work around this situation, re-select the device group from the XP RM device group menu in the Parameters tab. Configuring server data center assignments To configure server data center assignments from the Failover Cluster Management Parameters tab or the resource configuration tool: 1. 2. 3. To remove a data center assignment, select the assignment, and then click Remove. To modify a data center assignment, select the assignment, and then click Modify. Enter the new Data center name in the Modify Node in Data Center List window, and then click OK. To add a data center assignment, click Add. Select a host and a data center, and then click OK.
59
4.
Changing failover and failback behavior To configure failover and failback behavior from the Failover Cluster Management Parameters tab or the resource configuration tool: 1. Click Failover/Failback to display the Failover/Failback window.
2. 3.
Update the ApplicationStartup and AutoRecover values as needed, and then click OK. Click OK to save your changes and close the Properties window or Resource Configuration tool.
Activating the pair/resync monitor The pair/resync monitor detects and responds to suspended XP Continuous Access links if the ResyncMonitor object is set to YES. If the ResyncMonitorAutoRecover object is set to YES, automatic disk pair resynchronization is also activated. When the resource is taken offline, the monitor is stopped for the XP RAID Manager device group used for this resource. CAUTION: If a resource cannot be taken offline manually, and goes into a failed state, the cluster administrator must disable monitoring of the device group for this resource. To avoid data corruption, this task must be part of the recovery procedure when XP Cluster Extension is deployed in an MSCS environment. See Stopping the pair/resync monitor on page 115. You must ensure that the pair/resync monitor does not monitor and resynchronize the disk pair (device group) from both disk array sites. To activate the pair/resync monitor from the Failover Cluster Management Parameters tab or the resource configuration tool:
60
1.
2. 3. 4. 5. 6.
Select the Use pair/resync monitor check box to set the ResyncMonitor object to YES. Select the Pair/resync monitor autoRecovery check box to set the ResyncMonitorAutoRecover object is to YES. If you want to change the monitoring interval (ResyncMonitorInterval), enter a value in the Monitor interval box. Click OK to save your changes and close the Pair/Resync Monitor Properties window. Click OK to save your changes and close the Properties window or Resource Configuration tool.
TIP: You can activate ResyncMonitor from the Microsoft CLI. For example, if your XP Cluster Extension resource is clx_fileshare, enter the following command: C:\>cluster resource clx_fileshare /privprop ResyncMonitor=yes.
Configuring takeover actions Pre-executables and post-executables can be defined to be executed before or after XP Cluster Extension invokes its takeover functions. To configure takeover actions from the Failover Cluster Management Parameters tab or the resource configuration tool: 1. Click Pre/Post Exec to display the Pre/Post Executable Properties window.
61
2.
Update the PreExecScript, PostExecScript, and PostExecCheck values as needed, and then click OK. When configuring pre/post takeover executable paths, enter the full path to the script. If a script fails, the XP Cluster Extension resource will fail.
3.
Click OK to save your changes and close the Properties window or Resource Configuration tool.
Configuring Rolling Disaster Protection To configure rolling disaster protection from the Failover Cluster Management Parameters tab or the resource configuration tool: 1. Click Rolling Disaster to display the Rolling Disaster Protection window.
2.
Add mirror units to each data center: a. b. c. Click Add MU # to DC A. Select mirror units from the list, and click OK. Repeat the previous steps for Data Center B.
3. 4.
Update the BCResyncEnabledA, BCResyncEnabledB, BCResyncMuListA, and BCResyncMuListB values as needed, and then click OK. Click OK to save your changes and close the Properties window or Resource Configuration tool.
NOTE: For more information, see Setting objects to enable rolling disaster protection on page 141.
62
NOTE: When you configure XP Cluster Extension resource properties from a remote management station through the MMC, which uses the standard Microsoft Properties tab, the properties you enter are not validated, so you must enter the property values accurately, and verify them against the XP Cluster Extension documentation. When you use this option, you will see the default Microsoft properties page instead of the XP Cluster Extension Parameters tab. For more information about using the MMC, see Remote management of XP Cluster Extension resources in a cluster (Windows Server 2008/2008 R2) on page 73 and your Microsoft documentation.
63
The following example changes the FenceLevel property of the XP Cluster Extension resource clx_fileshare:
C:\>cluster resource clx_fileshare /privprop FenceLevel=data
The following example changes the XP RAID Manager instance used for the XP Cluster Extension resource clx_fileshare from 10 to 99, and then adds instance 22 to provide redundancy:
C:\>cluster resource clx_fileshare /privprop RaidManagerInstances="99 22"
The following example changes the name of XP Cluster Extension resource XP Cluster Extension resource1 to XP Cluster Extension resource2:
cluster resource "XP Cluster Extension resource1" /ren:"XP Cluster Extension resource2"
IMPORTANT: If you plan to use the default values for these properties, no UCF is required. To configure properties using a UCF: 1. 2. 3. Take the XP Cluster Extension resource offline. Open the sample UCF.cfg file located in %HPCLX_PATH%\sample. Update the file with the property values you want to use. For more information on the available properties, see Chapter 8 on page 123. 4. 5. Save the file and copy it to the following directory on all cluster nodes: %HPCLX_PATH%\conf. Bring the XP Cluster Extension resource online.
64
For Windows Server 2008/2008 R2, use the Failover Cluster Management GUI, cluster commands in the CLI, or the MMC for remote management. For Server Core or Hyper-V Server, use cluster commands in the CLI or the MMC.
Adding dependencies using Failover Cluster Management (Windows Server 2008/2008 R2)
You can add dependencies with the GUI on a local node or by using the MMC to run the Failover Cluster Management application. 1. 2. 3. 4. 5. Open Failover Cluster Management. Select a service or application that has an XP Cluster Extension resource. Double-click a disk in the summary pane. Click the Dependencies tab, and then click Insert. Select the XP Cluster Extension resource in the Resource menu.
6.
65
66
Figure 4 Service or application example (quorum service control disks not shown)
.
67
XP Cluster Extension is configured as a single resource to enable read/write access to the physical disk resource used for the CLX_SHARE cluster group. The physical disk resource depends on the XP Cluster Extension resource and can be brought online only when the XP Cluster Extension resource is already online. Independent of this resource tree, the network card will be configured with the CLX_SHARE service or application's (resource group's) IP address and network name. If all those resources have been brought online, the file share can be started. To configure the XP Cluster Extension resource according to the configuration in Figure 6 on page 68: 1. 2. 3. 4. 5. Log in to the host3_DCB system with the Administrator account. Create the file share service or application with all previously mentioned resources and its dependencies, except the XP Cluster Extension resource on host3_DCB. Create a new resource of type XP Cluster Extension and add systems host2_DCA, host3_DCB, and host4_DCB to its possible owners. Change the restart behavior of the XP Cluster Extension resource so that the resource can be restarted and so that the restart affects the group. Set the number of restarts to 0. Edit the resource properties, including the following information: XP RAID Manager instances XP RAID Manager device group details Server data center assignments Click the Pre/Post Exec button and add clxpre.exe with its full path. (The clxpre.exe program is an example. It is not included in the XP Cluster Extension product.) Add a dependency on the XP Cluster Extension resource CLX_FILESHARE to the physical disk resource Disk_32b_00b. Check the cluster service, group, and resource settings with the following commands: C:\>cluster group CLX_SHARE /prop C:\>cluster resource CLX_FILESHARE /prop
6. 7. 8.
68
9.
For Windows Server 2003 only: Set the XP Cluster Extension resource property RestartAction to zero (0), or check the Do not restart check box in the resource's Advanced tab window, and then use the following commands to check if the value has changed. For example: C:\>cluster resource CLX_FILESHARE /prop RestartAction=0 C:\>cluster resource CLX_FILESHARE /prop If you are using the CLI to set resource properties, the equivalent command is cluster res CLX_FILESHARE /prop RestartAction=0.
10. For Windows Server 2008/2008 R2 only: Enable the XP Cluster Extension resource property If restart is unsuccessful, fail over all resources in this service or application. This value is set in the Policies tab in the Failover Cluster Management Properties window. If you are using the CLI to set resource properties, the equivalent command is cluster res CLX_FILESHARE /prop RestartAction=0. 11. Bring the service or application online on host3_DCB by using the Failover Cluster Management GUI, Cluster Administrator GUI, or the following cluster command in the CLI: C:\>cluster group CLX_SHARE /online:host3_DCB 12. Verify that the XP Cluster Extension resource and all other CLX_SHARE application resources are brought online: C:\>cluster group CLX_SHARE 13. Take the service or application offline, and verify that all resources are stopped: C:\>cluster group CLX_SHARE /offline C:\>cluster group CLX_SHARE 14. Bring the service or application online again and verify that all resources are available: C:\>cluster group CLX_SHARE /online:host3_DCB C:\>cluster group CLX_SHARE 15. Check the cluster service settings of system host4_DCB, and the group and resource settings. 16. Move the service or application to system host4_DCB and verify that all resources are available: C:\>cluster group CLX_SHARE /moveto:host4_DCB C:\>cluster group CLX_SHARE 17. Check the cluster service settings of system host2_DCA, and the group and resource settings. 18. Move the service or application to system host2_DCA and verify that all resources are available: C:\>cluster group CLX_SHARE /moveto:host2_DCA C:\>cluster group CLX_SHARE 19. Check the cluster service settings of system host1_DCA, and the group and resource settings. 20. Take the service or application offline, and verify that all resources are stopped: C:\>cluster group CLX_SHARE /offline C:\>cluster group CLX_SHARE
69
21. Change the XP Cluster Extension resource to be able to restart on another system: C:\>cluster resource CLX_FILESHARE /prop RestartAction=2 C:\>cluster resource CLX_FILESHARE /prop
Deleting a resource
Deleting a running resource causes the resource and its dependents to go offline. CAUTION: Deleting a running XP Cluster Extension resource does not remove the resource_name.online file and does not remove the device group from the list of monitored device groups if the pair/resync monitor is used to monitor the XP Continuous Access Software link. Therefore, the device group must be deleted from the list of monitored device groups manually using the clxchkmon command after deleting the XP Cluster Extension resource. See Stopping the pair/resync monitor on page 115.
70
CAUTION: Failure to delete the monitored device group from the list of monitored device groups can cause data corruption if the ResyncMonitorAutoRecover attribute is set to YES. When deleting resources: For Windows Server 2008/2008 R2, use the GUI or CLI. For Server Core or Hyper-V Server, use the CLI or the MMC. For Windows Server 2003, use the GUI or CLI. For more information on deleting resources, see your Microsoft documentation.
71
72
Administration
XP Cluster Extension administration includes remote management of resources and monitoring of system resources and logs.
Remote management of XP Cluster Extension resources in a cluster (Windows Server 2008/2008 R2)
You can use the MMC with Failover Cluster Management to manage clusters and configure XP Cluster Extension resources. Note the following when configuring XP Cluster Extension resources by using the MMC from a remote management station: When you use the MMC to remotely configure XP Cluster Extension resource properties in a Server Core or Hyper-V Server cluster node, the Failover Cluster Management GUI on the remote management station displays the standard Microsoft Properties tab instead of the customized XP Cluster Extension Parameters tab. For more information about the Properties tab, see Setting XP Cluster Extension resource properties using the MMC on page 62. When you install XP Cluster Extension into a Windows Server 2008/2008 R2 environment, the resource extension DLL is registered by default, which prevents you from configuring an XP Cluster Extension resource from a remote management station. If you need to remotely configure an XP Cluster Extension resource in a Windows Server 2008/2008 R2-based cluster, unregister clxmscsex.dll from the cluster node, which allows you to configure the XP Cluster Extension resource using the standard Microsoft Properties tab. Use the command cluster /UNREGADMINEXT:clxmscsex.dll to unregister the DLL. CAUTION: Configuring XP Cluster Extension resources using the MMC from a remote management station is supported using only the standard Microsoft Properties tab. Do not try to use the customized XP Cluster Extension Parameters tab for this purpose. If you see the customized XP Cluster Extension Parameters tab when you try to configure an XP Cluster Extension resource from a remote management station using the MMC, you must unregister clxmscsex.dll from the cluster node. Use the command cluster /UNREGADMINEXT:clxmscsex.dll to unregister the DLL. Unregistering the DLL allows you to configure the resource using the standard Microsoft Properties tab. This situation might occur if you have a cluster with both Server Core or Hyper-V Server and Windows Server 2008/2008 R2 cluster nodes. When you configure XP Cluster Extension resource properties from a remote management station through the MMC, which uses the standard Microsoft Properties tab, the properties you enter are not validated, so you must enter the property values accurately, and verify them against the XP Cluster Extension documentation.
73
you will see the customized XP Cluster Extension Parameters tab. The customized tab is displayed because the resource extension DLL is registered by default on Windows Server 2003 cluster nodes, which prevents you from configuring the XP Cluster Extension resource from a remote management station. If you need to configure an XP Cluster Extension resource remotely for a Windows Server 2003based cluster, unregister clxmscsex.dll from the cluster node, which allows you to remotely configure an XP Cluster Extension resource using the standard Microsoft Properties tab. Use the command cluster /UNREGADMINEXT:clxmscsex.dll to unregister the DLL. NOTE: Configuring XP Cluster Extension resources by using Cluster Administrator from a remote management station is supported using only the standard Microsoft Properties tab. Do not try to use the customized XP Cluster Extension Parameters tab for this purpose. When you configure XP Cluster Extension resource properties from a remote management station through the Cluster Administrator, which uses the standard Microsoft Properties tab, the properties you enter are not validated, so you must enter the property values accurately, and verify them against the XP Cluster Extension documentation.
System resources
Monitor the system resources on a regular basis as part of Windows administration. If any system resource usage by the cluster service is reaching maximum levels, stop and then restart the cluster service. This action automatically fails over the resources and resets system resources. See the MSCS documentation for information about how to stop a cluster service. An alternate method is to manually move all resources to another node in the cluster before stopping the cluster service. After all resources are successfully moved to another node, stop and then restart the cluster service; then, manually move back all resources.
Logs
If the XP Cluster Extension log files need to be cleared and reset (for example, to reduce disk space usage), you can delete the files. XP Cluster Extension automatically creates new log files. TIP: Archive the log files before deleting them.
74
75
Figure 8 on page 77 shows an example resource graph of the CLX_WEB_SERVER service group. XP Cluster Extension is configured as a single resource to enable read/write access to the disk groups used for the web server service group. The DiskGroup resources depend on the XP Cluster Extension resource, and the Mount resources can be brought online only when the DiskGroup resources and the XP Cluster Extension resource are already online. Independent of this resource tree, the network card will be configured with the web server service group IP address. When all these resources have been brought online, the web server can be started.
76
5.
77
10. Start the VCS engine on dawn: #hastart 11. Start the VCS engine on sunset and dusk, and switch the web server service group to dawn and later to sunset and dusk. Before you switch the service group to the remote data center, make sure that the XP Continuous Access Software links are configured for bidirectional mirroring and that XP RAID Manager instances include the device group, configured for the web server service group. To switch the web server service group, enter: #hagrp switch CLX_WEB_SERVER to system_name 12. Verify that all XP Cluster Extension and web server service group resources are brought online: #hagrp -display
78
79
BCEnabledA, BCEnabledB BCResyncEnabledA, BCResyncEnabledB } str ApplicationDir = "/etc/opt/hpclx/" str XPSerialNumbers[] str RaidManagerInstances[] str DeviceGroup str DC_A_Hosts[] str DC_B_Hosts[] str FenceLevel = never str DataLoseMirror = no str DataLoseDataCenter = yes str JournalDataCurrency = yes int AsyncTakeoverTimeout = 1800 str ApplicationStartup = fastfailback int ResyncWaitTimeout = 300 str FastFailbackEnabled = yes str AutoRecover = no str ResyncMonitor = no str ResyncMonitorAutoRecover = no str ResyncMonitorInterval = 60 str PreExecScript str PostExecScript str PostExecCheck = no str BCMuListA[] str BCMuListB[] str BCResyncMuListA[] str BCResyncMuListB[] str BCEnabledA = no str BCEnabledB = no str BCResyncEnabledA = no str BCResyncEnabledB = no)
Adding an XP Cluster Extension resource using the VCS Cluster Manager GUI
1. Open the Cluster Explorer.
80
2.
Select the service group, right-click, and choose Add Resource; or, click Add Resource in the Cluster Explorer toolbar.
3. 4.
Enter the resource name in the Resource name box. Select ClusterExtensionXP from the Resource Type list.
81
5.
To change attribute values of the new XP Cluster Extension resource, click the button in the Edit column of the value you want to change, and modify the values as desired in the Edit Attribute window.
6. 7.
Select the Critical and Enabled boxes in the Add Resource window. Click OK.
82
The following commands change the XP RAID Manager instance used for the XP Cluster Extension resource clx_web, and then add an additional instance to provide redundancy:
# hares -display clx_web -attribute RaidManagerInstances # hares -modify clx_web RaidManagerInstances -update 90 # hares -modify clx_web RaidManagerInstances -add 22
The following example displays all attributes of the XP Cluster Extension resource clx_web:
# hares -display clx_web
83
3.
4. 5.
Click Edit for the attribute you want to change. Enter changes to the attribute value. For nonscalar attributes, use the + and x buttons to add or remove elements. Do not change the attribute's scope to local; all XP Cluster Extension attributes are global in scope. Click OK to accept the change.
6.
84
Enabling and bringing an XP Cluster Extension resource online using the CLI
Syntax
hares modify ClusterExtensionXP_resource Enabled 1 hares online ClusterExtensionXP_resource sys system_name The following example enables and brings the XP Cluster Extension resource clx_web online:
# hares -modify clx_web Enabled 1 # hares -online clx_web -sys sunrise
Enabling and bringing an XP Cluster Extension resource online using the VCS Cluster Manager GUI
1. 2. 3. Open the Cluster Explorer Right-click the resource name. Select online, and then select the system where you want to bring the resource online.
85
4.
Syntax
hares offline ClusterExtensionXP_resource sys system_name hares offprop ClusterExtensionXP_resource sys system_name The following example takes the XP Cluster Extension resource clx_web offline or propagates the offline request to all its parent resources before taking it offline:
# hares -offline clx_web -sys sunrise # hares -offprop clx_web -sys sunrise
Taking an XP Cluster Extension resource offline using the VCS Cluster Manager GUI
1. 2. 3. 4. Open the Cluster Explorer Right-click the resource name. Select Offline or Offline Prop, and then the select system where you want to bring the resource offline. Click YES in the dialog box to confirm.
86
CAUTION: Failure to delete the monitored device group from the list of monitored device groups can cause data corruption if the ResyncMonitorAutoRecover attribute is set to YES.
87
If the service group is offline, you can remove the XP Cluster Extension resource from the service group. To remove the resource, see Deleting an XP Cluster Extension resource on page 87.
Log-level reporting
The default setting for the pair/resync monitor's log facility is log level WARNING in the syslog. Solaris does not log warning messages to syslog by default. To receive messages from the pair/resync monitor in case of XP Continuous Access Software link failures, add the following line to the /etc/syslog.conf file: user.warning /var/adm/messages This line ensures that you will be notified of XP Continuous Access Software link failures if you use the pair/resync monitor.
88
In some cases, this behavior could lead to failed XP Cluster Extension resources: XP Cluster Extension uses XP RAID Manager instances to communicate with the remote XP disk array. Depending on the settings of the XP RAID Manager instance timeout parameter and the number of remote instances, the online operation could time out. This can happen if the local XP RAID Manager instance cannot reach the remote XP RAID Manager instance. XP Cluster Extension tries to resynchronize disk pairs and waits until the XP RAID Manager device group is in PAIR state if the ApplicationStartup attribute is set to RESYNCWAIT. Depending on the XP RAID Manager version and the XP firmware version this could be a full resynchronization and may take longer than the online timeout interval. Even if the XP RAID Manager version and the XP firmware version allow a delta resynchronization, the delta between the primary and the secondary could be big enough for the copy process to exceed the online timeout value. The ResyncWaitTimeout attribute can automatically lead to failed XP Cluster Extension resources when set higher than the online timeout interval. If running in fence level ASYNC, the default value of the AsyncTakeoverTimeout can cause the resource to fail because its value is set beyond the resource online timeout interval. This is done because the takeover process for fence level ASYNC can take much longer when slow communications links are in place. To prevent takeover commands from being terminated by the takeover timeout before finishing, measure the time to copy the installed XP disk array cache and adjust the resource online timeout interval according to the measured copy time. When measuring the copy time, measure only the slowest link used for XP Continuous Access Software. This ensures that the XP disk array cache can be transferred from the remote XP disk array, even in the event of a single surviving replication link between the XP disk arrays. Because the failover environment is dispersed into two (or more) data centers, the failover time cannot be expected to be the same as it would be in a single data center with a single shared disk device. Therefore, adjust the online timeout values, the monitor interval of the XP Cluster Extension resource, and the service group using the XP Cluster Extension resource based on failover tests performed to verify the proper configuration setup.
89
system list (which should be a local system). This fails because the state of the local XP disk array has not changed. The service group fails until the service group is brought online on a system connected to the remote XP disk array. The service group online process takes longer and it does not access the VCS configuration file.
90
XP Cluster Extension resources go offline because the primary volume state changes from P-VOL_PAIR to P-VOL_PSUE and the secondary volume state changes from S-VOL_PAIR to EX_NORMT. The state combination P-VOL_PSUE and EX_NORMT is not designed to be handled automatically because the remote side (remote XP RAID Manager/ disk array), which has no status information available, could have more current data then the primary (P-VOL_PSUE) site. In this particular case, you are required to investigate data currency and determine the appropriate action to be taken.
or 1. Create the forceflag resource_name.forceflag in the ApplicationDir path. Default: /etc/opt/hpclx/ 2. 3. Bring the XP Cluster Extension resources online. Depending on the attributes set for the resources, you might need to manually resynchronize the XP Continuous Access disk pairs.
91
92
93
The configuration example in Figure 9 on page 93 assumes the following information about the cluster: There are four nodes in the cluster: Host1, Host2, Host3, and Host4. There are two XP disk arrays with serial numbers 30047 and 30053. The device group clxwebvgs is configured in the XP RAID Manager /etc/horcm101.conf file. XP Cluster Extension invokes the pre-executable script clxweb_pre_takeover.sh and the postexecutable script clxweb_post_takeover.sh. These files can be an executable script or a program of your choice. For RHCS, the configuration file /etc/opt/hpclx/conf/CLXXP.config is associated with the RHCS service CLXWEB that is configured to use the XP Cluster Extension resource agent script. RHCS invokes the resource agent script to start the CLXWEB service, which checks the disk pair states before the volume groups vgweb and vghtdocs are activated and the web server is started. The XP RAID Manager device group clxwebvgs includes all disks for the LVM volume groups vgweb and vghtdocs. The sample CLXXP.config file shows the contents of the configuration file with the described failover behavior. For SLE HA, the XP Cluster Extension resource configuration file /etc/opt/hpclx/conf/ CLXXP.config is associated with the SLE HA resource CLXWEB. SLE HA invokes the resource agent script, /usr/lib/ocf/resource.d/heartbeat/CLXXP, which checks the disk pair states before the volume groups vgweb and vghtdocs are activated and the web server is started. The XP RAID Manager device group clxwebvgs includes all disks for the LVM volume groups vgweb and vghtdocs. The sample CLXXP.config file shows the contents of the configuration file with the described failover behavior. Sample configuration file:
COMMON LogLevel APPLICATION XPSerialNumbers RaidManagerInstances DeviceGroup DC_A_Hosts DC_B_Hosts #optional parameter FenceLevel ApplicationStartup AutoRecover DataLoseMirror DataLoseDataCenter PreExecScript PostExecScript
info CLXWEB 30047 30053 101 clxwebvgs Host1 Host2# Host3 Host4#
# raid manager device group systems in data center A systems in data center B
(only necessary if other than default) data # values: data | never | async resyncwait # values: fastfailback | resyncwait yes # possible values: yes | no no # possible values: yes | no no # possible values: yes | no /etc/opt/hpclx/clxweb_pre_takeover.sh /etc/opt/hpclx/clxweb_post_takeover.sh
The ApplicationStartup object is set to RESYNCWAIT to configure the service (RHCS) or resource group (SLE HA) to wait for a pair resynchronization in the event that the service (RHCS) or resource group (SLE HA) fails over to an adoptive node. The AutoRecover object is set to YES, which means that you use XP Cluster Extension capabilities to automatically recover suspended disk pair states. The DataLoseMirror object and DataLoseDataCenter object are set to NO, which means XP Cluster Extension does not allow you to start the service (RHCS) or resource group (SLE HA) automatically if the disk pair is suspended or a takeover operation leads to a suspended disk pair.
94
XP Cluster Extension enables read/write access to the disk groups used for the web server service or resource group. Activation of the volume groups depends on a successful return code from XP Cluster Extension. The logical volumes can be mounted only when their volume groups are active and XP Cluster Extension allows read/write access to the disk group. After the file system for the web server's executables and content data is mounted and checked, the NIC is configured with the web server's IP address.
Configuration overview
1. 2. 3. 4. Create an RHCS shared resource. For instructions, see Creating an RHCS XP Cluster Extension shared resource on page 95. Create an RHCS service using the XP Cluster Extension shared resource. For instructions, see Creating an RHCS service using the XP Cluster Extension shared resource on page 97. Configure the pair/resync monitor if you plan to use the pair/resync feature (optional). For instructions, see Configuring the pair/resync monitor on page 108. Activate the pair/resync monitor (optional). For instructions, see Activating the pair/resync monitor on page 109.
95
8.
9.
Click Submit.
9.
Click OK.
10. Select File > Save to save the configuration changes. The service configuration in /etc/cluster/cluster.conf is updated. 11. Click Send to Cluster to propagate the cluster configuration to the other cluster nodes.
96
Configuration overview
1. Create a service at the root of the dependency tree using the XP Cluster Extension shared resource created in Creating an RHCS XP Cluster Extension shared resource on page 95. This ensures that the XP Cluster Extension resource is the first resource to start in a service. All other resources in this service should be configured as child resources to XP Cluster Extension. Use one of the following procedures: Using Conga to create a service, page 97 Using system-config-cluster to create a service, page 98 Create a configuration file. For instructions, see Creating the XP Cluster Extension resource configuration file, page 99. Test the service configuration. For instructions, see Testing the service configuration, page 100.
2. 3.
10. Select an XP Cluster Extension shared resource from the Use an existing global resource menu.
97
11. Click Submit. Conga saves the configuration information and updates all of the other cluster nodes. NOTE: To add additional resources to the service, use the Add a child feature.
98
6.
Enter the service name in the Name box, and then click OK. IMPORTANT: The service name must match the name that is defined for the APPLICATION property in the configuration file CLXXP.config. The Service Management dialog box appears.
7.
Click Add a Shared Resource to this service. The Resource Configuration dialog box appears.
8. 9.
Select CLXXP in the Select a Resource Type menu, and then click OK. To add additional resources to the service, select the XP Cluster Extension resource and click Attach a new Private Resource to the Selection. Select the resource to be configured and provide the required resource agent parameters.
10. Click Close to close the Service Management window. 11. Select File > Save to save the configuration changes. The service configuration in /etc/cluster/cluster.conf is updated. 12. Click Send to Cluster to propagate the cluster configuration to the other cluster nodes.
99
3.
In the configuration file (CLXXP.config), enter the appropriate values for: XPSerialNumbers RaidManagerInstances DeviceGroup DC_A_Hosts DC_B_Hosts ResyncMonitor FenceLevel DataLoseMirror DataLoseDataCenter
NOTE: For more information about these values, see Chapter 8 on page 123. For example:
APPLICATION XPSerialNumbers RaidManagerInstances DeviceGroup DC_A_Hosts DC_B_Hosts ResyncMonitor FenceLevel DataLoseMirror DataLoseDataCenter CLXWEB 30060 30080 101 vgnetscape sys1A sys2A sys1B sys2B yes never yes yes
IMPORTANT: If you are using Device Mapper Multipath, configure the multipath_rescan.sh script as a PostExecScript. For more information, see Rescanning multipath devices on page 106. 4. Copy the updated CLXXP.config file to the other cluster nodes.
100
4.
5.
Relocate the service to a remote data center node. a. Verify that the disks CLXWEB uses are in the PAIR state: #export HORCMINST=101 #pairdisplay fcx g clxwebvgs b. Move the service CLXWEB to Host3. Verify that service has successfully moved and started on Host3: #clusvcadm -r CLXWEB -m Host3 #clustat -s CLXWEB c. Verify that the disk pairs are now in read/write mode on the remote storage system: #pairdisplay fcx g clxwebvgs d. After verifying that the service CLXWEB, including XP Cluster Extension, can be run on each system in the cluster, move the service back to its primary system: #clusvcadm -r CLXWEB -m Host1 #clustat -s CLXWEB #pairdisplay fcx g clxwebvgs
101
NOTE: For instructions on stopping or disabling an XP Cluster Extension service using Conga or the Cluster Configuration Tool, see the RHCS documentation.
Configuration overview
1. 2. 3. Create and configure an XP Cluster Extension resource. For instructions, see Creating and configuring an XP Cluster Extension resource on page 102. Configure the pair/resync monitor if you plan to use the pair/resync feature (optional). For instructions, see Configuring the pair/resync monitor on page 108 Activate the pair/resync monitor (optional). For instructions, see Activating the pair/resync monitor on page 109.
3.
102
3.
In the configuration file (CLXXP.config), enter the appropriate values for: XPSerialNumbers RaidManagerInstances DeviceGroup DC_A_Hosts DC_B_Hosts ResyncMonitor FenceLevel DataLoseMirror DataLoseDataCenter
NOTE: For more information about these values, see Chapter 8 on page 123. For example:
APPLICATION XPSerialNumbers RaidManagerInstances DeviceGroup DC_A_Hosts DC_B_Hosts ResyncMonitor FenceLevel DataLoseMirror DataLoseDataCenter CLXWEB 30060 30080 101 vgnetscape sys1A sys2A sys1B sys2B yes never yes yes
IMPORTANT: If you are using Device Mapper Multipath, configure the multipath_rescan.sh script as a PostExecScript. For more information, see Rescanning multipath devices on page 106. 4. Copy the updated file to the other cluster nodes.
103
4.
Select the following options for the XP Cluster Extension resource: Name
Class Provider Type
Value
ocf heartbeat CLXXP
5.
Configure the instance attributes for the resource by selecting the app parameter. In the Value box, enter the APPLICATION tag name configured in the XP Cluster Extension configuration file (/etc/opt/hpclx/CLXXP.config). Configure the start, stop, and monitor operations for the XP Cluster Extension resource. Add additional primitive resources to the group. For example: If LVM and File System are used as the second and third resources of the group, the Summary dialog box is similar to the following:
6. 7.
8. 9.
Add a resource colocation constraint between the resource group ID assigned in Step 2 and the last resource in the group hierarchy. Set location constraints for the group ID to achieve the required failover order for the group.
10. Set the operation defaults to control failover behavior. To specify that when a resource fails, the resources attempts to restart on the same node or another node in the cluster, use the following settings: Name
requires on-fail timeout
Value
nothing restart 30
11. Set the migration-threshold value. This value defines the number of failures that can occur on a node before the node becomes ineligible to host the resource and the resource fails over to another node. Set this value to 1 for XP Cluster Extension. 12. Disable automatic failback by using resource constraints and setting resource-stickiness to the lowest value compared with the other resource location constraints.
104
Value
Enter a resource group ID. true true
The Linux HA Management Client prompts you to enter the resource type details. 3. Set the value of the app parameter to the APPLICATION tag name configured in the XP Cluster Extension resource configuration file (/etc/opt/hpclx/conf/CLXXP.config). NOTE: The resource hierarchy depends on the order in which resources are added. Always add XP Cluster Extension resources as the first resource in a group. 4. 5. Add an LVM resource to the group created in Step 2. Set the value of the volgrpname parameter to the name of the volume group managed by the XP Cluster Extension resource. Add a Filesystem resource. Set the following values as appropriate for your environment: device directory fstype Configure the start, stop, and monitor operations for the XP Cluster Extension resource and all other resources. Add a location constraint to the resource group ID assigned in Step 2. Add an Expression to the location constraint. For information on the settings to enter, see the SLE HA documentation. Select the Expression you added in Step 8 and enter a value in the Score box. A high score indicates a high priority for the selected location constraint. For example, if there are three nodes N1, N2, and N3 and if N1 has highest priority followed by N2, and then N3, create three location constraints for the same resource, and assign the scores as 1000, 500, and 200, respectively. 10. Add a resource colocation constraint between the resource group ID assigned in Step 2 and the last resource in the group hierarchy. 11. Right-click the resource in the Linux HA Management Client GUI, and then select Start.
6. 7. 8. 9.
105
4. 5.
106
1.
Copy the multipath_rescan.sh script to the /etc/opt/hpclx/conf folder, and rename the file as follows: RHCS: multipath_rescan_ServiceName.sh SLE HA: multipath_rescan_ResourceGroupName.sh
2.
Open the script file and enter the user-friendly names of all multipath devices that are in the volume groups configured for the RHCS service or SLE HA resource group. For instructions on finding the user-friendly name of a multipath device, see Finding the user-friendly name of a multipath device on page 107. In the following example, you specify the user-friendly names (mpathab, mpathac, and mpathad) for the variable MULTIPATH_DEVICES:
MULTIPATH_DEVICES=( mpathab mpathac mpathad )
3.
Enter the multipath_rescan.sh script for the PostExecScript object in the Cluster Extension resource configuration file. You must specify the full path name of the multipath_rescan.sh script. For example:
Attr aaa-
107
2.
Obtain the SCSI ID for a multipath device. Use the scsi_id command for SUSE Linux Enterprise Server, and the hp_scsi_id command for Red Hat Enterprise Linux. SUSE Linux Enterprise Server:
[root@node1 ]# scsi_id -guns /block/dm-21 360060e8014424600000142460000039d
3.
Use the multipath command to obtain the user-friendly name for the multipath device's generated SCSI ID. In the following example, mpathq is the user-friendly name of a multipath device:
[root@node1]# multipath -ll | grep 360060e8014424600000142460000039d | awk '{print $1}' mpathq
108
2.
Choose the port that the pair/resync monitor will use, and then add the following line to the services file: clxmonitor nnnnn /tcp where nnnnn is the port number. For example:
clxmonitor clxmonitor 22222/udp 22222/tcp # CLX Pair/Resync Monitor # CLX Pair/Resync Monitor
Timing considerations
XP Cluster Extension gives priority to XP disk array operations over cluster software operations. If XP Cluster Extension invokes disk pair resynchronization or gathers information about the remote XP disk array, XP Cluster Extension waits until the requested status information is reported. This ensures the priority of data integrity over cluster software failover processes. This behavior can lead to failed resources, as follows: XP Cluster Extension uses XP RAID Manager instances to communicate with the remote XP disk array. Depending on the setting of the XP RAID Manager instance timeout parameter and the number of remote instances, the service or resource group start operation can time out. This can occur if the local XP RAID Manager instance cannot reach the remote XP RAID Manager instance. In an SLE HA environment, the timeout value defined for the start operation can be adjusted to the appropriate value to avoid this situation. In an RHCS environment, the timeout value depends on the timeout value specified in script resource agent (/usr/share/cluster/script.sh). XP Cluster Extension tries to resynchronize disk pairs and waits until the XP RAID Manager device group is in the PAIR state if the ApplicationStartup object is set to RESYNCWAIT. XP RAID Manager
109
and the XP firmware fully support delta resynchronization; however, the delta between the primary and secondary disks can be large enough for the copy process to exceed the service or resource group startup timeout value. The ResyncWaitTimeout object can cause the resource to fail if its value is higher than the resource startup timeout value. If running in fence-level ASYNC, the default value of AsyncTakeoverTimeout can cause the resource to fail if its value is set beyond the recommended startup timeout value. This is done because the takeover process for fence-level ASYNC can take longer when communication links are slow. To prevent the takeover timeout from terminating the takeover commands, measure the time required to copy the installed XP disk array cache and adjust the resource startup timeout interval. When measuring the copy time, measure only the slowest link used for XP Continuous Access Software. This ensures that the XP disk array cache can be transferred from the remote XP disk array, even in the event of a single surviving replication link between the XP disk arrays.
NOTE: Because the failover environment is dispersed over two or more data centers, the failover time cannot be expected to be the same as that of a single data center with a single shared disk device. Therefore, you must adjust the service or resource group startup timeout value and the monitor interval of the XP RAID Manager device group based on failover tests you perform to verify the proper configuration setup.
110
Timing considerations
XP Cluster Extension is designed to prioritize XP disk array operations over application service startup operations. If XP Cluster Extension invokes disk pair resynchronization operations or gathers information about the remote XP disk array, XP Cluster Extension waits until the requested status information is reported. This prioritizes data integrity over application service startup and failover behavior. Because the takeover timing depends on the configuration of your XP RAID Manager environment and the settings in UCF.cfg, these considerations must be evaluated: XP Cluster Extension uses XP RAID Manager instances to communicate with the remote XP disk array. Depending on the settings of the XP RAID Manager instance timeout parameter and the number of remote instances, the online operation could time out. This can also happen if clxrun is used in a script or called by another program and the local XP RAID Manager instance cannot reach the remote XP RAID Manager instance. See Setting up XP RAID Manager on page 20 for more information. If the ApplicationStartup attribute is set to RESYNCWAIT, XP Cluster Extension tries to resynchronize disk pairs and waits until the XP RAID Manager device group is in PAIR state. In some versions of XP RAID Manager and XP firmware, a full resynchronization is done. Depending on the amount
111
of data to be transferred, it could take hours to resynchronize. If this is the case, clxrun may take some time to return. Do not stop clxrun; use it to check the status of the associated XP RAID Manager device groups. Even if the XP RAID Manager version and the XP firmware version allow a delta resynchronization, the amount of delta data to be transferred between the primary and the secondary could be long enough for the copy process to take a while. If running in fence level ASYNC, the default value of the AsyncTakeoverTimeout is set to a very high number. This is done because the takeover process for fence level ASYNC can take much longer when slow communications links are in place; adjust this value after measuring the XP Continuous Access Software environment. See AsyncTakeoverTimeout on page 130 for more details. To prevent premature termination of the takeover commands by the takeover timeout, measure the time to copy the installed XP family disk array cache and adjust the resource online timeout interval according to the measured copy time. Use only the slowest link XP Continuous Access Software link to measure the copy time. This ensures that the XP disk array cache can be transferred from the remote XP disk array, even in the event of a single surviving replication link between the XP family disk arrays. In general, because the failover environment is dispersed into two (or more) data centers, the failover time cannot be expected to be the same as it would be in a single data center with a single shared disk device.
112
APPLICATION netscape #the application service DeviceGroup netscapedg #RM dev group for the app service RaidManagerInstances 22 90#RM instance number for dev group XPSerialNumbers 34001 34005 #local and remote XP Serial Numbers DC_A_Hosts eserv1 eserv2 #data center A hostnames DC_B_Hosts eserv3 eserv4 #data center B hostnames
CLI commands
This section describes the following CLI commands: clxrun, page 113 clxchkmon, page 114
clxrun
Check disk set
Description
clxrun can be used to manually prepare the application service's disk set before an existing application service start procedure is invoked. When using clxrun, the status of the associated XP RAID Manager device group is checked to ensure that access to the disk set will occur under data consistency and concurrency situations only. clxrun must be invoked before the application service disk set can be activated; it is considered an online-only program. However, the CLI features provide the same disaster tolerance features as the integrated versions of XP Cluster Extension. NOTE: Execution of clxrun does not start the pair/resync monitor.
Syntax
clxrun [-version] [-forceflag] app_name
Arguments
version forceflag app_name Displays the XP Cluster Extension version Forces startup The application name configured in the user configuration file (UCF.cfg)
The clxrun program expects only one parameter as the default setting. This parameter is used to uniquely identify the application service in the APPLICATION section of the user configuration file. clxrun first checks for the forceflag option. When using clxrun, it is not necessary to create an application_name.forceflag file. This option, however, must be specified first if used.
113
CAUTION: The forceflag option is implemented as an emergency switch to manually activate your XP disk set. If the forceflag option has been specified, XP Cluster Extension will not check any consistency or concurrency rules before activating the XP disk set.
Return codes
clxrun exits with one of the following return codes: 0 OK Application service can be started. ERROR_GLOBAL Application service should not start on any system in either site on either disk array. ERROR_DC Application service should not start on any system in the local site on the local disk array. ERROR_LOCAL Application service should not start on this system.
Example 1 # clxrun sap Example 1 is based on the assumption that you have defined an APPLICATION tag named sap in the UCF.cfg file and you have specified all necessary objects, including the DeviceGroup object, to map the XP disk set to the application service sap. XP Cluster Extension will check the disk set mapped to the application service sap, run the necessary takeover procedure and return one of the return codes mentioned in the return code table. Example 2 # clxrun -forceflag sap Example 2 is based on the assumption that you have defined an APPLICATION tag named sap in the UCF.cfg file and you have specified all necessary objects, including the DeviceGroup object, to map the XP disk set to the application service sap. XP Cluster Extension will check the XP disk set mapped to the application service sap, and run the necessary takeover procedure to enable read/write access to the XP disk set.
clxchkmon
Pair/resync monitor access program
Description
The clxchkmon utility program allows starting and stopping of the resynchronization features and queries to gather state information of the monitored device groups.
114
To update or remove a specific resource, use clxchkmon n resource_name g device_group. If clx is not specified, the command is applied only to non-XP Cluster Extension resources. To update all non-XP Cluster Extension resources, use clxchkmon t. To update XP Cluster Extension resources, use clxchkmon clx t.
Displaying resources
The following command displays all resources: clxchkmon show The following command displays XP Cluster Extension resources only: clxchkmon clx show
Removing resources
The following command removes only non-XP Cluster Extension resources: clxchkmon remove The following command removes all XP Cluster Extension resources: clxchkmon clx remove
CAUTION: If you respond Y (yes) to remove the combination, the resource will be removed from the list of resources to be monitored in the pair/resync monitor. If this is not an emergency removal attempt and the XP Cluster Extension resource is online, the previous procedure will lead to a failed resource, which will take all dependent resources offline and eventually force your application offline. Do not use this command to take your XP Cluster Extension resources offline.
115
Syntax clxchkmon [-clx] [-s host name] [-n resource_name g device_group] [[-t monitor_interval | -autorecover mode | -remove [-force] | -show | -pid | -stopsrv | -log [error | warning | info | trace]]] [-p port number] where: -s hostname n resource_name Specifies the name of a host. Specifies the resource (application) name as used in XP Cluster Extension. Specifies an XP RAID Manager group name. Specifies interval in seconds to update registered monitor resources. Specify YES to enable autorecovery, or NO to disable autorecovery for registered monitor resource. Executes the command only for XP Cluster Extension resources. Removes the resource from the monitor list. Disables user confirmation to remove resource. Displays monitored resources. Returns the process ID of the pair/resync monitor. Stops the pair/resync monitor socket server. Sets the log level for the pair/resync monitor. Specifies the port number to be used.
clx remove force show pid stopsrv log p port_number Return codes
clxchkmon exits with one of the following return codes: 0 1 2 3 4 10 Successful, or device group is in PAIR state. Device group is not in PAIR state. Resource/device group is not registered with the pair/resync monitor. Pair/resync monitor (clxchkd) is not running. Device group's pair status is pending. Pair/resync monitor internal error.
116
11 12 13 14 16
Invalid argument to pair/resync monitor. Pair/resync monitor received signal (control-c) interrupt. Unknown status for device group. No port number is specified in services file for clxmonitor. Invalid use of the clx option on a non-XP Cluster Extension resource or XP Cluster Extension resource specified without the clx option. XP RAID Manager error.
For more information, see Monitoring and resynchronizing device groups on page 143.
117
118
Description
The primary (master) disk of a disk pair The secondary (slave) disk of a disk pair A disk with no pair affinity to any other disk (This could be shown in pairdisplay outputs for your XP Continuous Access Software disk if you accidentally exported the XP Business Copy Software environment variable HORCC_MRCF. In such a case, the MU number field will not be empty.)
PAIR
The disk is either a primary disk or a secondary disk. If both (P-VOL and S-VOL) disks are in PAIR state, XP Continuous Access Software updates the secondary disk based on the primary disk. If you see only one disk in PAIR state (while the second disk is in another state), one of the following has occurred: The pair affinity on only one site of the disk pair was deleted. A takeover command has been invoked on the secondary site, while no data has been written to the primary site and the XP Continuous Access Software link was down. A takeover command has been invoked on the primary site with the fence level configured to DATA to release the fenced disk, while the XP Continuous Access Software link was down. (The secondary disk would stay in PAIR state.)
PSUS
The pair affinity has been manually suspended or a takeover operation has been invoked on the secondary site with the fence level configured to NEVER. (In this case, the secondary disk would have the state SSUS-SSWS.) The pair affinity has been manually suspended or a takeover operation has been invoked on the secondary site. In this case, the secondary disk would have the state SSWS if you invoke pairdisplay with the fc option. In fence level ASYNC, the disk could also show PFUL or PFUS when using the fc option. Only the secondary disk could show SSUS. With the fc option of pairdisplay, you can check whether somebody manually suspended the pair or a takeover command had been invoked. A prior takeover command is indicated by the SSWS state. In this case, the secondary disk is mandatory and a resynchronization can be done only from the S-VOL site. The disk is in a failure mode. Either the XP Continuous Access Software link is down, or the disk must be replaced.
SSUS
SSUS - SSWS
PSUE
119
State
PDUB
Description
The disk is in a failure mode. Either the XP Continuous Access Software link is down, or the disk must be replaced. This is a special state of PSUE. If you have configured several disks into a LUSE configuration, where several LDEVS are combined to create an extended size disk and one or more disks are in an error condition, this state will be shown. This state is used to indicate that a threshold of the side file area in the XP disk array cache has been reached. This state can be seen with fence level ASYNC only. See the HP XP Continuous Access Software documentation for more information. This state is used to indicate that the side file is full and the XP disk array was not able to transfer the cache content to the remote XP disk array for a certain time. The XP disk pair has been suspended to continue processing host I/O. This state can be seen with fence level ASYNC only. See the HP XP Continuous Access Software documentation for more information.
PFUL
PFUS
Recovery sequence
To recover from a certain server or XP Continuous Access Software link failure: 1. Start the XP RAID Manager instances on both local and remote servers: Linux/UNIX export HORCMINST=instance_number horcmstart.sh instance_number Windows set HORCMINST=instance_number HORCMSTART instance_number 2. Gather general pair status information: pairdisplay g device_group 3. Display the pair status information after a failed swap-takeover (the S-VOL state is SSWS): pairdisplay g device_group fc 4. To recover from these states, invoke the following command from the S-VOL side: pairresync swaps c 15 g device_group If the pair needs to be used on the old primary side, the following commands must be invoked from the primary side: pairresync swapp c 15 g device_group horctakeover g device_group
120
5.
Display the pair status information after a P-VOL takeover (local P-VOL PSUS; remote S-VOL PAIR): pairdisplay g device_group fc To recover from these states, invoke the following command from the P-VOL side: pairresync c 15 g device_group CAUTION: The application must be shut down and the file systems unmounted before a fenced disk in fence level DATA can be set in read/write mode again. After the P-VOL takeover, the file system must be checked before it can be mounted. Any other recovery procedure could lead to unrecoverable file systems. If a horctakeover command results in S-VOL, or P-VOL becomes SMPL and none of the disks in the device group has been written to, you can recover from the situation by splitting the remaining P-VOL or S-VOL to SMPL: pairsplit [-S | -R] -g device_group After splitting the pair, the pair can be re-created without copying its content using: paircreate -nocopy c 15 -f fence_level -g device_group -v [r | l] If a horctakeover command results in S-VOL, or P-VOL becomes SMPL and data was written to one of the disks in the device group, you can recover from the situation by splitting the remaining P-VOL or S-VOL to SMPL: pairsplit [-S | -R] -g device_group After being split, the pair can be re-created with a full copy using: paircreate c 15 -f fence_level -g device_group -v [r | l] To ensure that a certain pair state has been established, invoke the event wait command: pairevtwait -g device_group -t time_to_wait -s pair_state
121
122
123
RHCS and SLE HA XP Cluster Extension integration with RHCS and SLE HA uses an XP Cluster Extension resource configuration file. The objects and format in the configuration file are the same as the UCF.cfg file. For more information, see Chapter 5 on page 93. VERITAS Cluster Server Integrating XP Cluster Extension with VERITAS Cluster Server does not require a user configuration file when the standard environment for XP Cluster Extension is used. The XP Cluster Extension objects that are integrated with VERITAS Cluster Server are configurable as resource attributes in the cluster software. For more information, see Chapter 4 on page 75.
File structure
The configuration file consists of a COMMON section and an APPLICATION section. These sections are distinguished by control tags. XP Cluster Extension uses the following objects as control tags: COMMON APPLICATION Objects have one of the following formats:
tag integer string A definition of an object; for example, COMMON or APPLICATION A number; for example, a timeout value A name, which can include alphabetic and numeric characters and underscores; for example, an application startup value A list of space-separated strings, for example, a list of host names (lists of numbers are stored as lists of strings)
list
Text that is a comment starts with the pound (#) symbol and continues until the end of the line. Comments can start on a new line or be part of a line specifying an object.
124
Objects are supported according to the requirements or capabilities of the cluster software, as shown in Table 4 on page 125. Table 4 Cluster software supported objects System Object
COMMON LogDir LogLevel SearchObject VcsBinPath APPLICATION ApplicationDir ApplicationStartup AsyncTakeoverTimeout AutoRecover BCEnabledA BCEnabledB BCMuListA BCMuListB BCResyncEnabledA BCResyncEnabledB BCResyncMuListA BCResyncMuListB ClusterNotifyCheckTime ClusterNotifyWaitTime DataLoseDataCenter DataLoseMirror DC_A_Hosts
CLI
HACMP
MSCS
VCS
RHCS, SLE HA
125
System Object
DC_B_Hosts DeviceGroup FastFailbackEnabled FenceLevel Filesystems JournalDataCurrency LocalDCLMForNonPAIRDG PostExecCheck PostExecScript PreExecScript RaidManagerInstances ResyncMonitor ResyncMonitorAutoRecover ResyncMonitorInterval ResyncWaitTimeout StatusRefreshInterval Vgs XPSerialNumbers Supported
CLI
HACMP
MSCS
VCS
RHCS, SLE HA
COMMON objects
The COMMON section is used to set the environment of XP Cluster Extension. The COMMON tag can appear in the configuration file only once. The COMMON object does not require any value. Objects of the type COMMON can appear only one time. Those objects must be placed after the COMMON tag in the configuration file. If the default values fit your environment, there is no need to specify them in the file.
126
COMMON
Format Description tag Distinguishes between general (common) and application-specific objects.
LogDir
Format Description Default value String (Optional) Defines the path to the XP Cluster Extension log file. Linux/Unix /var/opt/hpclx/log Windows %ProgramFiles%\Hewlett-Packard\Cluster Extension XP\log
LogLevel
Format Description Valid values String (Optional) Defines the logging level used by XP Cluster Extension. error (default): Logs only error messages for events that are unrecoverable. warning: Logs error messages and warning messages for events that are recoverable. info: Logs error messages, warning messages, and additional information, such as disk status. debug: Logs error messages, warning messages, info messages, and messages that report on execution status; useful for troubleshooting.
Default value
127
APPLICATION objects
The APPLICATION section defines the failover and failback behavior of XP Cluster Extension for each application service. APPLICATION is a multitag that can appear in the configuration file for each application service using XP Cluster Extension. The APPLICATION object requires the name of the application service as its value. The objects specified after an APPLICATION tag must appear only once per application. As with the COMMON objects, the APPLICATION objects have predefined default values. XP Cluster Extension uses the following rules to define objects: If you use the default value, you do not have to specify the object. XP Cluster Extension uses objects depending on the setting of other objects. For example, if you set the FenceLevel object to DATA, XP Cluster Extension uses the values specified for the DataLoseMirror or DataLoseDataCenter object. However, these objects are ignored if the FenceLevel object is set to NEVER. The pre-execution and post-execution functions in XP Cluster Extension are not processed if the associated object values are empty. (This is the default setting.) When setting APPLICATION object values: Use the VCS GUI for VCS. Use a user configuration file for the CLI and HACMP. Use the Microsoft Cluster Administrator GUI (Windows Server 2003) or the Failover Cluster Management GUI (Windows Server 2008/2008 R2) for MSCS.. Use an XP Cluster Extension configuration file for RHCS and SLE HA.
APPLICATION objects
This section describes the available APPLICATION objects for XP Cluster Extension.
APPLICATION
Format Description Tag Distinguishes between general and application-specific objects. Specify the name of the application service. The format of its value is equivalent to a string value.
ApplicationDir
Format Description String Specifies the directory where XP Cluster Extension searches for application-specific files, such as the force flag or online file. If ApplicationDir is set to a nonexistent drive and PairResyncMonitor is not enabled, XP Cluster Extension is unable to create the online file and cannot put the resource online. Windows If ApplicationDir is not set, XP Cluster Extension uses the local %HPCLX_PATH% values as defined in the registry.
128
Default values
Linux/UNIX online file: /etc/opt/hpclx force flag file: etc/opt/hpclx/conf Windows %HPCLX_PATH%
Files
resource_name.createsplitbrain resource_name.forceflag resource_name.online If specified in a user configuration file, resource_name is the value of the APPLICATION tag; otherwise, resource_name is the value of the XP Cluster Extension resource name.
ApplicationStartup
Format Description String (Optional) Specifies where a cluster group should be brought online. The ApplicationStartup object can be customized to determine whether an application service starts locally or is transferred back to the remote data center (if possible) to start immediately without waiting for resynchronization. This object is used only if an application service has already been transferred to the secondary site and no recovery procedure has been applied to the disk set (the disk pair has not been recovered and is not in PAIR state). This process is considered a failback attempt without prior disk pair recovery. XP Cluster Extension can detect the most current copy of your data based on the disk state information. If XP Cluster Extension detects that the remote XP disk array has the most current data, it orders a resynchronization of the local disk from the remote disk, or it stops the startup process to enable the cluster software to fail back to the remote XP disk array. If a resynchronization is ordered, XP Cluster Extension monitors the progress of the copy process. If the application service was running on a secondary XP disk array without a replication link, a large number of records may need to be copied. If the copy process takes longer than the configured application startup timeout value, the application startup will fail. MSCS If the ApplicationStartup resource property is set to FASTFAILBACK and the FailoverThreshold value is set to a number higher than the current number of clustered systems for the service or application, the service or application will restart on configured nodes until one of the following conditions is met: The resource is brought online in the remote data center. The resource failed because the FailoverThreshold value has been reached. The resource failed because the FailoverPeriod timeout value has been reached. CAUTION: Disable subsequent automated failover procedures for recovery failback operations.
129
Valid values
FASTFAILBACK (default) The cluster group is brought online in the remote data center (if possible) without waiting for resynchronization. The application startup process is stopped locally and XP Cluster Extension reports a data center error. Depending on the cluster software, the application service cannot start on any system in the local data center, and the cluster software transfers the application service back to the remote data center. Use this value to provide the highest level of application service availability. Depending on the value configured for the AutoRecover object, XP Cluster Extension attempts to update the former primary disk based on the secondary disk and swaps the personalities of the disk pair so that the local disk will become the primary disk. In a two-node cluster, this process does not work because the target failback system is not available. In this case, the application service must be started manually, or the ApplicationStartup object must be set to RESYNCWAIT. In an XP Cluster Extension for MSCS integration, XP Cluster Extension can detect when there is no target failback system available in the remote data center. In this case, XP Cluster Extension behaves as if the ApplicationStartup resource property is set to RESYNCWAIT. RESYNCWAIT The online local cluster group must wait until the disk status is PAIR. XP Cluster Extension initiates a resynchronization of the local disk based on the remote disk. The copy process is monitored; if no copy progress is made after a monitoring interval expires, the copy process is considered failed and XP Cluster Extension returns a global error. If RESYNCWAIT has been specified for the ApplicationStartup object, the ResyncWaitTimeout object must be specified, in case XP Cluster Extension should wait for resynchronization changes for more or less than 90 seconds, which is the default.
AsyncTakeoverTimeout
Format Description Integer (Optional) Specifies the horctakeover command timeout in seconds. Must be adjusted based on disk mirroring link speed. This object is used only if the FenceLevel object value is ASYNC. The takeover operation for fence level ASYNC (XP Continuous Access Software) offers the option to stop the data transfer process after a specified time value. This is used to allow access to the remote copy if the data transfer process is stopped due to an XP Continuous Access Software link failure. All data that has been copied up to the moment the timeout value is reached is consistent and available to access at the secondary site.
130
CAUTION: Measure or calculate the full XP disk array cache copy time to use the gathered information for the AsyncTakeoverTimeout object. After a takeover command has been invoked, XP Continuous Access Software copies the side file area residing in the XP disk array cache to the site where the takeover command has been issued (the secondary disks). The side file area cannot exceed the installed cache size. The maximum time for the AsyncTakeoverTimeout object is the time to fully copy the amount of cache size data. The takeover timeout value is used to terminate the copy process to provide access to the secondary disks; for example, if all links or the primary XP disk array are unavailable to copy the side file area. The copy time depends on the performance of the XP Continuous Access Software link between your sites. The takeover or resynchronization operation could take longer than the timeout value for application service startup in the cluster software. The application service startup might fail in this case. However, the takeover or resynchronization command will continue in the background.
Default value
3600
AutoRecover
Format Description String (Optional) Recovers a suspended or deleted disk pair when the resource is brought online at application service startup time. If the AutoRecover object is set to YES, XP Cluster Extension will try to resynchronize the remote disk at application startup time. XP Cluster Extension will ignore the return code of the resynchronization command and allow access to the disk ensuring highest application availability. If the resynchronization attempt fails, XP Cluster Extension will not fail. The internal logic will first apply the concurrency and consistency rules to allow access to the disk set. If you configure fence level DATA for the device group and set the FenceLevel object to DATA, the AutoRecover object will change XP Cluster Extension's behavior. XP Cluster Extension will attempt to re-establish the PAIR state and wait for the PAIR state before it allows access to the disk. If the resynchronization or takeover process fails, XP Cluster Extension returns a global error. YES (default) NO
Valid values
BCEnabledA
Format Description String (Optional) Enables rolling disaster protection for data center A.
131
Valid values
YES NO (default)
BCEnabledB
Format Description Valid values String (Optional) Enables rolling disaster protection for data center B. YES NO (default)
BCMuListA
Format Description List (Optional) Space-separated list defines the MU number of the XP Business Copy Software disk pairs in data center A.
BCMuListB
Format Description List (Optional) Space-separated list defines the MU number of the XP Business Copy Software disk pairs in data center B.
BCResyncEnabledA
Format Description String (Optional) Enables automatic resynchronization of XP Business Copy Software disk pairs in data center A. The automatic resynchronization function is supported only when the split XP Business Copy Software pair is located in the same data center where XP Cluster Extension is started. YES NO (default)
Valid values
BCResyncEnabledB
Format Description String (Optional) Enables automatic resynchronization of XP Business Copy Software disk pairs in data center B. The automatic resynchronization function is supported only when the split XP Business Copy Software pair is located in the same data center where XP Cluster Extension is started. YES NO (default)
Valid values
132
BCResyncMuListA
Format Description List (Optional) Space-separated list defines the MU number of the XP Business Copy Software disk pairs in data center A.
BCResyncMuListB
Format Description List (Optional) Space-separated list defines the MU number of the XP Business Copy Software disk pairs in data center B.
ClusterNotifyCheckTime
Format Description Integer Specifies how often XP Cluster Extension will check for VM live migration state changes. 10 seconds
Default value
ClusterNotifyWaitTime
Format Description Integer Specifies the amount of time that XP Cluster Extension will monitor for VM live migration state changes. 5 seconds
Default value
DataLoseDataCenter
Format Description String (Optional) Specifies whether a resource should be brought online while the disk pair is (or will be) suspended or deleted and there is no connection (XP Continuous Access and IP network) to the remote data center. Used only if the FenceLevel object value is DATA. XP RAID Manager is able to access its remote peer to invoke takeover actions for XP Continuous Access Software device groups. It is also able to invoke a swaptakeover operation of the device group from the secondary site. If no configured remote XP RAID Manager instance replies to a request of the local XP RAID Manager instance (remote status EX_ENORMT), all network connections between the local and the remote data center are considered DOWN. If the swap-takeover operation leads to a suspended state for the device group, the XP Continuous Access Software links are considered DOWN. Because redundant networks and XP Continuous Access Software links are necessary to build a disaster-tolerant environment, this situation can be considered as a data
133
center failure. The DataLoseDataCenter object is used to allow/prohibit automatic application service startup in this particular case. The combination of setting the DataLoseMirror object to YES and the DataLoseDataCenter object to NO are contradictory. Valid values YES (default) NO
DataLoseMirror
Format Description String (Optional) Specifies whether a resource should be brought online while the disk pair is suspended or deleted. Used only if the FenceLevel object value is DATA and local and remote XP disk status information can be gathered. If the remote XP disk state information is not available (remote state EX_ENORMT), the setting of the DataLoseDataCenter object will be used. Depending on the value configured for the AutoRecover object, XP Cluster Extension will attempt to recover the PAIR state for the device group. XP Cluster Extension waits until the PAIR state has been established. If this operation fails, XP Cluster Extension returns a global error. Because the DATA fence level ensures no loss of concurrency, manual intervention is required to recover the PAIR state. The PAIR state must be re-established for all disks in the device group before you can start the application service. The combination of setting the DataLoseMirror object to YES and the DataLoseDataCenter object to NO are contradictory. YES NO (default)
Valid values
DC_A_Hosts (Required)
Format Description List This space-separated list defines the cluster nodes in data center A. VCS This object is a string-vector element. Add a new element to the list for each system name.
DC_B_Hosts (Required)
Format Description List This space-separated list defines the cluster nodes in data center B. VCS This object is a string-vector element. Add a new element to the list for each system name.
134
DeviceGroup (Required)
Format Description Files String XP RAID Manager device group, containing the application service disk set. Linux/UNIX /etc/horcmX.conf Windows: \winnt\horcmX.conf %system_root%\horcmX.conf where X is the XP RAID Manager instance number.
Valid values
FenceLevel
Format Description String (Optional) The FenceLevel object specifies the fence level configured for the device group. XP Cluster Extension checks whether the current fence level reported by the XP disk array is the same as the configured (expected) fence level. This object is also used to make sure your configurations are supported based on consistency considerations. Different failover and recovery procedures are used for different fence levels. If you change the FenceLevel object value, also review the values of these objects: DataLoseMirror, DataLoseDataCenter, and AsyncTakeoverTimeout. DATA NEVER (default) ASYNC (includes JOURNAL)
Valid values
135
Description
JournalDataCurrency
Format Description String (Optional) Specifies whether a resource should be brought online while there could still potentially be a large amount of data on P-VOL Journal that cannot be transmitted to the secondary site due to the XP Continuous Access Software link being down. Used only if the FenceLevel object value is ASYNC and the local device is an S-VOL. XP Cluster Extension checks whether the current XP Continuous Access Software link status is >0 using the minimum active paths (MINAP) value returned by the XP RAID Manager pairvolchk command. If the minimum active paths equals 0, this indicates that the XP Continuous Access Software link is unavailable and that any data still located in the primary journal will not be replicated to the secondary volume. If JournalDataCurrency is set to YES then XP Cluster Extension will not perform the takeover operation and will not allow the application to access the data. YES (default) NO
Valid values
LocalDCLMForNonPAIRDG
Format Description String Specifies whether a live migration operation within the local data center is allowed when the device group is not in PAIR state. Set this property to YES to allow live migration operations in the local data center when the device group is not in PAIR state, the latest data is in the local data center, and the XP Cluster Extension resource can come online. For example, if the device group state is PVOL_COPY in the local data center and SVOL_COPY in the remote data center, setting this property to YES allows you to perform live migration to nodes within the local data center. Set this property to NO if you want to cancel live migration operations within the local data center when the device group is not in PAIR state. NOTE: Configure this parameter for each XP Cluster Extension resource associated with the VM cluster resource and the corresponding application cluster resource in the UCF file. If the VM group contains more than one XP Cluster Extension resource, and you want to use this parameter, you must set this parameter to the same value for each XP Cluster Extension resource. If you do not set the parameter to the same value, this parameter will default to a value of NO.
136
Valid values
YES NO (default)
PostExecCheck
Format Description String (Optional) The PostExecCheck object is used to configure XP Cluster Extension to gather XP disk pair status information after the takeover procedure. That information will be passed to the post-executable. In case of a remote data center failure, it could be time consuming to gather that information, especially if your post-executable does not need any XP status information. The arguments passed to the postexecutable will include only the local disk status if the PostExecCheck object is set to NO. See Setting up XP RAID Manager on page 20. YES NO (default)
Valid values
PostExecScript
Format Description String (Optional) Specifies an executable with its full path name to be invoked after the takeover action or failover procedure.
PreExecScript
Format Description String (Optional) Specifies an executable with its full path name to be invoked before the takeover action or failover procedure.
RaidManagerInstances (Required)
Format Description List A space-separated list of XP RAID Manager instances that XP Cluster Extension can use to communicate with the disk array. The instance numbers must be the same among all cluster systems. XP Cluster Extension can alternate between the specified instances. VCS This object is a string-vector element. Add a new element to the list for each system name. Linux/UNIX /etc/horcmX.conf Windows %systemroot%\horcmX.conf where X is the XP RAID Manager instance number.
Files
137
ResyncMonitor
Format Description String (Optional) Starts the pair/resync monitor to monitor the disk pair status and resynchronize disk pairs if the ResyncMonitorAutoRecover attribute is set to YES. YES NO (default)
Valid values
ResyncMonitorAutoRecover
Format Description String (Optional) Automatically recovers disk pairs states if the disk pairs are monitored by the pair/resync monitor. YES NO (default)
Valid values
ResyncMonitorInterval
Format Description Integer (Optional) Specifies the monitor interval (in seconds) that the pair/resync monitor checks the disk pair status. 60
Default value
ResyncWaitTimeout
Format Description Integer (Optional) Specifies the timeout value (in seconds) for a disk pair resynchronization. It may take some time to resynchronize disks. The timer times out if there is no change in the percentage value of the copy status for the device group in the specified time interval. The timeout value is used if the ApplicationStartup object is set to RESYNCWAIT. 90
Default value
StatusRefreshInterval
Format Description Default value Integer Specifies how often XP Cluster Extension will gather XP storage array information. 300 seconds
138
XPSerialNumbers (Required)
Format Description List A space-separated list of at least two serial numbers must be specified: the serial numbers of the primary and secondary XP disk arrays. XP Cluster Extension checks whether the local disk array is contained in this list. Serial numbers of the disk arrays of the connected cluster nodes (at least two). VCS This object is a string-vector element. Add a new element to the list for each system name.
139
140
141
Extension suspends specified XP Business Copy Software disk pairs that are in PAIR state. For information on setting XP Cluster Extension objects, see Chapter 8 on page 123. When using rolling disaster protection, note the following: If the BCEnabledA and BCEnabledB objects are set to YES, you must configure specific XP Business Copy Software disk pairs using MU numbers. The MU number defines one of the many disk pair relationships you can create with XP Business Copy Software disk pairs. You can specify as many MU numbers as the XP Business Copy Software supports. Disk pair MU numbers are specified by the BCMuListA and BCMuListB objects for data centers A and B. To enable resynchronization of XP Business Copy Software disk pairs that have been split by XP Cluster Extension, use the BCResyncEnabledA and BCResyncEnabledB objects for data centers A and B. XP Cluster Extension maintains a list of all associated XP Business Copy Software disk pairs that were in PAIR state before a resynchronization attempt. If pairs were suspended, XP Cluster Extension automatically resynchronizes those disk pairs after the XP Continuous Access Software remote mirrored disk pairs have been paired. This feature supports automatic resynchronization of locally split XP Business Copy Software disk pairs only. You must specify MU numbers for resynchronization by using the BCResyncMuListA and BCResyncMuListB objects for data centers A and B.
142
143
CAUTION: If the application service stops, the cluster software or your customized solution must be able to stop the monitoring or resynchronization utility. Without this ability, the use of the pair/resync monitor is not supported. HP recommends that you disable application service failover during a disk pair recovery (resynchronization). When the pair/resync monitor is enabled, XP Cluster Extension takes immediate action to recover any reported suspended disk pair. If, at any time, the resynchronization process is running on both disk array sites, data corruption might occur. Turn the pair/resync monitor (clxchkd) on or off using the ResyncMonitor object. For information on setting XP Cluster Extension objects, see Chapter 8 on page 123. If the ResyncMonitorAutoRecover object is set to YES, the monitor tries to resynchronize the remote disk based on the local disk. Resynchronization occurs only if the disks are in a P-VOL/S-VOL or S-VOL/P-VOL relationship. If one or both disk pairs are in the SMPL state or the device group state is mixed, automatic resynchronization is not attempted. The ResyncMonitorAutoRecover object set to YES is supported only if the minimum disk array firmware version is 01-11-xx (XP512/XP48) or 21.01.xx (XP128/XP1024), and the minimum XP RAID Manager version is 01.04.00. The monitor interval is specified with the ResyncMonitorInterval object. Do not set the monitor interval below the XP RAID Manager timeout parameter (HORCM_MON in the horcmX.conf file). If the link for the device group is broken, the pair/resync monitor notifies you by using the syslog facility (Linux/UNIX) and the Event Log (Windows). The monitor recognizes a broken link only when data is to be written to disk; otherwise, the data is the same on the primary and secondary disk, and the device group state is reported as PAIR.
You cannot use the force flag if the local disk state is S-VOL_COPY, which indicates that a copy operation is in progress. When a copy operation is in progress, a disk cannot be activated, and XP Cluster Extension returns a global error. Using the force flag does not enable the automatic recovery features of XP Cluster Extension. After using the force flag, you must recover the suspended or broken disk pairs using XP RAID Manager commands as described in Recovery sequence on page 120.
144
Arguments
The following arguments are transferred to the scripts in this order: 1. 2. 3. 4. 5. Name Vgs (HACMP only) RaidManagerInstances DeviceGroup local device group state (check) Pre-executable status before failover and post-executable status after failover 6. local device group state (display) Pre-executable status before failover and post-executable status after failover IMPORTANT: An empty string is returned if parameter #5 is not SSWS, PSUE, or PDUB. 7. remote device group state (check) Pre-executable status before failover and post-executable status after failover
145
8.
remote device group state (display) Pre-executable status before failover and post-executable status after failover IMPORTANT: An empty string is returned if parameter #7 is not SSWS, PSUE, or PDUB.
9.
10. disk array serial numbers (local) 11. reserved 12. reserved 13. disk array firmware version (local) 14. XP RAID Manager version (local) 15. application directory path (ApplicationDir object) 16. log file location (LogDir object) 17. DC_A_Hosts node names 18. DC_B_Hosts node names
146
CAUTION: If the pre-execution program returns 1, 2, 3, or 5, a post-executable will not be executed. If a takeover function fails, the post-executable will not be executed.
147
148
10 Troubleshooting
To troubleshoot problems with XP Cluster Extension, you must understand XP Continuous Access Software environments. Many issues can be attributed to incompatible disk pair states. See the XP Continuous Access Software and XP RAID Manager documentation before assuming that a problem has been caused by XP Cluster Extension. For more information on XP Continuous Access Software, see the HP StorageWorks XP Continuous Access Software user guide. XP Cluster Extension logs messages to the cluster-specific log location. However, it always keeps its own log file in its default log location. CAUTION: XP Cluster Extension is not able to handle XP device group states automatically and correctly when they result from manual manipulations. XP Cluster Extension will try to automatically recover suspended XP RAID Manager device group states if the AutoRecover object is set to YES. However, if the recovery procedure experiences a problem, XP Cluster Extension will not stop unless fence level DATA is used or the ApplicationStartup object is set to RESYNCWAIT. Therefore, ensure that the device group PAIR state has been recovered before the next failure occurs. Always disable automatic application service failover when resynchronizing disk pairs. A failure of the resynchronization source while resynchronizing can lead to unrecoverable data on the resynchronization target. The resynchronization process does not copy data in transactional order. For more information, see Implementing rolling disaster protection on page 141.
149
Start errors
Start errors can occur when the path to the XP RAID Manager binaries has not been set in the PATH environment variable. If a user configuration file is not found in the correct directory location, XP Cluster Extension returns a local error. A start error occurs if the APPLICATION name tag value in the XP Cluster Extension resource configuration file does not match the service name (RHCS) or the App value of XP Cluster Extension resource (SLE HA). XP Cluster Extension returns a local error if it does not find the XP Cluster Extension resource configuration file for in the correct directory location (RHCS and SLE HA).
global error
When XP Cluster Extension is integrated, an error message string and integer value are displayed. For the CLI, a return code is displayed. For more information, see CLI commands on page 113.
150
Troubleshooting
Start errors
HACMP will go into a loop and wait until the problem is solved and until the file /etc/opt/hpclx/ application_name.LOCK has been removed. This process has been adopted from HACMP, which will also run in an endless loop if there is a failure and until you recover all errors and start the application manually. After all errors have been recovered, you can invoke the command clruncmd to return control back to the cluster software. If the program is in a very early state of processing and experiences a problem before resolution of the application name, it may return an error return code. The /etc/opt/hpclx/UNKNOWN.LOCK file is created and must be removed after the problem has been resolved.
Failover errors
As mentioned previously, the HACMP error handling of the XP Cluster Extension will create a .LOCK file for the resource group (for example, /etc/opt/hpclx/OracleRG.LOCK). Messages are logged to the log files /var/opt/hpclx/log/clxhacmp.log and /tmp/hacmp.out. The file can be removed after the problem has been solved. HACMP can then continue to start the resource group. This file will be created for any error XP Cluster Extension returns. However, XP Cluster Extension will specify whether the error is a local, data center, or cluster-wide error. The following example demonstrates the behavior of XP Cluster Extension for HACMP if a pair state is discovered (which does not allow for an automatic takeover operation by XP Cluster Extension). In this case, the pairs have been manually suspended. It is impossible for XP Cluster Extension to determine which copy of the mirrored data is the most current. The output in /tmp/hacmp.out will be similar to the following example:
clxHACMP: > Fri Dec 15 16:35:19 NFT 2000 clxHACMP: > Arguments: oracle ora1vg ora2vg 0 oracle PVOL_PSUS PSUS SVOL_SSUS SSUS DATA 30368 30380 01-11-22/00 01.04.01 clxHACMP: > number of arguments: 14 clxHACMP: > 1: oracle clxHACMP: > 2: ora1vg ora2vg clxHACMP: > 3: 0 clxHACMP: > 4: oracle clxHACMP: > 5: PVOL_PSUS clxHACMP: > 6: PSUS clxHACMP: > 7: SVOL_SSUS clxHACMP: > 8: SSUS clxHACMP: > 9: DATA clxHACMP: > 10: 30368 clxHACMP: > 11: 30380 clxHACMP: > 12: clxHACMP: > 13: 01-11-22/00 clxHACMP: > 14: 01.04.01 clxHACMP > ===PRE=============================================== clxHACMP: pre-exec script successful (rc=0). clxHACMP: ERROR - no takeover action found. clxHACMP: ERROR - global cluster failure occurred - waiting! clxHACMP: ERROR clxHACMP: ERROR - ================================================================ clxHACMP: ERROR - XP Cluster Extension takeover procedure FAILED. clxHACMP: ERROR -
151
clxHACMP: ERROR - Pair state of device group "oracle" might be clxHACMP: ERROR - incorrect. Manual checking and correction within clxHACMP: ERROR - Continuous Access XP is required. clxHACMP: ERROR - Remove file "/etc/opt/hpclx/OracleRG.LOCK" in order clxHACMP: ERROR - to continue with HACMP specific recovery actions. =================================================================
The last message is repeated every 5 minutes. XP Cluster Extension will stop any further processing until the you remove the application_name.LOCK file to transfer control back to HACMP. This enables you to check the status of the data on each copy and decide whether it is safe to continue or not. Depending on the amount of time needed for checking the configuration and the XP disk pair status, the HACMP timeout could be reached. This will automatically cause the event config_too_long to be called by HACMP. The following message will appear in the log file /tmp/hacmp.out:
WARNING: Cluster MYCLUSTER has been running recovery program '/usr/es/sbin/cluster/ events/node_up.rp' for 1110 seconds. Please check cluster status.
If you think the XP Cluster Extension configuration is correct, and the XP disk pair status allows you to manually continue the process for starting the application, remove the application lock file /etc/ opt/hpclx/oracle.LOCK mentioned in the previous error message. When this file has been removed, XP Cluster Extension transfers control back to HACMP. The event get_disk_vg_fs and all the subsequent events within the main event node_up_local will be processed. Because XP Cluster Extension as a pre-event of get_disk_vg_fs has produced an error, the main event node_up_local will fail as well. The following HACMP event event_error will be called:
node_up_local[30] [ 0 -ne 0 ] node_up_local[8] exit 1 Dec 15 17:07:17 EVENT FAILED:1: node_up_local node_up[326] [ 1 -ne 0 ] node_up[328] cl_log 650 node_up: Failure occurred while processing Resource Group OracleRG. Manual intervention required. node_up OracleRG *************************** Dec 15 2000 17:07:17 !!!!!!!!!! ERROR !!!!!!!!!! *************************** Dec 15 2000 17:07:17 node_up: Failure occurred while processing Resource Group OracleRG. Manual intervention required. node_up[329] STATUS=1 node_up[337] [ AIX1 != AIX1 ] node_up[356] exit 1 Dec 15 17:07:18 EVENT FAILED:1: node_up AIX1
To continue any further processing of HACMP, you must invoke the HACMP command clruncmd to recover from the status event_error. Example
# clruncmd aix1
This will bring the cluster into normal status again. All subsequent events (for example, node_up_complete) will be processed.
152
Troubleshooting
Failover errors
XP Cluster Extension's integration with MSCS returns a local error and fails the resource if a configuration error occurs. This could be a problem with the XP RAID Manager instance configuration or an error, which will probably require starting the resource group on another system. XP Cluster Extension resources return a data center error and fail the resource if the XP disk array status indicates that the problem experienced locally would not be solved on another system connected to the same XP disk array. This means all systems specified in the DC_A_Hosts resource property or the DC_B_Hosts resource property would fail to bring the resource group online. Depending on the resource group and resource property values, the resource tries to start on different nodes several times. If the remote data center is down, this would look like the resource group is alternating between the surviving systems. This happens until the previously mentioned resource and resource group property values are reached or you disable the restarting of the resource. This could be also the case if the ApplicationStartup resource property has been set to FASTFAILBACK. If an XP disk array state has been discovered that does not allow bringing the resource group online on any system in the cluster, a cluster error would be reported and the resource would fail on all systems. This could lead to the same behavior as described for an XP Cluster Extension data center error. Examples of such a state could be a SMPL state on both primary and secondary disks, a suspended (PSUS/SSUS) state on either site, or a state mismatch in the device group for this resource group. None of the previously mentioned scenarios will allow automatic recovery because the XP Cluster Extension resource cannot decide which copy of the data is the most current copy. In those cases, a storage or cluster administrator must investigate what happened to the environment. In any case, restarting a failed resource group without investigating the problem is not recommended. A failed XP Cluster Extension resource indicates the need to check the status of the XP disk pair on each copy and decide whether it is safe to continue or not. Figure 11 on page 154 shows examples of an incompatible XP disk pair state shown in the clxmscs.log file. The same messages can be found in the MSCS cluster log file if the XP Cluster Extension LogLevel object is set to INFO; this, however, requires creating a UCF.cfg file.
153
154
Troubleshooting
This is a general VCS engine log file, which gives an overview of all cluster-related activities and whether they were successful or unsuccessful. VCS 1.3.0 or later: /var/VRTSvcs/log/ClusterExtensionXP_A.log VCS 1.1.2: /var/VRTSvcs/log/ClusterExtensionXP.log_A This XP Cluster Extension agent log file of VCS shows agent-related error information. The XP Cluster Extension log file is named clxvcs.log.
Start errors
VCS will fail the resource and disable the service group on the local system if it the clxpcf file is not present. If the program is in a very early state of processing, this operation might fail, and XP Cluster Extension will not show the service group in the error message. However, VCS will fail the resource.
Failover errors
XP Cluster Extension's integration with VCS disables service groups on the local system if a configuration error occurs. In this case, XP Cluster Extension will return a local error. The service group is disabled in the data center if the XP disk array status indicates the problem experienced locally cannot be solved on another system connected to the same XP disk array. All systems specified in the DC_A_Hosts object or DC_B_Hosts object are disabled to bring the service group online. This could be also the case if the ApplicationStartup object has been set to FASTFAILBACK. If an XP disk array state has been discovered (which does not allow bringing the service group online on any system in the cluster), a cluster error is reported and all systems are disabled to bring the service group online. Such state could be a SMLP state on both primary and secondary disks, a suspended (PSUS/SSUS) state on either site, or a state mismatch in the device group for this service group. None of the scenarios allows automatic recovery because XP Cluster Extension cannot determine which copy of the data is the most current. In these cases, a storage or cluster administrator must investigate what happened to the environment. CAUTION: HP does not recommend that you enable the service group again and try to bring the prior failed service group online without investigating the problem. When a failed XP Cluster Extension resource occurs, check the status of the XP disk pair on each copy, and decide whether it is safe to continue. Examples Figure 12 on page 156 and Figure 13 on page 156 show examples of an incompatible XP disk pair state shown in the VCS Cluster Manager Log Desk window.
155
Figure 12 Incompatible XP disk pair state (VCS Cluster Manager Log Desk window)
.
Figure 13 on page 156 shows detailed information for the current XP disk pair state, which will be displayed in the VCS Log Desk only if the XP Cluster Extension LogLevel object is set to INFO.
Figure 13 Detailed information of the XP disk pair state (VCS Log Desk)
.
156
Troubleshooting
Failover errors
XP Cluster Extension will fail to bring an RHCS service or SLE HA resource group online on the local system if a configuration error occurs. In this case, XP Cluster Extension returns a local error. The RHCS service or SLE HA resource group will go into a failed state after a startup attempt on any system in the same data center if the disk array status indicates that a problem experienced locally would not be solved on another system connected to the same disk array. In this case, XP Cluster Extension returns a data center error. This error could also occur if the ApplicationStartup object is set to FASTFAILBACK. If a disk array state that does not allow starting the RHCS service or SLE HA resource group on any system in the cluster is discovered, a cluster error is reported and none of the systems will be allowed to run the service or resource group. Such a state could be an SMLP state on both primary and secondary disks, a suspended (PSUS/SSUS) state on either site, or a state mismatch in the device group for this RHCS service or SLE HA resource group. None of these scenarios allows automatic recovery because XP Cluster Extension cannot determine which copy of the data is the most current. In these cases, a storage or cluster administrator must investigate the problem. CAUTION: Do not start the RHCS service or SLE HA resource group again or try to start the failed RHCS service or SLE HA resource group without investigating the problem. When an RHCS service or SLE HA resource group using XP Cluster Extension fails, check the status of the XP disk pair on each copy and decide whether it is safe to continue.
157
3.
Restart the node that was shut down. NOTE: The time to detect a storage outage due to failure of all paths to storage depends on the setting for no_path_retry in the multipath software configuration. A value of fail does not queue I/O in the event of a failure in all paths and returns an immediate failure. For information about the recommended value for your environment, see the DM-Multipath documentation. Some resource agents, such as LVM, offer a mechanism called self_fence to take themselves out of a cluster through node reboot when an underlying logical volume can no longer be accessed. For supported options, see the RHCS documentation.
3.
158
Troubleshooting
Problem Resource XYZ: XP Cluster Extension: device group XYZ is not in PAIR state. This message appears even though the device group is in PAIR state. Solution If you are using the pair/resync monitor, the ResyncMonitorInterval must be less than or equal to the resource monitor interval for the XP Cluster Extension resource to prevent erroneous logging. The ResyncMonitorInterval in XP Cluster Extension defines when the pair/resync monitor checks the actual device group state. This state will be valid and shown until the next update (ResyncMonitorInterval) occurs. If the actual XP disk pair state changes between two ResyncMonitorInterval(s), the PAIR state shown by the pair/resync monitor will not be correct. The resource monitor checks the status of the XP Cluster Extension resource at the resource monitor interval of the cluster software. The XP Cluster Extension resource reports the status of the device group at that interval based on the current state in the pair/resync monitor. If the ResyncMonitorInterval is set to a higher value than the resource monitor interval for the XP Cluster Extension resource, the pair/resync monitor will update the device group state less often. However, the XP Cluster Extension resource logs messages only if the device group is not in PAIR state or if an XP RAID Manager error occurred (for example, if XP RAID Manager is not running).
159
Example Set the XP Cluster Extension agent's MonitorInterval attribute to 60 seconds (the default value); then set the XP Cluster Extension resource ResyncMonitorInterval attribute to less than 60 seconds.
160
Troubleshooting
Subscription service
HP recommends that you register your product at the Subscriber's Choice for Business website: http://www.hp.com/go/e-updates After registering, you will receive e-mail notification of product enhancements, new driver versions, firmware updates, and other product resources.
Related information
The following documents and websites provide related information: HP HP HP HP HP HP StorageWorks StorageWorks StorageWorks StorageWorks StorageWorks StorageWorks XP Cluster Extension Software Installation Guide XP RAID Manager User's Guide XP Continuous Access Software User Guide XP Continuous Access Software Journal User Guide XP Business Copy Software User Guide SAN Design Reference Guide
You can find these documents on the Manuals page of the HP Business Support Center website: http://www.hp.com/support/manuals
161
In the Storage section, click Storage Software, and then select your product.
White papers
The following white papers are available at www.hp.com/storage/whitepapers: Live Migration across data centers and disaster tolerant virtualization architecture with HP StorageWorks Cluster Extension and Microsoft Hyper-VTM Considerations in HP StorageWorks XP Cluster Extension configurations to stop automatic XP CA disk pair resynchronization when CA link is suspended Implementing HP StorageWorks Cluster Extension for Windows in a VMware Virtual Machine Migrating HP StorageWorks XP Cluster Extension Quorum Filter Service Implementations to Microsoft Majority Node Set Quorum Configurations Migrating an HP Serviceguard for Linux Cluster to a Novell SUSE Linux Enterprise High Availability Extension Cluster Migrating an HP Serviceguard for Linux Cluster to Red Hat Cluster Suite in Red Hat Enterprise Linux 5 Advanced Platform
HP websites
For additional information, see the following HP websites: http://www.hp.com http://www.hp.com/go/storage http://www.hp.com/service_locator http://www.hp.com/support/manuals http://www.hp.com/storage/spock www.hp.com/storage/whitepapers http://docs.hp.com/en/ha.html
Typographic conventions
Table 5 Document conventions Convention
Blue text: Table 5 Blue, underlined text: http://www.hp.com
Element
Cross-reference links and e-mail addresses Website addresses Keys that are pressed
Bold text
Text typed into a GUI element, such as a box GUI elements that are clicked or selected, such as menu and list items, buttons, tabs, and check boxes Text emphasis File and directory names
Italic text
Monospace text
162
Convention
Monospace, italic text Monospace, bold text
Element
Code variables Command variables Emphasized monospace text
CAUTION: Indicates that failure to follow directions could result in damage to equipment or data.
163
164
Glossary
CHA Channel adapter. A device that provides the interface between the array and the external host system. Occasionally, this term is used synonymously with the term channel host interface processor (CHIP). Command-line interface. An interface comprised of various commands which are used to control operating system responses. A volume on the disk array that accepts HP StorageWorks Continuous Access or HP StorageWorks Business Copy control operations which are then executed by the array. Control unit. Custom volume size. CVS devices (OPEN-x CVS) are custom volumes configured using array management software to be smaller than normal fixed-size OPEN system volumes. Synonymous with volume size customization (VCS). Dynamic-link library. Device Specific Module. Disconnecting a failed unit or path and replacing it with an alternative unit or path to continue functioning. Fibre Channel. A network technology primarily used for storage networks. A method of setting rejection of XP Continuous Access Software write I/O requests from the host according to the condition of mirroring consistency. Graphical User Interface. Globally unique identifier. High Availability Cluster Multi-Processing. An IBM application for AIX software. Host bus adapter. A periodic synchronization signal issued by cluster software or hardware to indicate that a node is an active member of the cluster. Logical device. An LDEV is created when a RAID group is carved into pieces according to the selected host emulation mode (that is, OPEN-3, OPEN-8, OPEN-9). The number of resulting LDEVs depends on the selected emulation mode. The term LDEV is also known as term volume. Logical unit. Logical unit number.
CU CVS
DLL DSM failover FC fence level GUI GUID HACMP HBA heartbeat LD, LDEV
LU LUN
165
LUSE
The LUSE feature is available when the HP StorageWorks LUN Manager product is installed, and allows a LUN, normally associated with only a single LDEV, to be associated with 1 to 36 LDEVs. Essentially, LUSE makes it possible for applications to access a single large pool of storage. See also LD, LDEV Logical Volume Manager. Minimum active paths. Microsoft Management Console. Majority node set. Microsoft Cluster Service. Mirror unit. Network interface card. A device that handles communication between a device and other devices on a network. A path is created by associating a port, a target, and a LUN ID with one or more LDEVs. Also known as a LUN. Product Configuration File. A physical connection that allows data to pass between a host and the disk array. The number of ports on an XP disk array depends on the number of supported I/O slots and the number of ports available per I/O adapter. The XP family of disk arrays supports FC ports as well as other port types. Ports are named by port group and port letter, such as CL1-A, in which CL1 is the group, and A is the port letter. The data center location that owns the cluster group (quorum resource). Pair suspended-split. Primary volume. In MSCS, a cluster resource that has been configured to control the cluster, maintaining essential cluster data and recovery information. In the event of a node failure, the quorum acts as a tie-breaker and is transferred to a surviving node to ensure that data remains consistent within the cluster. Redundant array of independent disks. Small Computer Systems Interface. A standard, intelligent parallel interface for attaching peripheral devices to computers, based on a device-independent protocol. The data center location with the mirror copy of the quorum disk pair. System Manager Information Tool. Simplex.
RAID SCSI
166
Glossary
A state of data corruption that can occur if a cluster is re-formed as subclusters of nodes at each site, and each subcluster assumes authority, starting the same set of applications and modifying the same data. Single Point of Connectivity Knowledge website. SPOCK is the primary portal used to obtain detailed information about supported HP StorageWorks product configurations. Single point of failure. Secondary or remote volume. The copy volume that receives the data from the primary volume. Service processor. A notebook computer built into the disk array. The SVP provides a direct interface to the disk array and used only by the HP service representative. Target mode SCSI. User account control. User configuration file. VERITAS Cluster Server. On the XP array, a volume is a uniquely identified virtual storage device composed of a control unit (CU) component and a logical device (LDEV) component separated by a colon. For example 00:00 and 01:00 are two uniquely identified volumes; one is identified as CU = 00 and LDEV = 00, and the other as CU = 01 and LDEV = 00; they are two unique separate virtual storage devices within the XP array. , Volume size customization. Also known as CVS. Virtual Machine. HP StorageWorks XP Business Copy Software. XP Business Copy Software lets you maintain up to nine local copies of logical volumes on the disk array. HP StorageWorks XP Continuous Access Software. XP Continuous Access Software lets you create and maintain duplicate copies of local logical volumes on a remote disk array.
167
168
Glossary
Index
A
agent configuring, 75 disabling, 87 APPLICATION section description, 128 application service failover, 159 ApplicationDir object description, 128 ApplicationStartup object description, 129 AsyncTakeoverTimeout object description, 130 AutoFailbackType description, 48 automatic recovery, 149 AutoRecover object description, 131 rolling disaster protection, 142 cluster software integration with XP Cluster Extension, 11 ClusterNotifyCheckTime description, 133 UCF requirement, 64 ClusterNotifyWaitTime description, 133 UCF requirement, 64 clxhosts updating, 108 COMMON section description, 127 configuration configuration tool Windows, 37 consolidated site, 13 Linux, 97, 102 Microsoft Cluster Service, 37 one to one, 12 configuration information exporting, 41 importing, 41 configuration tool Windows, 37 contacting HP, 161 conventions document, 162
B
Basic Resource Health Check Interval description, 47 BCEnabledA object description, 131 BCEnabledB object description, 132 BCMuListA object description, 132 BCMuListB object description, 132 BCResyncEnabledA object description, 132 BCResyncEnabledB object description, 132 BCResyncMuListA object description, 133 BCResyncMuListB object description, 133
D
DataLoseDataCenter object description, 133 DataLoseMirror object description, 134 DC_A_Hosts object description, 134 DC_B_Hosts object description, 134 deleting a device group, 87 dependencies adding (CLI), 66 adding (Windows Server 2008/2008 R2), 65 Device Mapper Multipath Software Rescanning devices, 106 DeviceGroup object description, 135
C
CLI XP Cluster Extension, 12
169
disaster tolerance, 11 disk pairs XP Continuous Access, 11 document conventions, 162 related documentation, 161 documentation HP website, 161 providing feedback, 163
H
HACMP bringing resource groups online, 30 configuring resources, 25 custom cluster events, 32, 33 failover error handling, 150 failure, 35 integrating XP Cluster Extension, 26 integration with XP Cluster Extension, 25 pair/resync monitor, 32 restrictions, 35 taking resource groups offline, 31 timing considerations, 34 help obtaining, 161 HP technical support, 161 Hyper-V Live Migration, 71, 74
E
enabling a service group, 155 error return codes failover, 150 exporting configuration information, 41
F
FailoverPeriod description, 48 FailoverThreshold description, 48 fast failback XP Continuous Access Asynchronous Software, 12 FASTFAILBACK value description, 130 FastFailbackEnabled object description, 135 VCS setting, 89 features XP Cluster Extension, 11 fence levels XP Continuous Access, 14 FenceLevel object description, 135 files clxhosts, 108 event log, 159 force flag, 144 services, 26, 78, 108 Filesystems object description, 135 force flag file, 144 forceflag option, 114
I
importing configuration information, 41 instances starting and stopping, 22 IsAlivePollInterval description, 47
J
JournalDataCurrency object description, 136
L
LInux timing considerations, 109 Linux Device Mapper Multipath Software, 106 pair/resync monitor, 108 pair/resync monitor integration, 109 live migration, 64, 71, 74 LocalDCLMForNonPAIRDG description, 136 log file location, 149 log files Microsoft Cluster Service, 73 MSCS, 74 LogDir object description, 127 LogLevel UCF requirement, 64 LogLevel object description, 127
G
group names Microsoft Cluster Service, 42, 44
170
LooksAlivePollInterval description, 47
M
majority node set Microsoft Cluster Service, 15 MergeCheckInterval UCF requirement, 64 Microsoft Cluster Service adding dependencies, 64 administration, 73, 74 changing resource names, 44, 45 configuration example, 66 configuration file, 123 configuring XP RAID Manager advanced properties, 58 configuring XP RAID Manager device group details, 58 configuring XP RAID Manager instances, 57 data center assignments, 59 group names, 42, 44 integration with XP Cluster Extension, 37 majority node set, 15 resource names, 42, 44 Microsoft Management Console, 46 mounting a file system, 121 multipath_rescan script, 106
PendingTimeout description, 48 post-execution programs invoking, 145 return codes, 147 PostExecCheck object description, 137 PostExecScript object description, 137 pre-execution programs invoking, 145 return codes, 146 PreExecScript object description, 137 programs post-execution, 145 pre-execution, 145
R
RaidManagerInstances object description, 137 recommendations log files, 74 recovering PAIR state, 159 recovery disk pair states, 119 procedures, 119 sequence, 120 recovery procedure, 60 Red Hat Cluster Service, 97, 102 environment file, 97, 102 related documentation, 161 remote management, 46, 62 Windows Server 2003, 73 Windows Server 2008/2008 R2, 73 removing a combination, 115 resource groups HACMP, 30, 31 resource names Microsoft Cluster Service, 42, 44
N
names changing (Microsoft Cluster Service), 45 changing (MSCS), 44 network considerations XP RAID Manager, 21
O
objects APPLICATION section, 128 COMMON section, 126 XP Cluster Extension, 123
P
pair/resync monitor configuring for Linux, 108 configuring for Microsoft Cluster Service, 38 HACMP integration, 32 integration with Microsoft Cluster Service, 52, 60 invoking, 143 port, 26, 38, 40, 78, 108 troubleshooting, 159
171
resources adding for Microsoft Cluster Service, 42 adding for VCS, 80 adding for Windows Server 2008/2008 R2, 44 adding with the CLI, 44 bringing online, 70 changing attributes for VCS, 83 changing properties for Microsoft Cluster Service, 49 configuring for Microsoft Cluster Service, 45 configuring for VCS, 79 deleting for MSCS, 70 linking for VCS, 84 Microsoft Cluster Service, 45 properties (CLI), 63 properties (UCF), 64 taking offline, 70 Response to resource failure description, 47 RestartAction description, 47 RestartPeriod description, 47 RestartThreshold description, 47 resynchronizing a disk pair, 88, 109 ResyncMonitor object description, 138 rolling disaster protection, 142 ResyncMonitorAutoRecover attribute, 71, 87 ResyncMonitorAutoRecover object description, 138 ResyncMonitorInterval object description, 138 RESYNCWAIT value description, 130 ResyncWaitTimeout object description, 138 return codes post-execution, 147 pre-executable, 146 RHCS configuration file, 124 rolling disaster protection, 12 automatic recovery, 142 configuration with XP Business Copy Software, 141 pair/resync monitor, 142 restoring server operation, 142 setting in user configuration file, 141
S
SearchObject object description, 127 server restoring operation, 142 service or application bouncing, 72 ServiceGroupHB resources, 84 SG recovery, 109 starting errors, 150 StatusRefreshInterval description, 138 UCF requirement, 64 Subscriber's Choice, HP, 161
T
takeover function failure, 147 technical support HP, 161 Thorough Resource Health Check Interval description, 47 timing HACMP considerations, 34 Microsoft Cluster Service considerations, 72 timing considerations Linux, 109 troubleshooting offline condition (VCS), 90 XP Cluster Extension problems, 149 typographic conventions, 162
U
user configuraiton file LocalDCLMForNonPAIRDG, 136
172
user configuration file APPLICATION section, 128 ApplicationDir object, 128 ApplicationStartup object, 129 AsyncTakeoverTimeout object, 130 AutoRecover object, 131 BCEnabledA object, 131 BCMuListA object, 132 BCMuListB object, 132 BCResyncEnabledA object, 132 BCResyncEnabledB object, 132 BCResyncMuListA object, 133 BCResyncMuListB object, 133 COMMON section, 126, 127 DataLoseDataCenter object, 133 DataLoseMirror object, 134 DC_A_Hosts object, 134 DC_B_Hosts object, 134 DeviceGroup object, 135 FASTFAILBACK value, 130 FastFailbackEnabled object, 135 FenceLevel object, 135 Filesystems object, 135 HACMP, 123 HACMP customization, 28 JournalDataCurrency object, 136 LogDir object, 127 LogLevel object, 127 object formats, 124 objects, 123 PostExecCheck object, 137 PostExecScript object, 137 PreExecScript object, 137 RaidManagerInstances object, 137 ResyncMonitor object, 138 ResyncMonitorAutoRecover object, 138 ResyncMonitorInterval object, 138 RESYNCWAIT value, 130 ResyncWaitTimeout object, 138 sample, 139 SearchObject object, 127 specifying object values, 124 structure, 124 VcsBinPath object, 127 Vgs object, 139 XPSerialNumbers object, 139
W
websites HP , HP Subscriber's Choice for Business, 161 product manuals, 161
X
XP Business Copy Software rolling disaster protection, 141 XP Cluster Extension CLI, 12 cluster software, 11 configurations consolidated-site, 13 one-to-one, 12 configuring with Microsoft Cluster Service, 37 dependency on XP RAID Manager, 20 environments, 14 features, 11 XP Continuous Access configurations, 14 fence levels, 14 pairs, 11 XP Continuous Access Asynchronous Software fast failback, 12 XP RAID Manager, 20 creating instances, 20 device groups, 21 network considerations, 21 rolling disaster protection, 141 setting up, 20 starting and stopping instances, 22 testing failover and failback, 22 XPSerialNumbers object description, 139
V
VCS recovery, 88 VcsBinPath object description, 127 VERITAS Cluster Server (VCS) configuration file, 124 integration with XP Cluster Extension, 75
173
174