Está en la página 1de 51

High Availability and Fault Tolerance

Module 11

2011 VMware Inc. All rights reserved

You Are Here

Course Introduction Introduction to Virtualization Virtual Machines VMware vCenter Server Configure and Manage Virtual Networks Configure and Manage Virtual Storage Managing Virtual Machines

Data Protection Access & Authentication Control Resource Management and Monitoring High Availability Scalability Patch Management Installing vSphere Components

VMware vSphere: Install, Configure, Manage Revision A

11-2
2011 VMware Inc. All rights reserved

Importance

Most organizations rely on computer-based services like email, databases, and Web-based applications. The failure of any of these services can mean lost productivity and revenue. Configuring highly available, computer-based services is extremely important for an organization to remain competitive in contemporary business environments. With VMware vSphere 5, a new high availability architecture has been released.

VMware vSphere: Install, Configure, Manage Revision A

11-3
2011 VMware Inc. All rights reserved

Module Lessons

Lesson 1: Lesson 2: Lesson 3: Lesson 4:

Introduction to vSphere High Availability Configuring vSphere High Availability vSphere High Availability Architecture Introduction to Fault Tolerance

VMware vSphere: Install, Configure, Manage Revision A

11-4
2011 VMware Inc. All rights reserved

Lesson 1: Introduction to vSphere High Availability

VMware vSphere: Install, Configure, Manage Revision A

11-5
2011 VMware Inc. All rights reserved

Learner Objectives

After this lesson, you should be able to do the following: Describe the various options that you can configure to ensure high availability in a vSphere 5 environment. Discuss the response of vSphere High Availability when a VMware ESXi host, a virtual machine, or an application fails.

VMware vSphere: Install, Configure, Manage Revision A

11-6
2011 VMware Inc. All rights reserved

VMware Offers Protection at Every Level


Protection against hardware failures Planned maintenance with zero downtime Protection against unplanned downtime and disasters
High Availability & Fault Tolerance vSphere Storage VMotion Site Recovery Manager

VMware vSphere vMotion, DRS NIC Teaming, Storage Multipathing

3rd-Party Backup Solutions, VMware Data Recovery

Component

Server

Storage
11-7

Data

Site

VMware vSphere: Install, Configure, Manage Revision A

2011 VMware Inc. All rights reserved

vCenter Server Availability - Recommendations

Make VMware vCenter Server and the components it relies on highly available. vCenter Server relies on: vCenter Server database:

Cluster the database. Refer to the specific database documentation. Set up with multiple redundant servers.

Active Directory structure: Methods for making vCenter Server available: Use vSphere High Availability to protect the vCenter Server virtual machine. Use VMware vCenter Server Heartbeat.

VMware vSphere: Install, Configure, Manage Revision A

11-8
2011 VMware Inc. All rights reserved

High Availability

A highly available system is one that is continuously operational for a desirably long length of time.

Level of availability

Downtime per year 87 hours (3.5 days) 8.76 hours 52 minutes 5 minutes

What level of virtual machine availability is important to you?

99% 99.9% 99.99% 99.999%

VMware vSphere: Install, Configure, Manage Revision A

11-9
2011 VMware Inc. All rights reserved

vSphere High Availability


vSphere HA Level of availability Amount of downtime High availability Minimal Works with all supported guest operating systems

Guest operating systems supported

VMware ESXi hardware supported

Works with all supported ESXi hardware

Uses

Use to provide high availability for the virtual machines that require that level of protection.

VMware vSphere: Install, Configure, Manage Revision A

11-10
2011 VMware Inc. All rights reserved

vSphere HA Failure Scenarios

ESXi host failure Guest OS failure Application failure

VMware vSphere: Install, Configure, Manage Revision A

11-11
2011 VMware Inc. All rights reserved

High Availability Failure Scenario - Host


LUN 1 LUN 2 LUN 3

virtual machine A virtual machine A virtual machine B virtual machine C virtual machine D

virtual machine B virtual machine E virtual machine F

ESXi host

ESXi host

ESXi host

When a host fails, vSphere HA restarts the affected virtual machines on other hosts

vCenter Server
VMware vSphere: Install, Configure, Manage Revision A 11-12

= vSphere HA cluster

2011 VMware Inc. All rights reserved

High Availability Failure Scenario Guest Operating System


LUN 1 LUN 2 LUN 3

virtual machine A
VMware tools

virtual machine C
VMware tools

virtual machine E
VMware tools

virtual machine B
VMware tools

virtual machine D
VMware tools

virtual machine F
VMware tools

ESXi host

ESXi host

ESXi host

When a virtual machine stops sending heartbeats or the virtual machine process crashes (vmx), vSphere HA resets the virtual machine

vCenter Server
VMware vSphere: Install, Configure, Manage Revision A 11-13

= vSphere HA cluster

2011 VMware Inc. All rights reserved

HA Failure Scenario - Application


LUN 1 LUN 2 LUN 3

application
virtual machine A

application
virtual machine C

application
virtual machine E

application
virtual machine B

application
virtual machine D

application
virtual machine F

When an application fails, vSphere HA restarts the affected virtual machine on the same host. Requires VMware Tools to be installed

ESXi host

ESXi host

ESXi host

vCenter Server
VMware vSphere: Install, Configure, Manage Revision A 11-14

= vSphere HA cluster

2011 VMware Inc. All rights reserved

Review of Learner Objectives

You should be able to do the following: Describe the various options that you can configure to ensure high availability in a vSphere 5 environment. Discuss the response of vSphere High Availability when an ESXi host, a virtual machine, or an application fails.

VMware vSphere: Install, Configure, Manage Revision A

11-15
2011 VMware Inc. All rights reserved

Lesson 2: Configuring vSphere High Availability

VMware vSphere: Install, Configure, Manage Revision A

11-16
2011 VMware Inc. All rights reserved

Learner Objectives

After this lesson, you should be able to do the following: Configure a vSphere HA cluster.

VMware vSphere: Install, Configure, Manage Revision A

11-17
2011 VMware Inc. All rights reserved

Enabling vSphere HA
Enable vSphere HA by creating a cluster or modifying a vSphere Distributed Resource Scheduler (DRS) cluster.

VMware vSphere: Install, Configure, Manage Revision A

11-18
2011 VMware Inc. All rights reserved

Configuring vSphere HA Settings

Disable Host Monitoring when performing maintenance on any cluster/host. Enabled is the default setting.
Admission Control refers to the amount of available resources that can be used to start virtual machines on a specific ESXi host. The default setting is to disallow power and other operations that will violate the set Admission Control Policy.

Admission control helps ensure sufficient resources to provide high availability. Default setting is Host failures the cluster tolerates. VMware recommended setting
11-19
2011 VMware Inc. All rights reserved

VMware vSphere: Install, Configure, Manage Revision A

Admission Control Policy Choices

Policy Percentage of cluster resources reserved as failover spare capacity Host failures cluster tolerates

Description Reserves specified percentage of total capacity

Recommended use When virtual machines have highly variable CPU and memory reservations

Reserves enough resources When virtual machines to tolerate specified number have similar CPU/memory of host failures reservations and similar memory overheads Dedicates a host exclusively for failover service To accommodate organizational policies that dictate the use of a passive failover host

Specify a failover host

VMware vSphere: Install, Configure, Manage Revision A

11-20
2011 VMware Inc. All rights reserved

Configuring Virtual Machine Options


Configure options at the cluster level or per virtual machine.
VM restart priority determines relative order in which virtual machines are restarted after a host failure.

Host Isolation response determines what happens to virtual machines when a host loses the management network but continues running.

VMware vSphere: Install, Configure, Manage Revision A

11-21
2011 VMware Inc. All rights reserved

Configuring Virtual Machine Monitoring

Reset a virtual machine if its VMware Tools heartbeat or VMware Tools application heartbeats are not received.

Determine how quickly failures are detected.

Set monitoring sensitivity for individual virtual machines.

VMware vSphere: Install, Configure, Manage Revision A

11-22
2011 VMware Inc. All rights reserved

Importance of Redundant Heartbeat Networks

In a vSphere HA cluster, heartbeats are: Sent between the master and the slave hosts Used to determine if a master or slave host has failed Sent over a heartbeat network The heartbeat network is: Implemented using a VMkernel port marked for management Redundant heartbeat networks: Allow for the reliable detection of failures

VMware vSphere: Install, Configure, Manage Revision A

11-23
2011 VMware Inc. All rights reserved

Redundancy Using NIC Teaming

You can use NIC teaming to create a redundant heartbeat network on ESXi hosts. Both port groups must be VMkernel ports.

NIC teaming on an ESXi host


VMware vSphere: Install, Configure, Manage Revision A 11-24
2011 VMware Inc. All rights reserved

Redundancy Using Additional Networks


You can also create redundancy by configuring more heartbeat networks: On ESXi hosts, add one or more VMkernel networks marked for management traffic. Configure port group with these settings: Set Load Balancing to originating port ID. Do not enable Failback. Configure port group with active/standby failover.

VMware vSphere: Install, Configure, Manage Revision A

11-25
2011 VMware Inc. All rights reserved

Network Configuration and Maintenance

Before changing the networking configuration on the ESXi hosts (adding port groups, removing vSwitches): Deselect Enable Host Monitoring. Place the host in maintenance mode. These steps prevent unwanted attempts to fail over virtual machines.

VMware vSphere: Install, Configure, Manage Revision A

11-26
2011 VMware Inc. All rights reserved

Cluster Resource Allocation Tab

How much CPU and memory resources is the cluster using now? How much reserved capacity remains?

VMware vSphere: Install, Configure, Manage Revision A

11-27
2011 VMware Inc. All rights reserved

Monitoring Cluster Status


clusters Summary tab

The vSphere HA Cluster Status window displays details about host operational status, virtual machine protection, and heartbeat datastores The Configuration Issues window displays the current vSphere HA operational status, including the specific status and errors for each master and slave host in the cluster.

VMware vSphere: Install, Configure, Manage Revision A

11-28
2011 VMware Inc. All rights reserved

Lab 18

In this lab, you will modify slot sizes and admission control. 1. Create a cluster enabled for vSphere HA. 2. Add your ESXi host to a cluster. 3. Test vSphere HA functionality. 4. Prepare for the next lab.

VMware vSphere: Install, Configure, Manage Revision A

11-29
2011 VMware Inc. All rights reserved

Review of Learner Objectives

You should be able to do the following: Configure a vSphere HA cluster.

VMware vSphere: Install, Configure, Manage Revision A

11-30
2011 VMware Inc. All rights reserved

Lesson 3: vSphere High Availability Architecture

VMware vSphere: Install, Configure, Manage Revision A

11-31
2011 VMware Inc. All rights reserved

Learner Objectives

After this lesson, you should be able to do the following: Describe heartbeat mechanisms used by vSphere HA. Identify and discuss additional failure scenarios.

VMware vSphere: Install, Configure, Manage Revision A

11-32
2011 VMware Inc. All rights reserved

vSphere HA Architecture: Agent Communication


datastore datastore datastore

FDM vpxa hostd vpxa

FDM hostd vpxa

FDM hostd

ESXi host (slave)

ESXi host (slave)

ESXi host (master)

vpxd

vCenter Server
VMware vSphere: Install, Configure, Manage Revision A 11-33

= Management network

2011 VMware Inc. All rights reserved

vSphere HA Architecture: Network Heartbeats


NAS/NFS VMFS Local

virtual machine A virtual machine B

virtual machine C virtual machine D

virtual machine E virtual machine F

ESXi host
(slave)

ESXi host
(slave)

ESXi host
(master)

Management network 1

vCenter Server
VMware vSphere: Install, Configure, Manage Revision A 11-34

Management network 2

2011 VMware Inc. All rights reserved

vSphere HA Architecture: Datastore Heartbeats


NAS/NFS VMFS Local

virtual machine A virtual machine B

virtual machine C virtual machine D

virtual machine E virtual machine F

ESXi host
(slave)

ESXi host
(master)

ESXi host
(slave)

Cluster Edit Settings Window

vCenter Server
Management network 1 Management network 2

VMware vSphere: Install, Configure, Manage Revision A

11-35
2011 VMware Inc. All rights reserved

Additional HA Failure Scenarios

Slave host failure Master host failure Host isolation Management network failures

Network partition Network isolation

VMware vSphere: Install, Configure, Manage Revision A

11-36
2011 VMware Inc. All rights reserved

Failed Slave Host


NAS/NFS (lock file) VMFS (heartbeat region)
file locks file locks

virtual machine A virtual machine B

virtual machine C virtual machine D

virtual machine E virtual machine F

ESXi host
(slave)

ESXi host

ESXi host
(slave)

(master)

vCenter Server
VMware vSphere: Install, Configure, Manage Revision A 11-37

primary heartbeat network alternate heartbeat network

2011 VMware Inc. All rights reserved

Failed Master Host


NAS/NFS (lock file) VMFS (heartbeat region)
file locks file locks

virtual machine A virtual machine B

virtual machine C virtual machine D

virtual machine E virtual machine F

default gateway (isolation address)

ESXi host
Role: slave MOID: 98

ESXi host
Role: master MOID: 99

ESXi host

Role: slave MOID: 100

primary heartbeat network

vCenter Server
VMware vSphere: Install, Configure, Manage Revision A 11-38

alternate heartbeat network MOID = managed object ID

2011 VMware Inc. All rights reserved

Isolated Host

virtual machine A virtual machine B

virtual machine C virtual machine D

virtual machine E virtual machine F

ESXi host

ESXi host

ESXi host

The host is not observing any election traffic on the management and cannot ping its isolation address(es), the host is isolated.

default gateway (isolation address)

VMware vSphere: Install, Configure, Manage Revision A

11-39
2011 VMware Inc. All rights reserved

Design Considerations

Host isolation events can be minimized through good design Implement redundant heartbeat networks Implement redundant isolation addresses If host isolation events do occur, good design enables vSphere HA to determine whether the isolated host is still alive Implement datastores so that they are separated from the management network using one or both of the following approaches:

Fibre Channel over fibre optic Physically separating your IP storage network from the management network

VMware vSphere: Install, Configure, Manage Revision A

11-40
2011 VMware Inc. All rights reserved

Network Partition
virtual machine A virtual machine B virtual machine C virtual machine D virtual machine E virtual machine F virtual machine G virtual machine H

ESXi host
MASTER

ESXi host
SLAVE

ESXi host
SLAVE

ESXi host
SLAVE

MASTER

vCenter Server
VMware vSphere: Install, Configure, Manage Revision A 11-41

default gateway (isolation address)

2011 VMware Inc. All rights reserved

Review of Learner Objectives

You should be able to do the following: Describe heartbeat mechanisms used by vSphere HA Identify and discuss additional failure scenarios

VMware vSphere: Install, Configure, Manage Revision A

11-42
2011 VMware Inc. All rights reserved

Lesson 4: Introduction to Fault Tolerance

VMware vSphere: Install, Configure, Manage Revision A

11-43
2011 VMware Inc. All rights reserved

Learner Objectives

After this lesson, you should be able to do the following: List Fault Tolerance requirements and limitations. Describe Fault Tolerance operation.

VMware vSphere: Install, Configure, Manage Revision A

11-44
2011 VMware Inc. All rights reserved

What Is Fault Tolerance (FT)?


FT: A fault-tolerant system is designed so that, in the event of an unplanned outage, a backup virtual machine can immediately take over with no loss of service. (The backup virtual machine is called a secondary virtual machine.)

Provides a higher level of business continuity than vSphere HA Provides zero downtime and zero data loss for applications

FT can be used for any application that needs to be available at all times. FT can be used with DRS: Fault-tolerant virtual machines benefit from better initial placement and are included in the clusters load-balancing calculations.

VMware vSphere: Install, Configure, Manage Revision A

11-45
2011 VMware Inc. All rights reserved

VMware Fault Tolerance


Fault Tolerance Level of availability Amount of downtime Fault tolerance Zero Works with all supported guest operating systems

Guest operating systems supported

ESXi hardware supported

Widely compatible

Uses

Use to provide fault tolerance to your critical virtual machines.

VMware vSphere: Install, Configure, Manage Revision A

11-46
2011 VMware Inc. All rights reserved

Fault Tolerance in Action

vLockstep technology

vLockstep technology

primary VM

new secondary primary VM VM

new secondary VM

FT provides zero-downtime, zero-data-loss protection to virtual machines in a vSphere HA cluster.

VMware vSphere: Install, Configure, Manage Revision A

11-47
2011 VMware Inc. All rights reserved

Fault Tolerance Guidelines

Check the requirements and limitations of FT. Ensure enough ESXi hosts for fault-tolerant virtual machines: No more than four fault-tolerant virtual machines (primaries or secondaries) on any single host Store ISOs on shared storage for continuous access: Especially if used for important operations Disable BIOS-based power management: Prevents the secondary virtual machine from having insufficient CPU resources

VMware vSphere: Install, Configure, Manage Revision A

11-48
2011 VMware Inc. All rights reserved

Enabling Fault Tolerance on a Virtual Machine

VMware vSphere: Install, Configure, Manage Revision A

11-49
2011 VMware Inc. All rights reserved

Review of Learner Objectives

You should be able to do the following: List Fault Tolerance requirements and limitations. Describe Fault Tolerance operation.

VMware vSphere: Install, Configure, Manage Revision A

11-50
2011 VMware Inc. All rights reserved

Key Points

vSphere HA restarts virtual machines on the remaining hosts in the cluster. Hosts in vSphere HA clusters have a master/slave relationship. Implement redundant heartbeat networks either with NIC teaming or by creating additional heartbeat networks. FT provides zero downtime for applications that need to be available at all times.

VMware vSphere: Install, Configure, Manage Revision A

11-51
2011 VMware Inc. All rights reserved