Está en la página 1de 137

NetApp Hands-on-Lab

Version 2.0
Welcome to the NetApp
Hands-On-Lab
Introduction
Thank You for Choosing the NetApp Hands-On-Lab.
In this session, you will have an opportunity to test-drive NetApp storage and management technology. The lab will demonstrate
how Netapp tools help you improve efficiency, optimize capacity, and increase operational agility.
The lab is designed in two sections:
Section One focuses on the NetApp Virtual Storage Console (VSC) plugin for VMware vCenter. This section looks at how
virtualization administrators can use the NetApp VSC to improve efficiency by simplifying the provisioning and management of
storage in a vSphere environment.
Section Two focuses on NetApp Insight Balance performance analytics software. This section looks at how your organization
can use NetApp Insight Balance to identify emerging performance issues, optimize resource allocation, and realize
substantial cost savings in the process.
Two Things To Keep In MInd
1. Each lab exercise will showcase how NetApp's integrations with VMware vSphere help virtualization admins simplify the
management, provisioning and optimization of many elements of a vSphere environment; not just storage. NetApp products
are designed and built to provide greater efficiency and improve performance. So, while this is a NetApp Hands-On-Lab,
some of the exercises will focus on non-storage elements of the environment.
2. Insight Balance software is designed for heterogeneous environments. It offers support for products from many server and
storage vendors. As a result, you will see references to both NetApp and non-NetApp products; just as you probably would in
your own data center.
To thank you for choosing the NetApp Hands-on-Lab, we have a gift for you. You'll find more information about it at the end of
this lab manual.
We hope you enjoy the lab!
Welcome to Nephosoft!
Scenario
Welcome to Nephosoft!
Nephosoft is a multi-national software company that has expanded from a niche market of developing manufacturing and logistics
integration software to an enterprise application hosting firm.
Meet John Locke
John Locke is the Senior Virtualization Administrator at Nephosoft. John introduced vSphere to Nephosoft to and it's been a
game-changer for them.
This lab will follow John as he works through some of the service requests he receives in support of Project Vanquish. Nephosoft
is counting on vSphere to provide the flexibility and efficiency it needs to bring Project Vanquish in on time.
The exercises in this lab will be triggered by emails John receives. Others will follow him as he performs regular maintenance and
looks for ways to optimize his infrastructure.
John arrives in the office and begins working his way through the emails that are already piling up in his Inbox.
Now, let's get started!
Getting To Know the Lab
Environment
Looking Around: Meet The NetApp Lab
First, let's find our way around the lab environment.
Close Server Manager Interface
Please close the Server Manager window.
Click on the "X" on the top right corner of the interface.
Open the vSphere Client
The lab exercises start here, on the Control Center Desktop. Your first exercises will use the VMware vSphere Client, marked 1.
Additional exercises will use NetApp Insight Balance operational analytics software, marked 2. To begin the lab, double-click on
the VMware vSphere Client icon.
Log in to vCenter
When the vSphere Client login dialog appears, click Login.
The credentials fields are already populated for your convenience.
Finding the Home Screen
When the vSphere Client opens, you will be presented with the Hosts and Clusters view.
Click on the Home icon to move to the Home screen.
Home Screen
The home screen of VMware vCenter will appear. At the bottom of the screen, in the Solutions and Applications section of the
interface, you will find the NetApp icon.
Click the NetApp icon.
Security Alert
Dismiss the Security Alert that appears.
Click Yes.
NetApp Virtual Storage Console
The NetApp VSC Overview screen will appear. Let's take a look at the information this dialog contains.
On the left side are tabs controlling the information being presented and the actions you can choose from (highlighted in Blue).
On the right side, at the top, is information about the storage controllers that are connected to the vSphere environment
(highlighted in Green).
Information about the ESX hosts is presented at the bottom on the interface on the right side (highlighted in Red).
Update Storage Discovery
On the top right corner of the dialog, in the section presenting the list of storage controllers, you will find the Update link. This link
causes the VSC to scan the environment for storage controllers. Click Update now to begin the storage discovery process. This
process will populate the vSphere client with the list of controllers you will be using in subsequent exercises.
Select the "Provisioning and Cloning" Tab
You will now use the VSC to add a storage controller to your VMware environment.
On the left side of the VSC interface, select the Provisioning and Cloning tab.
Adding Your Storage Controllers
A list of actions will appear on the left side of the dialog, in the Provisioning and Cloning menu.
Click on Storage Controllers.
The Update process begun above has discovered the available NetApp storage controllers in the environment and populated this
dialog with a list of those controllers. Once they have been discovered, the NetApp VSC offers you the ability to manage and
provision NetApp storage for the VMware vSphere environment; simply and easily. The discovery process has populated the list
with the NetApp controller we will be using in this lab.
Please proceed to the next lesson.
Optimizing Storage Using
NetApp Best Practices
Configuring Your ESX Hosts for NetApp Best Practices
The first email John opens is from his one of the IT managers, Bill Buzzworth. Bill is being pressured to ensure the readiness of
his data center infrastructure for the Project Vanquish launch. As a step in those preparations, Bill is asking John to take a look at
each ESX server and verify that its storage settings are optimized for NetApp Best Practices. After all, Nephosoft is primarily a
NetApp shop.
John knows he can use the NetApp VSC to verify ESX storage settings, and if needed, correct them; all using just a couple of
mouse clicks.
Let's walk through the process.
Return to Host Configuration
The Monitoring and Host Configuration tab on the left side of the dialog opens the Overview pane containing information about the
vSphere environment.
Click the Monitoring and Host Configuration tab
Monitoring for Best Practices
NetApp recognizes that the configuration of the storage settings on an ESXi host is critical to ensure optimal storage performance
for vSphere. The VSC monitors the storage configuration of the ESXi hosts in the environment to ensure that they comply with
NetApp Best Practices for vSphere (Best Practices can be found in NetApp Technical Report TR-3749 at www.netapp.com).
This information is presented on the lower right side of the Overview pane. The presence of an alert indicates that there is a
configuration error on one of the servers. Let's use the VSC to correct the error and clear the alert.
Right-click menu integrations...
Right-click on the server esx-03a.corp.local and select "Show Details..."
Examining ESXi Host Details
The ESX Host Details dialog will appear. Notice how all of the NFS storage settings are GREEN (optimized) with the exception of
NFS.HeartbeatTimeout, which is in RED (not optimized).
Clicking the "X" in the top right corner to close the dialog and return to the Overview screen.
Applying All Advanced Settings with a few mouse clicks...
Right click on the host again, and select "Set Recommended Values..."
Configuring NetApp Recommended Settings
All NetApp controllers are true unified storage platforms, supporting both block and file protocols on the same controllers. This
dialog reflects that flexibility, presenting administrators with choices to optimize storage configuration settings for both block and
NFS storage.
In this case, the alert was generated by a error in the NFS portion of the ESXi host configuration.
UNCHECK the box for both HBA/CNA Adapter Settings and MPIO (Multi-path I/O) as both settings are for block storage;
something Nephosoft is not using.
CHECK the box next to NFS Settings.
Click OK.
NFS Settings Repaired
The configuration process will complete in a moment. Notice that the alert has cleared, indicating that the host's storage
configuration is now optimized.
Verifying the NFS Settings
Once again, right-click on host esx-03a-corp-local and select "Show Details..."
Notice that the ESX/i configuration setting NFS.HeartbeatTimeout is now set to "5" and that the value is in GREEN (optimized).
You have just optimized the storage configuration of an ESX host storage for use with NetApp storage.
Close the Host Details Pane
Dismiss the Host Details pane.
Click on the "X" on the top right corner.
The Infrastructure Is Ready
Thanks to the NetApp VSC, John was able to find and correct the storage configuration of his ESX servers so that they are in line
with NetApp Best Practices. Now, the infrastructure will be one step closer to ready for the Project Vanquish launch.
Reclaiming Your Storage
Using NetApp Deduplication
Understanding and Configuring Datastore Deduplication
John's next task for the day is in follow-up to an email he received yesterday.
Before John left for the day, Rajesh Kumar, the Director of Engineering, emailed him to ask for some help. Rajesh was concerned
that the Test/Dev team in Bangalore was about to run out of storage space to validate the output of Project Vanquish. The lack of
resources threatened the project's timely delivery. Something had to be done.
To solve that problem John decided to enable block-level deduplication of primary storage. He knows that NetApp deduplication
can reclaim over 90% of storage space in some VMware environments; particularly those that use a large number of similar VMs.
Test/Dev usually has a large number of duplicate VMs. It is a perfect candidate to reap the benefits of storage deduplication.
Because the initial deduplication process has a temporary impact on performance, John enabled the feature and scheduled it to
run overnight. It's the next morning and John wants to see how much space NetApp deduplication has reclaimed.
Let's take a look.
Check the Datastores
Let's first take a look at the current storage layout.
Click on "Storage Details - NAS".
Deduplication Management in VSC
In the top-right portion of the Storage Details - NAS view, there are two NFS datastores listed: nfs1 and nfs2.
If it is not already selected, click on nfs1.
NFS Datastore Details
The datastore Details pane presents virtualization administrators with a detailed view of the information about that datastore.
Note that the VSC offers information about the datastore utilization (highlighted in GREEN), as well as the vSphere hosts that are
connected to that datastore (highlighted in RED).
Storage Efficiency with Deduplication
At the bottom left of the Details dialog is the Deduplication pane. Look at the interface to confirm that:
1. deduplication is enabled on this datastore,
2. it is scheduled to run every night at midnight (sat-sun@0).
As expected, NetApp deduplication has improved storage efficiency on this datastore by 97%.
John sends an email to the Rajesh letting him know that the storage space is now available to the Test/Dev team.
Enabling Deduplication on a Volume
Now let's enable deduplication on the other NFS volume.
Click Home at the top of the dialog to return to the Home Screen.
Click on Hosts and Clusters.
Select the Storage Volume to Deduplicate
Click on host "esx-01a.corp.local" on the left side of the dialog.
The datastores accessed by this host will appear on the right side of the dialog.
Select a Volume to Deduplicate
Right click on the nfs2 datastore.
Select NetApp->Provisioning and Cloning->Deduplication Management, from the menu.
Enable Deduplication on the Volume
Check the box on the right side of the dialog, next to "Enable deduplication".
As it is best to run the deduplication process during non-production hours, do not check Start Deduplication, or Scan.
Click OK.
Confirm The Change
Click "Yes" to confirm that you wish to enable deduplication on this volume.
You have now enabled deduplication on volume nfs2. The process will run at midnight, the default schedule.
Back On Track
Thanks to the VSC and NetApp's deduplication of primary storage, Rajesh's QA team in Bangalore was able to recoup over 90% of
its allocated storage. The validation process for Project Vanquish is back on track.
Simplifying Datastore
Provisioning
Provisioning Datastores Quickly and Easily
John has received an email from the Mira Oshenko, Nephosoft's Director of Operations.
Mira informs John that a new Logistics department is being created to handle the massive demand Nephosoft expects Project
Vanquish to drive. The email asks John to make the necessary resources available. Mira asks John to provision a 100GB
datastore and make it available immediately.
Nephosoft has a "virtualize first" policy. John knows that the new department will require storage resources to house its VMs. He
will need to provision a new storage volume and make it available to the vSphere cluster serving the new department.
Fortunately, the VSC can help.
Home Screen
Click Home to return to the main vSphere Infrastructure Client screen.
Hosts and Clusters
Click on Hosts and Clusters.
Pick a Cluster
Select "A-Cluster" as the provisioning target.
Notice that the provisioning target is the cluster; not a specific host. Choosing to act at the cluster-level allows VSC to connect the
new datastore to all of the ESX hosts in that cluster automatically. Adding a datastore to a specific host is just as easy. Simply
select the host, rather than the cluster, and follow the steps below.
Right-Click To Provision
Right-click on "A-Cluster" and select NetApp -> Provisioning and Cloning-> Provision datastore.
Accept Certificate Warning
If a certificate warning appears, dismiss it by Clicking Yes.
Note: This warning may not appear if the vSphere Infrastructure Client was not closed after the first lesson.
Select the Target Storage Controller
The NetApp Datastore Provisioning Wizard will appear. In this wizard, you can select the target controller that will house the new
volume.
Accept the default, "vsim-l-01a".
Click the Next button on the bottom right of the dialog.
Select Storage Protocol (NFS)
In this exercise, you will be creating an NFS volume. NFS is the default value and is already selected.
Please note: The messages at the bottom of the dialog are normal and may be safely ignored.
Click Next.
Specify the Details of the New Datastore
In the datastore configuration dialog field, enter the following value:
Size (GB): 1000
Notice that an alert appears next to the size field. Mouse over that alert and an error message appears, stating, "The maximum
value for this field is 8.109375."
This alert appears because the value you have entered (1000 GB) exceeds the actual space available (8.10GB) in the chosen
aggregate.
The VSC is attempting to validate that the aggregate can accommodate the volume size being requested.
Ignore the error and continue entering the values listed below.
Specify the Details of the New Datastore (Continued)
Enter the following values in their respective fields, ignoring all others:
Datastore name: Logistics
Aggregate: aggr0
ThinProv: Checked
Notice that error message disappears when "ThinProv" is enabled. "Thin provisioning" creates a volume that consumes storage
capacity only as needed. Because little or no space is required at the time the volume is created, the 1000GB of virtual space can
"fit" in the 8.10 GB of space available. As a result, there is no longer an error and the alert clears.
Click Next
Verify the Settings
The Summary dialog will appear, providing you with an opportunity to verify the setting you entered.
Confirm your entries and click the "Apply" button at the bottom right of the dialog.
View the Results of the Datastore Creation
New tasks will appear in the Recent Tasks interface of the vSphere Client.
1) A new NFS datastore will be created on the NetApp controller attached to the cluster. As the action was taken using the VSC, the
datastore will be configured with settings that are optimized for vSphere.
2) Selecting the "thin provisioning" option ensures that the datastore creation completes extremely quickly.
3) The datastore will not require its full compliment of space at the time of its creation. Rather, it will grow automatically, on
demand, as additional space is required to house the growing number of VMs the Logistics department needs.
3) The datastore will be mounted (connected) to all the ESX hosts in the cluster, immediately and automatically.
The VSC allows you to simplify and automate the management of NetApp storage for your vSphere environment, saving you time
and making you more effective.
Please Note: This process will take some time to complete. Please watch the tasks as they scroll through the "Task Pane" at
the bottom of the interface to follow the progress.
Checking the Results
Click on any ESX host in the cluster.
Select the Configuration Tab
Select the Configuration tab on right side of the dialog.
Access the Storage Information
Select Storage in the Hardware pane.
Verifying the Result
The new storage is connected to this ESX host.
Ready to Go!
Nephosoft now has its infrastructure ready for the creation of its Logistics department. It is one step closer to shipping Project
Vanquish.
But there's an alert! John remembers that Mira asked for a 100GB Datastore. John appears to have added an extra '0' to the
value. We'll fix that in the next lesson.
Growing or Shrinking Live
Datastores With Ease
Resizing Live, Production Datastores Non-disruptively
In the previous lesson, John was provisioning storage for the new Logistics department. Using the NetApp VSC, he was able to
provision a new datastore to all hosts in a vSphere cluster with just a few mouse clicks. Unfortunately, he accidentally added an
extra zero to the value he entered in the Datastore Provisioning Wizard. As a result, John created a datastore with a size of 1TB
(1000GB) instead of 100GB datastore he intended.
Fortunately, that's not a problem. NetApp storage allows the dynamic resizing (growing or shrinking) of live datastores. In this
lesson, we'll shrink Johns 1000GB datastore to 100GB using the resizing capabilities available in the VSC.
Starting the Resizing Process
Click "Home" at the top of the vSphere Client
Click Datastores and Datastore Clusters
In the Inventory pane, select "Datastores and Datastore Clusters".
Expanding the List of Datastores
The list of datastores is not visible.
1. Click on the "+" sign next to vc-w8-01.corp.local to expand the tree.
2. Click on the "+" sign next to "Corp" to reveal the complete list of datastores.
Select the "Logistics" Datastore
Select Logistics from the list of available datastores.
Right-Click To Resize
Right-click on the Logistics datastore and select NetApp -> Provisioning and Cloning -> Resize
Correct the Size of the Datastore
The Resize Datastore wizard will appear.
This interface presents the Volume Settings information for the Logistics datastore, and offers the ability to resize that datastore by
entering a new value in the New datastore size field.
Clear this field and enter the correct value of "100".
Click OK.
Confirm the Change
Click "Yes" to confirm the change.
View Summary
The information in the Capacity pane (highlighjted in RED) on the Summary tab indicates that we have shrunk the datastore from
1000GB to the proper size of 100GB.
By using the VSC to shrink the datastore, the change propagated to all the hosts in the vSphere cluster automatically, as the
Recent Tasks pane indicates (highlighted in GREEN).
Please Note: This process will take a few minutes to complete. Please watch the tasks as they scroll through the "Task Pane"
at the bottom of the interface to follow the progress.
Conclusion
Now, John has corrected his error and the volume size has been corrected.
As this lesson has shown, NetApp storage allows you to shrink a Live, Production datastore, non-disruptively. However, not only
can you shrink a datastore, you can also grow it to accommodate increases in demand; and the VSC puts that functionality a few
mouse clicks away.
Cloning Virtual Machines
Using FlexClone
Rapidly Cloning Virtual Machines
John receives an email from Beth Williams, the Director of Finance. In anticipation of the massive growth in revenue from Project
Vanquish, she will be expanding the Accounting department by 19 employees. Nephosoft's onboarding policy was updated
recently to require that all new employee be provided with a virtual machines as their primary desktops.
John has used vSphere's cloning capability in the past to deploy desktop VMs. However, he knows that NetApp's native Rapid
Cloning capability can deploy those clones more quickly and with greater space efficiency.
Lets walk through the process.
Preparing to Deploy VM Clones Using the VSC
Select Home->VMs and Templates in the vSphere Infrastructure Client.
Choose the Template to Clone
Select "Accounting Template" from the list of available VMs.
Right-Click to Clone
Right-click on the Accounting Template.
Select NetApp -> Provisioning and Cloning -> Create rapid clones
Create Rapid Clones Wizard
The Rapid Clones Wizard offers the ability to select which storage controller to use to house the newly cloned VMs.
Accept the default value and click Next.
Cloning to the Cluster
The Rapid Cloning Wizard requires you to select a target ESX server to host the clones. For the purpose of this lab, we will be
creating clones at the cluster-level.
Why deploy clones onto the cluster?
The VSC offers vSphere-aware, intelligent cloning. The rapid cloning utility understands that spreading workloads across all the
members of the vSphere cluster will help prevent any single ESX host from becoming overburdened. To balance the load created
by these new clones, the VSC's cloning algorithm places VMs in a round-robin fashion across all the hosts in the chosen cluster. If
a specific ESX host was selected, the clones would be created on that host.
Select "A-Cluster" and click Next.
Choose the Clones' Disk Format
The VSC allows you to specify what disk format to use.
In this case, the default value, "Same format as source", will do.
Click Next.
Virtual Machine Details
The Rapid Cloning Wizard offers you the ability to customize settings for your clones.
Please Note: The VMware lab environment is a shared infrastructure. In order to ensure that lab resources are available for all
lab students, please customize only the following settings:
Memory Size (MB): 128
Number of clones: 19 (Be sure that you verify that the number you have entered in this field is "19".)
Clone name: Accting
WARNING: Attempting to change any other settings in this dialog will cause errors that may prevent you from completing
your lab at this time.
Ensure that your settings match those pictured above and click Next.
Select the Target Datastore
We will be placing these clones in the nfs2 datastore.
Select nfs2 and click Next.
Review Summary
Verify that your settings match those in the sample diagram and click Apply.
Sit Back and Watch the Magic Happen...
Cloning VMs using the VSC allows vSphere to take advantage of NetApp's native cloning feature, FlexClone. Using the intelligence
of NetApp's WAFL (Write Anywhere File Layout), Rapid Cloning on NetApp storage delivers VM clones extremely quickly and with a
near-zero storage footprint.
Please note: This task may take several minutes to complete in the lab environment. On physical NetApp storage, the process
completes much more quickly.
Cloning VMs using the VSC allows vSphere to take advantage of NetApp's native cloning feature, FlexClone. Using the intelligence
of NetApp's WAFL (Write Anywhere File Layout), Rapid Cloning on NetApp storage delivers VM clones extremely quickly and with a
near-zero storage footprint.
Using NetApp's VSC, FlexClone and VMware vSphere, John has managed to ready Nephosoft's infrastructure for 19 new
accountants.
Note: The VSC cloning utility offers other options such as customizing the VMs with Microsoft SysPrep, or registering the clones with
the VMware View Connection Server. However, those features are beyond the scope of this lab.
Make No Mistake! -
Destroying Datastores
Without Fear
Destroying Datastores Intelligently Using The VSC
John receives another email from the Rajesh.
Rajesh informs John that management has decided that they will be moving the Engineering desktop infrastructure into the cloud.
They believe that providing a cloud-based development environment will expedite the delivery of Project Vanquish. As a result of
this change, Engineering will no longer need the VM's that John has provisioned for the engineers, or the storage that he had
allocated for them.
The NetApp VSC makes it easy to disconnect and destroy a datastore serving a vSphere cluster. Let's walk through the process of
destroying both the Engineering storage volume and the VMs that volume contains.
Select the Storage Volume
Engineering uses the nfs1 datastore.
In the Home->Inventory->Datastores and Datastore Clusters view, select the "nfs1" datastore.
Right-Click To Destroy
Right-click on the nfs1 datastore and select NetApp -> Provisioning and Cloning -> Destroy
Ensure the Integrity of vSphere Before Destroying the Datastore
The NetApp VSC has a built-in understanding of vSphere. It recognizes that a datastore serving a vSphere cluster will likely contain
virtual machine data. So, before destroying any datastore, the VSC...
1) ...examines the datastore and the vSphere environment for any VM data living on that storage,
2) ...presents a list of the VMs that will be affected by destroying the datastore,
3) ...warns you that the VMs will be unregistered from vCenter and destroyed,
4) ...asks you to confirm the you intend to destroy the VMs and the datastore housing them,
...and only then does the VSC destroy the VM data and the datastore on which they resided.
In this way, the VSC's awareness of vSphere helps protect you from taking actions with unintended consequences.
Verify the List of VMs
As expected, the nfs1 datastore contains the Engineering VMs. This dialog presents the list of VMs that will be destroyed along
with the datastore.
Acknowledge the intended destruction by clicking "OK".
A Final Layer of Protection
The VSC will ask you again. After all, you don't want to make a mistake.
Click Yes.
Watch the Carnage
The VSC will complete the datastore destruction process automatically.
It will...
1) ...unregister all the VMs contained in the targeted datastore from the vSphere cluster.
2) ...unmount the datastore from all the ESX hosts that are connected to it.
3) ...disable the export on the NetApp controller.
4) ...and delete the volume on the backend storage.
The native vSphere awareness built into the VSC allows you to destroy the datastore in one easy workflow, while still protecting you
from unintended consequences.
Please Note: This process will take a few minutes to complete. Please watch the tasks as they scroll through the "Task Pane"
at the bottom of the interface to follow the progress.
Engineering In the Cloud
The Engineering desktops have been migrated into the cloud and Rajesh can now use the reclaimed storage for other purposes.
Fortunately, the NetApp VSC allowed John to make the change quickly without worry about making a serious mistake that could
endanger Project Vanquish.
Close the vSphere Infrastructure Client
This exercise completes the first section of the NetApp lab.
Please close the vSphere Infrastructure Client by clicking the "X" on the top right corner of the interface.
Optimizing Performance and
Capacity
Troubleshooting Business Critical Apps
This portion of the NetApp Hands-on-Lab will introduce you to OnCommand Insight Balance.
Insight Balance is a powerful operational analytics software package that gathers performance and capacity information about your
environment and analyzes it to help you identify both existing and emergent performance issues. The Insight Balance portion of the
lab will call on your abilities to use this performance information:
to investigate a problem
gather information about it
identify the root causes
recommend a solution strategy.
The exercises are intended to demonstrate how you can use the information Insight Balance contains to improve the performance
and efficiency of your infrastructure.
Let's begin.
The DBA
John's next email comes from Dave Hofmeyer, Nephosoft's Database Administrators (DBA). Dave reports that users are
complaining about the poor performance of a database application that is being used to support Project Vanquish. The
performance issue is having an impact on Project Vanquish calculations.
John launches OnCommand Insight Balance, NetApp's comprehensive operational analytics package, to identify the cause of the
problem and find a fix for it.
Launch OnCommand Insight Balance
Double-click on the desktop icon for NetApp Insight Balance to launch the web interface.
Accept Certificate Warning
A message warning that the website's certificate is not trust will appear.
Click on Continue to this website to dismiss the message.
Login to the Web Console
A login prompt will appear in the browser window.
Enter the following information in the respective fields:
User: admin
Password: password
Click Login.
Please note: The login is saved and may be auto-populated by selecting "admin" from the drop-down list of saved logins.
Understanding the Insight Balance Dashboard
When the Insight Balance interface appears, maximize the size of the window.
Login will place you in the Insight Balance Dashboard. Let's familiarize ourselves with the dashboard interface.
1. The top section (highlighted in RED) presents a set of tabs that allows you to select a segment of infrastructure on which to
focus.
2. The next section, Application and Status by Priority (highlighted in YELLOW), presents the status of specific applications,
calling out the elements of the infrastructure that they use and presenting the status of those components.
3. The third section (highlighted in GREEN) presents a easy-to-consume visual presentation summarizing the status of the
various parts of the infrastructure.
4. The bottom section, Recent Alarms (highlighted in BLUE), presents the most recent alarms that were generated.
Each of these sections offers links allowing you to drill in further to gather additional information. Take a moment to look around
before continuing.
Looking Into the Database Performance Problem
Upon looking at the dashboard, John notices that the database server, Windows2003_1_MsSQL, hosting the Microsoft SQL
database has an overall status (RED), suggesting there is a problem.
Looking at the right side of the interface, John notices that both the memory and CPU status are normal (GREEN). However, the
storage status is critical (RED). This combination of statuses suggests that the server is experiencing an issue related to storage.
John decides to investigate further.
Click on Windows2003_1_MsSQL, on the left side of the interface.
Server Summary Page
The Server page presents detailed information about the database server, Windows2003_1_MsSQL.
Tabs at the top of the screen (CPU, Memory, Storage or Host Contention) provide access to basic statistics for this virtual machine;
statistics similar to those found in VMware vCenter. However, Insight Balance adds a powerful analytics engine that translates
those statistics into information that you can use to identify and remediate performance issues in the infrastructure; even at a
VM-granular level.
Note: We will address the misaligned LUN warning at the top of the page in a later lesson.
Trouble in Storageville!
John notices the red exclamation (!) on the Storage tab. The presence of an error on this tab reinforces John's suspicion that the
server is experiencing a storage-related performance issue.
Perhaps the information in the graphs below on the Summary page can help John get started diagnosing the problem.
Drilling In To Diagnose the Issue
The top chart, Infrastructure Response Time (IRT), presents the average total response time for the VM's data processing events,
broken down by CPU Response Time and Storage Response Time. At the top of the chart are several statistical sample periods
(from one day to one year) to choose from.
Click the one day button (1d)
Identifying Relative Contributions to IRT
The information in the IRT chart is very telling.
The IRT values for transactions processed by MSSQL VM are hovering in the range of 300 to 350 milliseconds (ms), in the PINK
band. Of that time, the CPU is contributing only 4 ms; a contribution so small it is barely visible on the chart (GREEN). On the other
hand, storage latency is creating the remainder of the IRT; approximately 300-345 ms (RED).
If we managed to cut the CPU response time in half , the IRT would only decrease by 2 ms, improving performance by less than
one percent.
However, if we halved the storage response time, the IRT would be cut in half, reducing the IRT by over 150 ms.
As storage is contributing over 99% of transactional latency, it appears we should focus our efforts there.
Drilling Into the Data Topology
Now that we know the storage layer is introducing the vast majority of the transactional latency, it's time to drill into the environment
and see what infrastructure components are involved in the flow of data. The Data Topology tab provides just such a view.
Click on the Data Topology tab.
Observations
The Data Topology view provides a clear layout of how the pieces of the infrastructure supporting the MSSQL service are
connected; from the database, to the VM, to the virtualization host, to the storage.
In this case, two VMs are hosting database instances: Windows2003_1_MsSQL and MasterMsSQL. The MasterMsSQL
instance has objects on two volumes: the C: and E: drives. The status of the C: drive is GREEN, while the status of the E: drive is
RED.
It appears that the storage volume supporting the E: drive, disk group 1 (a RAID-5 set composed of 4 drives), is experiencing
performance issues. We will need to take a closer look at the E: drive.
Drilling In Further
Click on the E: drive to take a closer look.
Making the Path Clearer
Selecting the E: drive highlights its entire data path. As the flow is now highlighted, it becomes clear that all objects on the path are
experiencing performance issues. Notice that the icons representing the database, the storage volume, the vSphere connections,
and the RAID group on the storage array are all either red or yellow. In general, Insight Balance best practices recommend
troubleshooting an issue from right to left in the interface, as "upstream" objects, like servers, inherit problems that are
experienced at the physical disk-layer.
Examining the RAID Group
Since it's best to begin troubleshooting this issue at the level of the RAID disk group, this view of the topology is not ideal. It would
be helpful to re-orient the display so that the topology of the connections to the disk group was more clearly visible.Fortunately,
Insight Balance offers the "Re-orient" feature, allowing you to shift the focus of the display to highlight a specific piece of the
infrastructure.
Right-click on red Array disk group labeled 1(4) and select Re-orient Topology.
A Clearer View
After the re-orientation, the focus of the topology is now centered on disk group 1 (denoted by the yellow halo). In this view, all
infrastructure dependencies on disk group 1 are now clearly evident.
Notice that there are multiple server volumes with a red status connected to the disk group. While this view is better, we can
improve the clarity of this display even further.
Left click and drag the disk group so that disk group 1 is centered.
Clearer Still
After shifting the placement of Disk Group 1, it is easy to see that multiple hosts (highlighted by RED ARROWS) are being affected
by the performance issue of this disk group.
As the issue is affecting multiple hosts, the need to isolate the cause of the issue becomes more urgent.
Now we really need to determine what storage dependency these hosts have in common.
Summary Page
To display information about these hosts, right-click the disk group and select "Open Summary Page".
The Performance Summary Page
The link will take you to the Performance Summary display. A little orientation will be helpful.
1. The top bar (highlighted in RED) indicates that we are on the Performance page.
2. The first graph (highlighted in YELLOW) presents a view of the Infrastructure Response Time.
3. The next graph (highlighted in GREEN) presents Throughput information. Measured in IOPS (Inputs/Outputs Operations Per
Second), this chart indicates storage performance.
4. The final section (highlighted in BLUE) presents I/O, Response Time, and Disk Utilization information for the dependent hosts.
The list of hosts is sorted by the resources each host is consuming, from highest to lowest.
Now, let's take a closer look at the information in two sections.
Infrastructure Response Time vs. Disk Utilization
Please Note: The sample image provided is a composite of two sample for your convenience.
The dark grey section of the top graph indicates that disk utilization is approaching 100% (RED). This value represents the total of
the disk utilization for all workloads using that disk group. This value is based on throughput, response time, and queue depth. As
the dark grey section extends upward, disk utilization increases, peaking at a maximum value of 100%.
The light grey section of the graph represents response time for the host. As the light grey section extends downward, response
time deteriorates, adding milliseconds to each transaction.
1. Mouse over the dark grey Disk Utilization section for the time period November 3, 2010 10:00 am to 4:00 pm. Notice that
the disk utilization has increased and remained quite high.
2. Now mouse over the light grey Response Time section for the time period November 3, 2010 10:00 am to 4:00 pm.. Notice
that, as disk utilization rises, response time nearly doubles, from 15ms to 34ms.
This information reinforces the conclusion that the performance issue is related to disk contention.
A Closer Look at Disk IOPS
Please Note: The sample image is a composite of two samples for your convenience.
The next graph displays the IOPS the disks performed during that same time interval.
1. Mouse over the dark grey section (Read IOPS) for the time period November 3, 2010 10:00 am to 4:00 pm. Notice that disk
read activity increases from 625 IOPS to 908 IOPS during this period.
2. Now mouse over the light grey section (Write IOPS) for the time period November 3, 2010 10:00 am to 4:00 pm.. Notice
that, disk write activity increases from 470 IOPS to 580 IOPS, as well.
Nephosoft uses RAID-5 as its default configuration. This disk group is using RAID-5, as well. RAID-5 is optimized for reading data,
as each write requires the calculation and writing of parity data to allow data recovery in the event of a disk failure. RAID5
configurations are not particularly efficient. As a result, the number of IOPS is exceeding the I/O capacity of this disk group.
The high percentage of writes also suggests that the I/O characteristics of the applications using it are incompatible with RAID-5. In
that case, using this disk group was a poor choice,
Using this information, we can concluded that the performance problem with the Raid Group 1 is caused by:
an excessive amount of raw IO
application IO characteristics that are too write-intensive for RAID-5.
Now, the critical question is: Which VM is consuming excessive resources?
Looking For a Bully
The set of graphs at the bottom of the page presents the list of servers/VMs using disk group 1, with those consuming the largest
volume of resources at the top of the list. As virtualization relies on shared resources, any VM that consumes excessive resources
may affect the operation of other VMs relying on those same resources; in essence "bullying" them by taking the resources they rely
upon.
At the top of the list of VMs are Win2K3 Converter and the MasterMsSQL server. These two VMs are the most likely bullies.
However, we will need more information to be certain.
That information can be found on the Contention tab (at the top of the page) or simply by clicking on the win2k3_Converter server
link in the current window.
Click on the win2k3_Converter server link (highlighted in RED).
VM IO Details
The win2k3_Converter server link brings you to the page presenting the total IOPS for Disk Group 1, the win2k3_Converter server's
only disk group. If the VM used multiple disk groups, all would be displayed on this page.
The performance issue seems to be related to a specific disk on a VM, so it makes sense to look at the performance of the volume
behind that disk.
Click on the Volumes(3) link to see the IO break down by volume instead of by disk group.
Please Note: The interface relies on Java. If the Java loading message appears and does not clear after a minute or two,
please refresh the page by hitting the "Refresh" button on the browser menu bar.
The Volume Summary Overview
The first thing that grabs your attention in the Volume Summary Overview pane is the red exclamation mark (!) next to the volume
mapping to drive E:. This error state suggests there is a critical issue with that volume.
Combining the following:
1. the presence of that exclamation mark indicating a critical error
2. the IO for that VM placed it at the top of the list of resource consumers
...makes the VM is an excellent candidate for some additional scrutiny. Let's drill in further.
Click the details link to the right of the E: volume.
The Volume Details Overview Page
The Volume Details Overview page contains information about the performance of the selected volume.. Let's examine it, as well.
1. The top graph, Perceived Disk Utilization (highlighted in RED), presents information about disk utilization from the server's
(VM's) perspective.
2. The second graph, Outstanding IO (highlighted in YELLOW) presents the average length of the volume's IO queue waiting to
be processed.
3. The third graph, Response Time (highlighted in ORANGE) presents response time for the volume.(Explained earlier)
4. The fourth graph, Throughput (highlighted in GREEN) conveys Read and Write IO statistics for the volume. (Explained earlier).
5. The final graph, Capacity (highlighted in BLUE), conveys the amount of available and consumed space on the volume.
For purposes of this exercise, we will be concerned with two data sets: Perceived Disk Utilization and Throughput.
The Impact of Perceived Disk Utilization
So, how can we use Perceived Disk Utilization to diagnose the performance issue?
In general, the amount of disk utilization reported by an operating system (OS) is not reliable, as the OS believes it is talking to a
single disk rather than the multiple disks of a RAID group on shared storage. Since Insight Balance has end-to-end visibility into
the storage infrastructure , it can correct that OS-derived value so that it represents the state of disk utilization accurately, taking into
account all parts of the storage fabric.
Looking at the graph:
1. The yellow line represents the threshold at which a disk performance warning level has been reached.
2. The red line represents the threshold at which a critical state has been reached.
3. The green background represents the Perceived Disk Utilization the server is actually experiencing.
In that the amount of Perceived Disk Utilization exceeds both the warning and critical thresholds by a significant amount, the server
(VM) using this volume is clearly experiencing constrained IO.
We now know that the server is having difficulty getting enough throughput to this volume to service the I/O requests that
applications are demanding. This information confirms that this volume on this server is the likely culprit.
But why is this volume having issues?
Examining Throughput
Note: The sample image is a composite of two samples for your convenience.
The fourth graph, Throughput, presents the read and write IOPS for the volume under examination.
The data indicate that the volume of IO is exceeding a total of 350 IOPS (totaling read and write IO). This volume of IO cannot be
sustained by a four-disk RAID set. As a result, Win2K3_converter is generating more traffic than the disk group can manage.
Compounding the problems caused by excessive IO, the ratio of read IOPS to write IOPS is nearly 1 to 1. As was mentioned earlier,
RAID is optimized for read operations. A workload with this percentage of reads to writes is not compatible with a RAID-5
configuration; particularly a RAID group with only four disk spindles to share the load.
This graph confirms the cause of the performance problem: an overutilized RAID group with excessive IO activity caused by a
bullying VM.
Returning to the Data Topology View
Click on the Data Topology subtab.
Center the Data Topology Display
Click and hold the Win2k3_Converter icon and drag it to center the display.
We Have Our Bully!
Through careful investigation, John concludes that the performance problems on the Windows2003_1_MsSQL server are a result
of a poor storage configuration.
1. Win2K3_converter is generating more IO than the volume can handle.
2. The small RAID set is being overutilized when accounting for the traffic generated by Win2K3_converter; let alone the
combined traffic of Win2K3_converter and MsSQL.
To solve the performance problem, Win2K3_converter's VM image files should be migrated to a datastore whose configuration is
more appropriate for such a workload - perhaps a 16-disk NetApp RAID DP aggregate (Come to the NetApp booth or visit
www.netapp.com for information about what RAID DP is.)
Clearing the Path for Project Vanquish
John explains the cause of the issue to Dave and lets him know what will need to be done to fix it.
Optimizing VM Workload Distribution
John's next task focuses on maintaining the performance of his virtualization infrastructure.
Project Vanquish development has really been driving the infrastructure hard. John knows that there is probably some additional
optimization that he can do to help distribute the load evenly across all the ESX hosts. However, how can he be sure how many
workloads a given host can support? Since he won't be able to expand his vSphere deployment until next year, he needs to get
everything he can out of the infrastructure he has now.
Keeping that goal in mind, John notices that one of his ESX hosts, "esx21" seems to be experiencing periodic performance issues.
He believes the distribution of VMs across his cluster may not be ideal.
John wants to figure out how many VMs he can run on esx21 while maintaining optimal performance and efficiency.
Insight Balance can help...
Choosing a Virtualization Host to Optimize
The Resource Lookup field is at top of the any page in the Insight Balance user interface.
Type esx in that field.
A list of hostnames containing the string "esx" will appear.
Click esxcs21.corp.local in that list.
Exploring the Host Summary Page
The link will place you on the Host Summary page for host esx21.corp.local.
As Insight Balance is intended for heterogeneous environments, if this host were running Hyper-V, Insight Balance would provide
information relevant to that hypervisor. However, esx21 is a vSphere host, so the Insight Balance interface contains the following
information relevant to vSphere.
1. The first section (highlighted in RED) provides a summary of basic information about the selected host, including tabs offering
opportunities to drill in for more detail.
2. The second section, VMs (highlighted in YELLOW), includes the VM's operating system, CPU usage, CPU shares, memory
usage, memory shares, and health status of all the VMs running on that host.
3. The third section, Volumes (highlighted in GREEN), includes information about the storage volumes connected to the host.
4. The fourth section, Server LUNs (highlighted in BLUE), includes information about the LUNs connected to the host, including
their World Wide Number, alignment status, RAID-type, name, and health.
5. The fifth section, FC Ports (highlighted in BLACK), lists the host bus adapters (HBAs) through which the host is connecting to
the storage area network (SAN).
For the remainder of this exercise, we will be focusing on the first section.
What Is the PI Tab?
This section of the Insight Balance interface provides a summary of basic information for host esx21, as well as links to vSphere
cluster statistics, the vCenter server statistics, and the host's console.
The tabs below (highlighted in GREEN) offer a set of valuable links to the host's CPU, Memory, Network, Storage, and Host
Contention information, all in one convenient location.
But what is the "PI" tab?
The Insight Balance Performance Index
"PI" stands for "Performance Index".
Just as the Disk Utilization data in the previous chapter provided information about available performance capacity for a Disk
Group, the Performance Index provides information about how close to ideally a system is performing.
Said another way, the Performance Index value tells you whether a host is underutilized, overburdened, or optimized.
This information could be used, for example, to determine the optimal number of VMs an ESX or Hyper-V host could support.
Click the PI tab.
A Look At the Performance Index Page
The link places you on the Performance Index page.
This page has three sections...
1. The upper graph (highlighted in RED) presents an hourly average of the PI for a host; in this case, esx21.
2. The lower graph (highlighted in YELLOW) offers a detailed view of the data at any point on the trend line in the upper graph.
3. The section to the lower right (highlighted in GREEN) presents the numerical data that was used to calculate the PI reflected in
the lower point-in-time graph.
The Performance Index Explained
Insight Balance calculates a host's PI value by analyzing the metrics of the elements that affect a host's performance: CPU,
Memory, and Storage.
All three of these resources are critical components in the data processing path. After all, data has to pass from the CPU, to
Memory, to Storage and back before an application can do anything with it. Using NetApp's proprietary queuing network model,
Insight Balance crunches performance information for each of these resources to derive the host's Performance Index.
When you place another workload on a host, that host's performance degrades by some amount. Insight Balance calculates the
point at which the performance cost of adding an additional workload exceeds the benefit of the work it performs.
Insight Balance assigns a value of 100 to that point; the point of near-perfect efficiency. That value is the "Optimal Point"; the basis
for the Performance Index.
The Performance Index and Decision Making
Once you know the point of optimization, resource optimization decisions are easier to make.
Can I add VMs to this ESX host?
Are there too many VMs running on that host?
Using the Performance Index, now you know.
If a host's PI value is below 100, the server is underutilized and it can support additional workloads (or VMs).
Conversely, if the PI value is above 100, the system is overworked and should have some workloads removed.
Think of it this way...
If the PI is above 100, 'lighten the load.'
If the PI is below 100, 'load more VMs.'
If the PI is at 100, 'leave it alone.'
Using the PI Trend Graph
Now that we know what the PI means, let's use it to help John optimize esx21.
Take a look at the upper graph on the PI page. The time-averaged PI data on this graph can be used to identify any trends or
patterns in server esx21's performance levels.
If customers report that an application is experiencing episodes of poor performance, John can use the PI graph to identify when
those episodes are occurring and focus his investigation there.
Understanding the Point-In-Time PI Graph
When you select a point on the upper graph (YELLOW), the data in the lower graph (BLUE) updates to reflect the specific PI
information of the server at the point in time you have selected.
The two points in the lower graph, GREEN and RED, represent the Current and Optimal operating levels of the server at that time,
respectively.
If the GREEN point (Current) is to the left of the RED point (Optimal), the server is underutilized.
If the RED point (Optimal) is to the left of the GREEN point (Current), the server is overutilized.
At an optimal operating level, the two points should be coincident (on top of each other).
The Information Details Pane
The information pane to the right of the Point-In-Time graph (YELLOW) presents a detailed snapshot of the actual data on which
that graph is based.
The pane includes numerical values for CPU Utilization, Throughput, and Response Time as well as the PI value for that sample
period.
The data can be used to develop a clearer understanding of the cause of any performance issue, as well as provide a point of
departure for further investigation.
Selecting a Section to Examine
Let's select a smaller portion of the Performance Index trend graph to examine.
On the line in the upper graph, find a point where the Performance Index value is approximately 50.
Left-click on that point (YELLOW) and drag the mouse to the right, to a point after and well-below the spike.
The Result of the Selection
The upper graph now contains only the data from the selected section.
Click on a point on the line on the upper graph.
The lower graph now contains the PI value of the server at that specific point in time. Let's take a closer look at what the lower
graph is telling us. The scale on the X-axis represents the amount of Throughput the host is generating. The Y-axis represents the
Response Time (number of transactions per second) the server is pushing.
The shape of the curve is exactly what you would expect. As the number of transactions increases, the amount of time required to
process each transaction increases, as well. To put it in terms of virtualization, adding more VMs to this host pushes the number
of transactions per second (X-axis) to the right, increasing the Response Time (Y-axis).
As the Throughput increases, Response Time (I/O latency) increases. And, as Response Time increases, server efficiency
decreases.
Notice how the curve rises from left to right. As you move farther to the right, the curve becomes steeper. The red dot, (Optimal PI
value), is placed at the point on the curve where the rate of rise accelerates. At that point, the performance cost of adding workloads
becomes less linear and more exponential. That is the point where the cost of adding workloads exceeds the benefit.
Case Study #1: A PI of 50
Select a point on the upper graph where the PI is approximately 50.
In the sample provided, the point selected yields a Current Operating Point (GREEN) on the lower graph with a Performance Index
of 50.1. By contrast, the Optimal Point (RED) represents a Performance Index of 100.
Recall that additional processing up to the Optimal Point has a more linear effect on performance. After that point, this host's
performance will degrade exponentially. If the Optimal Point represents ideal utilization (consumption of 100% with optimal
efficiency), and the current Operating Point is 50.1%, then only 50% of the available capacity of this host is being consumed.
As growth is more linear up to the PI of 100, John could safely double the work this server is doing without compromising
performance. Putting the statement in terms of VMs, if the host were running five VMs, John could add five similar VMs without
losing efficiency. If John had 10 VDI desktops on this host, he could safely add 10 more without issue.
The Performance Index gives John a clear sense of how much more esx21 can handle.
Case Study #2: A PI of 100
In the upper graph, select a point on the line where the Performance Index is approximately 100.
In the sample provided, the Current Operating Point has a PI of 108.1; close to the Optimal Point on the graph, but slightly above it.
As the host's current PI is near 100, this host is operating at a level where adding workloads would cause a dramatic degradation
in performance.
If Nephosoft needs another VM to run MSSQL, where should John place it? In this case, based upon the information provided by
Insight Balance, it is clear that this ESX host would be the wrong host. The server's PI is already above 100. It has absolutely no
headroom left. Another ESX host would probably be a better choice.
John would not be able to make such a solid, efficient decision without Insight Balance and the Performance Index.
Case Study #3: A PI Over 100
In the upper graph, select a point on the line where the Performance Index is substantially greater than 300.
In the sample provided, the Current Operating Point has a Performance Index value of 310.26.
Lets assume, for the sake of this exercise, that John decided to provision the DBA's new MSSQL server VM onto host esx21,
anyway and that decision caused the PI to rise to this level. What effect would that decision have?
The PI for this sample period is 310. That is over three times the targeted Optimal Value of 100. Clearly, the server is operating well
beyond its envelope of maximum efficiency.
Troubleshooting Using the Detail Pane
Lets use the information in the Detail pane to the right to understand the consequences of John's decision.
Take a close look at the information in the Details Pane
CPU utilization on this host is only 25%.
Average IO Throughput is 159.60. As the Optimal Throughput is 119.25. Clearly, the server is exerting itself as it attempts to
push IO as fact as it can.
Optimal Overall Response Time for this server is 18.24 milliseconds. However, the server's actual Response Time for this
sample period is 56.59 milliseconds.
The server is operating at approximately 94% utilization.
What can we conclude?
1. The server's Performance Index value is 310. We know immediately that the server is not operating efficiently.
2. As the CPU workload only 25%, whatever is causing this issue is not CPU-related. Adding CPU and memory would not
resolve the performance issue.
3. With an Effective Utilization value of 94%, the server is being asked to support more workloads than it can handle. The MSSQL
VM needs to be moved as soon as possible..
4. The Average Throughput value exceeds the Optimal value. The VMs on this server are generating more IO than the host can
handle.
5. As we have ruled out the CPU and memory as a factor, the high Response Time value tells us that greatest contributor to poor
performance is storage latency.
Therefore, John should move some of the VMs on esx21 to another host and troubleshooting the poor storage performance.
Conclusion
While it seems quite simple, the Performance Index is an extremely powerful tool.
Now, John can distribute his VMs across his vSphere environment without having to guess which host to place them on.
He has visibility into state of the CPU, Memory and I/O efficiency on each of his ESX hosts in real-time.
He can look across the entire environment and know exactly where he has resources available and where resources are
over-provisioned.
He can use that information to ensure that his VMs are distributed optimally across his vSphere environment.
John now knows how much headroom he has left in his vSphere infrastructure. With that knowledge, he has the confidence to
manage his organization's purchase of additional capacity.
At last, John clearly sees what's really going on in his vSphere environment. The Performance Index allows him to get everything he
can out of his infrastructure and support Nephosoft as it completes Project Vanquish.
Identifying Misaligned LUNs and VMDK Partitions
As John was investigating the database performance issues for Dave, he noticed an alert message about "misaligned LUN
partitions".
Now that he has isolated the performance problem, he has time to revisit that message and find out what it's all about, and take a
break from all the Project Vanquish activity.
Opening the Servers Page
Find the Servers tab in the upper row of tabs on the Insight Balance page.
Click on the Servers tab.
Alert!
Find the alert message, marked with a yellow triangle and an exclamation point, toward the top of the page.
Click on the View the Report link at the end of the alert message.
What is Misalignment?
"Misalignment" is short-hand for "File System Misalignment"
Misalignment has a negative impact on storage performance, charging you a storage tax that you probably don't even know you're
paying.
The Cost of Misalignment
The Cost of Misalignment
Misalignment is the result of a configuration mismatch between an operating system (OS) and a storage controller.
When the file systems of an OS and a storage controller are misaligned, any storage I/O generated by the OS (a write to a block of
data on disk, for example) results in multiple pieces of I/O activity for the storage controller and its disk (writes to multiple disk
blocks). In this manner, file system misalignment raises the volume of storage I/O artificially, as a single I/O event is "multiplied",
creating numerous I/O events on the storage layer. The artificially high volume of storage activity occupies storage resources with
excessive, inefficient I/O, making them unavailable for other purposes. By keeping storage unnaturally busy, misalignment
compromises overall storage performance, and wastes resources unnecessarily.
The problem is more critical in a virtualized environment. As each VMDK contains its own file system, each VM can suffer from
misalignment. When a volume contains tens or hundreds of VMs, the problem multiplies, becoming much more concentrated and,
therefore, much more acute.
Fortunately, NetApp has developed tools to help you find and remediate misalignment in your environment.
And that's where Insight Balance comes in...
The Insight Balance Misalignment Report
Insight Balance understands server and storage infrastructure, including best practices configurations for storage from multiple
vendors. It also monitors the activity of the hosts (both physical and virtual). As a result, it can identify hotspots where host activity
and misalignment collide to create critical storage bottlenecks.
The Insight Balance Misalignment Report gathers the list of misaligned partitions onto a single page, and ranks them by the
amount of I/O activity (Average Throughput) each is experiencing. By organizing the information in this way, Insight Balance helps
prioritize where to take action first to get the greatest immediate benefit. Because misaligned traffic has a substantial impact on
storage performance, correcting misalignment problems also yield a substantial benefit. In this case, the report shows that the
Windows 2003_1_MsSQL VM is generating the most misaligned storage traffic, and is the most likely to cause performance
problems.
The Alignment Offset column shows the current alignment settings of the partition as well as the Vendor Recommended Alignment
value; information that tells John what the correct block alignment should be. Finally, the report also identifies two other partition
misalignment problems that John should consider correcting.
Now that John knows what partitions are misaligned and what settings to use to fix them, he can use tools like the NetApp VSC
and mbralign, among others, to make the corrections.
Unfortunately, time does not allow us to discuss NetApp's alignment correction tools in this lab. However, if you would like to learn
more about the arsenal of tools and techniques NetApp has developed to address misalignment, please visit the NetApp booth or
go to www.netapp.com for more information.
Reclaiming Memory Using the Virtual Machine Scorecard
Nephosoft is now completing the final preparations for the launch of Project Vanquish. The stream of emails containing service
requests is unrelenting at the moment. John does not have the time to explore his infrastructure to find issues. Even if he had the
time, the effort would not yield consistently valuable results. A better approach would be to have a monitor watch infrastructure
performance and look for emerging issues. Ideally, the software would generate reports that identify problems and/or targets of
optimization in his vSphere environment and notify him automatically.
The "Scorecard" reporting feature of Insight Balance can do just that.
Scorecards provide concise, summarized presentations of many attributes of your infrastructure in a form that can serve as a
foundation for taking immediate action. Not only is the information valuable, but Scorecard Reporting can also be configured to
generate and send reports to you regularly. Let's consider one example.
Suppose John needs to renew his vSphere licensing. VMware's new licensing model is based on the amount of vRAM that is
provisioned. Now, it's even more important that John know which virtual machines in his environment are not making full use of
their shares of vRAM. The Virtual Machine Scorecard is the report John needs.
Let's generate that report in Insight Balance and take a look at it.
Selecting the Reports Tab
Locate the row of tabs at the top of the Insight Balance interface.
Click on the Reports tab.
Selecting Scorecard Reports
Locate the list of subtabs in the Reports pane.
Click on the Scorecard Reports subtab.
Selecting the Virtual Machine Scorecard Report
Insight Balance has many reports preconfigured to provide information about data center infrastructure.
Click on the Reports dropdown box and select "Virtual Machine Scorecard" from the list.
Choose the Desired Frequency and Format.
1. Select a report format of "HTML".
2. Accept the "Current period" default value of "14 days" .
3. Check the "Enable advanced options" box (this option will expand additional choices below).
4. From the "Group by" drop down menu, select "Cluster".
5. From the "Sort by" drop down menu, select Maximum Virtual Server Memory Percent Used .
6. Click on Generate Report.
Navigating the Report Page
The Virtual Machine Scorecard report is full of useful information.
The report includes information about:
each virtualization hosts and the VMs it is running (RED)
VM alarm history (YELLOW)
CPU and performance metrics(GREEN)
memory consumption and allocation for both each physical host and its VMs (BLUE)
storage performance metrics (BLACK)
John can use this report to track high Guest CPU Utilization, CPU Run/Wait percentages, excessive Memory utilization, and even
Storage Response Time for each VM.
For the moment, let's focus on VM Memory utilization (BLUE).
Finding What Was Lost
As John looks at the report, he notices that Dave's VM, Windows2003_1_MsSQL, is on the spot, once again!
According to this Virtual Machine Scorecard, this VM has been allocated 2GB of vRAM (highlighted in GREEN). However,
Windows2003_1_MsSQL has only ever needed a maximum of 30% of that amount (highlighted in YELLOW)! John could easily cut
the VM's vRAM in half and still have plenty of memory headroom to spare. That memory could be freed up for other, busier VMs.
Also, in terms of the new vSphere licensing model, Nephosoft could probably realize some savings by decreasing the amount
of virtual memory Windows2003_1_MsSQL has been allocated. So, the Virtual Machine Scorecard report has not only increased
resource efficiency by freeing up vRAM for other hosts, it has started paying for itself by allowing Nephosoft to optimize its purchase
of vSphere licenses.
The Virtual Machine Scorecard is but one of many reports that provide insight into what is happening in your vSphere environment.
Please feel free to generate some of the other available reports such as the Storage Scorecard or Server Scorecard before you
complete this lab.
Conclusion
John has been using the Insight Balance Scorecard reports to stay ahead of the changes in his vSphere environment. The
information has allowed him to address performance issues before Nephosoft even knew they existed.
Information is power. Insight Balance gives you a powerful tool to help you manage and optimize your vSphere environment
Conclusion
Launching Project Vanquish
Thanks in part to John's effort, Project Vanquish is completed on time and is a rousing success. John receives a well-earned
promotion to IT Manager and a share of the Project Vanquish bonus.
Thank You
Find Out More About NetApp
Congratulations! You have now completed the NetApp Hands-on-Lab.
NetApp offers a complete storage solution for both virtualized and physical environments. NetApp's intelligent, unified storage and
suite of management tools allow you to get more out of your infrastructure. If you have questions about the lab content or would like
to learn more about NetApp products and features such as:
storage efficiency
zero-footprint cloning
space-efficient snapshots
data protection technology
...and a bevy of other things that we couldn't cover in this lab, please stop by the NetApp booth or visit www.netapp.com.
Don't forget! Now that you have completed this lab, we have a graduation present for you. Please click the "Ask Question"
button on the top right corner of the lab manual screen, and a NetApp lab proctor will bring you your gift.
We hope you found the NetApp lab content interesting and valuable. Please let us know what you thought of the experience.
Best Regards,
The NetApp Hands-on-Lab Team

También podría gustarte