Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Agenda
Introduction
What is? Why use? Where to get?
Cluster Health Monitor (CHM)
Installation
Of the Tool
Of the GUI
CHM in Action
Administration
FAQ & More Information
OTN Migration
Possible outcomes:
Oracle Support finds the answer in one of the logs
Oracle Support needs more node specific information to answer the question
For the latter: This why you need Cluster Health Monitor (CHM) for example
For the latter: CHM provides a historical view on collected data for analyzes
>crfgui -d "00:05:00" -m 192.168.2.8
Cluster Health Analyzer V1.10 Look for Loggerd via node 192.168.2.8
...reading 300 sec from the past
Connected to Loggerd on rac1
Note: Node rac1 is now up
Cluster 'MyCluster', 2 nodes. Ext time=2010-08-18 23:22:30
Installation
28127
|grep oracrf
0 21:26 ?
0 21:26 ?
0 21:26 ?
0 21:26 ?
00:00:00 /bin/sh/usr/lib/oracrf/bin/crfcheck
00:00:00 /usr/lib/oracrf/bin/osysmond
00:00:00 /usr/lib/oracrf/bin/oproxyd
00:00:00 /usr/lib/oracrf/bin/ologgerd -m rac1 -r -d /u01/orachmbdb/
osysmond
osysmond
ologgerd
ologgerd
oproxyd
oproxyd
If your client is a Windows client, download the Windows version of the tool
Unzip and install the GUI using:
Usage: crfinst.pl -a
-c
-d
-f
-g
-h
-i
-N
[<nodelist>]
[<nodelist>]
[-b <bdb loc>]
<ui install dir>
<nodelist> -b <bdb loc> [-m <master>]
ClusterName.
Administration
Administration part 1
The main administration tool for CHM: oclumon
[oracle@rac1 ~]$ oclumon -h
For help from command line
: oclumon <verb> -h
For help in interactive mode : <verb> -h
Currently supported verbs are :
showtrail, showobjects, dumpnodeview, manage, version, debug, quit and help
Administration part 2
How long can I go back in time?
Reviewing historical data is limited by the size of the Berkeley DB
By default the database retains the node views from all the nodes
for the last 24 hours in a circular manner.
This limit can be increased to 72 hours by using the following oclumon command:
'oclumon manage -bdb resize 259200'.
resize is set in seconds
In the current release (as of 11.2.0.1) you cannot query the current retention time
You can, however, set it to the time that you think is appropriate / reasonable
Administration part 3
Get me information on the command line
> oclumon dumpnodeview -v -n rac1 -last "00:00:03 3 seconds
---------------------------------------Node: rac1 Clock: '08-19-10 03.53.53 UTC' SerialNo:63193
---------------------------------------SYSTEM:
#cpus: 2 cpu: 4.5 cpuq: 1 physmemfree: 13896 mcache: 959952 swapfree: 1900208 ior: 0 iow: 297 ios: 17 netr: 57.9 netw: 43.56 procs: 187 rtprocs: 11 #fds: 2658 #sysfdlimit:
6815744 #disks: 7 #nics: 4 nicErrors: 0
TOP CONSUMERS:
topcpu: 'osysmond(13446) 0.66' topprivmem: 'ologgerd(13532) 102260' topshm: 'ologgerd(13532) 46680' topfd: 'crsd.bin(10754) 102' topthread: 'crsd.bin(10754) 58'
PROCESSES:
name: 'osysmond' pid: 13446 #procfdlimit: 1024 cpuusage: 0.66 memusage: 78912 shm: 41196 #fd: 22 #threads: 9 priority: 139
name: 'orarootagent.bi' pid: 10890 #procfdlimit: 65536 cpuusage: 0.66 memusage: 6420 shm: 10032 #fd: 7 #threads: 34 priority: 19
name: 'ologgerd' pid: 13532 #procfdlimit: 1024 cpuusage: 0.0 memusage: 102260 shm: 46680 #fd: 19 #threads: 9 priority: 139
DEVICES:
sdf ior: 0.0 iow: 0.0 ios: 0 qlen: 0
sdf1 ior: 0.0 iow: 0.0 ios: 0 qlen:
sde ior: 0.0 iow: 0.0 ios: 0 qlen: 0
sde1 ior: 0.0 iow: 0.0 ios: 0 qlen:
sdd ior: 0.0 iow: 0.0 ios: 0 qlen: 0
NICS:
lo netrr: 21.3 netwr: 21.3 neteff: 42.7 nicerrors: 0 pktsin: 7 pktsout: 7 errsin: 0 errsout: 0 indiscarded: 0 outdiscarded: 0 inunicast: 7 innonunicast: 0 type:
PUBLIC
eth0 netrr: 25.65 netwr: 15.94 neteff: 41.60 nicerrors: 0 pktsin: 13 pktsout: 13 errsin: 0 errsout: 0 indiscarded: 0 outdiscarded: 0 inunicast: 13 innonunicast:
0 type: PRIVATE latency: <1
eth1 netrr: 10.27 netwr: 6.58 neteff: 16.85 nicerrors: 0 pktsin: 30 pktsout: 22 errsin: 0 errsout: 0 indiscarded: 0 outdiscarded: 0 inunicast: 30 innonunicast: 0
type: PRIVATE latency: <1
eth2 netrr: 0.12 netwr: 0.0 neteff: 0.12 nicerrors: 0 pktsin: 0 pktsout: 0 errsin: 0 errsout: 0 indiscarded: 0 outdiscarded: 0 inunicast: 0 innonunicast: 0
type: PUBLIC latency: <1
PROTOCOL ERRORS:
IPHdrErr: 0 IPAddrErr: 0 IPUnkProto: 0 IPReasFail: 0 IPFragFail: 0 TCPFailedConn: 50 TCPEstRst: 13 TCPRetraSeg: 69 UDPUnkPort: 41 UDPRcvErr: 0
End of data
Administration part 4
Time is crucial the clock
> oclumon dumpnodeview -n rac1 -s "2010-08-19 02.00.01" -e "2010-08-19 02.00.03"
---------------------------------------Node: rac1 Clock: '08-19-10 02.00.01 UTC' SerialNo:58695
---------------------------------------SYSTEM:
#cpus: 2 cpu: 4.20 cpuq: 4 physmemfree: 17728 mcache: 953248 swapfree: 1900208 ior: 0 iow: 103
ios: 7 netr: 46.36 netw: 39.29 procs: 187 rtprocs: 11 #fds: 2658 #sysfdlimit: 6815744
#disks: 7 #nics: 4 nicErrors: 0
TOP CONSUMERS:
topcpu: 'osysmond(13446) 1.31' topprivmem: 'ologgerd(13532) 102260' topshm: 'ologgerd(13532)
46680' topfd: 'crsd.bin(10754) 102' topthread: 'crsd.bin(10754) 58'
End of data
Alternative:
oclumon dumpnodeview -allnodes -s "2010-08-19 02.00.01" -e "2010-08-19 02.00.03
The "Clock:" in the oclumon output is printed in the
timezone which the master daemon is running with.
Administration part 5
Sampling data and refresh rate
The sampling rate of the tool depends on the currently active processes
and the devices on the system. Up to a total of 1000 active processes and
disks with ideal system, the sampling interval is approximately 1 second.
The refresh rate of the GUI is 1 second per default, but a higher refresh
rate can be specified using the r parameter followed by the time in secs.
Frequently Asked
Questions (FAQ)
FAQ #1
Is CHM a CVU (Cluster Verification Utility) replacement?
NO
CVU is a separate tool with
a completely different purpose.
CVU does not gather nor
provide the same data that
CHM provides.
For more information on CVU
got to:
http://www.oracle.com/goto/rac
On this page, follow this link:
Cluster Verification Utility - Download
FAQ #2
Can CHM be used as an OS Watcher replacement?
YES
OS Watcher (OSW) is a collection of UNIX shell scripts(*) intended to
collect and archive operating system and network metrics
to aid support in diagnosing performance issues.
FAQ #3
Is CHM the standard tool to be used?
YES
Oracle RAC Development recommends using CHM whenever possible:
When using Oracle Clusterware, Oracle Grid Infrastructure, or Oracle RAC
The current release is available on Linux and Windows both 32 and 64bit.
More Information
The data gathering part of the tool will be part of the standard installation
CHM will therefore be installed into the Oracle Grid Infrastructure home
The Berkeley DB will be installed in the Oracle Grid Infrastructure home (default)
The GUI remains as a separately downloadable item
Changes in some parts of the architecture are possible, but the principles remain
The tool will provide more configuration options on the command line for example
The tool will be enabled per default with a default retention time (adjustable)
More Information
http://www.oracle.com/goto/rac
Download link: Cluster Health Monitor - Download
http://www.oracle.com/goto/clusterware
Technical White Paper
Oracle Clusterware 11g Release 2 Technical Overview
For OS Watcher
My Oracle Support doc ID 301137.1 - OS Watcher User Guide
OTN Migration
A migration with some impact
Note that Oracle Technology Network (also known as OTN) was migrated
URLs containing http://otn.oracle.com/ are moved
Individual items (e.g. papers) are migrated to a new Content Management System
Direct links using the old URL to those items may therefore not work anymore
Some links to main pages should be redirected to some new pages e.g.:
http://otn.oracle.com/rac (might go away over time)
http://www.oracle.com/technetwork/database/clustering/overview/index.html