Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Understanding Oracle RAC Internals Part 2 for the Oracle RAC SIG
Markus Michalewicz (Markus.Michalewicz@oracle.com) Senior Principal Product Manager Oracle RAC and Oracle RAC One Node
2 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracles products remains at the sole discretion of Oracle.
Agenda
Client Connectivity Node Membership The Interconnect
Client Connectivity
Direct or indirect connect
Connect Time Load Balancing (CTLB) Connect Time Connection Failover (CTCF) Runtime Connection Load Balancing (RTLB) Runtime Connection Failover (RTCF)
BATCH Production Email
SCAN
Connection Pool
Client Connectivity
Connect Time Connection Failover
jdbc:oracle:thin:@MySCAN:1521/Email PMRAC = (DESCRIPTION = (FAILOVER=ON) (ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521))
MySCAN
Connection Pool
Client Connectivity
Runtime Time Connection Failover
PMRAC = (DESCRIPTION = (FAILOVER=ON) (ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email) ...))
MySCAN
Connection Pool
Client Connectivity
Runtime Time Connection Failover
PMRAC = (DESCRIPTION = (FAILOVER=ON) (ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email) (FAILOVER_MODE= (TYPE=select)(METHOD=basic)(RETRIES=180)(DELAY=5))))
MySCAN
Connection Pool
Client Connectivity
More information
If problems occur, see:
Note 975457.1 How to Troubleshoot Connectivity Issues with 11gR2 SCAN Name
For more advanced configurations, see: Note 1306927.1 Using the TNS_ADMIN variable and changing the default port
number of all Listeners in an 11.2 RAC for an 11.2, 11.1, and 10.2 Database
BATCH Production Email
? ? ?
MySCAN
Connection Pool
Client Connectivity
Two ways to protect the client
1. Transparent Application Failover (TAF)
Tries to make the client unaware of a failure Provides means of CTCF and RTCF Allows for pure selects (reads) to continue Write transactions need to be re-issued The Application needs to be TAF aware
2.
MySCAN
Connection Pool
10
TAF failover retries: 0 TAF failover delay: 0 Connection Load Balancing Goal: LONG Runtime Load Balancing Goal: NONE TAF policy specification: BASIC
11
Client Connectivity
Use a FAN aware connection pool
1
If a connection pool is used
The clients (users) get a physical connection to the connection pool The connection pool creates a physical connection to the database It is a direct client to the database
Connection Pool
12
MySCAN
Client Connectivity
Use a FAN aware connection pool
2
The connection pool
Invalidates connections to one instance Re-establishes new logical connections May create new physical connections Prevent new clients to be misrouted
The application needs to handle the transaction failure that might have occurred.
BATCH Production Email
Connection Pool
13
MySCAN
Client Connectivity
The Load Balancing (LB) cases
Connect Time Load Balancing (CTLB) Runtime Connection Load Balancing (RTLB) On the Client Side On the Server Side
BATCH Production Email
MySCAN
Connection Pool
14
Client Connectivity
Connect Time Load Balancing (CTLB) on the client side
PMRAC = (DESCRIPTION = (FAILOVER=ON)(LOAD_BALANCE=ON)
BATCH Production
MySCAN
Connection Pool
15
Client Connectivity
Connect Time Load Balancing (CTLB) on the server side
Traditionally, PMON dynamically registers the services to the specified listeners with:
Service names for each running instance of the database and instance names for the DB The listener is updated with the load information for every instance and node as follows: 1-Minute OS Node Load Average all 30 secs. Number of Connections to Each Instance
MySCAN
Connection Pool
?
16 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Use FAN for the Load Balancing cases
Connect Time Load Balancing (CTLB) Connect Time Connection Failover (CTCF)
Runtime Connection Load Balancing (RTLB) Runtime Connection Failover (RTCF) 30% connections
Im busy
RAC Database
60% connections
Instance3
17
Client Connectivity
Use FAN for the Load Balancing cases
Connect Time Load Balancing (CTLB) Runtime Connection Load Balancing (RTLB) Also via AQ (Advanced Queuing) based notifications Background is always the Load Balancing Advisory
30% connections
RAC Database
For more information, see: Oracle Real Application Clusters Administration and Deployment Guide 11g Release 2: 5 Introduction to Automatic Workload Management
18 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
MySCAN
60% connections
Instance3
Node Membership
19
ASM Instance
HA Framework
Node Membership My Oracle Support (MOS) OS Note 1053147.1 - 11gR2 Clusterware and Grid Home - What You Need to Know Note 1050908.1 - How to Troubleshoot Grid Infrastructure Startup Issues
20 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
CSSD (ora.cssd)
CSSDMONITOR was: oprocd now: ora.cssdmonitor
CSSD
Oracle Clusterware
CSSD CSSD
SAN Network
Voting Disk
SAN Network
21
Private Interconnect Network Heartbeat Voting Disk based communication Disk Heartbeat
Oracle Clusterware
Ping
CSSD
Ping
22 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
CSSD-log:
[date / time] [CSSD][1111902528] clssnmPollingThread: node mynodename (5) at 75% heartbeat fatal, removal in 6.770 seconds
CSSD
CSSD
23
CSSD-log:
[CSSD] [1115699552] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(1) LATS(63436584) Disk lastSeqNo(1)
CSSD
CSSD
Ping
24 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
E.g. Voting Disk serial number: [GRID]> crsctl query css votedisk
1. 2 1212f9d6e85c4ff7bf80cc9e3f533cc1 (/dev/sdd5) [DATA]
Node information
Ping
26
http://www.oracle.com/goto/rac
Using standard NFS to support
geographically dispersed
27 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
1.
2. 3.
28
29
30
Fencing Basics
Why are nodes evicted?
Evicting (fencing) nodes is a preventive measure (its a good thing)!
Shared data must not be written by independently operating nodes The easiest way to prevent this is to forcibly remove a node from the cluster
CSSD
CSSD
31
Fencing Basics
How are nodes evicted? STONITH
A kill request is sent to the respective node(s) Using all (remaining) communication channels STONITH foresees that a remote node kills the node to be evicted
CSSD
CSSD
32
Fencing Basics
EXAMPLE: Network heartbeat failure
It is determined which nodes can still talk to each other A kill request is sent to the node(s) to be evicted
Using all (remaining) communication channels Voting Disk(s) A node is requested to kill itself; executer: typically CSSD
CSSD
2
CSSD
33
Fencing Basics
What happens, if CSSD is stuck?
CSSD failed for some reason CSSD is not scheduled within a certain margin
See also: MOS note 1050693.1 Troubleshooting 11.2 Clusterware Node Evictions (Reboots)
2 CSSDmonitor CSSD
CSSD
CSSD
34 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Fencing Basics
How can nodes be evicted?
Oracle Clusterware 11.2.0.1 and later supports IPMI (optional) Intelligent Platform Management Interface (IPMI) drivers required IPMI allows remote-shutdown of nodes using additional hardware A Baseboard Management Controller (BMC) per cluster node is required
CSSD
CSSD
35
Fencing Basics
EXAMPLE: IPMI based eviction on heartbeat failure
The network heartbeat between the nodes has failed
It is determined which nodes can still talk to each other
CSSD
36
Fencing Basics
Which node gets evicted?
Voting Disks and heartbeat communication is used to determine the node
In a 2 node cluster, the node with the lowest node number should survive In a n-node cluster, the biggest sub-cluster should survive (votes based)
CSSD
CSSD
37
Fencing Basics
Cluster members can escalate a kill request
Cluster members (e.g Oracle RAC instances) can request
Oracle Clusterware
Fencing Basics
Cluster members can escalate a kill request
Oracle Clusterware will then attempt to kill the requested member
If the requested member kill is unsuccessful, a node eviction escalation can be issued,
which leads to the eviction of the node, on which the particular member currently resides
Oracle Clusterware
Fencing Basics
Cluster members can escalate a kill request
Oracle Clusterware will then attempt to kill the requested member
If the requested member kill is unsuccessful, a node eviction escalation can be issued,
which leads to the eviction of the node, on which the particular member currently resides
Oracle Clusterware
Fencing Basics
Cluster members can escalate a kill request
Oracle Clusterware will then attempt to kill the requested member
If the requested member kill is unsuccessful, a node eviction escalation can be issued,
which leads to the eviction of the node, on which the particular member currently resides
41
Re-boots affect applications that might run an a node, but are not protected Customer requirement: prevent a reboot, just stop the cluster implemented...
App X
RAC DB Inst. 1 RAC DB Inst. 2
App Y
CSSD
Oracle Clusterware
CSSD
42
Instead of fast re-booting the node, a graceful shutdown of the stack is attempted
It starts with a failure e.g. network heartbeat or interconnect failure
App X
RAC DB Inst. 1 RAC DB Inst. 2
App Y
CSSD
Oracle Clusterware
CSSD
43
Instead of fast re-booting the node, a graceful shutdown of the stack is attempted
Then IO issuing processes are killed; it is made sure that no IO process remains
For a RAC DB mainly the log writer and the database writer are of concern
App X
RAC DB Inst. 1
App Y
CSSD
Oracle Clusterware
CSSD
44
Instead of fast re-booting the node, a graceful shutdown of the stack is attempted
Once all IO issuing processes are killed, remaining processes are stopped
IF the check for a successful kill of the IO processes, fails reboot
App X
RAC DB Inst. 1
App Y
CSSD
Oracle Clusterware
CSSD
45
Instead of fast re-booting the node, a graceful shutdown of the stack is attempted
Once all remaining processes are stopped, the stack stops itself with a restart flag
App X
RAC DB Inst. 1 Oracle Clusterware CSSD
App Y
OHASD
46
Instead of fast re-booting the node, a graceful shutdown of the stack is attempted
OHASD will finally attempt to restart the stack after the graceful shutdown
App X
RAC DB Inst. 1 Oracle Clusterware CSSD
App Y
OHASD
47
IF the check for a successful kill of the IO processes fails reboot IF CSSD gets killed during the operation reboot IF cssdmonitor (oprocd replacement) is not scheduled reboot IF the stack cannot be shutdown in short_disk_timeout-seconds reboot
App X
RAC DB Inst. 1 RAC DB Inst. 2
App Y
CSSD
Oracle Clusterware
CSSD
48
The Interconnect
49
The Interconnect
Heartbeat and memory channel between instances
Network Public Lan
Node 1 Node 2 Node N-1 Node N
Client
Interconnect
with switch SAN switch
50
The Interconnect
Redundant Interconnect Usage
1 Redundant Interconnect Usage can be used as a bonding alternative
It works for private networks only; the nodeVIPs use a different approach It enables HA and Load Balancing for up to 4 NICs per server (on Linux / Unix)
It can be used by Oracle Databases 11.2.0.2 and Oracle Clusterware 11.2.0.2 It uses so called HAIPs that are assigned to the private networks on the server The HAIPs will be used by the database and ASM instances and processes
Node 1
Node 2
HAIP1
HAIP2
HAIP3
HAIP4
51
The Interconnect
Redundant Interconnect Usage
2 A multiple listening endpoint approach is used
The HAIPs are taken from the link-local (Linux / Unix) IP range (169.254.0.0)
To find the communication partners, multicasting on the interconnect is required With 11.2.0.3 Broadcast is a fallback alternative (BUG 10411721) Multicasting is still required on the public lan for MDNS for example. Details in My Oracle Support (MOS) Note with Doc ID 1212703.1: 11.2.0.2 Grid Infrastructure Install or Upgrade may fail due to Multicasting
Node 1 Node 2
HAIP1
HAIP2
HAIP3
HAIP4
52
The Interconnect
Redundant Interconnect Usage and the HAIPs
If a network interface fails, the assigned HAIP is failed over to a remaining one. Redundant Interconnect Usage allows having networks in different subnet
You can either have one subnet for all networks or a different one for each You can also use VLANs with the interconnect. For more information see:
Note 1210883.1 - 11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip Note 220970.1 - RAC: Frequently Asked Questions - How to use VLANs in Oracle RAC? AND
Are there any issues for the interconnect when sharing the same switch as the public network by using VLAN to separate the network?
Node 1
Node 2
HAIP1
HAIP1 HAIP2
HAIP3
HAIP4 HAIP3
53
54