We recently bought HP enclosures and blade servers and so we used HP Virtual Connect (VC) to manage blades network. We wanted to implement Oracle RAC for one of our critical database (SAP one). Then all our questions started when we asked ourselves the configuration we plan to setup. Putting our 2 RAC nodes in same enclosure ? Even as simple as this the certification using VC for RAC interconnect was not so obvious… Our HP contacts initially told us that it was not certified (and could take 3-6 months more to get Oracle approval on this configuration) forcing us to buy extra switches (which ones ?) and then retracted saying they were wrong and that it is fully certified…
Nevertheless the configuration we finally wanted to setup (for maximum redundancy) was an inter-enclosure (using 2 c7000 enclosures) with one RAC node in each enclosure. I’m not talking here of extended RAC (RAC long distance) as our enclosures are in the same data-center and are at less than a cable length distance…
When working with my system teammate on Network Interface Controller (NIC) configuration for bonding/teaming configuration I found this new (18.104.22.168) Grid Infrastructure feature calls Cluster High Availability IP (HAIP). New features that triggered more questions than answers to open points I may had…
Before entering into Oracle RAC configuration I will, as a reminder for me mainly, put some Linux commands to display the system configuration.
My testing has been done on Red Hat Enterprise Linux Server release 5.5 (Tikanga) and Oracle Enterprise Edition 22.214.171.124. The RAC cluster is made of 2 nodes.
NIC bonding (or teaming) is the capability to bond together 2 or more NIC to create a special NIC called a channel bonding interface. The clubbed NICs act as one NIC to provide redundancy and increased bandwidth.
In my configuration
[root@server1 ~]# ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 1000baseT/Full 10000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full 10000baseT/Full Advertised auto-negotiation: No Speed: 2500Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: yes [root@server1 ~]# ethtool eth1 Settings for eth1: Supported ports: [ TP ] Supported link modes: 1000baseT/Full 10000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full 10000baseT/Full Advertised auto-negotiation: No Speed: 2500Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d Link detected: yes
The displayed speed may looks strange but with VC you can allocate any percentage of a 10Gb NIC.
Then moving to /etc/sysconfig/network-scripts you need to configure the 2 NICs with something like (note the MASTER and SLAVE keywords):
[root@server1 network-scripts]# cat ifcfg-eth0 # ServerEngines Corp. Emulex OneConnect 10Gb NIC (be3) DEVICE=eth0 BOOTPROTO=static ONBOOT=yes HWADDR=00:17:a4:77:fc:08 MASTER=bond0 SLAVE=yes USERCTL=no [root@server1 network-scripts]# cat ifcfg-eth1 # ServerEngines Corp. Emulex OneConnect 10Gb NIC (be3) DEVICE=eth1 BOOTPROTO=static ONBOOT=yes HWADDR=00:17:a4:77:fc:78 MASTER=bond0 SLAVE=yes USERCTL=no
Ensure you have the required kernel modules loaded:
[root@server1 network-scripts]# modprobe --list | grep bonding /lib/modules/2.6.18-274.3.1.el5/kernel/drivers/net/bonding/bonding.ko [root@server1 network-scripts]# modprobe --list | grep mii /lib/modules/2.6.18-274.3.1.el5/kernel/drivers/net/mii.ko [root@server1 network-scripts]# modinfo bonding filename: /lib/modules/2.6.18-274.3.1.el5/kernel/drivers/net/bonding/bonding.ko author: Thomas Davis, email@example.com and many others description: Ethernet Channel Bonding Driver, v3.4.0-1 version: 3.4.0-1 license: GPL srcversion: 91E9D75EA26B18C60985A99 depends: ipv6 vermagic: 2.6.18-274.3.1.el5 SMP mod_unload gcc-4.1 parm: max_bonds:Max number of bonded devices (int) parm: num_grat_arp:Number of gratuitous ARP packets to send on failover event (int) parm: num_unsol_na:Number of unsolicited IPv6 Neighbor Advertisements packets to send on failover event (int) parm: miimon:Link check interval in milliseconds (int) parm: updelay:Delay before considering link up, in milliseconds (int) parm: downdelay:Delay before considering link down, in milliseconds (int) parm: use_carrier:Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default) (int) parm: mode:Mode of operation : 0 for balance-rr, 1 for active-backup, 2 for balance-xor, 3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, 6 for balance-alb (charp) parm: primary:Primary network device to use (charp) parm: primary_reselect:Reselect primary slave once it comes up; 0 for always (default), 1 for only if speed of primary is better, 2 for only on active slave failure (charp) parm: lacp_rate:LACPDU tx rate to request from 802.3ad partner (slow/fast) (charp) parm: xmit_hash_policy:XOR hashing method: 0 for layer 2 (default), 1 for layer 3+4 (charp) parm: arp_interval:arp interval in milliseconds (int) parm: arp_ip_target:arp targets in n.n.n.n form (array of charp) parm: arp_validate:validate src/dst of ARP probes: none (default), active, backup or all (charp) parm: fail_over_mac:For active-backup, do not set all slaves to the same MAC. none (default), active or follow (charp) parm: resend_igmp:Number of IGMP membership reports to send on link failure (int) parm: debug:Print debug messages; 0 for off (default), 1 for on (int) module_sig: 883f3504e58269a94abe3920d1168f1123b820a08079cf7c9add3058728a552c7ea955a11dfffe609f439e1665f33a82ae8fc2ed4e982cc22849c8ccd
The best practices is to define bonding parameters using BONDING_OPTS in NIC dedicated configuration files. In /etc/modprobe.conf you should normally see only (modprobe bonding to reload it):
[root@server1 network-scripts]# cat /etc/modprobe.conf | grep bond alias bond0 bonding
Then define the bonding NIC with something like:
[root@server1 network-scripts]# cat ifcfg-bond0 DEVICE=bond0 BOOTPROTO=none ONBOOT=yes USERCTL=no IPADDR=10.75.1.222 NETWORK=10.75.1.0 NETMASK=255.255.255.0 BROADCAST=10.75.1.255 GATEWAY=10.75.1.254 BONDING_OPTS="mode=4 miimon=100"
miimon parameter controls the MII link monitoring frequency in milliseconds (MII apparently stand for media independent interface).
mode, taken from RedHat official documentation can take the following values:
|balance-rr or 0||Sets a round-robin policy for fault tolerance and load balancing. Transmissions are received and sent out sequentially on each bonded slave interface beginning with the first one available.|
|active-backup or 1||Sets an active-backup policy for fault tolerance. Transmissions are received and sent out via the first available bonded slave interface. Another bonded slave interface is only used if the active bonded slave interface fails.|
|balance-xor or 2||Sets an XOR (exclusive-or) policy for fault tolerance and load balancing. Using this method, the interface matches up the incoming request’s MAC address with the MAC address for one of the slave NICs. Once this link is established, transmissions are sent out sequentially beginning with the first available interface.|
|broadcast or 3||Sets a broadcast policy for fault tolerance. All transmissions are sent on all slave interfaces.|
|802.3ad or 4||Sets an IEEE 802.3ad dynamic link aggregation policy. Creates aggregation groups that share the same speed and duplex settings. Transmits and receives on all slaves in the active aggregator. Requires a switch that is 802.3ad compliant.|
|balance-tlb or 5||Sets a Transmit Load Balancing (TLB) policy for fault tolerance and load balancing. The outgoing traffic is distributed according to the current load on each slave interface. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed slave.|
|balance-alb or 6||Sets an Active Load Balancing (ALB) policy for fault tolerance and load balancing. Includes transmit and receive load balancing for IPV4 traffic. Receive load balancing is achieved through ARP negotiation.|
Default mode is 0 and I would have personally chosen mode 6 which looks the most advanced having failover and load balancing in transmit and receive… But there are multiple references on My Oracle Support (MOS) mentioning bugs with this mode, that apparently most people have chosen…
Once all configured reload parameters with ifup bond0 (bonding NIC must be down before configuring it) or even with service network restart:
[root@server1 ~]# ifconfig bond0 bond0 Link encap:Ethernet HWaddr 00:17:A4:77:FC:08 inet addr:10.75.1.222 Bcast:10.75.1.255 Mask:255.255.255.0 inet6 addr: fe80::217:a4ff:fe77:fc08/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:885011418 errors:1445 dropped:0 overruns:0 frame:1441 TX packets:2700910551 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:209023586594 (194.6 GiB) TX bytes:2782566207400 (2.5 TiB)
[root@server1 ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.4.0-1 (October 7, 2008) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 802.3ad info LACP rate: slow Active Aggregator Info: Aggregator ID: 1 Number of ports: 1 Actor Key: 9 Partner Key: 1 Partner Mac Address: 00:00:00:00:00:00 Slave Interface: eth0 MII Status: up Speed: 100 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: 00:17:a4:77:fc:08 Aggregator ID: 1 Slave Interface: eth1 MII Status: up Speed: 100 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: 00:17:a4:77:fc:0a Aggregator ID: 2
So then why not using NIC bonding for RAC interconnect and avoiding to use HAIP ? Mainly because sometimes it may not be possible to implement it (switches limitation ?) or because the only authorized NIC bonding failover is not providing the expected result (packets in error only after timeout is case of suspicious errors as per my system teammate). When having a look to my 126.96.36.199 RAC cluster I have also seen that private network was not using what I initially expected…
To check if HAIP is running (the -init option, not documented, provides the core processes list of ClusterWare i.e. CRS, CSS, …):
[root@server1 ~]# export GRID_HOME=/oracleMWQ/GRID/112_64 [root@server1 ~]# $GRID_HOME/bin/crsctl stat res -init -w "TYPE = ora.haip.type" NAME=ora.cluster_interconnect.haip TYPE=ora.haip.type TARGET=ONLINE STATE=ONLINE on server1
To display its configuration. Two NICs have been specified for Cluster interconnect network:
[root@server1 ~]# $GRID_HOME/bin/oifcfg getif eth0 10.10.10.0 global cluster_interconnect eth1 10.10.10.0 global cluster_interconnect bond0 10.75.1.0 global public [root@server1 ~]# $GRID_HOME/bin/oifcfg iflist -p -n eth0 10.10.10.0 PRIVATE 255.255.255.0 eth0 169.254.0.0 UNKNOWN 255.255.128.0 eth1 10.10.10.0 PRIVATE 255.255.255.0 eth1 169.254.128.0 UNKNOWN 255.255.128.0 bond0 10.75.1.0 PRIVATE 255.255.255.0
Here is starting the new things of HAIP, please note the 169.254.x.x IP addresses, that can also be seen at OS level:
[root@server1 ~]# ifconfig . . eth0 Link encap:Ethernet HWaddr B4:99:BA:A7:07:2A inet addr:10.10.10.11 Bcast:10.10.10.255 Mask:255.255.255.0 inet6 addr: fe80::b699:baff:fea7:72a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:317590783 errors:0 dropped:0 overruns:0 frame:0 TX packets:102992935 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:87430872027 (81.4 GiB) TX bytes:51030279760 (47.5 GiB) eth0:1 Link encap:Ethernet HWaddr B4:99:BA:A7:07:2A inet addr:169.254.5.247 Bcast:169.254.127.255 Mask:255.255.128.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 . . [root@server1 ~]# ip addr list eth0 10: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether b4:99:ba:a7:07:2a brd ff:ff:ff:ff:ff:ff inet 10.10.10.11/24 brd 10.10.10.255 scope global eth0 inet 169.254.5.247/17 brd 169.254.127.255 scope global eth0:1 inet6 fe80::b699:baff:fea7:72a/64 scope link valid_lft forever preferred_lft forever
Which can also be seen at ASM and RAC database level:
SQL> SET lines 200 SQL> col host_name FOR a15 SQL> col INSTANCE_NAME FOR a15 SQL> SELECT a.host_name, a.instance_name, b.name, b.ip_address FROM gv$cluster_interconnects b, gv$instance a WHERE a.inst_id=b.inst_id ORDER BY 1,2,3; HOST_NAME INSTANCE_NAME NAME IP_ADDRESS --------------- --------------- --------------------------------------------- ------------------------------------------------ server1 +ASM1 eth0:1 169.254.5.247 server1 +ASM1 eth1:1 169.254.192.128 server2 +ASM2 eth0:1 169.254.90.252 server2 +ASM2 eth1:1 169.254.158.153
So, as specified in documentation by the way, the IPs that are used for Cluster interconnect are not the private ones specified when installing product (globally non routable) but the ones automatically allocated by HAIP in 169.254.x.x range.
- Bonding or Teaming for RedHat Server
- The Channel Bonding Module
- Configure Ethernet Bonding Interface on EL5 or RHEL5 [ID 877012.1]
- Linux Ethernet Bonding Driver [ID 434375.1]
- 11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip [ID 1210883.1]
- Private network