IT World https://blog.yannickjaquier.com RDBMS, Unix and many more... Tue, 07 Aug 2018 14:30:18 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.8 Fast Connection Failover (FCF) – JDBC HA – part 4 https://blog.yannickjaquier.com/oracle/fast-connection-failover-fcf-jdbc-ha-part-4.html https://blog.yannickjaquier.com/oracle/fast-connection-failover-fcf-jdbc-ha-part-4.html#comments Tue, 07 Aug 2018 14:29:41 +0000 https://blog.yannickjaquier.com/?p=4293 Preamble Fast Connection Failover (FCF) is almost the same as previous testing except that here the FAN feature is provided by the connection pool directly and so you do not require the simplefan.jar class anymore. Instead we are going to use the Universal Connection Pool (UCP) class implemented in ucp.jar file. Fast Connection Failover (FCF) […]

The post Fast Connection Failover (FCF) – JDBC HA – part 4 appeared first on IT World.

]]>

Table of contents

Preamble

Fast Connection Failover (FCF) is almost the same as previous testing except that here the FAN feature is provided by the connection pool directly and so you do not require the simplefan.jar class anymore. Instead we are going to use the Universal Connection Pool (UCP) class implemented in ucp.jar file.

Fast Connection Failover (FCF) testing

Universal Connection Pool (UCP) for JDBC

If you dig a bit in Oracle documentation you can find below statement:

Starting from Oracle Database 11g Release 2 (11.2), implicit connection pool has been deprecated, and replaced with Universal Connection Pool (UCP) for JDBC. Oracle recommends that you take advantage of the new architecture, which is more powerful and offers better performance.

So obviously I will use UCP in my Fast Connection Failover (FCF) testing !

The complete reference of Java class is available at Oracle® Universal Connection Pool for JDBC Java API Reference.

You have also an Oracle UCP FAQ but at the time of writing this blog post the page is full of bugs with missing links…

As we have seen with FAN you need a service and ONS up and running on your RAC cluster !

Java testing code

In below example code FCF is activated using this command:

pds.setFastConnectionFailoverEnabled(true);

I am also logging in a separate log file all the UCP events like initialization, connections closed, connections borrowed and so on. This is done in below part of the script:

// Logging UCP events in a log file
Handler fh = new FileHandler("fcf01.log");
// Finally the UCPFormatter is not providing a so nice display
//fh.setFormatter(new UCPFormatter());
fh.setFormatter(new SimpleFormatter());
UniversalConnectionPoolManager mgr = UniversalConnectionPoolManagerImpl.getUniversalConnectionPoolManager();
mgr.setLogLevel(Level.FINE);
Logger log = Logger.getLogger("oracle.ucp");
log.setLevel(Level.FINE);
log.addHandler(fh);
// To avoid display on screen
log.setUseParentHandlers(false);

The Oracle Notification Service (ONS) part is a bit cumbersome in Oracle documentation. The easiest way to implement it is to rely on what they call automatic ONS and register to the service managed by Grid infrastructure. Investigating further I have seen that in my case (newest releases of product) it’s not even more mandatory to register with setONSConfiguration method:

For standalone Java applications, you must configure ONS using the setONSConfiguration method. However, if your application meets the following requirements, then you no longer need to call the setONSConfiguration method for enabling FCF:

  • Your application is using Oracle Database 12c Release 1 (12.1) UCP and Oracle RAC Database 12c Release 1 (12.1)
  • Your application does not require ONS wallet or keystore

Java testing code

Finally the complete code with as much comment as I can to explain each part. You need to add to your project (classpath) ojdbc8.jar, ons.jar and ucp.jar.

package fcf01;

import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Locale;
import java.util.logging.FileHandler;
import java.util.logging.Handler;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.logging.SimpleFormatter;
import oracle.ucp.admin.UniversalConnectionPoolManager;
import oracle.ucp.admin.UniversalConnectionPoolManagerImpl;
import oracle.ucp.jdbc.JDBCConnectionPoolStatistics;
import oracle.ucp.jdbc.PoolDataSource;
import oracle.ucp.jdbc.PoolDataSourceFactory;
import oracle.ucp.jdbc.ValidConnection;

public class fcf01 {

  // To restrict display versus System.out.println(stats);
  // String fcfInfo = (oracleJDBCConnectionPool)stats.getFCFProcessingInfo(); NOT WORKING at all !!
  private static void display_statistics(JDBCConnectionPoolStatistics stats) {
    System.out.println("AbandonedConnectionsCount: " + stats.getAbandonedConnectionsCount());
    System.out.println("AvailableConnectionsCount: " + stats.getAvailableConnectionsCount());
    System.out.println("AverageBorrowedConnectionsCount: " + stats.getAverageBorrowedConnectionsCount());
    System.out.println("AverageConnectionWaitTime: " + stats.getAverageConnectionWaitTime());
    System.out.println("BorrowedConnectionsCount: " + stats.getBorrowedConnectionsCount());
    System.out.println("ConnectionsClosedCount: " + stats.getConnectionsClosedCount());
    System.out.println("ConnectionsCreatedCount: " + stats.getConnectionsCreatedCount());
    System.out.println("RemainingPoolCapacityCount: " + stats.getRemainingPoolCapacityCount());
    System.out.println("TotalConnectionsCount: " + stats.getTotalConnectionsCount());
    System.out.println("getPeakConnectionsCount: " + stats.getPeakConnectionsCount());
    System.out.println("getPendingRequestsCount: " + stats.getPendingRequestsCount());
    System.out.println("getRemainingPoolCapacityCount: " + stats.getRemainingPoolCapacityCount());
  }

  public static void main(String[] args) throws Exception {
    PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource();
    Connection connection1 = null;
    Statement statement1 = null;
    ResultSet resultset1 = null;

    // To have date format in English, my Windows desktop being in French 🙂
    Locale.setDefault(new Locale("en"));
    pds.setConnectionFactoryClassName("oracle.jdbc.pool.OracleDataSource");
    pds.setUser("yjaquier");
    pds.setPassword("secure_password");
    // The RAC connection using SCAN name and HA service
    pds.setURL("jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac-cluster-scan.domain.com)(PORT=1531))(CONNECT_DATA=(SERVICE_NAME=pdb1srv)))");
    pds.setConnectionPoolName("FCFPool");

    // Automatic ONS
    // pds.setONSConfiguration("nodes=server2.domain.com:6200,server3.domain.com:6200");
    // Pool sizing
    pds.setMinPoolSize(10);
    pds.setMaxPoolSize(20);
    pds.setInitialPoolSize(10);
    // Enable Fast Connection Failover
    pds.setFastConnectionFailoverEnabled(true);

    // Simple check to demonstrate one cool method
    System.out.println("FCF activated ?: " + pds.getFastConnectionFailoverEnabled());

    // Logging UCP events in a log file
    Handler fh = new FileHandler("fcf01.log");
    // Finally the UCPFormatter is not providing a so nice display
    //fh.setFormatter(new UCPFormatter());
    fh.setFormatter(new SimpleFormatter());
    UniversalConnectionPoolManager mgr = UniversalConnectionPoolManagerImpl.getUniversalConnectionPoolManager();
    mgr.setLogLevel(Level.FINE);
    Logger log = Logger.getLogger("oracle.ucp");
    log.setLevel(Level.FINE);
    log.addHandler(fh);
    // To avoid display on screen
    log.setUseParentHandlers(false);

    while (true)
    {
      try
      {
        System.out.println("Trying to obtain a new connection from pool ...");
        connection1 = pds.getConnection();
        System.out.println("Number of borrowed connections from the pool: " + pds.getBorrowedConnectionsCount());
        statement1 = connection1.createStatement();
        // The infinite loop awaiting RAC node down events
        while (true)
        {
          System.out.println("\nPool status at "+LocalDateTime.now().format(DateTimeFormatter.ofPattern("dd-MM-yyyy HH:mm:ss"))+": ");
          JDBCConnectionPoolStatistics stats = pds.getStatistics();
          display_statistics(stats);
          resultset1 = statement1.executeQuery("select sys_context('USERENV','SERVER_HOST') from dual");
          while (resultset1.next()) {
            System.out.println("\nWorking on " + resultset1.getString(1));
          }
          resultset1 = statement1.executeQuery("select gvi.host_name,count(*) from gv$session gvs,  gv$instance gvi where gvs.inst_id=gvi.inst_id "
              + "and gvs.username='YJAQUIER' and program='JDBC Thin Client' group by gvi.host_name order by gvi.host_name");
          while (resultset1.next()) {
            System.out.println(resultset1.getString(1) + ": " + resultset1.getString(2));
          }
          Thread.sleep(2000);
          resultset1.close();
        }
      }
      catch (SQLException sqlexc)
      {
        System.out.println("SQLException detected ...");
        // Recommended method to check if a borrowed connection is still usable after an SQL exception
        if (connection1 == null || !((ValidConnection) connection1).isValid())
        {
          System.out.println("Connection retry necessary ...");
          try
          {
            connection1.close();
          }
          catch (Exception closeExc)
          {
            System.out.println("Exception detected when closing connection:");
            closeExc.printStackTrace();
          }
        }
      }
    }
  }
}

While the Java application is running I have directly killed (kill command under Linux) the instance on where the connection has been borrowed.

When the application start I have an almost equally distributed pool:

fcf01
fcf01

After I have killed the instance where the connection was borrowed we need to borrow a new one:

fcf02
fcf02

Once a new connection from pool has been borrowed we can continue working, with no application failure by the way:

fcf03
fcf03

In meanwhile the pool is growing to reach requested minimum on surviving instance:

fcf04
fcf04

Once the killed instance as restarted (automatically managed by Grid infrastructure) the pool is rebalanced across all the instances:

fcf05
fcf05

The Java application log file I have generated is confirming what we have graphically seen:

Feb 28, 2018 12:28:46 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.admin.UniversalConnectionPoolManagerMBeanImpl:getUniversalConnectionPoolManagerMBean::Universal Connection Pool Manager MBean created
Feb 28, 2018 12:28:47 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.jdbc.PoolDataSourceImpl:createPoolWithDefaultProperties:oracle.ucp.jdbc.PoolDataSourceImpl@591f989e:Connection pool instance is created with default properties
Feb 28, 2018 12:28:47 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.jdbc.PoolDataSourceImpl:createPool:oracle.ucp.jdbc.PoolDataSourceImpl@591f989e:Connection pool instance is created
Feb 28, 2018 12:28:48 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Topology:enableFANHeuristically:oracle.ucp.common.UniversalConnectionPoolBase$4@5c671d7f:Heuristically determine whether to enable FAN
Feb 28, 2018 12:28:48 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Topology:enableFANHeuristically:oracle.ucp.common.UniversalConnectionPoolBase$4@5c671d7f:RAC/GDS 12.x, FAN is heuristically enabled
Feb 28, 2018 12:28:51 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.UniversalConnectionPoolBase:start:oracle.ucp.jdbc.oracle.OracleConnectionConnectionPool@5f2050f6:pool started
Feb 28, 2018 12:28:51 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.jdbc.PoolDataSourceImpl:startPool:oracle.ucp.jdbc.PoolDataSourceImpl@591f989e:connection pool is started
Feb 28, 2018 12:29:25 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:growing...
Feb 28, 2018 12:29:29 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.FailoverDriver$1:handleNotifications:oracle.ucp.common.FailoverDriver$1@5cada3dd:event processed, snapshot:[orcl_1,db=orcl,service=pdb1srv,host=server2:(activeCount:4,borrowedCount:1,active:true,aff=true,violating=false,id=1), orcl_2,db=orcl,service=pdb1srv,host=server3:(activeCount:0,borrowedCount:0,active:false,aff=true,violating=false,id=0)]
Feb 28, 2018 12:29:29 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.FailoverDriver$1:handleNotifications:oracle.ucp.common.FailoverDriver$1@5cada3dd:event processed, snapshot:[orcl_1,db=orcl,service=pdb1srv,host=server2:(activeCount:4,borrowedCount:1,active:true,aff=true,violating=false,id=1), orcl_2,db=orcl,service=pdb1srv,host=server3:(activeCount:0,borrowedCount:0,active:false,aff=true,violating=false,id=0)]
Feb 28, 2018 12:30:07 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:grew up 1 connection to reach the minimum
Feb 28, 2018 12:30:07 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:growing...
Feb 28, 2018 12:30:07 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:grew up 1 connection to reach the minimum
Feb 28, 2018 12:30:07 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:growing...
Feb 28, 2018 12:30:07 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:grew up 1 connection to reach the minimum
Feb 28, 2018 12:30:07 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:growing...
Feb 28, 2018 12:30:07 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:grew up 1 connection to reach the minimum
Feb 28, 2018 12:30:07 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:growing...
Feb 28, 2018 12:30:09 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:grew up 1 connection to reach the minimum
Feb 28, 2018 12:30:09 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:growing...
Feb 28, 2018 12:30:09 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:grew up 1 connection to reach the minimum
Feb 28, 2018 12:31:12 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:growing...
Feb 28, 2018 12:31:15 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.Core:adjustMinLimit:oracle.ucp.common.Core@34f7cfd9:grew up 1 connection to reach the minimum
Feb 28, 2018 12:31:23 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.FailoverDriver$1:handleNotifications:oracle.ucp.common.FailoverDriver$1@5cada3dd:event processed, snapshot:[orcl_1,db=orcl,service=pdb1srv,host=server2:(activeCount:5,borrowedCount:1,active:true,aff=true,violating=false,id=1), orcl_2,db=orcl,service=pdb1srv,host=server3:(activeCount:6,borrowedCount:0,active:true,aff=true,violating=false,id=0)]
Feb 28, 2018 12:31:23 PM oracle.ucp.logging.ClioSupport _log
FINE: oracle.ucp.common.FailoverDriver$1:handleNotifications:oracle.ucp.common.FailoverDriver$1@5cada3dd:event processed, snapshot:[orcl_1,db=orcl,service=pdb1srv,host=server2:(activeCount:5,borrowedCount:1,active:true,aff=true,violating=false,id=1), orcl_2,db=orcl,service=pdb1srv,host=server3:(activeCount:6,borrowedCount:0,active:true,aff=true,violating=false,id=0)]

References

The post Fast Connection Failover (FCF) – JDBC HA – part 4 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/fast-connection-failover-fcf-jdbc-ha-part-4.html/feed 1
Fast Application Notification (FAN) – JDBC HA – part 3 https://blog.yannickjaquier.com/oracle/fast-application-notification-fan-jdbc-ha-part-3.html https://blog.yannickjaquier.com/oracle/fast-application-notification-fan-jdbc-ha-part-3.html#comments Fri, 27 Jul 2018 13:37:33 +0000 https://blog.yannickjaquier.com/?p=4288 Preamble Fast Application Notification (FAN) require you build a Real Application Cluster (RAC) environment. You cannot use the classical Operating System cluster active/passive. So a bit more work… Fast Application Notification (FAN) testing Even if I have found this in Oracle official documentation: Although the Oracle JDBC drivers now support the FAN events, Oracle UCP […]

The post Fast Application Notification (FAN) – JDBC HA – part 3 appeared first on IT World.

]]>

Table of contents

Preamble

Fast Application Notification (FAN) require you build a Real Application Cluster (RAC) environment. You cannot use the classical Operating System cluster active/passive. So a bit more work…

Fast Application Notification (FAN) testing

Even if I have found this in Oracle official documentation:

Although the Oracle JDBC drivers now support the FAN events, Oracle UCP provides more comprehensive support for all FAN events.

I have always wondered the usage of this simplefan.jar file available for download on Java download page. So testing JDBC with this JAR file should be the most basic FAN testing.

The complete reference of the API is available at Oracle® Database RAC FAN Events Java API Reference 12c Release 2 (12.2).

Service setup

I have struggled a lot trying to use the internal service that is created with every pluggable database (even if you do not have the multitenant paid option and so a single pluggable database). This is simply not working and you must create a new one with the Grid infrastructure part of your RAC installation.

This has triggered a new problem when I have stupidly created this service with the same name as the internal one (pdb1), unbelievable that there is no control on this please see issues encountered section to overcome the situation if you face it.

I have chosen to create my RAC cluster with now to be used server pool policy so creating a FAN aware service with (-pdb to specify the pluggable database to be used or you end up in container one and obviously name MUSt NOT be pdb1 so using pdb1srv):

[oracle@server3 ~]$ srvctl add service -db orcl -pdb pdb1 -service pdb1srv -notification TRUE -serverpool server_pool01 -failovertype SELECT
[oracle@server2 ~]$ crsctl stat res ora.orcl.pdb1srv.svc -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.orcl.pdb1srv.svc
      1        OFFLINE OFFLINE                               STABLE
      2        OFFLINE OFFLINE                               STABLE
--------------------------------------------------------------------------------
[oracle@server2 ~]$ srvctl start service -db orcl -service pdb1srv
[oracle@server2 ~]$ crsctl stat res ora.orcl.pdb1srv.svc -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.orcl.pdb1srv.svc
      1        ONLINE  ONLINE       server2                  STABLE
      2        ONLINE  ONLINE       server3                  STABLE
--------------------------------------------------------------------------------

You can also check to any SCAN listener that it is taken into account:

[oracle@server2 ~]$ lsnrctl status listener_scan1

LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 22-FEB-2018 11:29:44

Copyright (c) 1991, 2016, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date                20-FEB-2018 12:57:51
Uptime                    1 days 22 hr. 31 min. 54 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/12.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/server2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.121)(PORT=1531)))
Services Summary...
Service "-MGMTDBXDB" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "65188d13fadb4a7be0536638a8c0aa34" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "651a2f85336128cde0536638a8c0ff54" has 2 instance(s).
  Instance "orcl_1", status READY, has 1 handler(s) for this service...
  Instance "orcl_2", status READY, has 1 handler(s) for this service...
Service "_mgmtdb" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "gimr_dscrep_10" has 1 instance(s).
  Instance "-MGMTDB", status READY, has 1 handler(s) for this service...
Service "orcl" has 2 instance(s).
  Instance "orcl_1", status READY, has 1 handler(s) for this service...
  Instance "orcl_2", status READY, has 1 handler(s) for this service...
Service "orclXDB" has 2 instance(s).
  Instance "orcl_1", status READY, has 1 handler(s) for this service...
  Instance "orcl_2", status READY, has 1 handler(s) for this service...
Service "pdb1" has 2 instance(s).
  Instance "orcl_1", status READY, has 1 handler(s) for this service...
  Instance "orcl_2", status READY, has 1 handler(s) for this service...
Service "pdb1srv" has 2 instance(s).
  Instance "orcl_1", status READY, has 1 handler(s) for this service...
  Instance "orcl_2", status READY, has 1 handler(s) for this service...
The command completed successfully

If you like to drop and recreate a new one with different option use:

[oracle@server2 ~]$ srvctl stop service -db orcl -service pdb1srv
[oracle@server3 ~]$ srvctl remove service -db orcl -service pdb1srv

Oracle Notification Service (ONS) setup

One of the prerequisite of FAN is Oracle Notification Service (ONS), the documentation says you must have it on node where JDBC is running but you can have it on each node of your RAC cluster using a feature called auto-ONS. Luckily with my fresh 12cR2 installation ONS is already configured and running (I just had to correct nodes value on second node where first server was missing. Don’t mess up with all config files located on $GRID_HOME/opmn/conf directory the one to use is the one displayed in ONS log file ($GRID_HOME/opmn/logs/ons.log.server2 for me). Means the file to modify for me IS NOT ons.config but ons.config.server2:

[2018-02-23T17:49:42+01:00] [ons] [TRACE:32] [] [ons-local] Reloading by request
[2018-02-23T17:49:45+01:00] [ons] [TRACE:32] [] [ons-local] Config file: /u01/app/12.2.0/grid/opmn/conf/ons.config.server2

After correction I end up with a file like:

[oracle@server2 ~]$ cat $ORACLE_HOME/opmn/conf/ons.config.server2
usesharedinstall=true
allowgroup=true
localport=6100          # line added by Agent
remoteport=6200         # line added by Agent
nodes=server2.domain.com:6200,server3.domain.com:6200
walletfile=/u01/app/grid/crsdata/server2/onswallet/             # line added by Agent
allowunsecuresubscriber=true            # line added by Agent

Check all is running fine with:

[oracle@server2 ~]$ onsctl ping
ons is running ...
[oracle@server2 ~]$ onsctl debug
[2018-02-28T12:31:12+01:00] [ons] [ERROR:1] [] [ons-local] /u01/app/12.2.0/grid/opmn/conf/ons.config.server2: 5: (warning) unkown key: ocrnodename
[2018-02-28T12:31:12+01:00] [ons] [ERROR:1] [] [ons-local] /u01/app/12.2.0/grid/opmn/conf/ons.config.server2: 5: (warning) unkown key: ocrnodename
HTTP/1.1 200 OK
Connection: close
Content-Type: text/html
Response:

== server2.domain.com:6200 9006 18/02/28 12:31:11 ==
Build: ONS_12.2.0.1.0_LINUX.X64_161121.1010 2016/11/21 19:53:59 UTC
Home: /u01/app/12.2.0/grid

======== ONS ========

           IP ADDRESS                   PORT    TIME   SEQUENCE  FLAGS
--------------------------------------- ----- -------- -------- --------
                         192.168.56.102  6200 5a9530ac 0000000d 00000008

Listener:

  TYPE                BIND ADDRESS               PORT  SOCKET
-------- --------------------------------------- ----- ------
Local                                        ::1  6100      6
Local                                  127.0.0.1  6100      7
Remote                                       any  6200      8
Remote                                       any  6200      -

Servers: (1)

            INSTANCE NAME                  TIME   SEQUENCE  FLAGS     DEFER
---------------------------------------- -------- -------- -------- ----------
dbInstance_server3.domain.com_6200       5a953161 00000003 00000002          0
                          192.168.56.103 6200

Connection Topology: (2)

                IP                      PORT   VERS  TIME
--------------------------------------- ----- ----- --------
                         192.168.56.103  6200     4 5a953161
                           **                          192.168.56.102 6200
                         192.168.56.102  6200     4 5a9530ac=
                           **                          192.168.56.103 6200

Server connections: (1)

   ID            CONNECTION ADDRESS              PORT  FLAGS  SNDQ REF PHA ACK
-------- --------------------------------------- ----- ------ ---- --- --- ---
       d                          192.168.56.103  6200 2004a6    0   1  IO   0

Client connections: (10)

   ID            CONNECTION ADDRESS              PORT  FLAGS  SNDQ REF PHA SUB
-------- --------------------------------------- ----- ------ ---- --- --- ---
       0                                internal     0 00044a    0   1  IO   1
       2                                     ::1 56365 20041a    0   1  IO   1
       3                                     ::1 56621 20041a    0   1  IO   1
       4                                     ::1 56877 20041a    0   1  IO   1
       5                                     ::1 57133 20041a    0   1  IO   1
       9                                     ::1 59949 20041a    0   1  IO   0
       a                                     ::1 60717 20041a    0   1  IO   1
     5f2                     ::ffff:192.168.56.1  1237 26042a    0   1  IO   2
     5f1                     ::ffff:192.168.56.1   725 26042a    0   1  IO   2
 request                                     ::1 29026 200e1a    0   1  IO   0

Events:

  Flags 00000000 Processed 17842
  Threads:
    Total 3 Idle 3
    Last started: 18/02/27 11:19:24

AIO:

  Sockets 9 Events 0 Waiters 3 Timers 2 Flags 00000000
  Threads:
    Total 3 Idle 3
    Last started: 18/02/27 11:19:24

Resources:

  Notifications:
    Received: Total 17 (local 13 internal 1)
              Queued 0 (threads 0, flags 00000000)

  Blocks:
    mLink   : 5000/5000 blocks 1
    subMatch: 10000/10000 blocks 1
    event   : 5000/5000 blocks 1

Java testing code

The Oracle official documentation is not super clear on this part and it is not possible to find a clear working example. The idea behind simplefan.jar library and classes is to register a listener that will listen on particular event. Currently only three are supported:

  • LoadAdvisoryEvent
  • NodeDownEvent
  • ServiceDownEvent

As I am not using a pool the LoadAdvisoryEvent has not raised for me. Then to be honest I did not know in advance which one would raise from the NodeDownEvent or ServiceDownEvent events in case I kill the used instance by my java program. At the end this is the ServiceDownEvent event that has raised…

You need to add to your project (classpath) ojdbc8.jar, ons.jar and simplefan.jar.

package fan01;

import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Properties;
import oracle.jdbc.pool.OracleDataSource;
import oracle.simplefan.FanEventListener;
import oracle.simplefan.FanManager;
import oracle.simplefan.FanSubscription;
import oracle.simplefan.LoadAdvisoryEvent;
import oracle.simplefan.NodeDownEvent;
import oracle.simplefan.ServiceDownEvent;

public class fan01 {
  static Integer connection_status = 0;
  // Make Oracle connection and return a connection object
  private static Connection oracle_connection() throws Exception {
    Connection connection1 = null;
    OracleDataSource ods1 = new OracleDataSource();

    try {
      ods1.setUser("yjaquier");
      ods1.setPassword("secure_password");
      ods1.setURL("jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=rac-cluster-scan.domain.com)(PORT=1531))(CONNECT_DATA=(SERVICE_NAME=pdb1srv)))");
      connection1 = ods1.getConnection();
    }
    catch (SQLException e) {
      System.out.println("Connection Failed! Check output console");
      e.printStackTrace();
      return null;
    }
    System.out.println("Connected to Oracle database...");
    return connection1;
  }

  public static void main(String[] args) throws Exception {
    Connection connection1 = null;
    String query1 = null;
    ResultSet resultset1 = null;
    Properties props = new Properties();
    Properties onsProps = new Properties();
    FanManager fanMngr = FanManager.getInstance();


    try {
      Class.forName("oracle.jdbc.driver.OracleDriver");
    }
    catch (ClassNotFoundException e) {
      System.out.println("Where is your Oracle JDBC driver ?");
      e.printStackTrace();
      System.exit(1);
    }
    System.out.println("Oracle JDBC Driver Registered!");

    connection1=oracle_connection();
    if (connection1==null) {
      System.exit(1);
    }

    props.put("serviceName", "pdb1srv");
    onsProps.setProperty("onsNodes", "server2.domain.com:6200,server3.domain.com:6200");
    fanMngr.configure(onsProps);
    FanSubscription sub = fanMngr.subscribe(props);
    sub.addListener(new FanEventListener() {
      public void handleEvent(ServiceDownEvent event) {
        System.out.println("Service down event registered !");
        connection_status=1;
      }
      public void handleEvent(NodeDownEvent event) {
        System.out.println("Node down event registered !");
        connection_status=2;
      }
      public void handleEvent(LoadAdvisoryEvent event) {
        System.out.println("Load advisory event registered !");
        connection_status=3;
      }
    });

    query1="select HOST_NAME from v$instance";
    while (true) {
      System.out.println("\nStatus at "+LocalDateTime.now().format(DateTimeFormatter.ofPattern("dd-MM-yyyy HH:mm:ss"))+": ");
      if (connection1!=null) {
        try {
          resultset1 = connection1.createStatement().executeQuery(query1);
          while (resultset1.next()) {
            System.out.println("Server used: "+resultset1.getString(1));
          }
        }
        catch (SQLException e1) {
          System.out.println("Query has failed...");
          if (connection_status != 0) {
            connection1.close();
            connection1 = null;
            while (connection1 == null) {
              try {
                connection1=oracle_connection();
              } catch (Exception e2) {
                e2.printStackTrace();
              }
            }
            connection_status = 0;
          }
        }
      }
      Thread.sleep(2000);
    }
    //resultset1.close();
    //connection1.close();
  }
}

Once the program is connected I have killed the pmon of the instance where the program connected. We clearly the ServiceDownEvent event raised and the re-connection handle by my Java program with no applicative failure:

fan01
fan01

Issues encountered

Default pluggable database service destroyed

If by mistake or lack of knowledge (my case) you try to play with default pluggable database service (DBMS_SERVICE package) you end up with a very particular situation where you cannot start your pluggable database when it is down:

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO
SQL> alter pluggable database pdb1 close immediate;

Pluggable database altered.

SQL> alter pluggable database pdb1 open;
 alter pluggable database pdb1 open
*
ERROR at line 1:
ORA-44304: service  does not exist
ORA-44777: Pluggable database service cannot be started.

SQL> select con_id, name from v$services;

    CON_ID NAME
---------- ----------------------------------------------------------------
         1 orclXDB
         1 orcl
         1 SYS$BACKGROUND
         1 SYS$USERS

SQL> select name from dba_services;

NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
orclXDB
orcl

And if the pluggable database is up and running you cannot connect to it:

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           READ WRITE NO
SQL> alter session set container=pdb1;
ERROR:
ORA-44787: Service cannot be switched into.

SQL> alter session set container=pdb1;
alter session set container=pdb1
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 11524
Session ID: 54 Serial number: 53333

Situation and way to recover it as been well explained in Mike Dietrich and William Sescu blogs:

References

The post Fast Application Notification (FAN) – JDBC HA – part 3 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/fast-application-notification-fan-jdbc-ha-part-3.html/feed 1
Transparent Application Failover (TAF) – JDBC HA – part 2 https://blog.yannickjaquier.com/oracle/jdbc-failover-highly-available-part-2.html https://blog.yannickjaquier.com/oracle/jdbc-failover-highly-available-part-2.html#comments Mon, 16 Jul 2018 09:18:29 +0000 https://blog.yannickjaquier.com/?p=4273 Preamble Transparent Application Failover (TAF) can be tested with a simple active/passive OS cluster (Pacemaker in my case) and you must use the JDBC OCI driver not the Thin one ! The principle is a primary node switchover to secondary node. In initial state my database is running on server2.domain.com: [root@server2 ~]# pcs status Cluster […]

The post Transparent Application Failover (TAF) – JDBC HA – part 2 appeared first on IT World.

]]>

Table of contents

Preamble

Transparent Application Failover (TAF) can be tested with a simple active/passive OS cluster (Pacemaker in my case) and you must use the JDBC OCI driver not the Thin one !

The principle is a primary node switchover to secondary node. In initial state my database is running on server2.domain.com:

[root@server2 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server2.domain.com (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Wed Dec 20 15:56:27 2017
Last change: Wed Dec 20 15:55:54 2017 by root via cibadmin on server2.domain.com

2 nodes configured
5 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 Resource Group: oracle
     virtualip  (ocf::heartbeat:IPaddr2):       Started server2.domain.com
     vg01       (ocf::heartbeat:LVM):   Started server2.domain.com
     u01        (ocf::heartbeat:Filesystem):    Started server2.domain.com
     orcl       (ocf::heartbeat:oracle):        Started server2.domain.com
     listener_orcl      (ocf::heartbeat:oralsnr):       Started server2.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

And I simply set the primary node to standby node with following command (that can be executed from any node of the cluster):

[root@server2 ~]# pcs node standby server2.domain.com

To finally reach below situation where all resources have been restarted on server3.domain.com:

[root@server2 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server2.domain.com (version 1.1.16-12.el7-94ff4df) - partition with quorum
Last updated: Wed Dec 20 17:42:49 2017
Last change: Wed Dec 20 17:35:52 2017 by root via cibadmin on server2.domain.com

2 nodes configured
5 resources configured

Node server2.domain.com: standby
Online: [ server3.domain.com ]

Full list of resources:

 Resource Group: oracle
     virtualip  (ocf::heartbeat:IPaddr2):       Started server3.domain.com
     vg01       (ocf::heartbeat:LVM):   Started server3.domain.com
     u01        (ocf::heartbeat:Filesystem):    Started server3.domain.com
     orcl       (ocf::heartbeat:oracle):        Started server3.domain.com
     listener_orcl      (ocf::heartbeat:oralsnr):       Started server3.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

You can unstandby server2.domain.com with:

[root@server2 ~]# pcs node unstandby server2.domain.com

I have defined below tnsnames entry:

PDB1 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.99)(PORT = 1531))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = pdb1)
			(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC))
    )
  )

Transparent Application Failover (TAF) testing

The test code I am using is (running under Eclipse). You need to add ojdbc8.jar file to your project (classpath):

package taf01;

import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;

import oracle.jdbc.pool.OracleDataSource;
import oracle.jdbc.OracleConnection;

public class taf01 {
  // Make Oracle connection and return a connection object
  private static Connection oracle_connection() throws Exception {
    Connection connection1 = null;
    CallBack function_callback = new CallBack();
    String callback_message = null;
    OracleDataSource ods1 = new OracleDataSource();

    try {
      Class.forName("oracle.jdbc.driver.OracleDriver");
    }
    catch (ClassNotFoundException e) {
      System.out.println("Where is your Oracle JDBC driver ?");
      e.printStackTrace();
      return null;
    }

    System.out.println("Oracle JDBC Driver Registered!");

    try {
      ods1.setUser("yjaquier");
      ods1.setPassword("secure_password");
      ods1.setURL("jdbc:oracle:oci:@//192.168.56.99:1531/pdb1");
      connection1 = ods1.getConnection();
    }
    catch (SQLException e) {
      System.out.println("Connection Failed! Check output console");
      e.printStackTrace();
      return null;
    }
    System.out.println("Connected to Oracle database...");
    return connection1;
  }

  public static void main(String[] args) throws Exception {
    Connection connection1=null;
    String query1 = null;
    ResultSet resultset1 = null;

    // To set TNS_ADMIN variable in Java
    System.setProperty("oracle.net.tns_admin","C:/oracle/product/12.2.0/client_1/network/admin");
    connection1=oracle_connection();
    if (connection1==null) {
      System.exit(1);
    }
    query1="select HOST_NAME from v$instance";
    for(int i=1; i <= 10000; i++) {
      System.out.println("\nQuery "+i+" at "+LocalDateTime.now().format(DateTimeFormatter.ofPattern("dd-MM-yyyy HH:mm:ss"))+": ");
      if (connection1!=null) {
        try {
          resultset1 = connection1.createStatement().executeQuery(query1);
          while (resultset1.next()) {
            System.out.println("Server used: "+resultset1.getString(1));
          }
        }
        catch (SQLException e) {
          System.out.println("Query has failed...");
        }
      }
      Thread.sleep(2000);
    }
    resultset1.close();
    connection1.close();
  }
}

With JDBC Thin driver my sample code does not fail but never ever reconnected to second node where the database is restarted (using jdbc:oracle:thin:@//192.168.56.99:1531/pdb1 as connect string):

taf01
taf01

Remark:
By the way we knew by looking at above table TAF is NOT available with JDBC Thin driver.

With JDBC OCI driver it is just simply failing (using jdbc:oracle:oci:@//192.168.56.99:1531/pdb1 as connect string):

taf02
taf02

If I try to use below TNS entry, either directly is Java program with:

ods1.setURL("jdbc:oracle:oci:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.56.99)(PORT=1531))(CONNECT_DATA=(SERVICE_NAME=pdb1)(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC))))");

Or simply with a direct TNS entry:

PDB1 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.99)(PORT = 1531))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = pdb1)
			(FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC))
    )
  )

And the TNS_ADMIN property set with:

System.setProperty("oracle.net.tns_admin","C:/oracle/product/12.2.0/client_1/network/admin");

I have the same exact behavior even if from database standpoint it looks better:

SQL> set lines 200 pages 1000
SQL> select module, failover_type, failover_method, failed_over, service_name
     from v$session
     where username='YJAQUIER';

MODULE                                                         FAILOVER_TYPE FAILOVER_M FAI SERVICE_NAME
-------------------------------------------------------------- ------------- ---------- --- ----------------------------------------------------------------
javaw.exe                                                      SELECT        BASIC      NO  pdb1
SQL Developer                                                  NONE          NONE       NO  pdb1
SQL> set lines 200 pages 1000
SQL> col name for a10
SQL> select name,failover_delay, failover_method,failover_restore,failover_retries,failover_type
     from dba_services;

NAME       FAILOVER_DELAY FAILOVER_METHOD                              FAILOV FAILOVER_RETRIES FAILOVER_TYPE
---------- -------------- -------------------------------------------- ------ ---------------- --------------------------------------------
pdb1

Clearly implementing TAF and having a professional looking application requires a bit more of effort, let see how we can achieve this... If you dig in JDBC Developer's Guide you have a complete chapter on JDBC OCI driver TAF implementation.

Roughly this is achieved by creating a callback function that will handle the re-connection in case the connection has failed. You have multiple My Oracle Support (MOS) notes on the subject. It is also mandatory to use a TNS entry that is TAF enable. Refer to Net Services Administrator's Guide to know how to use it. The code I am using is:

package taf01;

import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;

import oracle.jdbc.OracleOCIFailover;
import oracle.jdbc.pool.OracleDataSource;
import oracle.jdbc.OracleConnection;

public class taf01 {
  // Make Oracle connection and return a connection object
  private static Connection oracle_connection() throws Exception {
    Connection connection1 = null;
    CallBack function_callback = new CallBack();
    String callback_message = null;
    OracleDataSource ods1 = new OracleDataSource();

    try {
      Class.forName("oracle.jdbc.driver.OracleDriver");
    }
    catch (ClassNotFoundException e) {
      System.out.println("Where is your Oracle JDBC Thin driver ?");
      e.printStackTrace();
      return null;
    }

    System.out.println("Oracle JDBC Driver Registered!");

    try {
      ods1.setUser("yjaquier");
      ods1.setPassword("secure_password");
      ods1.setURL("jdbc:oracle:oci:@pdb1");
      connection1 = ods1.getConnection();
      ((OracleConnection) connection1).registerTAFCallback(function_callback, callback_message);
    }
    catch (SQLException e) {
      System.out.println("Connection Failed! Check output console");
      e.printStackTrace();
      return null;
    }
    System.out.println("Connected to Oracle database...");
    return connection1;
  }

  public static void main(String[] args) throws Exception {
    Connection connection1=null;
    String query1 = null;
    ResultSet resultset1 = null;

    // To set TNS_ADMIN variable in Java
    System.setProperty("oracle.net.tns_admin","C:/oracle/product/12.2.0/client_1/network/admin");
    connection1=oracle_connection();
    if (connection1==null) {
      System.exit(1);
    }
    query1="select HOST_NAME from v$instance";
    for(int i=1; i <= 10000; i++) {
      System.out.println("\nQuery "+i+" at "+LocalDateTime.now().format(DateTimeFormatter.ofPattern("dd-MM-yyyy HH:mm:ss"))+": ");
      if (connection1!=null) {
        try {
          resultset1 = connection1.createStatement().executeQuery(query1);
          while (resultset1.next()) {
            System.out.println("Server used: "+resultset1.getString(1));
          }
        }
        catch (SQLException e) {
          System.out.println("Query has failed...");
        }
      }
      Thread.sleep(2000);
    }
    resultset1.close();
    connection1.close();
  }
}

// Define class CallBack
class CallBack implements OracleOCIFailover {
  // TAF callback function 
  public int callbackFn (Connection conn, Object ctxt, int type, int event) {
    String failover_type = null;

    switch (type) {
    case FO_SESSION: 
      failover_type = "SESSION";
      break;
    case FO_SELECT:
      failover_type = "SELECT";
      break;
    default:
      failover_type = "NONE";
    }

    switch (event) {
    case FO_BEGIN:
      System.out.println(ctxt + ": "+ failover_type + " failing over...");
      break;
    case FO_END:
      System.out.println(ctxt + ": failover ended");
      break;
    case FO_ABORT:
      System.out.println(ctxt + ": failover aborted.");
      break;
    case FO_REAUTH:
      System.out.println(ctxt + ": failover.");
      break;
    case FO_ERROR:
      System.out.println(ctxt + ": failover error gotten. Sleeping...");
      // Sleep for a while 
      try {
        Thread.sleep(1000);
      }
      catch (InterruptedException e) {
        System.out.println("Thread.sleep has problem: " + e.toString());
      }
      return FO_RETRY;
    default:
      System.out.println(ctxt + ": bad failover event.");
      break;
    }  
    return 0;
  }
}

I start the Java program when the database is running on server2.domain.com:

taf03
taf03

When the database switch to server3.domain.com the Java does not fail and wait until it is possible to re-initiate the connection and continue on server3.domain.com

taf04
taf04

Remark:
All what we have seen here above would be more than true with a RAC cluster. I have chosen to use a basic primary/secondary active/passive implementation because it is much more common in real life and also because it requires much less effort to build and maintain.

References

The post Transparent Application Failover (TAF) – JDBC HA – part 2 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/jdbc-failover-highly-available-part-2.html/feed 1
JDBC client high availability features – JDBC HA – part 1 https://blog.yannickjaquier.com/oracle/jdbc-client-high-availability-features-part-1.html https://blog.yannickjaquier.com/oracle/jdbc-client-high-availability-features-part-1.html#respond Thu, 05 Jul 2018 14:07:48 +0000 https://blog.yannickjaquier.com/?p=4259 JDBC client high availability features You might have build high level architecture to make your Oracle databases highly available. The two main options are: Using an Operating System cluster (Veritas, Pacemaker,…) and make your database highly available in an active/passive configuration sometimes switching from one data center to another one physically independent. Using Real Application […]

The post JDBC client high availability features – JDBC HA – part 1 appeared first on IT World.

]]>

Table of contents

JDBC client high availability features

You might have build high level architecture to make your Oracle databases highly available. The two main options are:

  • Using an Operating System cluster (Veritas, Pacemaker,…) and make your database highly available in an active/passive configuration sometimes switching from one data center to another one physically independent.
  • Using Real Application Cluster (RAC) with sometimes a Data Guard (DG) configuration to try to follow Oracle Maximum Availability Architecture (MAA).

But when done are you sure that your developers have implemented all what they could in their Java code to benefit from this ? Have you never heard that they need to shutdown and restart their application when you have switched your Veritas cluster to passive node ? The Oracle features for Java high available are, some are new and some are here for quite a long time:

  • Transparent Application Failover (TAF):

    Transparent Application Failover (TAF) is a feature of the Java Database Connectivity (JDBC) Oracle Call Interface (OCI) driver. It enables the application to automatically reconnect to a database, if the database instance to which the connection is made fails. In this case, the active transactions roll back.

  • Fast Application Notification (FAN):

    The Oracle RAC Fast Application Notification (FAN) feature provides a simplified API for accessing FAN events through a callback mechanism. This mechanism enables third-party drivers, connection pools, and containers to subscribe, receive and process FAN events. These APIs are referred to as Oracle RAC FAN APIs in this appendix.

    The Oracle RAC FAN APIs provide FAN event notification for developing more responsive applications that can take full advantage of Oracle Database HA features. If you do not want to use Universal
    Connection Pool, but want to work with FAN events implementing your own connection pool, then you should use Oracle RAC Fast Application Notification.

  • Fast Connection Failover (FCF):

    The Fast Connection Failover (FCF) feature is a Fast Application Notification (FAN) client implemented through the connection pool. The feature requires the use of an Oracle JDBC driver and an Oracle RAC database or an Oracle Restart on a single instance database.

  • Transaction Guard (TG):

    Transaction Guard for Java provides transaction idempotence, that is, every transaction has at-most-once execution that prevents applications from submitting duplicate transactions

  • Application Continuity (AC):

    Oracle Database 12c Release 1 (12.1) introduced the Application Continuity feature that masks database outages to the application and end users are not exposed to such outages.

For definition of Application Continuity / Transaction Guard please check Oracle Application Continuity page. In short:

  • Application Continuity (AC) hide database outages from end users and applications by auto-recovering running session, only a delay from application might be seen.
  • Transaction Guard (TG) ensure a transaction has been at-most-once executed, occur also in case the session has been unexpectedly ended.

Note that few restrictions apply:

  • Oracle recommends not to use TAF and Fast Connection Failover in the same application.
  • For Application Continuity feature you must use Transaction Guard 12.2.
  • Application Continuity is a feature of the Oracle JDBC Thin driver and is not supported by JDBC OCI driver.

In a previous post I have build a small Oracle 12cR2 database cluster between two virtual machines using a Linux free tool called Pacemaker. As you might guess the idea now is to see how a Java application can benefit from this cluster. But to really benefit from all Java high availability option this entry solution might not be enough and you might need to build an expensive and way more complex RAC cluster.

Most of our production databases are using a premier class cluster call Veritas Cluster Server (VCS) which is (I suppose) a higher level of expertise than my simple Pacemaker one. But whatever the idea remains the same: switch my database to one of the remaining node of my cluster. In many case when we switch the primary node, either to test it or really because we have an outage.

Testing has been done using two virtual machine running Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 – 64bit Production under Oracle Linux Server release 7.4. The two nodes cluster is made of:

  • server2.domain.com
  • server3.domain.com

The two node have also been used to build a RAC cluster.

The client is a Java program running under Eclipse Oxygen.2 Release (4.7.2). For JDBC OCI driver the Oracle client installed is also 12.2.0.1.0 64 bits.

Which JDBC driver to choose ?

There is a nice page in official documentation called Feature Differences Between JDBC OCI and Thin Drivers that display the key features differences between the two client side available drivers (here is an hard copy for easy reference of this post):

JDBC OCI Driver JDBC Thin Driver
OCI connection pooling N/A
N/A Default support for Native XA
Transparent Application Failover (TAF) N/A
OCI Client Result Cache N/A
N/A Application Continuity
N/A Transaction Guard
N/A Support for row count per iteration for array DML
N/A SHA-2 Support in Oracle Advanced Security
oraaccess.xml configuration file settings N/A
N/A Oracle Advanced Queuing
N/A Continuous Query Notification
N/A Support for the O7L_MR client ability
N/A Support for promoting a local transaction to a global transaction

As we can see it is not super easy to choose one as you have to make a choice between very interesting mutually exclusive features like OCI connection pooling / Transparent Application Failover (TAF) / OCI Client Result Cache and Application Continuity / Transaction Guard…

The JDBC Thin driver is a simple jar file that you add in your CLASSPATH when executing your Java program, the number in ojdbc.jar jar file is:

Oracle Database version JDBC version
12.2 or 12cR2 JDBC 4.2 in ojdbc8.jar
12.1 or 12cR1 JDBC 4.1 in ojdbc7.jar
JDBC 4.0 in ojdbc6.jar
11.2 or 11gR2 JDBC 4.0 in ojdbc6.jar
JDBC 3.0 in ojdbc5.jar

The JDBC OCI driver comes with a complete client installation or, starting with 10.1.0, with the OCI Instant Client feature, an option of the instant client installation. The JDBC OCI driver is located in $ORACLE_HOME/bin directory on Windows and $ORACLE_HOME/lib directory on Unix-like. File called ocijdbc.dll on Windows and libocijdbc.so on Unix-like. So ocijdbc12.dll on my Windows desktop and libocijdbc12.so on my database Oracle home.

To have the JDBC OCI driver installed you can choose custom installation and select Oracle Call Interface (OCI) in available option:

jdbc01
jdbc01

Real Application Cluster (RAC) with VirtualVox

VirtualBox configuration

There are a lot of articles on Internet on how to create a RAC cluster within VirtualBox. Overall it is not that complex till you have understood the VirutalBox shared storage capability of disk and understood how to configure the different network cards by virtual machine.

You need three network cars for each VM:

  • One for public and administrative (system team) connection (Host-only Adapter)
  • One for private cluster interconnect (Internal Network)
  • One for Internet access s you will surely need to install few packages (Bridged Adapter)
jdbc02
jdbc02
jdbc03
jdbc03
jdbc04
jdbc04

From VM it gives:

1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:4e:19:d5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.102/24 brd 192.168.56.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.56.112/24 brd 192.168.56.255 scope global secondary eth0:1
       valid_lft forever preferred_lft forever
    inet 192.168.56.121/24 brd 192.168.56.255 scope global secondary eth0:2
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe4e:19d5/64 scope link
       valid_lft forever preferred_lft forever
3: eth1:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:9b:fd:a4 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.102/24 brd 192.168.1.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 169.254.209.44/16 brd 169.254.255.255 scope global eth1:1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe9b:fda4/64 scope link
       valid_lft forever preferred_lft forever
4: eth2:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:15:09:4b brd ff:ff:ff:ff:ff:ff
    inet 10.70.101.71/24 brd 10.70.101.255 scope global dynamic eth2
       valid_lft 3590sec preferred_lft 3590sec
    inet6 fe80::a00:27ff:fe15:94b/64 scope link
       valid_lft forever preferred_lft forever

You need shared storage between your Virtual Machines (VM):

jdbc05
jdbc05
jdbc06
jdbc06

Local DNS with dnsmasq

One point that has always been an issue for me is the SCAN name that should be resolved by a DNS server and point onto three different IP. Fortunately I have seen many blog posts mentioning a lightweight DNS server call dnsmasq.

The IP addresses I would like to configure are:

[root@server2 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.102  server2 server2.domain.com
192.168.56.103  server3 server3.domain.com
192.168.1.102  server2-priv server2-priv.domain.com
192.168.1.103  server3-priv server3-priv.domain.com
192.168.56.112  server2-vip server2-vip.domain.com
192.168.56.113  server3-vip server3-vip.domain.com
192.168.56.121  rac-cluster-scan rac-cluster-scan.domain.com
192.168.56.122  rac-cluster-scan rac-cluster-scan.domain.com
192.168.56.123  rac-cluster-scan rac-cluster-scan.domain.com

In dnsmasq configuration file (/etc/dnsmasq.conf) I have commented out only one parameter to match my local domain:

local=/domain.com/

In /etc/resolf.conf file, to make it persistent across reboot the DHCP third network car should be de-activated:

[root@server2 ~]# cat /etc/resolv.conf
; generated by /usr/sbin/dhclient-script
search domain.com
nameserver 192.168.56.102
nameserver 164.129.154.205
nameserver 10.129.252.253

If you want to keep an Internet access on your virtual machine you can use below nice trick to avoid /etc/resolv.conf being modified by DHCP client:

[root@server4 oracle]# chattr +i /etc/resolv.conf

Enable and start dnsmasq:

[root@server3 ~]# systemctl enable dnsmasq
Created symlink from /etc/systemd/system/multi-user.target.wants/dnsmasq.service to /usr/lib/systemd/system/dnsmasq.service.
[root@server3 ~]# systemctl start dnsmasq
[root@server3 ~]# systemctl status dnsmasq
● dnsmasq.service - DNS caching server.
   Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2018-02-09 16:11:02 CET; 25min ago
 Main PID: 6338 (dnsmasq)
   CGroup: /system.slice/dnsmasq.service
           └─6338 /usr/sbin/dnsmasq -k

Feb 09 16:11:02 server3.domain.com systemd[1]: Starting DNS caching server....
Feb 09 16:11:02 server3.domain.com dnsmasq[6338]: started, version 2.76 cachesize 150
Feb 09 16:11:02 server3.domain.com dnsmasq[6338]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
Feb 09 16:11:02 server3.domain.com dnsmasq[6338]: using local addresses only for domain domain.com
Feb 09 16:11:02 server3.domain.com dnsmasq[6338]: reading /etc/resolv.conf
Feb 09 16:11:02 server3.domain.com dnsmasq[6338]: using local addresses only for domain domain.com
Feb 09 16:11:02 server3.domain.com dnsmasq[6338]: ignoring nameserver 192.168.56.103 - local interface
Feb 09 16:11:02 server3.domain.com dnsmasq[6338]: using nameserver 164.129.154.205#53
Feb 09 16:11:02 server3.domain.com dnsmasq[6338]: using nameserver 10.129.252.253#53
Feb 09 16:11:02 server3.domain.com dnsmasq[6338]: read /etc/hosts - 9 addresses

It says it is ignoring local nameserver but if you do not add it then it does not work…

And now magically you can nslookup your SCAN cluster name that are round robin displaying your three IP addressees:

[root@server3 ~]# nslookup rac-cluster-scan
Server:         192.168.56.103
Address:        192.168.56.103#53

Name:   rac-cluster-scan.domain.com
Address: 192.168.56.121
Name:   rac-cluster-scan.domain.com
Address: 192.168.56.122
Name:   rac-cluster-scan.domain.com
Address: 192.168.56.123

The installation of Grid infrastructure should work great. The nasty thing I have discovered with Grid 12cR2 is that the zip file you download is directly the binaries and should be unzip in your final target directory, not anymore in a temporary folder. This will, apparently, be also the case for Oracle software starting with 18c…

References

The post JDBC client high availability features – JDBC HA – part 1 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/oracle/jdbc-client-high-availability-features-part-1.html/feed 0
Pacemaker configuration for an Oracle database and its listener https://blog.yannickjaquier.com/linux/pacemaker-configuration-oracle-database.html https://blog.yannickjaquier.com/linux/pacemaker-configuration-oracle-database.html#comments Fri, 08 Jun 2018 17:14:34 +0000 http://blog.yannickjaquier.com/?p=4126 Preamble In order to test a real life high availability scenario you might want to create an operating system cluster to simulate what you could have in production. Where I work the standard tool to manage OS cluster is Veritas Cluster Server (VCS). It’s a nice tool but its installation require a license key that […]

The post Pacemaker configuration for an Oracle database and its listener appeared first on IT World.

]]>

Table of contents

Preamble

In order to test a real life high availability scenario you might want to create an operating system cluster to simulate what you could have in production. Where I work the standard tool to manage OS cluster is Veritas Cluster Server (VCS). It’s a nice tool but its installation require a license key that is not easy to get to test the product.

A free alternative is anyway available and is called Pacemaker. In this blog post I will setup a completer cluster with a virtual IP address (192.168.56.99), a LVM volume group (vg01), a file system (/u01) and finally an Oracle database and its associated listener. The listener will obviously listen on the virtual IP address of the cluster.

For testing I have used two virtual machines running Oracle Linux Server release 7.2 64 bits and Oracle Enterprise edition 12cR2 (12.2.0.1.0) but any Oracle release can be used. The virtual servers are:

  • server2.domain.com using non routable IP address 192.168.56.102
  • server3.domain.com using non routable IP address 192.168.56.103

The command to control and manage Pacemaker is pcs.

Pacemaker installation

Install PCS that control and configure pacemaker and corosync with:

[root@server2 ~]# yum -y install pcs

Pacemaker and corosync will be installed as well:

Dependencies Resolved

===========================================================================================================================================================================================================
 Package                                                           Arch                                 Version                                             Repository                                Size
===========================================================================================================================================================================================================
Installing:
 pcs                                                               x86_64                               0.9.152-10.0.1.el7                                  ol7_latest                               5.0 M
Installing for dependencies:
 corosync                                                          x86_64                               2.4.0-4.el7                                         ol7_latest                               212 k
 corosynclib                                                       x86_64                               2.4.0-4.el7                                         ol7_latest                               125 k
 libqb                                                             x86_64                               1.0-1.el7                                           ol7_latest                                91 k
 libtool-ltdl                                                      x86_64                               2.4.2-22.el7_3                                      ol7_latest                                48 k
 libxslt                                                           x86_64                               1.1.28-5.0.1.el7                                    ol7_latest                               241 k
 libyaml                                                           x86_64                               0.1.4-11.el7_0                                      ol7_latest                                54 k
 nano                                                              x86_64                               2.3.1-10.el7                                        ol7_latest                               438 k
 net-snmp-libs                                                     x86_64                               1:5.7.2-24.el7_3.2                                  ol7_latest                               747 k
 pacemaker                                                         x86_64                               1.1.15-11.el7                                       ol7_latest                               441 k
 pacemaker-cli                                                     x86_64                               1.1.15-11.el7                                       ol7_latest                               319 k
 pacemaker-cluster-libs                                            x86_64                               1.1.15-11.el7                                       ol7_latest                                95 k
 pacemaker-libs                                                    x86_64                               1.1.15-11.el7                                       ol7_latest                               521 k
 perl-TimeDate                                                     noarch                               1:2.30-2.el7                                        ol7_latest                                51 k
 psmisc                                                            x86_64                               22.20-11.el7                                        ol7_latest                               140 k
 python-backports                                                  x86_64                               1.0-8.el7                                           ol7_latest                               5.2 k
 python-backports-ssl_match_hostname                               noarch                               3.4.0.2-4.el7                                       ol7_latest                                11 k
 python-clufter                                                    x86_64                               0.59.5-2.0.1.el7                                    ol7_latest                               349 k
 python-lxml                                                       x86_64                               3.2.1-4.el7                                         ol7_latest                               758 k
 python-setuptools                                                 noarch                               0.9.8-4.el7                                         ol7_latest                               396 k
 resource-agents                                                   x86_64                               3.9.5-82.el7                                        ol7_latest                               359 k
 ruby                                                              x86_64                               2.0.0.648-29.el7                                    ol7_latest                                68 k
 ruby-irb                                                          noarch                               2.0.0.648-29.el7                                    ol7_latest                                89 k
 ruby-libs                                                         x86_64                               2.0.0.648-29.el7                                    ol7_latest                               2.8 M
 rubygem-bigdecimal                                                x86_64                               1.2.0-29.el7                                        ol7_latest                                80 k
 rubygem-io-console                                                x86_64                               0.4.2-29.el7                                        ol7_latest                                51 k
 rubygem-json                                                      x86_64                               1.7.7-29.el7                                        ol7_latest                                76 k
 rubygem-psych                                                     x86_64                               2.0.0-29.el7                                        ol7_latest                                78 k
 rubygem-rdoc                                                      noarch                               4.0.0-29.el7                                        ol7_latest                               319 k
 rubygems                                                          noarch                               2.0.14.1-29.el7                                     ol7_latest                               215 k

Transaction Summary
===========================================================================================================================================================================================================

On all nodes:

[root@server2 ~]# systemctl start pcsd.service
[root@server2 ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.

Change hacluster password on all nodes:

[root@server3 ~]# echo secure_password | passwd --stdin hacluster
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.

Set authentication for pcs:

[root@server3 ~]# pcs cluster auth server2.domain.com server3.domain.com
Username: hacluster
Password:
server3.domain.com: Authorized
server2.domain.com: Authorized

Create your cluster (cluster01) on your two nodes with:

[root@server2 ~]# pcs cluster setup --start --name cluster01 server2.domain.com server3.domain.com
Destroying cluster on nodes: server2.domain.com, server3.domain.com...
server2.domain.com: Stopping Cluster (pacemaker)...
server3.domain.com: Stopping Cluster (pacemaker)...
server2.domain.com: Successfully destroyed cluster
server3.domain.com: Successfully destroyed cluster

Sending cluster config files to the nodes...
server2.domain.com: Succeeded
server3.domain.com: Succeeded

Starting cluster on nodes: server2.domain.com, server3.domain.com...
server2.domain.com: Starting Cluster...
server3.domain.com: Starting Cluster...

Synchronizing pcsd certificates on nodes server2.domain.com, server3.domain.com...
server3.domain.com: Success
server2.domain.com: Success

Restarting pcsd on the nodes in order to reload the certificates...
server3.domain.com: Success
server2.domain.com: Success

Check it with:

[root@server2 ~]# pcs status
Cluster name: cluster01
WARNING: no stonith devices and stonith-enabled is not false
Stack: unknown
Current DC: NONE
Last updated: Wed Apr 19 10:01:02 2017          Last change: Wed Apr 19 10:00:47 2017 by hacluster via crmd on server2.domain.com

2 nodes and 0 resources configured

Node server2.domain.com: UNCLEAN (offline)
Node server3.domain.com: UNCLEAN (offline)

No resources


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Notice the WARNING above about missing stonish device…

Enable cluster with:

[root@server2 ~]# pcs cluster enable --all
server2.domain.com: Cluster Enabled
server3.domain.com: Cluster Enabled
[root@server2 ~]# pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: server2.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
 Last updated: Wed Apr 19 10:02:02 2017         Last change: Wed Apr 19 10:01:08 2017 by hacluster via crmd on server2.domain.com
 2 nodes and 0 resources configured

PCSD Status:
  server2.domain.com: Online
  server3.domain.com: Online

As documentation says:

STONITH is an acronym for “Shoot The Other Node In The Head” and it protects your data from being corrupted by rogue nodes or concurrent access.

This is also known as split brain, this simply allow multiple nodes to access same resource (like writing to a filesystem) at same time and simple goal is to avoid corruption… As the aim is to build something simple I will disable fencing with:

[root@server2 ~]# pcs property set stonith-enabled=false
[root@server2 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server2.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 10:02:53 2017          Last change: Wed Apr 19 10:02:49 2017 by root via cibadmin on server2.domain.com

2 nodes and 0 resources configured

Online: [ server2.domain.com server3.domain.com ]

No resources


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Pacemaker resources creation

To create a new resource you might want to know what are available ones in Pacemaker:

[root@server2 ~]# pcs resource list ocf:heartbeat
ocf:heartbeat:CTDB - CTDB Resource Agent
ocf:heartbeat:Delay - Waits for a defined timespan
ocf:heartbeat:Dummy - Example stateless resource agent
ocf:heartbeat:Filesystem - Manages filesystem mounts
ocf:heartbeat:IPaddr - Manages virtual IPv4 and IPv6 addresses (Linux specific version)
ocf:heartbeat:IPaddr2 - Manages virtual IPv4 and IPv6 addresses (Linux specific version)
ocf:heartbeat:IPsrcaddr - Manages the preferred source address for outgoing IP packets
ocf:heartbeat:LVM - Controls the availability of an LVM Volume Group
ocf:heartbeat:MailTo - Notifies recipients by email in the event of resource takeover
ocf:heartbeat:Route - Manages network routes
ocf:heartbeat:SendArp - Broadcasts unsolicited ARP announcements
ocf:heartbeat:Squid - Manages a Squid proxy server instance
ocf:heartbeat:VirtualDomain - Manages virtual domains through the libvirt virtualization framework
ocf:heartbeat:Xinetd - Manages a service of Xinetd
ocf:heartbeat:apache - Manages an Apache Web server instance
ocf:heartbeat:clvm - clvmd
ocf:heartbeat:conntrackd - This resource agent manages conntrackd
ocf:heartbeat:db2 - Resource Agent that manages an IBM DB2 LUW databases in Standard role as primitive or in HADR roles as master/slave configuration. Multiple partitions are supported.
ocf:heartbeat:dhcpd - Chrooted ISC DHCP server resource agent.
ocf:heartbeat:docker - Docker container resource agent.
ocf:heartbeat:ethmonitor - Monitors network interfaces
ocf:heartbeat:exportfs - Manages NFS exports
ocf:heartbeat:galera - Manages a galara instance
ocf:heartbeat:garbd - Manages a galera arbitrator instance
ocf:heartbeat:iSCSILogicalUnit - Manages iSCSI Logical Units (LUs)
ocf:heartbeat:iSCSITarget - iSCSI target export agent
ocf:heartbeat:iface-vlan - Manages VLAN network interfaces.
ocf:heartbeat:mysql - Manages a MySQL database instance
ocf:heartbeat:nagios - Nagios resource agent
ocf:heartbeat:named - Manages a named server
ocf:heartbeat:nfsnotify - sm-notify reboot notifications
ocf:heartbeat:nfsserver - Manages an NFS server
ocf:heartbeat:nginx - Manages an Nginx web/proxy server instance
ocf:heartbeat:oracle - Manages an Oracle Database instance
ocf:heartbeat:oralsnr - Manages an Oracle TNS listener
ocf:heartbeat:pgsql - Manages a PostgreSQL database instance
ocf:heartbeat:portblock - Block and unblocks access to TCP and UDP ports
ocf:heartbeat:postfix - Manages a highly available Postfix mail server instance
ocf:heartbeat:rabbitmq-cluster - rabbitmq clustered
ocf:heartbeat:redis - Redis server
ocf:heartbeat:rsyncd - Manages an rsync daemon
ocf:heartbeat:slapd - Manages a Stand-alone LDAP Daemon (slapd) instance
ocf:heartbeat:symlink - Manages a symbolic link
ocf:heartbeat:tomcat - Manages a Tomcat servlet environment instance

Virtual Ip address

Add resource, a virtual IP, to test your cluster. I have chosen to use the Host-only Virtualbox adapter as it is cluster nodes communication so eth0 on all my nodes. We have seen how to configure this with Oracle Enterprise Linux or Redhat:

[root@server2 ~]# pcs resource create virtualip IPaddr2 ip=192.168.56.99 cidr_netmask=24 nic=eth0 op monitor interval=10s
[root@server2 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server2.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 10:03:50 2017          Last change: Wed Apr 19 10:03:36 2017 by root via cibadmin on server2.domain.com

2 nodes and 1 resource configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 virtualip      (ocf::heartbeat:IPaddr2):       Started server2.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

You can check at OS level it has been done with:

[root@server2 ~]# ping -c 1 192.168.56.99
PING 192.168.56.99 (192.168.56.99) 56(84) bytes of data.
64 bytes from 192.168.56.99: icmp_seq=1 ttl=64 time=0.025 ms

--- 192.168.56.99 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.025/0.025/0.025/0.000 ms
[root@server2 ~]# ip addr show dev eth0
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:47:54:07 brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.102/24 brd 192.168.56.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.56.99/24 brd 192.168.56.255 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe47:5407/64 scope link
       valid_lft forever preferred_lft forever

Move virtual IP on server3.domain.com:

[root@server3 ~]# pcs resource move virtualip server3.domain.com
[root@server3 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server2.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 10:24:59 2017          Last change: Wed Apr 19 10:06:21 2017 by root via crm_resource on server3.domain.com

2 nodes and 1 resource configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 virtualip      (ocf::heartbeat:IPaddr2):       Started server3.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

We see the IP address has been transferred to server3.domain.com:

[root@server3 ~]# ip addr show dev eth0
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:b4:9d:bf brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.103/24 brd 192.168.56.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.56.99/24 brd 192.168.56.255 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:feb4:9dbf/64 scope link
       valid_lft forever preferred_lft forever

Volume group

I create a volume group (vg01) on a shared disk, I also mount the logical volume but this part is not yet required:

[root@server2 ~]# vgcreate vg01 /dev/sdb
  Physical volume "/dev/sdb" successfully created.
  Volume group "vg01" successfully created
[root@server2 ~]# lvcreate -n lvol01 -L 5G vg01
  Logical volume "lvol01" created.
[root@server2 ~]# mkfs -t xfs /dev/vg01/lvol01
meta-data=/dev/vg01/lvol01       isize=256    agcount=4, agsize=327680 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0, sparse=0
data     =                       bsize=4096   blocks=1310720, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@server2 ~]# mkdir /u01
[root@server2 ~]# systemctl daemon-reload
[root@server2 /]# mount -t xfs /dev/vg01/lvol01 /u01
[root@server2 /]# df /u01
Filesystem              1K-blocks  Used Available Use% Mounted on
/dev/mapper/vg01-lvol01   5232640 32928   5199712   1% /u01

I add the LVM resource to Pacemaker, I deliberately create it on server2.domain.com:

[root@server2 /]# pcs resource create vg01 LVM volgrpname=vg01
[root@server2 /]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 15:27:28 2017          Last change: Wed Apr 19 15:27:24 2017 by root via cibadmin on server2.domain.com

2 nodes and 2 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 virtualip      (ocf::heartbeat:IPaddr2):       Started server3.domain.com
 vg01   (ocf::heartbeat:LVM):   Started server2.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

I try to move vg01 volume group to server3.domain.com:

[root@server2 ~]# pcs resource move vg01 server3.domain.com
[root@server2 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 15:31:17 2017          Last change: Wed Apr 19 15:31:04 2017 by root via crm_resource on server2.domain.com

2 nodes and 2 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 virtualip      (ocf::heartbeat:IPaddr2):       Started server3.domain.com
 vg01   (ocf::heartbeat:LVM):   FAILED server2.domain.com (blocked)

Failed Actions:
* vg01_stop_0 on server2.domain.com 'unknown error' (1): call=12, status=complete, exitreason='LVM: vg01 did not stop correctly',
    last-rc-change='Wed Apr 19 15:31:04 2017', queued=1ms, exec=10526ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

All this to show that it is not so easy and it requires a bit more of modification. I start by removing the volume group resource:

[root@server2 ~]# pcs resource delete vg01
Deleting Resource - vg01

On all nodes:

[root@server2 ~]# lvmconf --enable-halvm --services --startstopservices
Warning: Stopping lvm2-lvmetad.service, but it can still be activated by:
  lvm2-lvmetad.socket
Removed symlink /etc/systemd/system/sysinit.target.wants/lvm2-lvmetad.socket.
[root@server2 ~]# ps -ef | grep lvm
root     31974  9198  0 15:58 pts/1    00:00:00 grep --color=auto lvm

In /etc/lvm/lvm.conf file of all nodes I add:

volume_list = [ "vg00" ]

Execute below command on each node, this is not supporting kernel upgrade. The annoying thing is that each time you have a new kernel you have to issue the command on new kernel BEFORE rebooting or you need to reboot two times:

[root@server3 ~]# dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)

Recreate the volume group resource with exclusive option (parameter to ensure that only the cluster is capable of activating the LVM logical volume):

[root@server2 ~]# pcs resource create vg01 LVM volgrpname=vg01 exclusive=true
[root@server2 ~]# pcs resource show
 virtualip      (ocf::heartbeat:IPaddr2):       Started server3.domain.com
 vg01   (ocf::heartbeat:LVM):   Started server2.domain.com
[root@server2 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 17:10:53 2017          Last change: Wed Apr 19 17:10:43 2017 by root via cibadmin on server2.domain.com

2 nodes and 2 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 virtualip      (ocf::heartbeat:IPaddr2):       Started server3.domain.com
 vg01   (ocf::heartbeat:LVM):   Started server2.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

The volume group move is now working fine:

[root@server2 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 17:10:53 2017          Last change: Wed Apr 19 17:10:43 2017 by root via cibadmin on server2.domain.com

2 nodes and 2 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 virtualip      (ocf::heartbeat:IPaddr2):       Started server3.domain.com
 vg01   (ocf::heartbeat:LVM):   Started server2.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@server2 ~]# pcs resource move vg01 server3.domain.com
[root@server2 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 17:12:20 2017          Last change: Wed Apr 19 17:11:06 2017 by root via crm_resource on server2.domain.com

2 nodes and 2 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 virtualip      (ocf::heartbeat:IPaddr2):       Started server3.domain.com
 vg01   (ocf::heartbeat:LVM):   Started server3.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Filesystem

Create the file system based on a logical volume:

[root@server2 ~]# pcs resource create u01 Filesystem device="/dev/vg01/lvol01" directory="/u01" fstype="xfs"
[root@server2 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 17:13:51 2017          Last change: Wed Apr 19 17:13:47 2017 by root via cibadmin on server2.domain.com

2 nodes and 3 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 virtualip      (ocf::heartbeat:IPaddr2):       Started server3.domain.com
 vg01   (ocf::heartbeat:LVM):   Started server3.domain.com
 u01    (ocf::heartbeat:Filesystem):    Started server3.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

To collocate resources I create a group, this can also be done with constraints but a group is more logic in our case (the order you choose will be starting order):

[root@server3 u01]# pcs resource group add oracle virtualip vg01 u01
[root@server3 u01]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 17:36:10 2017          Last change: Wed Apr 19 17:36:07 2017 by root via cibadmin on server3.domain.com

2 nodes and 3 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 Resource Group: oracle
     virtualip  (ocf::heartbeat:IPaddr2):       Started server3.domain.com
     vg01       (ocf::heartbeat:LVM):   Started server3.domain.com
     u01        (ocf::heartbeat:Filesystem):    Started server3.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

At that stage you can test already created resources are moving from one cluster node to the other with something like:

[root@server3 u01]# pcs cluster standby server3.domain.com
[root@server2 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 17:45:45 2017          Last change: Wed Apr 19 17:45:37 2017 by root via crm_resource on server2.domain.com

2 nodes and 3 resources configured

Node server3.domain.com: standby
Online: [ server2.domain.com ]

Full list of resources:

 Resource Group: oracle
     virtualip  (ocf::heartbeat:IPaddr2):       Started server2.domain.com
     vg01       (ocf::heartbeat:LVM):   Started server2.domain.com
     u01        (ocf::heartbeat:Filesystem):    Started server2.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

And be back in initial situation with (they are back on server3.domain.com because I have also played with preferred node but this is not mandatory):

[root@server3 ~]# pcs node unstandby server3.domain.com
[root@server3 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Wed Apr 19 17:48:05 2017          Last change: Wed Apr 19 17:48:03 2017 by root via crm_attribute on server3.domain.com

2 nodes and 3 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 Resource Group: oracle
     virtualip  (ocf::heartbeat:IPaddr2):       Started server3.domain.com
     vg01       (ocf::heartbeat:LVM):   Started server3.domain.com
     u01        (ocf::heartbeat:Filesystem):    Started server3.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Oracle database

[root@server2 ~]# pcs resource describe oracle
ocf:heartbeat:oracle - Manages an Oracle Database instance

Resource script for oracle. Manages an Oracle Database instance
as an HA resource.

Resource options:
  sid (required): The Oracle SID (aka ORACLE_SID).
  home: The Oracle home directory (aka ORACLE_HOME). If not specified, then the SID along with its home should be listed in /etc/oratab.
  user: The Oracle owner (aka ORACLE_OWNER). If not specified, then it is set to the owner of file $ORACLE_HOME/dbs/*${ORACLE_SID}.ora. If this does not work for you, just set it explicitely.
  monuser: Monitoring user name. Every connection as sysdba is logged in an audit log. This can result in a large number of new files created. A new user is created (if it doesn't exist) in the start action and subsequently used in monitor. It should have very limited rights. Make sure that the password for this user does not expire.
  monpassword: Password for the monitoring user. Make sure that the password for this user does not expire.
  monprofile: Profile used by the monitoring user. If the profile does not exist, it will be created with a non-expiring password.
  ipcrm: Sometimes IPC objects (shared memory segments and semaphores) belonging to an Oracle instance might be left behind which prevents the instance from starting. It is not easy to figure out which shared segments belong to which instance, in particular when more instances are running as same user. What we use here is the "oradebug" feature and its "ipc" trace utility. It is not optimal to parse the debugging information, but I am not aware of any other way to find out about the IPC information. In case the format or wording of the trace report changes, parsing might fail. There are some precautions, however, to prevent stepping on other peoples toes. There is also a dumpinstipc option which will make us print the IPC objects which belong to the instance. Use it to see if we parse the trace file correctly. Three settings are possible: - none: don't mess with IPC and hope for the best (beware: you'll probably be out of luck, sooner or later) - instance: try to figure out the IPC stuff which belongs to the instance and remove only those (default; should be safe) - orauser: remove all IPC belonging to the user which runs the instance (don't use this if you run more than one instance as same user or if other apps running as this user use IPC) The default setting "instance" should be safe to use, but in that case we cannot guarantee that the instance will start. In case IPC objects were already left around, because, for instance, someone mercilessly killing Oracle processes, there is no way any more to find out which IPC objects should be removed. In that case, human intervention is necessary, and probably _all_ instances running as same user will have to be stopped. The third setting, "orauser", guarantees IPC objects removal, but it does that based only on IPC objects ownership, so you should use that only if every instance runs as separate user. Please report any problems. Suggestions/fixes welcome.
  clear_backupmode: The clear of the backup mode of ORACLE.
  shutdown_method: How to stop Oracle is a matter of taste it seems. The default method ("checkpoint/abort") is: alter system checkpoint; shutdown abort; This should be the fastest safe way bring the instance down. If you find "shutdown abort" distasteful, set this attribute to "immediate" in which case we will shutdown immediate; If you still think that there's even better way to shutdown an Oracle instance we are willing to listen.

I have then installed Oracle on server3.domain.com where is mounted /u01 filesystem. I have also copied /etc/oratab, /usr/local/bin/coraenv, /usr/local/bin/dbhome and /usr/local/bin/oraenv to server2.domain.com. This step is not mandatory but it will ease Oracle usage on both nodes.

I have also, obviously, changed the listener to make it listening on my virtual IP i.e. 192.168.56.99.

Create the oracle resource:

[root@server3 ~]# pcs resource create orcl oracle sid="orcl" --group=oracle
[root@server3 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Thu Apr 20 18:18:41 2017          Last change: Thu Apr 20 18:18:38 2017 by root via cibadmin on server3.domain.com

2 nodes and 4 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 Resource Group: oracle
     virtualip  (ocf::heartbeat:IPaddr2):       Started server3.domain.com
     vg01       (ocf::heartbeat:LVM):   Started server3.domain.com
     u01        (ocf::heartbeat:Filesystem):    Started server3.domain.com
     orcl       (ocf::heartbeat:oracle):        Started server3.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

You must create monitoring user (default is OCFMON) and profile (default is OCFMONPROFILE) or you will get below error message:

* orcl_start_0 on server2.domain.com 'unknown error' (1): call=268, status=complete, exitreason='monprofile must start with C## for container databases',
    last-rc-change='Fri Apr 21 15:17:06 2017', queued=0ms, exec=17138ms

Please note that container databases is also taken into account and the account must be created on container with C## option as a global account. I have chosen not to create the required profile but I must take it into account when creating the resource:

SQL> create user c##ocfmon identified by "secure_password";

User created.

SQL> grant connect to c##ocfmon;

Grant succeeded.

I create the Oracle database Pacemaker resource:

[root@server2 ~]# pcs resource update orcl monpassword="secure_password" monuser="c##ocfmon" monprofile="default"
[root@server2 ~]# pcs resource show orcl
 Resource: orcl (class=ocf provider=heartbeat type=oracle)
  Attributes: sid=orcl monpassword=secure_password monuser=c##ocfmon monprofile=default
  Operations: start interval=0s timeout=120 (orcl-start-interval-0s)
              stop interval=0s timeout=120 (orcl-stop-interval-0s)
              monitor interval=120 timeout=30 (orcl-monitor-interval-120)

To have my pluggable database automatically opened as instance startup I have used a nice 12cR2 new feature called pluggable database default state:

SQL> SELECT * FROM dba_pdb_saved_states;

no rows selected

SQL> set lines 150
SQL> col name for a20
SQL> SELECT name, open_mode FROM v$pdbs;

NAME                 OPEN_MODE
-------------------- ----------
PDB$SEED             READ ONLY
PDB1                 MOUNTED

SQL> alter pluggable database pdb1 open;

Pluggable database altered.

SQL> alter pluggable database pdb1 save state;

Pluggable database altered.

SQL> col con_name for a20
SQL> SELECT con_name, state FROM dba_pdb_saved_states;

CON_NAME             STATE
-------------------- --------------
PDB1                 OPEN

Oracle listener

Create the Oracle listener resource with:

[root@server3 ~]# pcs resource create listener_orcl oralsnr sid="orcl" listener="listener_orcl" --group=oracle
[root@server3 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Thu Apr 20 18:21:05 2017          Last change: Thu Apr 20 18:21:02 2017 by root via cibadmin on server3.domain.com

2 nodes and 5 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 Resource Group: oracle
     virtualip  (ocf::heartbeat:IPaddr2):       Started server3.domain.com
     vg01       (ocf::heartbeat:LVM):   Started server3.domain.com
     u01        (ocf::heartbeat:Filesystem):    Started server3.domain.com
     orcl       (ocf::heartbeat:oracle):        Started server3.domain.com
     listener_orcl      (ocf::heartbeat:oralsnr):       Started server3.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Pacemaker graphical interface

You can go on any node of your cluster in https on port 2224 and get a very nice graphical interface where you can do apparently all the required modification of your cluster. Including the stop/start of resources. Overall this graphical interface is of great help when you want to know which options are available for resources:

pcs01
pcs01
pcs02
pcs02

Issues encountered

LVM volume group creation

If for any reason you must re-create or simply create the LVM volume group once you have done the configuration to forbid kernel to activate any volume outside of the root one (vg00 in my case) you must use below trick to escape from all LVM error messages.

The error messages you will get are:

[root@server2 ~]# vgcreate vg01 /dev/sdb
  Physical volume "/dev/sdb" successfully created.
  Volume group "vg01" successfully created
[root@server2 ~]# lvcreate -L 500m -n lvol01 vg01
  Volume "vg01/lvol01" is not active locally.
  Aborting. Failed to wipe start of new LV.

Trying to activate the volume group is not changing anything:

[root@server2 ~]# vgchange -a y vg01
  0 logical volume(s) in volume group "vg01" now active

To overcome the problem use below sequence:

[root@server2 ~]# lvscan
  ACTIVE            '/dev/vg00/lvol00' [10.00 GiB] inherit
  ACTIVE            '/dev/vg00/lvol03' [500.00 MiB] inherit
  ACTIVE            '/dev/vg00/lvol01' [4.00 GiB] inherit
  ACTIVE            '/dev/vg00/lvol02' [4.00 GiB] inherit
  ACTIVE            '/dev/vg00/lvol20' [5.00 GiB] inherit
[root@server2 ~]# vgcreate vg01 /dev/sdb --addtag pacemaker --config 'activation { volume_list = [ "@pacemaker" ] }'
  Volume group "vg01" successfully created
[root@server2 ~]# lvcreate --addtag pacemaker -L 15g -n lvol01 vg01 --config 'activation { volume_list = [ "@pacemaker" ] }'
  Logical volume "lvol01" created.
[root@server2 ~]# lvs
  LV     VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvol00 vg00 -wi-ao----  10.00g
  lvol01 vg00 -wi-ao----   4.00g
  lvol02 vg00 -wi-ao----   4.00g
  lvol03 vg00 -wi-ao---- 500.00m
  lvol20 vg00 -wi-ao----   5.00g
  lvol01 vg01 -wi-a-----  15.00g
[root@server2 ~]# lvscan
  ACTIVE            '/dev/vg01/lvol01' [15.00 GiB] inherit
  ACTIVE            '/dev/vg00/lvol00' [10.00 GiB] inherit
  ACTIVE            '/dev/vg00/lvol03' [500.00 MiB] inherit
  ACTIVE            '/dev/vg00/lvol01' [4.00 GiB] inherit
  ACTIVE            '/dev/vg00/lvol02' [4.00 GiB] inherit
  ACTIVE            '/dev/vg00/lvol20' [5.00 GiB] inherit
[root@server2 ~]# lvchange -an vg01/lvol01 --deltag pacemaker
  Logical volume vg01/lvol01 changed.
[root@server2 ~]# vgchange -an vg01 --deltag pacemaker
  Volume group "vg01" successfully changed
  0 logical volume(s) in volume group "vg01" now active
[root@server2 ~]# pcs resource create vg01 LVM volgrpname=vg01 exclusive=true --group oracle
[root@server2 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: server3.domain.com (version 1.1.15-11.el7-e174ec8) - partition with quorum
Last updated: Thu Apr 20 17:16:37 2017          Last change: Thu Apr 20 17:16:34 2017 by root via cibadmin on server2.domain.com

2 nodes and 2 resources configured

Online: [ server2.domain.com server3.domain.com ]

Full list of resources:

 Resource Group: oracle
     virtualip  (ocf::heartbeat:IPaddr2):       Started server3.domain.com
     vg01       (ocf::heartbeat:LVM):   Started server3.domain.com

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Resources constraints

Display resources constraints with:

[root@server2 ~]# pcs constraint show --full
Location Constraints:
  Resource: oracle
    Enabled on: server3.domain.com (score:INFINITY) (role: Started) (id:cli-prefer-oracle)
  Resource: virtualip
    Enabled on: server3.domain.com (score:INFINITY) (role: Started) (id:cli-prefer-virtualip)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

If you want to remove location contraint (currently set to server3.domain.com):

[root@server2 ~]# pcs constraint location remove cli-prefer-oracle
[root@server2 ~]# pcs constraint show --full
Location Constraints:
  Resource: virtualip
    Enabled on: server3.domain.com (score:INFINITY) (role: Started) (id:cli-prefer-virtualip)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:
[root@server2 ~]# pcs constraint location remove cli-prefer-virtualip
[root@server2 ~]# pcs constraint show --full
Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

If for example you want to colocate two resources without creating a group use something like:

[root@server2 ~]# pcs constraint colocation set virtualip vg01
[root@server2 ~]# pcs constraint show
Location Constraints:
Ordering Constraints:
Colocation Constraints:
  Resource Sets:
    set virtualip vg01 setoptions score=INFINITY
Ticket Constraints:
[root@server2 ~]# pcs constraint colocation show --full
Colocation Constraints:
  Resource Sets:
    set virtualip vg01 (id:pcs_rsc_set_virtualip_vg01) setoptions score=INFINITY (id:pcs_rsc_colocation_set_virtualip_vg01)
[root@server2 ~]# pcs constraint remove pcs_rsc_colocation_set_virtualip_vg01

References

The post Pacemaker configuration for an Oracle database and its listener appeared first on IT World.

]]>
https://blog.yannickjaquier.com/linux/pacemaker-configuration-oracle-database.html/feed 2
Grub configuration to disable consistent network device naming in OEL 7 https://blog.yannickjaquier.com/linux/disable-consistent-network-device-naming.html https://blog.yannickjaquier.com/linux/disable-consistent-network-device-naming.html#respond Sat, 12 May 2018 08:55:21 +0000 http://blog.yannickjaquier.com/?p=4104 Preamble Starting with Red Hat Enterprise Linux 7 and so Oracle Enterprise Linux 7 (and maybe on many other linux distributions, at least Centos 7 for sure) the network interface names have been moved to something a little bit different from traditional eth[0,1,2,..]: [root@server3 ~]# ip addr 1: lo: mtu 65536 qdisc noqueue state UNKNOWN […]

The post Grub configuration to disable consistent network device naming in OEL 7 appeared first on IT World.

]]>
Preamble

Starting with Red Hat Enterprise Linux 7 and so Oracle Enterprise Linux 7 (and maybe on many other linux distributions, at least Centos 7 for sure) the network interface names have been moved to something a little bit different from traditional eth[0,1,2,..]:

[root@server3 ~]# ip addr
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s3:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:47:54:07 brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.102/24 brd 192.168.56.255 scope global enp0s3
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe47:5407/64 scope link
       valid_lft forever preferred_lft forever
3: enp0s8:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:fc:21:55 brd ff:ff:ff:ff:ff:ff
    inet 10.70.101.94/24 brd 10.70.101.255 scope global dynamic enp0s8
       valid_lft 3572sec preferred_lft 3572sec
    inet6 fe80::a00:27ff:fefc:2155/64 scope link
       valid_lft forever preferred_lft forever

The reason for this is clear from Red Hat official documentation:

In Red Hat Enterprise Linux 7, udev supports a number of different naming schemes. The default is to assign fixed names based on firmware, topology, and location information. This has the advantage that the names are fully automatic, fully predictable, that they stay fixed even if hardware is added or removed (no re-enumeration takes place), and that broken hardware can be replaced seamlessly. The disadvantage is that they are sometimes harder to read than the eth0 or wlan0 names traditionally used. For example: enp5s0.

How to come back to legacy situation ? You might want to do this not only because bad habits die hard but simply because you are configuring a cluster of servers (RAC, NoSQL, …) and want to be sure that the interconnect interface is called eth0 on all your nodes…

Grub configuration

This blog post has been written with a virtual machine running Oracle Linux Server release 7.3 and having two network interfaces: one for interconnect and one for internet access.

grub01
grub01

Edit /etc/default/grub file and at the end of GRUB_CMDLINE_LINUX variable value add:

net.ifnames=0 biosdevname=0

Examples:

  • GRUB_CMDLINE_LINUX=”crashkernel=auto rd.lvm.lv=vg00/lvol00 rd.lvm.lv=vg00/lvol01 rhgb quiet numa=off transparent_hugepage=never net.ifnames=0 biosdevname=0″
  • GRUB_CMDLINE_LINUX=”crashkernel=auto rd.lvm.lv=vg00/lvol00 rd.lvm.lv=vg00/lvol01 rhgb quiet net.ifnames=0 biosdevname=0″

Rebuild Grub configuration:

[root@server3 ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.1.12-61.1.25.el7uek.x86_64
Found initrd image: /boot/initramfs-4.1.12-61.1.25.el7uek.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-514.6.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-514.6.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-7e6fb04dc02343d0a54dccc3940ad366
Found initrd image: /boot/initramfs-0-rescue-7e6fb04dc02343d0a54dccc3940ad366.img
done

Copy network configuration interface files to new name:

[root@server3 grub2]# cd /etc/sysconfig/network-scripts/
[root@server3 network-scripts]# cp ifcfg-enp0s3 ifcfg-eth0
[root@server3 network-scripts]# cp ifcfg-enp0s8 ifcfg-eth1

Change values of NAME and DEVICE in both files:

[root@server3 network-scripts]# cat ifcfg-eth0
HWADDR=08:00:27:DC:FB:92
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV4_DNS_PRIORITY=100
IPV6INIT=yes
IPV6_AUTOCONF=no
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
IPV6_DNS_PRIORITY=100
NAME=eth0
UUID=eefd48d5-7810-4848-a1ce-9040938fb455
DEVICE=eth0
ONBOOT=yes
IPADDR=192.168.56.103
PREFIX=24
[root@server3 network-scripts]# cat ifcfg-eth1
TYPE=Ethernet
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=eth1
UUID=6b145311-798e-4927-8876-18d02570f386
DEVICE=eth1
ONBOOT=yes
PEERDNS=yes
PEERROUTES=yes

Disable network manager:

[root@server3 ~]# systemctl disable NetworkManager
Removed symlink /etc/systemd/system/multi-user.target.wants/NetworkManager.service.
Removed symlink /etc/systemd/system/dbus-org.freedesktop.NetworkManager.service.
Removed symlink /etc/systemd/system/dbus-org.freedesktop.nm-dispatcher.service.

Reboot server:

[root@server3 ~]# reboot

You should see something like:

[root@server3 ~]# ip addr
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:47:54:07 brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.102/24 brd 192.168.56.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe47:5407/64 scope link
       valid_lft forever preferred_lft forever
3: eth1:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:fc:21:55 brd ff:ff:ff:ff:ff:ff
    inet 10.70.101.94/24 brd 10.70.101.255 scope global dynamic eth1
       valid_lft 604794sec preferred_lft 604794sec
    inet6 fe80::a00:27ff:fefc:2155/64 scope link
       valid_lft forever preferred_lft forever

As we have modified the default grub configuration the change is resisting to a Kernel upgrade !! Welcome to old legacy network naming !

With the drawback that cool network tools are not working anymore:

[root@server3 ~]# nmcli
Error: NetworkManager is not running.
[root@server3 ~]# nmtui
NetworkManager is not running.

References

The post Grub configuration to disable consistent network device naming in OEL 7 appeared first on IT World.

]]>
https://blog.yannickjaquier.com/linux/disable-consistent-network-device-naming.html/feed 0