Transparent Application Continuity (TAC) hands-on

Preamble

Transparent Application Continuity (TAC) is an extension of Application Continuity (AC) that ensure a transparent recover of database sessions (meeting outages criteria, see official documentation for the evolving list) following a planned or unplanned outage. Versus AC, TAC additionally supports applications that do not use a connection pool.

From the official documentation:

Transparent Application Continuity is a mode of Application Continuity beginning with Oracle Database 18c that transparently tracks and records session and transactional state so that a database session can be recovered following recoverable outages. This is done safely and with no need for a DBA to have any knowledge of the application or to make application code changes. Transparency is achieved by using a state-tracking infrastructure that categorizes session state usage as an application issues calls to the database. The use of state tracking future proofs applications using Transparent Application Continuity as applications or environments change.

This Oracle infographic is also summing-up things nicely. TAF stands for Transparent Application Failover. Draining is the action to smoothly remove sessions from an instance in case of planned downtime:

tac01
tac01

This feature, like when I have tested AC, is mainly used with Real Application Clusters (RAC) but it can also be used in case of a Data Guard environment. And following my article on Data Guard per pluggable database (http://) this is exactly in this high available environment that I’m planning to test the feature.

Testing has been done using Oracle Database 21c Enterprise Edition Release 21.15 running on a Red Hat Enterprise Linux release 8.7 (Ootpa) server with 12 cores and 64GB of memory. I have two container instances (orcl1 and orcl2) running on this server and I’m moving a pluggable database (pdb1) from one container instance to the other one using Data Guard per pluggable database feature. Those two container instances can be on different server and this is even more than suggested in a real production implementation…

Transparent Application Continuity (TAC) setup

Start by creating a TAC compatible service. The important option is FAILOVER_RESTORE and FAILOVER_TYPE that must be set to AUTO:

SQL> ALTER SESSION SET container=pdb1;
 
SESSION altered.
 
SQL>
DECLARE
  params dbms_service.svc_parameter_array;
BEGIN
  params('FAILOVER_RESTORE')    := 'AUTO';
  params('FAILOVER_TYPE')       := 'AUTO';
  params('FAILOVER_RETRIES')    := 3;
  params('DRAIN_TIMEOUT')       := 60;
  params('STOP_OPTION')         := DBMS_SERVICE.STOP_OPTION_IMMEDIATE;
  params('COMMIT_OUTCOME')      := 'TRUE';
  params('AQ_HA_NOTIFICATIONS') := 'TRUE';
  dbms_service.create_service(service_name => 'pdb1_tac', network_name => 'pdb1_tac', parameter_array => params);
END;
/
 
PL/SQL PROCEDURE successfully completed.
 
SQL> EXEC dbms_service.start_service(service_name => 'pdb1_tac');
 
PL/SQL PROCEDURE successfully completed.

I also create another service that is clearly not TAC compatible with:

SQL> ALTER SESSION SET container=pdb1;
 
SESSION altered.
 
SQL> EXEC dbms_service.create_service(service_name => 'pdb1_notac', network_name => 'pdb1_notac');
 
PL/SQL PROCEDURE successfully completed.
 
SQL> EXEC dbms_service.start_service(service_name => 'pdb1_notac');
 
PL/SQL PROCEDURE successfully completed.

To end up with, the PDB1 service in uppercase is automatically created by the pluggable database itself:

SQL> SET lines 200
SQL> col name FOR a15
SQL> SELECT name, failover_type, failover_retries, failover_delay, failover_restore FROM dba_services;
 
NAME            FAILOVER_TYPE                                                    FAILOVER_RETRIES FAILOVER_DELAY FAILOV
--------------- ---------------------------------------------------------------- ---------------- -------------- ------
PDB1
pdb1_tac        AUTO                                                                            3             10 AUTO
pdb1_notac                                                                                                       NONE

To test it modify tnsnames.ora of one of your database server Oracle Home and add the following entries (file located in /u01/app/oracle/homes/OraDB21Home1/network/admin or C:\Program Files\instantclient_23_5\network\admin for the case of Instant Client for me) for pdb1_tac and pdb1_no_tac services:

pdb1_tac =
  (DESCRIPTION =
    (CONNECT_TIMEOUT = 90)(RETRY_COUNT = 50)(RETRY_DELAY = 3)(TRANSPORT_CONNECT_TIMEOUT = 3 sec)
    (ADDRESS_LIST =
      (LOAD_BALANCE = ON)
      (ADDRESS = (PROTOCOL = TCP)(HOST = server01.domain.com)(PORT = 1531)))
    (ADDRESS_LIST =
      (LOAD_BALANCE = ON)
      (ADDRESS = (PROTOCOL = TCP)(HOST = server01.domain.com)(PORT = 1541)))
    (CONNECT_DATA=(SERVICE_NAME = pdb1_tac)))
 
pdb1_notac =
  (DESCRIPTION =
    (CONNECT_TIMEOUT = 90)(RETRY_COUNT = 50)(RETRY_DELAY = 3)(TRANSPORT_CONNECT_TIMEOUT = 3 sec)
    (ADDRESS_LIST =
      (LOAD_BALANCE = ON)
      (ADDRESS = (PROTOCOL = TCP)(HOST = server01.domain.com)(PORT = 1531)))
    (ADDRESS_LIST =
      (LOAD_BALANCE = ON)
      (ADDRESS = (PROTOCOL = TCP)(HOST = server01.domain.com)(PORT = 1541)))
    (CONNECT_DATA=(SERVICE_NAME = pdb1_notac)))

Ensure it works with:

[oracle@server01admin]$ tnsping pdb1_tac
 
TNS Ping Utility for Linux: Version 21.0.0.0.0 - Production on 02-SEP-2024 11:35:26
 
Copyright (c) 1997, 2021, Oracle.  All rights reserved.
 
Used parameter files:
/u01/app/oracle/homes/OraDB21Home1/network/admin/sqlnet.ora
 
 
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (CONNECT_TIMEOUT= 90)(RETRY_COUNT=50)(RETRY_DELAY=3)(TRANSPORT_CONNECT_TIMEOUT=3) (ADDRESS_LIST = (LOAD_BALANCE=on)
(ADDRESS = (PROTOCOL = TCP)(HOST=server01.domain.com)(PORT=1531))) (ADDRESS_LIST = (LOAD_BALANCE=on) (ADDRESS = (PROTOCOL = TCP)(HOST=server01.domain.com)(PORT=1541)))
(CONNECT_DATA=(SERVICE_NAME = pdb1_tac)))
OK (0 msec)

For testing purpose I also create a test table in my account:

SQL> CREATE TABLE yjaquier.test01(id NUMBER, descr VARCHAR2(20));
 
TABLE created.
 
SQL> INSERT INTO yjaquier.test01 VALUES(1, 'One');
 
1 ROW created.
 
SQL> COMMIT;
 
COMMIT complete.

Transparent Application Continuity (TAC) testing with SQL*Plus

TAC compatible service

I have initially tried to use the more modern SQL client SQLcl but I have not been able to initiate a failover compatible session even using something like (you have to shrink your TNS entry to fit it in a single line and run sqlcl has below):

PS C:\Program Files\sqlcl\bin> ./sql.exe yjaquier/secure_password@"(DESCRIPTION=(CONNECT_TIMEOUT = 90)(RETRY_COUNT = 50)(RETRY_DELAY = 3)(TRANSPORT_CONNECT_TIMEOUT = 3)(ADDRESS_LIST =  (LOAD_BALANCE = ON)
>> (ADDRESS = (PROTOCOL = TCP)(HOST = server01.domain.com)(PORT = 1531)))(ADDRESS_LIST =  (LOAD_BALANCE = ON)  (ADDRESS = (PROTOCOL = TCP)(HOST = server01.domain.com)(PORT = 1541)))(CONNECT_DATA=(SERVICE_NAME = pdb1_tac)))"
Picked up JAVA_TOOL_OPTIONS: -Duser.language=en
 
 
SQLcl: Release 24.2 Production on Tue Sep 03 11:59:53 2024
 
Copyright (c) 1982, 2024, Oracle.  All rights reserved.
 
Last Successful login time: Tue Sep 03 2024 11:59:54 +02:00
 
Connected to:
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.15.0.0.0
 
SQL> select
       module,
       service_name,
       to_char(logon_time, 'dd-mon-yyyy hh24:mi:ss') as logon_time,
       failover_type,
       failover_method,
       failed_over
     from v$session
     where username = 'YJAQUIER'
     order by osuser;
 
MODULE           SERVICE_NAME    LOGON_TIME              FAILOVER_TYPE    FAILOVER_METHOD    FAILED_OVER
________________ _______________ _______________________ ________________ __________________ ______________
SQLcl            pdb1_tac        03-sep-2024 11:59:54    NONE             NONE               NO

And yes checking in TAC supported client SQLcl is not listed… Same is also true apparently for SQL*Developer and there is a bug that has been filled for this (Transparent Application Continuity(TAC) Is Not Successful For SQL Developer Connections On Node Failover (Doc ID 2859802.1)).

So I have installed Oracle Instant Client 23.5 with the SQL*Plus package on my Windows desktop. Created a tnsnames.ora file in network/admin directory and added the above TNS entry and now the session is failover compatible:

The initial state is this one, notice the usage of pdb1_tac service that we created above:

PS C:\Program Files\instantclient_23_5> ./sqlplus.exe yjaquier/secure_password@pdb1_tac
 
SQL*Plus: RELEASE 23.0.0.0.0 - Production ON Tue Sep 3 12:57:14 2024
Version 23.5.0.24.07
 
Copyright (c) 1982, 2024, Oracle.  ALL rights reserved.
 
LAST SUCCESSFUL login TIME: Tue Sep 03 2024 12:01:37 +02:00
 
Connected TO:
Oracle DATABASE 21c Enterprise Edition RELEASE 21.0.0.0.0 - Production
Version 21.15.0.0.0
 
SQL> SET lines 200
SQL> SELECT
       module,
       service_name,
       TO_CHAR(logon_time, 'dd-mon-yyyy hh24:mi:ss') AS logon_time,
       failover_type,
       failover_method,
       failed_over
     FROM v$session
     WHERE username = 'YJAQUIER'
     ORDER BY osuser;
 
MODULE                                                           SERVICE_NAME                                                     LOGON_TIME                    FAILOVER_TYPE FAILOVER_M FAI
---------------------------------------------------------------- ---------------------------------------------------------------- ----------------------------- ------------- ---------- ---
SQL*Plus                                                         pdb1_tac                                                         03-sep-2024 14:38:14          AUTO          BASIC      NO
 
SQL> SELECT SYS_CONTEXT('USERENV','INSTANCE_NAME') AS instance FROM dual;
 
INSTANCE
--------------------------------------------------------------------------------
orcl1

Then I insert a row in my test table (I put on the time to show that commit is almost taking 30 seconds):

SQL> SET TIME ON
12:27:01 SQL> SELECT * FROM yjaquier.test01 ORDER BY id;
 
        ID DESCR
---------- --------------------
         1 One
 
12:27:05 SQL> INSERT INTO yjaquier.test01 VALUES(2, 'Two');
 
1 ROW created.

Before committing I initiate a switchover to other instance (orcl2) with “switchover to pluggable database pdb1 at orcl2” command and in meanwhile I commit in my session:

12:27:22 SQL> COMMIT;
 
COMMIT complete.
 
12:27:52 SQL> SELECT SYS_CONTEXT('USERENV','INSTANCE_NAME') AS instance FROM dual;
 
INSTANCE
--------------------------------------------------------------------------------
orcl2
 
12:29:45 SQL> SELECT
                module,
                service_name,
                TO_CHAR(logon_time, 'dd-mon-yyyy hh24:mi:ss') AS logon_time,
                failover_type,
                failover_method,
                failed_over
              FROM v$session
              WHERE username = 'YJAQUIER'
              ORDER BY osuser;
 
MODULE                                                           SERVICE_NAME                                                     LOGON_TIME                    FAILOVER_TYPE FAILOVER_M FAI
---------------------------------------------------------------- ---------------------------------------------------------------- ----------------------------- ------------- ---------- ---
SQL*Plus                                                         pdb1_tac                                                         24-sep-2024 12:27:51          AUTO          BASIC      YES

At the end what do we see ? Nothing ! And this is where the magic apply ! We notice that the commit has taken a fair amount of time while the pluggable database was changing role to the target instance but I did not encounter any error and the pluggable database switchover is transparent for the application ! We also see that the failed_over column of v$session view has changed its state to YES while initially it was NO.

If I redo the same exercise with pdb1_notac service and trying to switchover back my pluggable database to orcl1 instance the exercise is not going very far:

PS C:\Program Files\instantclient_23_5> ./sqlplus.exe yjaquier/secure_password@pdb1_notac
 
SQL*Plus: RELEASE 23.0.0.0.0 - Production ON Tue Sep 24 12:37:18 2024
Version 23.5.0.24.07
 
Copyright (c) 1982, 2024, Oracle.  ALL rights reserved.
 
LAST SUCCESSFUL login TIME: Tue Sep 24 2024 12:27:52 +02:00
 
Connected TO:
Oracle DATABASE 21c Enterprise Edition RELEASE 21.0.0.0.0 - Production
Version 21.15.0.0.0
 
SQL> SELECT SYS_CONTEXT('USERENV','INSTANCE_NAME') AS instance FROM dual;
 
INSTANCE
--------------------------------------------------------------------------------
orcl2
 
SQL> SELECT * FROM yjaquier.test01 ORDER BY id;
 
        ID DESCR
---------- --------------------
         1 One
         2 Two
 
SQL> SET lines 200
SQL> SELECT
  2                  module,
  3                  service_name,
  4                  TO_CHAR(logon_time, 'dd-mon-yyyy hh24:mi:ss') AS logon_time,
  5                  failover_type,
  6                  failover_method,
  7                  failed_over
  8                FROM v$session
  9                WHERE username = 'YJAQUIER'
 10                ORDER BY osuser;
 
MODULE                                                           SERVICE_NAME                                                     LOGON_TIME                    FAILOVER_TYPE FAILOVER_M FAI
---------------------------------------------------------------- ---------------------------------------------------------------- ----------------------------- ------------- ---------- ---
SQL*Plus                                                         pdb1_notac                                                       24-sep-2024 12:37:18          NONE          NONE       NO
 
SQL> INSERT INTO yjaquier.test01 VALUES(3, 'Three');
 
1 ROW created.
 
SQL> COMMIT;
COMMIT
     *
ERROR AT line 1:
ORA-03113: end-of-FILE ON communication channel
Process ID: 621685
SESSION ID: 399 Serial NUMBER: 20289
Help: https://docs.oracle.com/error-help/db/ora-03113/

Transparent Application Continuity (TAC) testing with Java

To develop this small application I have used IntelliJ IDEA 2024.2.2 (Community Edition). In your project remember to add the Oracle JDBC driver. I have decided to use the latest 23ai one so I added ojdbc11.jar library to my project:

tac02
tac02

The source code is the following. I have deliberately used a classic connection and not one coming from a pool to really see the difference between TAC and AC::

package tac01;
 
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Properties;
import java.sql.DatabaseMetaData;
import oracle.jdbc.pool.OracleDataSource;
import oracle.jdbc.OracleConnection;
import oracle.jdbc.replay.ReplayStatistics;
import oracle.jdbc.replay.ReplayableConnection;
 
public class tac01 {
  final static String DB_URL= "jdbc:oracle:thin:@(DESCRIPTION = (CONNECT_TIMEOUT = 90)(RETRY_COUNT = 50)(RETRY_DELAY = 3)(TRANSPORT_CONNECT_TIMEOUT = 3 sec)" +
          "(ADDRESS_LIST = (LOAD_BALANCE = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = server01.domain.com)(PORT = 1531)))(ADDRESS_LIST = (LOAD_BALANCE = ON)" +
          "(ADDRESS = (PROTOCOL = TCP)(HOST = server01.domain.com)(PORT = 1541)))(CONNECT_DATA=(SERVICE_NAME = pdb1_tac)))";
  final static String DB_USER = "yjaquier";
  final static String DB_PASSWORD = "secure_password";
 
  private static void display_replay_statistics(ReplayStatistics rs) {
    System.out.println("\nFailedReplayCount="+rs.getFailedReplayCount());
    System.out.println("ReplayDisablingCount="+rs.getReplayDisablingCount());
    System.out.println("SuccessfulReplayCount="+rs.getSuccessfulReplayCount());
    System.out.println("TotalCalls="+rs.getTotalCalls());
    System.out.println("TotalCallsAffectedByOutages="+rs.getTotalCallsAffectedByOutages());
    System.out.println("TotalCallsAffectedByOutagesDuringReplay="+ rs.getTotalCallsAffectedByOutagesDuringReplay());
    System.out.println("TotalCallsTriggeringReplay="+rs.getTotalCallsTriggeringReplay());
    System.out.println("TotalCompletedRequests="+rs.getTotalCompletedRequests());
    System.out.println("TotalProtectedCalls="+rs.getTotalProtectedCalls());
    System.out.println("TotalReplayAttempts="+rs.getTotalReplayAttempts());
    System.out.println("TotalRequests="+rs.getTotalRequests() + "\n");
    }
  public static void main(String[] args) throws SQLException {
    OracleDataSource ods = new OracleDataSource();
    Properties info = new Properties();
 
    Connection connection1 = null;
    Statement statement1 = null;
    ResultSet resultset1 = null;
 
    info.put(OracleConnection.CONNECTION_PROPERTY_USER_NAME, DB_USER);
    info.put(OracleConnection.CONNECTION_PROPERTY_PASSWORD, DB_PASSWORD);
    info.put(OracleConnection.CONNECTION_PROPERTY_DEFAULT_ROW_PREFETCH, "20");
 
    ods.setURL(DB_URL);
    ods.setConnectionProperties(info);
 
    connection1 = ods.getConnection();
 
    DatabaseMetaData dbmd = connection1.getMetaData();
    System.out.println("Driver Name: " + dbmd.getDriverName());
    System.out.println("Driver Version: " + dbmd.getDriverVersion());
 
    statement1 = connection1.createStatement();
    resultset1 = statement1.executeQuery("select sys_context('USERENV','INSTANCE_NAME') as instance from dual");
    while (resultset1.next()) {
      System.out.println("Working on instance " + resultset1.getString(1));
    }
 
    // To start fresh and to avoid ORA-41412: results changed during replay; failover cannot continue
    ((oracle.jdbc.replay.ReplayableConnection) connection1).endRequest(); // Explicit request end
    try {
      ((oracle.jdbc.replay.ReplayableConnection) connection1).beginRequest(); // Explicit request begin
      connection1.setAutoCommit(false);
      statement1 = connection1.createStatement();
 
      // Inserting one row in my test table with no commit
      System.out.println("\nInserting one row in my test table with no commit...");
      resultset1 = statement1.executeQuery("insert into yjaquier.test01 values(2, 'Two')");
 
      System.out.println("\nYou have now 30 seconds to initiate the switchover of your pluggable test database...");
      Thread.sleep(30000);
 
      System.out.println("\nNow trying to commit at " + LocalDateTime.now().format(DateTimeFormatter.ofPattern("dd-MM-yyyy HH:mm:ss")));
      connection1.commit();
      System.out.println("\nCommit executed it is now " + LocalDateTime.now().format(DateTimeFormatter.ofPattern("dd-MM-yyyy HH:mm:ss")));
      ((oracle.jdbc.replay.ReplayableConnection) connection1).endRequest(); // Explicit request end
    }
    catch (Exception e)
    {
      //Transaction is not recoverable ?
      System.out.println("Exception detected:");
      e.printStackTrace();
      display_replay_statistics(((oracle.jdbc.replay.ReplayableConnection) connection1).getReplayStatistics(ReplayableConnection.StatisticsReportType.FOR_CURRENT_CONNECTION));
      e = ((SQLException) e).getNextException();
      e.printStackTrace();
      resultset1.close();
      statement1.close();
      connection1.close();
      System.exit(1);
    }
    // Transaction has been recovered
    statement1 = connection1.createStatement();
    resultset1 = statement1.executeQuery("select sys_context('USERENV','INSTANCE_NAME') as instance from dual");
    while (resultset1.next()) {
      System.out.println("\nWorking on instance " + resultset1.getString(1));
    }
    display_replay_statistics(((oracle.jdbc.replay.ReplayableConnection) connection1).getReplayStatistics(ReplayableConnection.StatisticsReportType.FOR_CURRENT_CONNECTION));
    connection1.close();
    resultset1.close();
    statement1.close();
    System.out.println("\nNormal exit");
  }
}

And the execution workflow is the same as the one using SQL*Plus:

tac03
tac03

As you can see I start with my pluggable database running on orcl1 instance. Then I insert a row and pausing my application for 30 seconds to allow time to perform a pluggable database switchover (switchover to pluggable database pdb1 at orcl2). Then after I commit and absolutely no exception. At the end I display connection statistics and we can see the number of replayable connection equal to 1.

Errors encountered

Service not automatically started

I noticed when switching over the pluggable database that TAC service was not automatically started and when I started it manually (not production ready obviously) with:

SQL> EXEC dbms_service.start_service('pdb1_tac');
 
PL/SQL PROCEDURE successfully completed.

I got below error because I did it too slowly:

SQL> COMMIT;
 
ERROR:
ORA-25415: Application Continuity replay initiation timeout exceeded.

Ensure service is automatically started, in due time, with:

SQL> ALTER pluggable DATABASE pdb1 save state;
 
Pluggable DATABASE altered.
 
SQL> SET lines 200
SQL> col con_name FOR a15
SQL> col instance_name FOR a15
SQL> SELECT * FROM dba_pdb_saved_states;
 
    CON_ID CON_NAME        INSTANCE_NAME      CON_UID GUID                             STATE          RES
---------- --------------- --------------- ---------- -------------------------------- -------------- ---
         3 PDB1            orcl1           1596617114 22CB7A846E56C8C3E063402B4B0A72C2 OPEN           NO

The very bad news is that you have to issue this command on both container instance (orcl1 and orcl2 for me) to have the service automatically started when switching over or failing over. Quite annoying. Maybe a more elegant solution is to create an “after startup on database” trigger like describe in My Oracle Support note 1429223.1 (see references section).

Remark:
Strangely the row in dba_pdb_saved_states disappears once the database has been added to Data Guard Broker configuration…

References

About Post Author

Share the knowledge!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>