Table of contents
Preamble
We have a SAP WM instance with an infrastructure that is partially following the Oracle Maximum Availability Architecture (MAA). In our primary data center the configuration is made of two servers and a RAC deployment. We have also setup a Data Guard configuration (physical standby) in our disaster recovery site (around 10 Km far from main site). An enterprise architect teammate asked me the lag between the master RAC database and the physical standby one: simple and fair question isn’t it ?
The first annoying thing is that Data Guard Broker has not been configured so I will use only SQL commands but reaching conclusion with DG Broker is much simpler…
If you are here you know what is Data Guard, the official Oracle high level picture is below one. Dta Guard is not more than transferring redo from primary site to secondary site:
Data Guard apply lag
To measure the lag between primary and standby site I have used two methods with obvious idea of having same result. First method is using V$DATAGUARD_STATS view that is accessible even in mount state so on standby database:
SQL> SELECT * FROM v$dataguard_stats; NAME VALUE UNIT TIME_COMPUTED DATUM_TIME -------------------------------- ---------------------------------------------------------------- ------------------------------ ------------------------------ ------------------------------ transport LAG +00 00:00:00 DAY(2) TO SECOND(0) INTERVAL 05/02/2014 15:23:29 05/02/2014 15:23:28 apply LAG +00 00:22:55 DAY(2) TO SECOND(0) INTERVAL 05/02/2014 15:23:29 05/02/2014 15:23:28 apply finish TIME +00 00:00:02.764 DAY(2) TO SECOND(3) INTERVAL 05/02/2014 15:23:29 estimated startup TIME 17 SECOND 05/02/2014 15:23:29 |
Second method is using Oracle System Change Number (SCN), on standby database:
SQL> SELECT current_scn FROM v$database; CURRENT_SCN ----------- 6587402685 |
On primary database:
SQL> SELECT current_scn FROM v$database; CURRENT_SCN ----------- 6587738190 |
Then convert these two SCNs in date on primary database (query with SCN_TO_TIMESTAMP not possible on physical standby):
SQL> SELECT SCN_TO_TIMESTAMP(6587831530), SCN_TO_TIMESTAMP(6587402685) FROM dual; SCN_TO_TIMESTAMP(6587831530) SCN_TO_TIMESTAMP(6587402685) --------------------------------------------------------------------------- --------------------------------------------------------------------------- 02-MAY-14 03.23.33.000000000 PM 02-MAY-14 03.00.31.000000000 PM |
The two methods are showing same (bad) result (the small difference is explained by the time to do the different queries). So a bit disappointed as the lag is huge (23 minutes) and I have even seen it going up to more than 1 hour…
Remark:
The V$STANDBY_EVENT_HISTOGRAM view is empty if you are not using Active Data Guard paid option that allows you to perform read only query on your physical standby database. This is expected behavior…
My configuration is following best practices:
log_archive_config string DG_CONFIG=(WAL,WAL_standby) log_archive_dest_1 string LOCATION=+WAL_ARCH log_archive_dest_2 string SERVICE=WAL_standby LGWR ASYNC AFFIRM VALID_FOR=(ONLINE_LOGF ILES,PRIMARY_ROLE) DB_UNIQUE_N AME=WAL_standby delay=0 OPTION AL max_failure=15 reopen=300 r egister |
Remark:
LGWR is now deprecated and should be removed
I have standby redo log groups on primary database:
SQL> col MEMBER FOR a50 SQL> SELECT * FROM v$logfile ORDER BY TYPE,GROUP#; GROUP# STATUS TYPE MEMBER IS_ ---------- ------- ------- -------------------------------------------------- --- 11 ONLINE +WAL_OLOG/wal/onlinelog/group_11.257.810042773 NO 11 ONLINE +WAL_MLOG/wal/onlinelog/group_11.257.810042775 NO 12 ONLINE +WAL_OLOG/wal/onlinelog/group_12.258.810042775 NO 12 ONLINE +WAL_MLOG/wal/onlinelog/group_12.258.810042777 NO 13 ONLINE +WAL_OLOG/wal/onlinelog/group_13.259.810042779 NO 13 ONLINE +WAL_MLOG/wal/onlinelog/group_13.259.810042779 NO 14 ONLINE +WAL_OLOG/wal/onlinelog/group_14.260.810042781 NO 14 ONLINE +WAL_MLOG/wal/onlinelog/group_14.260.810042783 NO 21 ONLINE +WAL_OLOG/wal/onlinelog/group_21.261.810042803 NO 21 ONLINE +WAL_MLOG/wal/onlinelog/group_21.261.810042805 NO 22 ONLINE +WAL_OLOG/wal/onlinelog/group_22.262.810042807 NO 22 ONLINE +WAL_MLOG/wal/onlinelog/group_22.262.810042807 NO 23 ONLINE +WAL_OLOG/wal/onlinelog/group_23.263.810042809 NO 23 ONLINE +WAL_MLOG/wal/onlinelog/group_23.263.810042811 NO 24 ONLINE +WAL_OLOG/wal/onlinelog/group_24.264.810042813 NO 24 ONLINE +WAL_MLOG/wal/onlinelog/group_24.264.810042813 NO 71 STANDBY +WAL_OLOG/wal/onlinelog/group_71.265.844256955 NO 71 STANDBY +WAL_MLOG/wal/onlinelog/group_71.265.844256957 NO 72 STANDBY +WAL_OLOG/wal/onlinelog/group_72.266.844256957 NO 72 STANDBY +WAL_MLOG/wal/onlinelog/group_72.266.844256959 NO 73 STANDBY +WAL_OLOG/wal/onlinelog/group_73.267.844256961 NO 73 STANDBY +WAL_MLOG/wal/onlinelog/group_73.267.844256963 NO 74 STANDBY +WAL_OLOG/wal/onlinelog/group_74.268.844256965 NO 74 STANDBY +WAL_MLOG/wal/onlinelog/group_74.268.844256965 NO 81 STANDBY +WAL_OLOG/wal/onlinelog/group_81.269.844256969 NO 81 STANDBY +WAL_MLOG/wal/onlinelog/group_81.269.844256971 NO 82 STANDBY +WAL_OLOG/wal/onlinelog/group_82.270.844256973 NO 82 STANDBY +WAL_MLOG/wal/onlinelog/group_82.270.844256975 NO 83 STANDBY +WAL_OLOG/wal/onlinelog/group_83.271.844256975 NO 83 STANDBY +WAL_MLOG/wal/onlinelog/group_83.271.844256977 NO 84 STANDBY +WAL_OLOG/wal/onlinelog/group_84.272.844256979 NO 84 STANDBY +WAL_MLOG/wal/onlinelog/group_84.272.844256981 NO 32 ROWS selected. |
Started to investigate on redo generation:
SQL> ALTER SESSION SET nls_date_format='dd-mon-yyyy hh24:mi:ss'; SESSION altered. SQL> SET lines 200 SQL> SET pages 100 SQL> col metric_unit FOR a20 SQL> col metric_name FOR a25 SQL> SELECT * FROM (SELECT * FROM DBA_HIST_SYSMETRIC_HISTORY WHERE metric_name='Redo Generated Per Sec' ORDER BY snap_id DESC) WHERE rownum<=10; SNAP_ID DBID INSTANCE_NUMBER BEGIN_TIME END_TIME INTSIZE GROUP_ID METRIC_ID METRIC_NAME VALUE METRIC_UNIT ---------- ---------- --------------- -------------------------- -------------------------- ---------- ---------- ---------- ------------------------- ---------- -------------------- 9947 2087071167 2 02-may-2014 14:00:35 02-may-2014 14:01:35 6017 2 2016 Redo Generated Per Sec 344912.814 Bytes Per SECOND 9947 2087071167 2 02-may-2014 14:09:35 02-may-2014 14:10:35 6017 2 2016 Redo Generated Per Sec 185112.049 Bytes Per SECOND 9947 2087071167 2 02-may-2014 14:08:35 02-may-2014 14:09:35 6018 2 2016 Redo Generated Per Sec 131244.799 Bytes Per SECOND 9947 2087071167 2 02-may-2014 14:07:35 02-may-2014 14:08:35 5917 2 2016 Redo Generated Per Sec 281785.567 Bytes Per SECOND 9947 2087071167 2 02-may-2014 14:06:35 02-may-2014 14:07:35 6018 2 2016 Redo Generated Per Sec 153441.941 Bytes Per SECOND 9947 2087071167 2 02-may-2014 14:05:35 02-may-2014 14:06:35 6018 2 2016 Redo Generated Per Sec 154841.542 Bytes Per SECOND 9947 2087071167 2 02-may-2014 14:04:35 02-may-2014 14:05:35 6018 2 2016 Redo Generated Per Sec 105532.802 Bytes Per SECOND 9947 2087071167 2 02-may-2014 14:03:35 02-may-2014 14:04:35 6017 2 2016 Redo Generated Per Sec 217653.582 Bytes Per SECOND 9947 2087071167 2 02-may-2014 14:02:35 02-may-2014 14:03:35 6017 2 2016 Redo Generated Per Sec 251052.285 Bytes Per SECOND 9947 2087071167 2 02-may-2014 14:01:35 02-may-2014 14:02:35 5917 2 2016 Redo Generated Per Sec 209524.624 Bytes Per SECOND 10 ROWS selected. |
I’m not generating huge amount of redo log information (no mass update currently running) as I’m around 200-300 KB per second and the transport lag is not showing any gap…
Finally had a look to V$ARCHIVE_DEST_STATUS table and have seen, on primary:
SQL> SELECT dest_id,status,database_mode,recovery_mode FROM v$archive_dest_status WHERE status <> 'INACTIVE'; DEST_ID STATUS DATABASE_MODE RECOVERY_MODE ---------- --------- --------------- ----------------------- 1 VALID OPEN IDLE 2 VALID MOUNTED-STANDBY MANAGED |
On standby:
SQL> SELECT dest_id,status,database_mode,recovery_mode FROM v$archive_dest_status WHERE status <> 'INACTIVE'; DEST_ID STATUS DATABASE_MODE RECOVERY_MODE ---------- --------- --------------- ----------------------- 1 VALID MOUNTED-STANDBY MANAGED 2 VALID UNKNOWN IDLE 32 VALID UNKNOWN IDLE |
So in short all is ready for real time redo log apply but the feature has not been activated. So a final command is expected:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE; |
References
- Data Guard Physical Standby – Configuration Health Check (Doc ID 1581388.1)
- Data Guard transport lag in OEM 12c
- Data Guard Real-Time Apply FAQ (Doc ID 828274.1)
R RAVEENDRA says:
How we can apply the lag lag is huge (59 minutes)
Colin Dawson says:
Interesting article.
Following a SR query Oracle came back with the number of standby logs should be redo logs+1
Yannick Jaquier says:
Thanks for sharing !