Data Recovery Advisor (DRA) and Health Monitor by examples

Preamble

Have you ever sweat when having a restore/recover to perform ? If you are a production DBA surely yes, this is always a tricky situation and having something to support you during this phase is more than welcome. Oracle 11g brings a new tool for DBA called Data Recovery Advisor (DRA).

The tool is available in Database Control/Grid Control as well as command line. I will mainly use command line as who can do more can do less…

Data Recovery Advisor that can be seconded by health monitor to have proactive tests. It could avoid users calling you to notify the error which is comfortable for ones aiming to do things pro actively…

Test has be done on Red Hat Enterprise Linux Server release 5.5 (Tikanga) and Oracle database 11.2.0.2.0.

Missing datafile

To test deletion of a datafile let’s first create a test table and tablespace with something like:

SQL> CREATE smallfile TABLESPACE testtbs datafile '/oracle/data01/test/testtbs01.dbf' SIZE 100m reuse logging
     extent management local SEGMENT SPACE management auto;
 
TABLESPACE created.
 
SQL> CREATE TABLE yjaquier.test1 TABLESPACE testtbs AS SELECT object_id FROM dba_objects;
 
TABLE created.
 
SQL> SELECT * FROM yjaquier.test1 WHERE rownum<=5;
 
 OBJECT_ID
----------
        20
        46
        28
        15
        29

Then perform a full database backup with RMAN following script:

run
{
backup database include current controlfile tag 'FULL_BACKUP';
backup archivelog all delete all input tag 'FULL_BACKUP';
}

Delete the datafile behind the TESTTBS tablespace we just created:

[oracle@server1 ~]# rm /oracle/data01/test/testtbs01.dbf

And select your test table:

SQL> SELECT COUNT(*) FROM yjaquier.test1;
 
  COUNT(*)
----------
     60820
 
SQL> ALTER SYSTEM flush buffer_cache;
 
SYSTEM altered.
 
SQL> SELECT COUNT(*) FROM yjaquier.test1;
SELECT COUNT(*) FROM yjaquier.test1
*
ERROR AT line 1:
ORA-01116: error IN opening DATABASE FILE 6
ORA-01110: data FILE 6: '/oracle/data01/test/testtbs01.dbf'
ORA-27041: unable TO OPEN FILE
Linux-x86_64 Error: 2: No such FILE OR DIRECTORY
Additional information: 3

Remark:
You may still not see it at Oracle level but if you check alert log file you will see that Oracle with reactive health monitor has detected the error:

2011-07-22 10:20:03.754000 +02:00
Checker run found 1 new persistent data failures
2011-07-22 10:21:59.685000 +02:00
Errors in file /oracle/dump/test/diag/rdbms/test/test/trace/test_m000_9996.trc:
ORA-01116: error in opening database file 6
ORA-01110: data file 6: '/oracle/data01/test/testtbs01.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

Remark:
The flush of buffer cache may help you to see the error in your SQL session but nothing guaranteed. On Unix/Linux when you use a file the OS allocates inodes (stored in inode tables) and does not directly link to filename. Means that when you delete a file, inodes are still there and only the link to that file is broken. You can then recover data if file is still open using pseudo filesystem /proc (see below).

If you go to RMAN you see the failure listed:

RMAN> list failure;
 
List of Database Failures
=========================
 
Failure ID Priority Status    Time Detected        Summary
---------- -------- --------- -------------------- -------
67327      HIGH     OPEN      22-jul-2011 10:20:03 One or more non-system datafiles are missing

Ask RMAN to suggest you what to do with:

RMAN> advise failure;
 
List of Database Failures
=========================
 
Failure ID Priority Status    Time Detected        Summary
---------- -------- --------- -------------------- -------
67327      HIGH     OPEN      22-jul-2011 10:20:03 One or more non-system datafiles are missing
 
analyzing automatic repair options; this may take some time
using channel ORA_DISK_1
analyzing automatic repair options complete
 
Mandatory Manual Actions
========================
no manual actions available
 
Optional Manual Actions
=======================
1. If file /oracle/data01/test/testtbs01.dbf was unintentionally renamed or moved, restore it
 
Automated Repair Options
========================
Option Repair Description
------ ------------------
1      Restore and recover datafile 6
  Strategy: The repair includes complete media recovery with no data loss
  Repair script: /oracle/dump/test/diag/rdbms/test/test/hm/reco_2213659725.hm

In our case the file was not renamed or moved so let’s have a look to the repair script:

RMAN> host 'cat /oracle/dump/test/diag/rdbms/test/test/hm/reco_2213659725.hm';
 
   # restore and recover datafile
   sql 'alter database datafile 6 offline';
   restore datafile 6;
   recover datafile 6;
   sql 'alter database datafile 6 online';
host command complete

You can know use this script, as it is or not, to repair the error and manually close the failure or directly use REPAIR FAILURE that will perform exactly what’s suggested and automatically close the failure (in same RMAN session as LIST FAILURE command).

RMAN> repair failure;
 
Strategy: The repair includes complete media recovery with no data loss
Repair script: /oracle/dump/test/diag/rdbms/test/test/hm/reco_2213659725.hm
 
contents of repair script:
   # restore and recover datafile
   sql 'alter database datafile 6 offline';
   restore datafile 6;
   recover datafile 6;
   sql 'alter database datafile 6 online';
 
Do you really want to execute the above repair (enter YES or NO)? YES
executing repair script
 
sql statement: alter database datafile 6 offline
 
Starting restore at 22-jul-2011 12:46:44
using channel ORA_DISK_1
 
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00006 to /oracle/data01/test/testtbs01.dbf
channel ORA_DISK_1: reading from backup piece /oracle/fast_recovery_area/TEST/backupset/2011_07_21/o1_mf_nnndf_20110721_175312_72jlw3vm_.bkp
channel ORA_DISK_1: piece handle=/oracle/fast_recovery_area/TEST/backupset/2011_07_21/o1_mf_nnndf_20110721_175312_72jlw3vm_.bkp tag=20110721_175312
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07
Finished restore at 22-jul-2011 12:46:51
 
Starting recover at 22-jul-2011 12:46:51
using channel ORA_DISK_1
 
starting media recovery
 
archived log for thread 1 with sequence 13 is already on disk as file /oracle/fast_recovery_area/TEST/archivelog/2011_07_21/o1_mf_1_13_72k1byoz_.arc
archived log for thread 1 with sequence 14 is already on disk as file /oracle/fast_recovery_area/TEST/archivelog/2011_07_21/o1_mf_1_14_72k1f72m_.arc
archived log for thread 1 with sequence 15 is already on disk as file /oracle/fast_recovery_area/TEST/archivelog/2011_07_22/o1_mf_1_15_72kbc96g_.arc
archived log for thread 1 with sequence 16 is already on disk as file /oracle/fast_recovery_area/TEST/archivelog/2011_07_22/o1_mf_1_16_72kszc40_.arc
archived log for thread 1 with sequence 17 is already on disk as file /oracle/fast_recovery_area/TEST/archivelog/2011_07_22/o1_mf_1_17_72lcjmm1_.arc
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=12
channel ORA_DISK_1: reading from backup piece /oracle/fast_recovery_area/TEST/backupset/2011_07_21/o1_mf_annnn_20110721_175312_72jmky9m_.bkp
channel ORA_DISK_1: piece handle=/oracle/fast_recovery_area/TEST/backupset/2011_07_21/o1_mf_annnn_20110721_175312_72jmky9m_.bkp tag=20110721_175312
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
archived log file name=/oracle/fast_recovery_area/TEST/archivelog/2011_07_22/o1_mf_1_12_72lo8xcv_.arc thread=1 sequence=12
channel default: deleting archived log(s)
archived log file name=/oracle/fast_recovery_area/TEST/archivelog/2011_07_22/o1_mf_1_12_72lo8xcv_.arc RECID=966 STAMP=757169214
archived log file name=/oracle/fast_recovery_area/TEST/archivelog/2011_07_21/o1_mf_1_13_72k1byoz_.arc thread=1 sequence=13
archived log file name=/oracle/fast_recovery_area/TEST/archivelog/2011_07_21/o1_mf_1_14_72k1f72m_.arc thread=1 sequence=14
archived log file name=/oracle/fast_recovery_area/TEST/archivelog/2011_07_22/o1_mf_1_15_72kbc96g_.arc thread=1 sequence=15
media recovery complete, elapsed time: 00:00:04
Finished recover at 22-jul-2011 12:47:00
 
sql statement: alter database datafile 6 online
repair failure complete

And your test table is back to life:

SQL> SELECT * FROM yjaquier.test1 WHERE rownum<=5;
 
 OBJECT_ID
----------
        20
        46
        28
        15
        29

Missing control files

Should works the same as missing datafiles but here it’s a bit different. Delete all your controlfiles (unless a copy of a remaining one when database is down solve the issue) and issue a switch logfile to generate a write in controlfiles. Then you expect to see an error in RMAN but nothing even if VALIDATE DATABASE see that controlfiles are missing:

RMAN> list failure;
 
no failures found that match specification
 
RMAN> validate database;
 
Starting validate at 25-JUL-11
using channel ORA_DISK_1
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00004 name=/oracle/data01/test/users01.dbf
input datafile file number=00003 name=/oracle/rbs/test/undotbs01.dbf
input datafile file number=00001 name=/oracle/sys/test/system01.dbf
input datafile file number=00002 name=/oracle/sys/test/sysaux01.dbf
input datafile file number=00005 name=/oracle/data01/test/users02.dbf
input datafile file number=00006 name=/oracle/data01/test/testtbs01.dbf
channel ORA_DISK_1: validation complete, elapsed time: 00:05:05
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              12841        72960           25053958
  File Name: /oracle/sys/test/system01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              47093
  Index      0              9943
  Other      0              3083
 
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              14661        69120           25053963
  File Name: /oracle/sys/test/sysaux01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              19865
  Index      0              21742
  Other      0              12852
 
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              1            167680          25053963
  File Name: /oracle/rbs/test/undotbs01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              167679
 
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    OK     0              1            1024000         22976000
  File Name: /oracle/data01/test/users01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              1013550
  Index      0              493
  Other      0              9956
 
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    OK     0              95769        1024000         22975999
  File Name: /oracle/data01/test/users02.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              917472
  Index      0              0
  Other      0              10759
 
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
6    OK     0              4            12801           24040671
  File Name: /oracle/data01/test/testtbs01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              186
  Index      0              0
  Other      0              12610
 
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of validate command on ORA_DISK_1 channel at 07/25/2011 16:20:20
ORA-19625: error identifying file /oracle/ctrl/test/control01.ctl
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
 
RMAN> list failure;
 
no failures found that match specification

If you exit from RMAN session you cant then not initiate it again:

[oracle@server1 ~]$ rman target / catalog rman/rman@rmanrepository
 
Recovery Manager: Release 11.2.0.2.0 - Production on Mon Jul 25 16:44:18 2011
 
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
 
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-06003: ORACLE error from target database:
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/oracle/ctrl/test/control01.ctl'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

And you are obliged to abort your instance:

SQL> shutdown IMMEDIATE
ORA-00210: cannot OPEN the specified control FILE
ORA-00202: control FILE: '/oracle/ctrl/test/control01.ctl'
ORA-27041: unable TO OPEN FILE
Linux-x86_64 Error: 2: No such FILE OR DIRECTORY
Additional information: 3
SQL> shutdown abort
ORACLE instance shut down.

Then you can re-issue your RMAN connection and start recovery:

RMAN> startup nomount
 
Oracle instance started
 
Total System Global Area    1068937216 bytes
 
Fixed Size                     2233336 bytes
Variable Size                801115144 bytes
Database Buffers             255852544 bytes
Redo Buffers                   9736192 bytes
 
RMAN> list failure;
 
no failures found that match specification
 
RMAN> alter database mount;
 
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of alter db command at 07/25/2011 16:47:36
ORA-00205: error in identifying control file, check alert log for more info
 
RMAN> list failure;
 
List of Database Failures
=========================
 
Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
68646      CRITICAL OPEN      25-JUL-11     Control file /oracle/ctrl/test/control02.ctl is missing
68643      CRITICAL OPEN      25-JUL-11     Control file /oracle/ctrl/test/control01.ctl is missing

Then complete recover is execution of:

RMAN> repair failure;
RMAN> recover database;
RMAN> sql 'alter database open resetlogs';

Remark:
As a new incarnation of your database has been created a new full backup is strongly suggested.

Health Monitor

Introduced with 11g health monitor checks various component of the database:

SQL> SET lines 200
SQL> SET pages 100
SQL> col description FOR a50
SQL> SELECT name, cls_name, description FROM v$hm_check;
 
NAME                                                             CLS_NAME        DESCRIPTION
---------------------------------------------------------------- --------------- --------------------------------------------------
HM Test CHECK                                                    GENERIC         CHECK FOR health monitor functionality
DB Structure Integrity CHECK                                     PERSISTENT_DATA Checks integrity OF ALL DATABASE files
CF Block Integrity CHECK                                         PERSISTENT_DATA Checks integrity OF a control FILE block
Data Block Integrity CHECK                                       PERSISTENT_DATA Checks integrity OF a data FILE block
Redo Integrity CHECK                                             PERSISTENT_DATA Checks integrity OF redo LOG content
Logical Block CHECK                                              PERSISTENT_DATA Checks logical content OF a block
TRANSACTION Integrity CHECK                                      PERSISTENT_DATA Checks a TRANSACTION FOR corruptions
Undo SEGMENT Integrity CHECK                                     PERSISTENT_DATA Checks integrity OF an undo SEGMENT
No Mount CF CHECK                                                PERSISTENT_DATA Checks control FILE IN NOMOUNT MODE
Mount CF CHECK                                                   PERSISTENT_DATA Checks control FILE IN mount MODE
CF Member CHECK                                                  PERSISTENT_DATA Checks a multiplexed copy OF the control FILE
ALL Datafiles CHECK                                              PERSISTENT_DATA Checks ALL datafiles IN the DATABASE
Single Datafile CHECK                                            PERSISTENT_DATA Checks a data FILE
TABLESPACE CHECK CHECK                                           PERSISTENT_DATA Checks a TABLESPACE
LOG GROUP CHECK                                                  PERSISTENT_DATA Checks ALL members OF a LOG GROUP
LOG GROUP Member CHECK                                           PERSISTENT_DATA Checks a particular member OF a LOG GROUP
Archived LOG CHECK                                               PERSISTENT_DATA Checks an archived LOG
Redo Revalidation CHECK                                          PERSISTENT_DATA Checks redo LOG content
IO Revalidation CHECK                                            PERSISTENT_DATA Checks FILE accessibility
Block IO Revalidation CHECK                                      PERSISTENT_DATA Checks FILE accessibility
Txn Revalidation CHECK                                           PERSISTENT_DATA Revalidate corrupted TRANSACTION
Failure Simulation CHECK                                         PERSISTENT_DATA Creates dummy failures
Dictionary Integrity CHECK                                       PERSISTENT_DATA Checks dictionary integrity
ASM Mount CHECK                                                  ASM             Diagnose mount failure
ASM Allocation CHECK                                             ASM             Diagnose allocation failure
ASM Disk Visibility CHECK                                        ASM             Diagnose ADD disk failure
ASM FILE Busy CHECK                                              ASM             Diagnose FILE DROP failure
 
27 ROWS selected.

If we take the example of the missing database chapter:

SQL> SET lines 200
SQL> SET pages 100
SQL> SELECT * FROM (SELECT name, run_mode, start_time, status FROM v$hm_run WHERE check_name='DB Structure Integrity Check' ORDER BY run_id DESC)
     WHERE rownum<=15;
 
    RUN_ID NAME                             RUN_MODE START_TIME                                                                  STATUS
---------- -------------------------------- -------- --------------------------------------------------------------------------- -----------
     67506 HM_RUN_67506                     REACTIVE 22-JUL-11 11.20.08.051228 AM                                                COMPLETED
     67486 HM_RUN_67486                     REACTIVE 22-JUL-11 11.10.19.835735 AM                                                COMPLETED
     67466 HM_RUN_67466                     REACTIVE 22-JUL-11 11.00.19.011142 AM                                                COMPLETED
     67446 HM_RUN_67446                     REACTIVE 22-JUL-11 10.50.18.307013 AM                                                COMPLETED
     67426 HM_RUN_67426                     REACTIVE 22-JUL-11 10.46.40.629067 AM                                                COMPLETED
     67406 HM_RUN_67406                     REACTIVE 22-JUL-11 10.45.21.330512 AM                                                COMPLETED
     67386 HM_RUN_67386                     REACTIVE 22-JUL-11 10.44.14.952355 AM                                                COMPLETED
     67366 HM_RUN_67366                     REACTIVE 22-JUL-11 10.40.17.691822 AM                                                COMPLETED
     67346 HM_RUN_67346                     REACTIVE 22-JUL-11 10.30.16.862629 AM                                                COMPLETED
     67326 HM_RUN_67326                     REACTIVE 22-JUL-11 10.20.03.262136 AM                                                COMPLETED
     66966 HM_RUN_66966                     REACTIVE 20-JUL-11 05.18.15.527833 PM                                                COMPLETED
     66946 HM_RUN_66946                     REACTIVE 20-JUL-11 05.17.04.714570 PM                                                COMPLETED
     66926 HM_RUN_66926                     REACTIVE 20-JUL-11 04.31.06.729551 PM                                                COMPLETED
     66906 HM_RUN_66906                     REACTIVE 20-JUL-11 12.07.09.880817 PM                                                COMPLETED
     66866 HM_RUN_66866                     REACTIVE 20-JUL-11 12.05.46.513194 PM                                                COMPLETED
 
15 ROWS selected.

You can have the details of one particular run with:

SQL> SELECT finding_id, name, time_detected, priority, status, TYPE, description, damage_description FROM v$hm_finding WHERE run_id=67326;
 
FINDING_ID NAME                             TIME_DETECTED                                                               PRIORITY STATUS       TYPE
---------- -------------------------------- --------------------------------------------------------------------------- -------- ------------ -------------
DESCRIPTION
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DAMAGE_DESCRIPTION
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     67330 Missing datafile                 22-JUL-11 10.20.03.700874 AM                                                HIGH     CLOSED       FAILURE
Datafile 6: '/oracle/data01/test/testtbs01.dbf' IS missing
Some objects IN TABLESPACE TESTTBS might be unavailable

As I generated recommendations with RMAN ADVISE FAILURE commands there is no link in below table but you can display results of ADVISE FAILURE RMAN commands with (run_id is equal to 0):

SQL> SELECT recommendation_id, name, TYPE, status, description, repair_script FROM v$hm_recommendation WHERE fdg_id=67746 ORDER BY recommendation_id DESC;
 
RECOMMENDATION_ID NAME                             TYPE    STATUS
----------------- -------------------------------- ------- -------
DESCRIPTION
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
REPAIR_SCRIPT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            67749 RESTORE FILE                     MANUAL  NOT RUN
IF FILE /oracle/data01/test/testtbs01.dbf was unintentionally renamed OR moved, restore it
 
 
            67747 NO DATA LOSS                     REPAIR  NOT RUN
The repair includes complete media recovery WITH no data loss
/oracle/DUMP/test/diag/rdbms/test/test/hm/reco_2213659725.hm

You can generate manual check with Grid Control/Database Control and DBMS_HM PL/SQL package. Manual run are limited to below list of checks:

SQL> SET lines 200
SQL> col description FOR a50
SQL> SELECT name, cls_name, description FROM v$hm_check WHERE internal_check='N';
 
NAME                                                             CLS_NAME        DESCRIPTION
---------------------------------------------------------------- --------------- --------------------------------------------------
DB Structure Integrity CHECK                                     PERSISTENT_DATA Checks integrity OF ALL DATABASE files
CF Block Integrity CHECK                                         PERSISTENT_DATA Checks integrity OF a control FILE block
Data Block Integrity CHECK                                       PERSISTENT_DATA Checks integrity OF a data FILE block
Redo Integrity CHECK                                             PERSISTENT_DATA Checks integrity OF redo LOG content
TRANSACTION Integrity CHECK                                      PERSISTENT_DATA Checks a TRANSACTION FOR corruptions
Undo SEGMENT Integrity CHECK                                     PERSISTENT_DATA Checks integrity OF an undo SEGMENT
Dictionary Integrity CHECK                                       PERSISTENT_DATA Checks dictionary integrity
ASM Allocation CHECK                                             ASM             Diagnose allocation failure
 
8 ROWS selected.

Let’s execute the classical database structure check:

SQL> EXEC dbms_hm.run_check('DB Structure Integrity Check','Test_check_1');
 
PL/SQL PROCEDURE successfully completed.
 
SQL> SET LONG 100000
SQL> SET pages 100
SQL> SELECT dbms_hm.GET_RUN_REPORT('Test_check_1','TEXT','DETAIL') FROM dual;
 
DBMS_HM.GET_RUN_REPORT('TEST_CHECK_1','TEXT','DETAIL')
--------------------------------------------------------------------------------
Basic Run Information
 Run Name                     : Test_check_1
 Run Id                       : 68302
 CHECK Name                   : DB Structure Integrity CHECK
 MODE                         : MANUAL
 Status                       : COMPLETED
 START TIME                   : 2011-07-25 15:29:15.959688 +02:00
 END TIME                     : 2011-07-25 15:29:16.147916 +02:00
 Error Encountered            : 0
 Source Incident Id           : 0
 NUMBER OF Incidents Created  : 0
 
Input Paramters FOR the Run
Run Findings AND Recommendations

You can also see them with ADR Command Interpreter (ADRCI) tool:

adrci> show report HM_RUN Test_check_1
<?xml version="1.0" encoding="US-ASCII"?>
<HM-REPORT REPORT_ID="Test_check_1">
    <TITLE>HM Report: Test_check_1</TITLE>
    <RUN_INFO>
        <CHECK_NAME>DB Structure Integrity Check</CHECK_NAME>
        <RUN_ID>68302</RUN_ID>
        <RUN_NAME>Test_check_1</RUN_NAME>
        <RUN_MODE>MANUAL</RUN_MODE>
        <RUN_STATUS>COMPLETED</RUN_STATUS>
        <RUN_ERROR_NUM>0</RUN_ERROR_NUM>
        <SOURCE_INCIDENT_ID>0</SOURCE_INCIDENT_ID>
        <NUM_INCIDENTS_CREATED>0</NUM_INCIDENTS_CREATED>
        <RUN_START_TIME>2011-07-25 15:29:15.959688 +02:00</RUN_START_TIME>
        <RUN_END_TIME>2011-07-25 15:29:16.147916 +02:00</RUN_END_TIME>
    </RUN_INFO>
    <RUN_PARAMETERS/>
    <RUN-FINDINGS/>
</HM-REPORT>

Recover deleted files on Linux

This may work on other Unix/Linux than RedHat…

If your instance is not aborting for an internal error with something like:

2011-07-22 15:26:42.907000 +02:00
Errors in file /oracle/dump/test/diag/rdbms/test/test/trace/test_dbw1_3844.trc:
ORA-01116: error in opening database file 6
ORA-01110: data file 6: '/oracle/data01/test/testtbs01.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
System state dump requested by (instance=1, osid=3844 (DBW1)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/dump/test/diag/rdbms/test/test/trace/test_diag_3834.trc
DBW1 (ospid: 3844): terminating the instance due to error 1116
2011-07-22 15:26:44.983000 +02:00
Dumping diagnostic data in directory=[cdmp_20110722152642], requested by (instance=1, osid=3844 (DBW1)), summary=[abnormal instance termination].
Instance terminated by DBW1, pid = 3844

You may recover the missing datafile with following procedure (if database aborted or restarted you must recover the datafile !):

Locate the file descriptor of the database writer process (use lsof for Unix):

[root@server1 ~]# ps -ef | grep dbw | grep test
oracle   25097     1  0 15:37 ?        00:00:00 ora_dbw0_test
oracle   25099     1  0 15:37 ?        00:00:00 ora_dbw1_test
oracle   25101     1  0 15:37 ?        00:00:00 ora_dbw2_test
oracle   25103     1  0 15:37 ?        00:00:00 ora_dbw3_test
[root@server1 ~]# ls -l /proc/25097/fd | grep testtbs01.dbf
lrwx------ 1 oracle dba 64 Jul 22 16:01 258 -> /oracle/data01/test/testtbs01.dbf (deleted)
[root@server1 ~]# lsof -p 25097 | grep testtbs01.dbf
oracle  25097 oracle  258uW  REG   253,12  104865792     147460 /oracle/data01/test/testtbs01.dbf (deleted)
[root@server1 ~]# ls -l /proc/25097/fd/258
lrwx------ 1 oracle dba 64 Jul 22 16:01 /proc/25097/fd/258 -> /oracle/data01/test/testtbs01.dbf (deleted)

Create a symbolic link to be able to put the tablespace in read only mode:

[oracle@server1 ~]$ ln -s /proc/25097/fd/258 /oracle/data01/test/testtbs01.dbf
[oracle@server1 ~]$ ls -l /oracle/data01/test/testtbs01.dbf
lrwxrwxrwx 1 oracle dba 18 Jul 25 11:42 /oracle/data01/test/testtbs01.dbf -> /proc/25097/fd/258

Put the tablespace read only:

SQL> ALTER TABLESPACE testtbs read only;
 
TABLESPACE altered.

Remove the symbolic lind and copy the file descriptor to original missing file:

[oracle@server1 ~]$ rm /oracle/data01/test/testtbs01.dbf
[oracle@server1 ~]$ cp /proc/25097/fd/258 /oracle/data01/test/testtbs01.dbf
[oracle@server1 ~]$ ll /oracle/data01/test/testtbs01.dbf
-rw-r----- 1 oracle dba 104865792 Jul 22 16:13 /oracle/data01/test/testtbs01.dbf

Put back again tablespace in read write mode (alert will even disappear from RMAN):

SQL> ALTER TABLESPACE testtbs read WRITE;
 
TABLESPACE altered.

At that stage all is working abck again, but the file descriptor still see the datafile deleted, to recover this:

SQL>  ALTER TABLESPACE testtbs OFFLINE;
 
TABLESPACE altered.
 
SQL> recover datafile '/oracle/data01/test/testtbs01.dbf';
Media recovery complete.
 
SQL> ALTER TABLESPACE testtbs ONLINE;
 
TABLESPACE altered.

References

About Post Author

Share the knowledge!

2 thoughts on “Data Recovery Advisor (DRA) and Health Monitor by examples

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>