Resolve ORA-01578 error with and without a backup

Preamble

Data corruption
Data corruption

ORA-01578: ORACLE data block corrupted is not an error message we receive often. Personally I have seen it only two times in my DBA life, thanks to the quality SAN we have with good RAID level. That’s why it is also complicated to practice on this storage error that has corrupted your most important production database…

In case this error happens you have two situations: either you have a backup and the only solution not to loose any data is to restore it or you have no backup (or the restore is not working for any reasons) and at this point data loss is clear but you might want to extract as much data as possible from impacted objects.

Of course if the impacted object is an index then you are lucky as a simple drop/recreate index will solve the issue.

No need to say that before rushing to recover your database you must resolve first, with no delay, the source of the hardware problem and replace the faulty part !

Testing has been done on Oracle Linux Server release 7.2 64bit with Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit. The database is in ARCHIVELOG mode as any other production database.

ORA-01578 with a backup

I start by creating a dedicated tablespace for our test:

SQL> CREATE TABLESPACE bmr
     datafile '/u01/app/oracle/oradata/orcl/bmr01.dbf' SIZE 5M autoextend ON next 5M maxsize 500M
     logging ONLINE permanent
     blocksize 8192
     extent management local autoallocate
     SEGMENT SPACE management auto;
 
TABLESPACE created.

Then I create a test table and fill it with 500 rows:

SQL> CREATE TABLE test1(val NUMBER, descr VARCHAR2(200)) TABLESPACE bmr;
 
TABLE created.
 
SQL> DECLARE
i NUMBER;
BEGIN
i:=1;
WHILE (i <= 500)
LOOP
INSERT INTO test1 VALUES (i,TO_CHAR(TO_DATE(i, 'j'), 'jsp'));
i:=i+1;
END LOOP;
END;
/
 
PL/SQL PROCEDURE successfully completed.
 
SQL> COMMIT;
 
COMMIT complete.

I configure my Recovery Manager (RMAN) with below standard option:

configure controlfile autobackup on;
configure backup optimization on;
configure device type disk parallelism 2 backup type to compressed backupset;
configure controlfile autobackup format for device type disk to '/backup/%F';
configure channel device type disk format '/backup/%U';

I backup the database and archivelog files with:

RMAN> backup database plus archivelog delete input;

Then I add 500 more rows to the table, just to ensure I’m not loosing any figures after the end of recover process. I also gather statistics:

SQL> DECLARE
i NUMBER;
BEGIN
i:=501;
WHILE (i <= 1000)
LOOP
INSERT INTO test1 VALUES (i,TO_CHAR(TO_DATE(i, 'j'), 'jsp'));
i:=i+1;
END LOOP;
END;
/
 
PL/SQL PROCEDURE successfully completed.
 
SQL> COMMIT;
 
COMMIT complete.
 
SQL> EXEC dbms_stats.gather_table_stats(USER,'test1');
 
PL/SQL PROCEDURE successfully completed.
 
SQL> SELECT MAX(val) FROM test1;
 
  MAX(VAL)
----------
      1000

My table has below storage (5 data blocks starting at 131):

SQL> SELECT extent_id,file_id,block_id,bytes,blocks FROM dba_extents WHERE segment_name='TEST1';
 
 EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS
---------- ---------- ---------- ---------- ----------
         0          5        128      65536          8
SQL> SELECT num_rows, blocks, empty_blocks FROM user_tables WHERE table_name='TEST1';
 
  NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ------------
      1000          5            0
 
SQL> SELECT DISTINCT dbms_rowid.rowid_relative_fno(ROWID) AS file_no, dbms_rowid.rowid_block_number(ROWID) AS block_no FROM test1 ORDER BY 1,2;
 
   FILE_NO   BLOCK_NO
---------- ----------
         5        131
         5        132
         5        133
         5        134
         5        135

If you wonder why a difference between starting blocks (128 and 131), this is well explained in Overview of Extents and we clearly see the reserved blocks by Oracle in initial allocated extent.

Before any corrupted block test do not forget to purge the buffer cache or Oracle will not read again the block and you might see nothing as the blocks are already in buffer cache:

SQL> ALTER SYSTEM flush buffer_cache;
 
SYSTEM altered.

Let’s corrupt first data block with dd command. This trick is coming from my Recovery Manager training even if Oracle is doing things a bit differently. It might be cumbersome to be able to corrupt one block with all caching on modern computer (database, filesystem,..). I have finally be obliged to issue two dd commands and one sync:

[root@server1 ~]# dd if=/dev/zero of=/u01/app/oracle/oradata/orcl/bmr01.dbf bs=8192 conv=notrunc seek=131 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000209918 s, 39.0 MB/s
[root@server1 ~]# sync
[root@server1 ~]# dd if=/dev/zero of=/u01/app/oracle/oradata/orcl/bmr01.dbf bs=8192 conv=notrunc seek=131 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000209918 s, 39.0 MB/s

Now you should start to encounter error at Oracle level (if not flush the buffer cache):

SQL> ALTER SYSTEM flush buffer_cache;
 
SYSTEM altered.
 
SQL> SELECT * FROM test1 ORDER BY val;
SELECT * FROM test1 ORDER BY val
       *
ERROR AT line 1:
ORA-01578: ORACLE data block corrupted (FILE # 5, block # 131)
ORA-01110: data FILE 5: '/u01/app/oracle/oradata/orcl/bmr01.dbf'

To understand which blocks are impacted you may use:

SQL> SELECT * FROM v$database_block_corruption;
 
     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO     CON_ID
---------- ---------- ---------- ------------------ --------- ----------
         5        131          1                  0 ALL ZERO           0

Or DB verify utility:

[oracle@server1 ~]$ dbv file=/u01/app/oracle/oradata/orcl/bmr01.dbf
 
DBVERIFY: Release 12.1.0.2.0 - Production on Tue Jul 19 12:49:19 2016
 
Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.
 
DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl/bmr01.dbf
Page 131 is marked corrupt
Corrupt block relative dba: 0x01400083 (file 5, block 131)
Completely zero block found during dbv:
 
 
 
DBVERIFY - Verification complete
 
Total Pages Examined         : 640
Total Pages Processed (Data) : 4
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 130
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 505
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 2631270 (0.2631270)

You can also report corruption with RMAN:

RMAN> run {
 allocate channel channel01 type disk;
 allocate channel channel02 type disk;
 allocate channel channel03 type disk;
 allocate channel channel04 type disk;
 backup check logical validate database;
}
.
.
.
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    FAILED 0              505          640             2631270
  File Name: /u01/app/oracle/oradata/orcl/bmr01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              4
  Index      0              0
  Other      1              131
 
validate found one or more corrupt blocks
See trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_15528.trc for details
.
.
.

At this stage you can restore the entire datafile or use a cool feature called Block Media Recovery (BMR) and as its name stand for it will recover only the corrupted block. Pre 11gR1 the command is:

blockrecover datafile '...' block x;
blockrecover corruption list;

Starting with 11gR1 the new command is:

recover datafile '...' block x;
recover corruption list;

If you choose to restore the datafile then you have to put it offline so it has an impact for other objects in same datafile. BMR instead does not impact anything:

RMAN> alter database datafile '/u01/app/oracle/oradata/orcl/bmr01.dbf' offline;
 
Statement processed
 
RMAN> restore datafile 5;
 
Starting restore at 19-JUL-16
using channel ORA_DISK_1
using channel ORA_DISK_2
 
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00005 to /u01/app/oracle/oradata/orcl/bmr01.dbf
channel ORA_DISK_1: reading from backup piece /backup/18rb3b6l_1_1
channel ORA_DISK_1: piece handle=/backup/18rb3b6l_1_1 tag=TAG20160719T124301
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 19-JUL-16
 
RMAN> recover datafile 5;
 
Starting recover at 19-JUL-16
using channel ORA_DISK_1
using channel ORA_DISK_2
 
starting media recovery
media recovery complete, elapsed time: 00:00:00
 
Finished recover at 19-JUL-16
 
RMAN> alter database datafile '/u01/app/oracle/oradata/orcl/bmr01.dbf' online;
 
using target database control file instead of recovery catalog
Statement processed

BMR is the method that is chosen by Data Recovery Advisor:

RMAN> list failure;
 
using target database control file instead of recovery catalog
Database Role: PRIMARY
 
List of Database Failures
=========================
 
Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
167        HIGH     OPEN      19-JUL-16     Datafile 5: '/u01/app/oracle/oradata/orcl/bmr01.dbf' contains one or more corrupt blocks
 
RMAN> advise failure;
 
Database Role: PRIMARY
 
List of Database Failures
=========================
 
Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
167        HIGH     OPEN      19-JUL-16     Datafile 5: '/u01/app/oracle/oradata/orcl/bmr01.dbf' contains one or more corrupt blocks
 
analyzing automatic repair options; this may take some time
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=263 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=42 device type=DISK
analyzing automatic repair options complete
 
Mandatory Manual Actions
========================
no manual actions available
 
Optional Manual Actions
=======================
no manual actions available
 
Automated Repair Options
========================
Option Repair Description
------ ------------------
1      Perform block media recovery of block 131 in file 5
  Strategy: The repair includes complete media recovery with no data loss
  Repair script: /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2815027252.hm
 
RMAN> host 'cat /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2815027252.hm';
 
   # block media recovery
   recover datafile 5 block 131;
host command complete
 
RMAN> repair failure;
 
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2815027252.hm
 
contents of repair script:
   # block media recovery
   recover datafile 5 block 131;
 
Do you really want to execute the above repair (enter YES or NO)? y
executing repair script
 
Starting recover at 19-JUL-16
using channel ORA_DISK_1
using channel ORA_DISK_2
 
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00005
channel ORA_DISK_1: reading from backup piece /backup/18rb3b6l_1_1
channel ORA_DISK_1: piece handle=/backup/18rb3b6l_1_1 tag=TAG20160719T124301
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01
 
starting media recovery
media recovery complete, elapsed time: 00:00:01
 
Finished recover at 19-JUL-16
repair failure complete

And the table is again fully accessible:

SQL> SELECT COUNT(*) FROM test1;
 
  COUNT(*)
----------
      1000

ORA-01578 with no backup

Of course this must not happen but you might encounter it after trying to restore a backup and realizing that your backup strategy was not so perfect… So the importance to validate your backup procedure as often as you can. In extreme situation the data center failure might be so huge that even your backups are not accessible because you simply do not offload your backup to a third party supplier. The situation we had was a SAN failure where was located the TSM database, in this situation no restore was even possible…

Worth to mention that you will encounter data loss, the below procedure should be followed only in last option to try to recover figures you can. Below as been executed with SYS as I have not succeeded to grant execute on DBMS_REPAIR to my DBA account (!!).

Start by creating the package required backend tables:

SQL> EXECUTE dbms_repair.admin_tables(table_name => 'ORPHAN_KEY_TABLE', table_type => dbms_repair.orphan_table, action => dbms_repair.create_action, TABLESPACE => 'users');
 
PL/SQL PROCEDURE successfully completed.
 
SQL> EXECUTE dbms_repair.admin_tables(table_name => 'REPAIR_TABLE', table_type => dbms_repair.repair_table, action => dbms_repair.create_action, TABLESPACE => 'users');
 
PL/SQL PROCEDURE successfully completed.

Check your object with:

SQL> SET serveroutput ON
SQL> DECLARE
       corrupt_count BINARY_INTEGER:=0;
     BEGIN
       dbms_repair.check_object(schema_name => 'YJAQUIER', object_name => 'TEST1', repair_table_name => 'REPAIR_TABLE', corrupt_count => corrupt_count);
       dbms_output.put_line('Corrupted block(s): ' || TO_CHAR (corrupt_count));
     END;
     /
 
Corrupted block(s): 1
 
PL/SQL PROCEDURE successfully completed.

This fill REPAIR_TABLE:

SQL> col repair_description FOR a30
SQL> SET lines 150
SQL> SELECT relative_file_id,block_id,schema_name,object_name,fix_timestamp,repair_description FROM sys.repair_table;
 
RELATIVE_FILE_ID   BLOCK_ID SCHEMA_NAME                    OBJECT_NAME                    FIX_TIMES REPAIR_DESCRIPTION
---------------- ---------- ------------------------------ ------------------------------ --------- ------------------------------
               5        131 YJAQUIER                       TEST1                                    mark block software corrupt

Then I ask Oracle not to fetch this block anymore with:

SQL> SET serveroutput ON
SQL> SQL> EXEC DBMS_REPAIR.SKIP_CORRUPT_BLOCKS(schema_name => 'YJAQUIER', object_name => 'TEST1');
 
PL/SQL PROCEDURE successfully completed.

I can now read in my test table but I have lost 229 rows (!!):

SQL> SELECT COUNT(*) FROM test1;
 
  COUNT(*)
----------
       771

I can recover those rows in another table with something like:

SQL> CREATE TABLE test2 TABLESPACE users AS SELECT * FROM test1;
 
TABLE created.
 
SQL> DROP TABLE test1;
 
TABLE dropped.
 
SQL> RENAME test2 TO test1;
 
TABLE renamed.

If you check with DB Verify, RMAN or V$DATABASE_BLOCK_CORRUPTION you will still see a corrupted block. Either you wait Oracle to write a new object on it or you can accelerate the process following MOS note 336133.1:

First remove autoextend from your datafile:

SQL> alter database datafile '/u01/app/oracle/oradata/orcl/bmr01.dbf' autoextend off;
 
Database altered.

Create a test table in table with corrupted block avoiding deferred segment creation:

SQL> show parameter deferred_segment_creation
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
deferred_segment_creation            BOOLEAN     TRUE
 
SQL> CREATE TABLE clean1(n NUMBER, c VARCHAR2(4000)) SEGMENT creation IMMEDIATE nologging PCTFREE 99 TABLESPACE bmr;
 
TABLE created.

Control extend size with:

SQL> SET lines 150
SQL> SELECT * FROM dba_free_space WHERE file_id= 5 AND 131 BETWEEN block_id AND block_id + blocks -1;
 
TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
BMR                                     5        128      65536          8            5

Allocate as many extent as you can with:

SQL> BEGIN
       FOR i IN 1..1000000 LOOP
         EXECUTE IMMEDIATE 'alter table clean1 allocate extent (datafile '||'''/u01/app/oracle/oradata/orcl/bmr01.dbf''' ||'size 64k)';
       END LOOP;
     END ;
     /
BEGIN
*
ERROR AT line 1:
ORA-01653: unable TO extend TABLE YJAQUIER.CLEAN1 BY 128 IN TABLESPACE BMR
ORA-06512: AT line 3

Control the block has been allocated to your table with:

SQL> SELECT segment_name, segment_type, owner FROM dba_extents WHERE file_id = 5 AND 131 BETWEEN block_id AND block_id + blocks -1;
 
no ROWS selected

If not restart the process with a clean2 table and so on…

In my case it succeeded with CLEAN2 table:

SQL> col segment_name FOR a15
SQL> col owner FOR a15
SQL> SELECT segment_name, segment_type, owner FROM dba_extents WHERE file_id = 5 AND 131 BETWEEN block_id AND block_id + blocks -1;
 
SEGMENT_NAME    SEGMENT_TYPE       OWNER
--------------- ------------------ ---------------
CLEAN2          TABLE              YJAQUIER

Fill the table that has your corrupted block in its extents (CLEAN2 in my case):

SQL> BEGIN 
       FOR i IN 1..1000000000 LOOP 
         INSERT /*+ append */ INTO clean2 SELECT i, LPAD('reformat',3092, 'r') FROM dual; 
         COMMIT ; 
       END LOOP; 
     END;
     /
BEGIN
*
ERROR AT line 1:
ORA-01653: unable TO extend TABLE YJAQUIER.CLEAN2 BY 128 IN TABLESPACE BMR
ORA-06512: AT line 3

Perform few checkpoint and logfile switch with:

SQL> ALTER SYSTEM checkpoint;
 
SYSTEM altered.
 
SQL> ALTER SYSTEM switch logfile;
 
SYSTEM altered.

DB Verify should not report error anymore:

[oracle@server1 ~]$ dbv file=/u01/app/oracle/oradata/orcl/bmr01.dbf
 
DBVERIFY: Release 12.1.0.2.0 - Production on Tue Jul 19 15:36:48 2016
 
Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.
 
DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl/bmr01.dbf
 
 
DBVERIFY - Verification complete
 
Total Pages Examined         : 640
Total Pages Processed (Data) : 118
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 471
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 51
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 2641854 (0.2641854)

To “purge” V$DATABASE_BLOCK_CORRUPTION issue:

RMAN> backup validate datafile 5;
 
Starting backup at 19-JUL-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=262 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=25 device type=DISK
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00005 name=/u01/app/oracle/oradata/orcl/bmr01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    OK     0              51           647             2641854
  File Name: /u01/app/oracle/oradata/orcl/bmr01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              118
  Index      0              0
  Other      0              471
 
Finished backup at 19-JUL-16

References

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>