Table of contents
Preamble
ORA-01578: ORACLE data block corrupted is not an error message we receive often. Personally I have seen it only two times in my DBA life, thanks to the quality SAN we have with good RAID level. That’s why it is also complicated to practice on this storage error that has corrupted your most important production database…
In case this error happens you have two situations: either you have a backup and the only solution not to loose any data is to restore it or you have no backup (or the restore is not working for any reasons) and at this point data loss is clear but you might want to extract as much data as possible from impacted objects.
Of course if the impacted object is an index then you are lucky as a simple drop/recreate index will solve the issue.
No need to say that before rushing to recover your database you must resolve first, with no delay, the source of the hardware problem and replace the faulty part !
Testing has been done on Oracle Linux Server release 7.2 64bit with Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit. The database is in ARCHIVELOG mode as any other production database.
ORA-01578 with a backup
I start by creating a dedicated tablespace for our test:
SQL> CREATE TABLESPACE bmr datafile '/u01/app/oracle/oradata/orcl/bmr01.dbf' SIZE 5M autoextend ON next 5M maxsize 500M logging ONLINE permanent blocksize 8192 extent management local autoallocate SEGMENT SPACE management auto; TABLESPACE created. |
Then I create a test table and fill it with 500 rows:
SQL> CREATE TABLE test1(val NUMBER, descr VARCHAR2(200)) TABLESPACE bmr; TABLE created. SQL> DECLARE i NUMBER; BEGIN i:=1; WHILE (i <= 500) LOOP INSERT INTO test1 VALUES (i,TO_CHAR(TO_DATE(i, 'j'), 'jsp')); i:=i+1; END LOOP; END; / PL/SQL PROCEDURE successfully completed. SQL> COMMIT; COMMIT complete. |
I configure my Recovery Manager (RMAN) with below standard option:
configure controlfile autobackup on; configure backup optimization on; configure device type disk parallelism 2 backup type to compressed backupset; configure controlfile autobackup format for device type disk to '/backup/%F'; configure channel device type disk format '/backup/%U'; |
I backup the database and archivelog files with:
RMAN> backup database plus archivelog delete input; |
Then I add 500 more rows to the table, just to ensure I’m not loosing any figures after the end of recover process. I also gather statistics:
SQL> DECLARE i NUMBER; BEGIN i:=501; WHILE (i <= 1000) LOOP INSERT INTO test1 VALUES (i,TO_CHAR(TO_DATE(i, 'j'), 'jsp')); i:=i+1; END LOOP; END; / PL/SQL PROCEDURE successfully completed. SQL> COMMIT; COMMIT complete. SQL> EXEC dbms_stats.gather_table_stats(USER,'test1'); PL/SQL PROCEDURE successfully completed. SQL> SELECT MAX(val) FROM test1; MAX(VAL) ---------- 1000 |
My table has below storage (5 data blocks starting at 131):
SQL> SELECT extent_id,file_id,block_id,bytes,blocks FROM dba_extents WHERE segment_name='TEST1'; EXTENT_ID FILE_ID BLOCK_ID BYTES BLOCKS ---------- ---------- ---------- ---------- ---------- 0 5 128 65536 8 SQL> SELECT num_rows, blocks, empty_blocks FROM user_tables WHERE table_name='TEST1'; NUM_ROWS BLOCKS EMPTY_BLOCKS ---------- ---------- ------------ 1000 5 0 SQL> SELECT DISTINCT dbms_rowid.rowid_relative_fno(ROWID) AS file_no, dbms_rowid.rowid_block_number(ROWID) AS block_no FROM test1 ORDER BY 1,2; FILE_NO BLOCK_NO ---------- ---------- 5 131 5 132 5 133 5 134 5 135 |
If you wonder why a difference between starting blocks (128 and 131), this is well explained in Overview of Extents and we clearly see the reserved blocks by Oracle in initial allocated extent.
Before any corrupted block test do not forget to purge the buffer cache or Oracle will not read again the block and you might see nothing as the blocks are already in buffer cache:
SQL> ALTER SYSTEM flush buffer_cache; SYSTEM altered. |
Let’s corrupt first data block with dd command. This trick is coming from my Recovery Manager training even if Oracle is doing things a bit differently. It might be cumbersome to be able to corrupt one block with all caching on modern computer (database, filesystem,..). I have finally be obliged to issue two dd commands and one sync:
[root@server1 ~]# dd if=/dev/zero of=/u01/app/oracle/oradata/orcl/bmr01.dbf bs=8192 conv=notrunc seek=131 count=1 1+0 records in 1+0 records out 8192 bytes (8.2 kB) copied, 0.000209918 s, 39.0 MB/s [root@server1 ~]# sync [root@server1 ~]# dd if=/dev/zero of=/u01/app/oracle/oradata/orcl/bmr01.dbf bs=8192 conv=notrunc seek=131 count=1 1+0 records in 1+0 records out 8192 bytes (8.2 kB) copied, 0.000209918 s, 39.0 MB/s |
Now you should start to encounter error at Oracle level (if not flush the buffer cache):
SQL> ALTER SYSTEM flush buffer_cache; SYSTEM altered. SQL> SELECT * FROM test1 ORDER BY val; SELECT * FROM test1 ORDER BY val * ERROR AT line 1: ORA-01578: ORACLE data block corrupted (FILE # 5, block # 131) ORA-01110: data FILE 5: '/u01/app/oracle/oradata/orcl/bmr01.dbf' |
To understand which blocks are impacted you may use:
SQL> SELECT * FROM v$database_block_corruption; FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO CON_ID ---------- ---------- ---------- ------------------ --------- ---------- 5 131 1 0 ALL ZERO 0 |
Or DB verify utility:
[oracle@server1 ~]$ dbv file=/u01/app/oracle/oradata/orcl/bmr01.dbf DBVERIFY: Release 12.1.0.2.0 - Production on Tue Jul 19 12:49:19 2016 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl/bmr01.dbf Page 131 is marked corrupt Corrupt block relative dba: 0x01400083 (file 5, block 131) Completely zero block found during dbv: DBVERIFY - Verification complete Total Pages Examined : 640 Total Pages Processed (Data) : 4 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 130 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 505 Total Pages Marked Corrupt : 1 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 2631270 (0.2631270) |
You can also report corruption with RMAN:
RMAN> run { allocate channel channel01 type disk; allocate channel channel02 type disk; allocate channel channel03 type disk; allocate channel channel04 type disk; backup check logical validate database; } . . . File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------- 5 FAILED 0 505 640 2631270 File Name: /u01/app/oracle/oradata/orcl/bmr01.dbf Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------- Data 0 4 Index 0 0 Other 1 131 validate found one or more corrupt blocks See trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_15528.trc for details . . . |
At this stage you can restore the entire datafile or use a cool feature called Block Media Recovery (BMR) and as its name stand for it will recover only the corrupted block. Pre 11gR1 the command is:
blockrecover datafile '...' block x;
blockrecover corruption list; |
Starting with 11gR1 the new command is:
recover datafile '...' block x;
recover corruption list; |
If you choose to restore the datafile then you have to put it offline so it has an impact for other objects in same datafile. BMR instead does not impact anything:
RMAN> alter database datafile '/u01/app/oracle/oradata/orcl/bmr01.dbf' offline; Statement processed RMAN> restore datafile 5; Starting restore at 19-JUL-16 using channel ORA_DISK_1 using channel ORA_DISK_2 channel ORA_DISK_1: starting datafile backup set restore channel ORA_DISK_1: specifying datafile(s) to restore from backup set channel ORA_DISK_1: restoring datafile 00005 to /u01/app/oracle/oradata/orcl/bmr01.dbf channel ORA_DISK_1: reading from backup piece /backup/18rb3b6l_1_1 channel ORA_DISK_1: piece handle=/backup/18rb3b6l_1_1 tag=TAG20160719T124301 channel ORA_DISK_1: restored backup piece 1 channel ORA_DISK_1: restore complete, elapsed time: 00:00:01 Finished restore at 19-JUL-16 RMAN> recover datafile 5; Starting recover at 19-JUL-16 using channel ORA_DISK_1 using channel ORA_DISK_2 starting media recovery media recovery complete, elapsed time: 00:00:00 Finished recover at 19-JUL-16 RMAN> alter database datafile '/u01/app/oracle/oradata/orcl/bmr01.dbf' online; using target database control file instead of recovery catalog Statement processed |
BMR is the method that is chosen by Data Recovery Advisor:
RMAN> list failure; using target database control file instead of recovery catalog Database Role: PRIMARY List of Database Failures ========================= Failure ID Priority Status Time Detected Summary ---------- -------- --------- ------------- ------- 167 HIGH OPEN 19-JUL-16 Datafile 5: '/u01/app/oracle/oradata/orcl/bmr01.dbf' contains one or more corrupt blocks RMAN> advise failure; Database Role: PRIMARY List of Database Failures ========================= Failure ID Priority Status Time Detected Summary ---------- -------- --------- ------------- ------- 167 HIGH OPEN 19-JUL-16 Datafile 5: '/u01/app/oracle/oradata/orcl/bmr01.dbf' contains one or more corrupt blocks analyzing automatic repair options; this may take some time allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=263 device type=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: SID=42 device type=DISK analyzing automatic repair options complete Mandatory Manual Actions ======================== no manual actions available Optional Manual Actions ======================= no manual actions available Automated Repair Options ======================== Option Repair Description ------ ------------------ 1 Perform block media recovery of block 131 in file 5 Strategy: The repair includes complete media recovery with no data loss Repair script: /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2815027252.hm RMAN> host 'cat /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2815027252.hm'; # block media recovery recover datafile 5 block 131; host command complete RMAN> repair failure; Strategy: The repair includes complete media recovery with no data loss Repair script: /u01/app/oracle/diag/rdbms/orcl/orcl/hm/reco_2815027252.hm contents of repair script: # block media recovery recover datafile 5 block 131; Do you really want to execute the above repair (enter YES or NO)? y executing repair script Starting recover at 19-JUL-16 using channel ORA_DISK_1 using channel ORA_DISK_2 channel ORA_DISK_1: restoring block(s) channel ORA_DISK_1: specifying block(s) to restore from backup set restoring blocks of datafile 00005 channel ORA_DISK_1: reading from backup piece /backup/18rb3b6l_1_1 channel ORA_DISK_1: piece handle=/backup/18rb3b6l_1_1 tag=TAG20160719T124301 channel ORA_DISK_1: restored block(s) from backup piece 1 channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01 starting media recovery media recovery complete, elapsed time: 00:00:01 Finished recover at 19-JUL-16 repair failure complete |
And the table is again fully accessible:
SQL> SELECT COUNT(*) FROM test1; COUNT(*) ---------- 1000 |
ORA-01578 with no backup
Of course this must not happen but you might encounter it after trying to restore a backup and realizing that your backup strategy was not so perfect… So the importance to validate your backup procedure as often as you can. In extreme situation the data center failure might be so huge that even your backups are not accessible because you simply do not offload your backup to a third party supplier. The situation we had was a SAN failure where was located the TSM database, in this situation no restore was even possible…
Worth to mention that you will encounter data loss, the below procedure should be followed only in last option to try to recover figures you can. Below as been executed with SYS as I have not succeeded to grant execute on DBMS_REPAIR to my DBA account (!!).
Start by creating the package required backend tables:
SQL> EXECUTE dbms_repair.admin_tables(table_name => 'ORPHAN_KEY_TABLE', table_type => dbms_repair.orphan_table, action => dbms_repair.create_action, TABLESPACE => 'users'); PL/SQL PROCEDURE successfully completed. SQL> EXECUTE dbms_repair.admin_tables(table_name => 'REPAIR_TABLE', table_type => dbms_repair.repair_table, action => dbms_repair.create_action, TABLESPACE => 'users'); PL/SQL PROCEDURE successfully completed. |
Check your object with:
SQL> SET serveroutput ON SQL> DECLARE corrupt_count BINARY_INTEGER:=0; BEGIN dbms_repair.check_object(schema_name => 'YJAQUIER', object_name => 'TEST1', repair_table_name => 'REPAIR_TABLE', corrupt_count => corrupt_count); dbms_output.put_line('Corrupted block(s): ' || TO_CHAR (corrupt_count)); END; / Corrupted block(s): 1 PL/SQL PROCEDURE successfully completed. |
This fill REPAIR_TABLE:
SQL> col repair_description FOR a30 SQL> SET lines 150 SQL> SELECT relative_file_id,block_id,schema_name,object_name,fix_timestamp,repair_description FROM sys.repair_table; RELATIVE_FILE_ID BLOCK_ID SCHEMA_NAME OBJECT_NAME FIX_TIMES REPAIR_DESCRIPTION ---------------- ---------- ------------------------------ ------------------------------ --------- ------------------------------ 5 131 YJAQUIER TEST1 mark block software corrupt |
Then I ask Oracle not to fetch this block anymore with:
SQL> SET serveroutput ON SQL> SQL> EXEC DBMS_REPAIR.SKIP_CORRUPT_BLOCKS(schema_name => 'YJAQUIER', object_name => 'TEST1'); PL/SQL PROCEDURE successfully completed. |
I can now read in my test table but I have lost 229 rows (!!):
SQL> SELECT COUNT(*) FROM test1; COUNT(*) ---------- 771 |
I can recover those rows in another table with something like:
SQL> CREATE TABLE test2 TABLESPACE users AS SELECT * FROM test1; TABLE created. SQL> DROP TABLE test1; TABLE dropped. SQL> RENAME test2 TO test1; TABLE renamed. |
If you check with DB Verify, RMAN or V$DATABASE_BLOCK_CORRUPTION you will still see a corrupted block. Either you wait Oracle to write a new object on it or you can accelerate the process following MOS note 336133.1:
First remove autoextend from your datafile:
SQL> alter database datafile '/u01/app/oracle/oradata/orcl/bmr01.dbf' autoextend off; Database altered. |
Create a test table in table with corrupted block avoiding deferred segment creation:
SQL> show parameter deferred_segment_creation NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ deferred_segment_creation BOOLEAN TRUE SQL> CREATE TABLE clean1(n NUMBER, c VARCHAR2(4000)) SEGMENT creation IMMEDIATE nologging PCTFREE 99 TABLESPACE bmr; TABLE created. |
Control extend size with:
SQL> SET lines 150 SQL> SELECT * FROM dba_free_space WHERE file_id= 5 AND 131 BETWEEN block_id AND block_id + blocks -1; TABLESPACE_NAME FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO ------------------------------ ---------- ---------- ---------- ---------- ------------ BMR 5 128 65536 8 5 |
Allocate as many extent as you can with:
SQL> BEGIN FOR i IN 1..1000000 LOOP EXECUTE IMMEDIATE 'alter table clean1 allocate extent (datafile '||'''/u01/app/oracle/oradata/orcl/bmr01.dbf''' ||'size 64k)'; END LOOP; END ; / BEGIN * ERROR AT line 1: ORA-01653: unable TO extend TABLE YJAQUIER.CLEAN1 BY 128 IN TABLESPACE BMR ORA-06512: AT line 3 |
Control the block has been allocated to your table with:
SQL> SELECT segment_name, segment_type, owner FROM dba_extents WHERE file_id = 5 AND 131 BETWEEN block_id AND block_id + blocks -1; no ROWS selected |
If not restart the process with a clean2 table and so on…
In my case it succeeded with CLEAN2 table:
SQL> col segment_name FOR a15 SQL> col owner FOR a15 SQL> SELECT segment_name, segment_type, owner FROM dba_extents WHERE file_id = 5 AND 131 BETWEEN block_id AND block_id + blocks -1; SEGMENT_NAME SEGMENT_TYPE OWNER --------------- ------------------ --------------- CLEAN2 TABLE YJAQUIER |
Fill the table that has your corrupted block in its extents (CLEAN2 in my case):
SQL> BEGIN FOR i IN 1..1000000000 LOOP INSERT /*+ append */ INTO clean2 SELECT i, LPAD('reformat',3092, 'r') FROM dual; COMMIT ; END LOOP; END; / BEGIN * ERROR AT line 1: ORA-01653: unable TO extend TABLE YJAQUIER.CLEAN2 BY 128 IN TABLESPACE BMR ORA-06512: AT line 3 |
Perform few checkpoint and logfile switch with:
SQL> ALTER SYSTEM checkpoint; SYSTEM altered. SQL> ALTER SYSTEM switch logfile; SYSTEM altered. |
DB Verify should not report error anymore:
[oracle@server1 ~]$ dbv file=/u01/app/oracle/oradata/orcl/bmr01.dbf DBVERIFY: Release 12.1.0.2.0 - Production on Tue Jul 19 15:36:48 2016 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /u01/app/oracle/oradata/orcl/bmr01.dbf DBVERIFY - Verification complete Total Pages Examined : 640 Total Pages Processed (Data) : 118 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 471 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 51 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 2641854 (0.2641854) |
To “purge” V$DATABASE_BLOCK_CORRUPTION issue:
RMAN> backup validate datafile 5; Starting backup at 19-JUL-16 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=262 device type=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: SID=25 device type=DISK channel ORA_DISK_1: starting compressed full datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00005 name=/u01/app/oracle/oradata/orcl/bmr01.dbf channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01 List of Datafiles ================= File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------- 5 OK 0 51 647 2641854 File Name: /u01/app/oracle/oradata/orcl/bmr01.dbf Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------- Data 0 118 Index 0 0 Other 0 471 Finished backup at 19-JUL-16 |
References
- Master Note for Handling Oracle Database Corruption Issues (Doc ID 1088018.1)
- How to Format Corrupted Block Not Part of Any Segment (Doc ID 336133.1)
- Detect And Correct Corruption in Oracle
Laverde Williams says:
Hi, I know is and old thread. But I wondered if you create the trigger mencioned on MOS note
Yannick Jaquier says:
Hi,
As far as I remember nope I have not used it…