Saturday, July 31, 2010

Oracle Database Block corruption

Oracle Database Block corruption : ******************

“Block corruption is rare but it does happen. As databases get larger and larger – the probability of it happening at some point nears 100%.”--Mr. Tom Kyte Oracle Expert

Block corruption is while the data is being written to the data blocks, if the write to the block fails abruptly, I mean that there is a partial write in the block, may be because of power disruption or I/O problem, leaving no time for header to be updated, or row data to be populated, oracle leaves the block corrupt.In case of block corruption you can normally use the database unless you try to read that particular block, against which it shoots up the block corruption error.Generally block corruption occurs if write fails on the block, when the transaction is being committed


ORA-01578:
ORACLE data block corrupted (file # string, block # string)
Whenever we encounter above error message mean we have BLOCK CORRUPTION.

NOTE: We can find detail information about block corruption in alert.log file

Two types of block corruption can happens

- Physical corruption (media corrupt)
- Logical corruption (soft corrupt)

Physical corruption can be caused by defected memory boards, controllers or broken sectors on a hard disk;

Logical corruption can among other reasons be caused by an attempt to recover through a NOLOGGING action.



Difference between logical and physical corruption

Logical corruption is header - footer - that is one of the checks, yes (it is looking for fractured blocks and when it hits one, it'll re-read it, that is why there is no need for "alter tablespace begin backup" with rman)


Physical corruption is "we cannot read the block from disk, something is physically preventing us from doing so”.

How to detect block corruption?


1. DBVERIFY utility

DBVERIFY is an external command-line utility that performs a physical data structure integrity check. It can be used on offline or online databases, as well on backup files. You use DBVERIFY primarily when you need to ensure that a backup database (or datafile) is valid before it is restored.

http://dbataj.blogspot.com/2007/04/offline-database-verification-utility.html



2. Block checking parameters

There are two initialization parameters for dealing with block corruption:- DB_BOCK_CHECKSUM (calculates a checksum for each block before it is written to disk, every time)causes 1-2% performance overhead- DB_BLOCK_CHECKING (serverprocess checks block for internal consistency after every DML)causes 1-10% performance overhead

Note: In10g db_block_checksum value TYPICAL is implying TRUE and db_block_checking value FULL implying TRUE.

DB_BLOCK_CHECKING Initialization Parameter
http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/repair.htm#sthref3176




3. ANALYZE TABLE tablename VALIDATE STRUCTURE CASCADE SQL statement

Validate the structure of an index or index partition, table or table partition, index-organized table, cluster, or object reference (REF).

ANALYZE: Reporting Corruption
http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/repair.htm#sthref3173




4. RMAN BACKUP command with THE VALIDATE option

You can use the VALIDATE option of the BACKUP command to verify that database files exist and are in the correct locations, and have no physical or logical corruptions that would prevent RMAN from creating backups of them. When performing a BACKUP... VALIDATE, RMAN reads the files to be backed up in their entirety, as it would during a real backup. It does not, however, actually produce any backup sets or image copies.



Detection of Logical Block Corruption : *******************

Besides testing for media corruption, the database can also test data and index blocks for logical corruption, such as corruption of a row piece or index entry. If RMAN finds logical corruption, then it logs the block in the alert.log. If CHECK LOGICAL was used, the block is also logged in the server session trace file. By default, error checking for logical corruption is disabled.
For BACKUP commands the MAXCORRUPT parameter sets the total number of physical and logical corruptions permitted in a file. If the sum of physical and logical corruptions for a file is less than its MAXCORRUPT setting, the RMAN command completes successfully. If MAXCORRUPT is exceeded, the command terminates and RMAN does not read the rest of the file. V$DATABASE_BLOCK_CORRUPTION is populated with corrupt block ranges if the command succeeds. Otherwise, you must set MAXCORRUPT higher and re-run the backup to find out the corrupt block ranges.
RMAN found any block corruption in database then following Data Dictionary view populated.

V$COPY_CORRUPTION
V$BACKUP_CORRUPTION
V$DATABASE_BLOCK_CORRUPTION

Using RMAN to Validate Database Files
http://download.oracle.com/docs/cd/B19306_01/backup.102/b14192/bkup005.htm#i1006673

5. EXPORT/IMPORT command line utility

Full database EXPORT/IMPORT show=y is another method.

. about to export SCOTT's tables via Conventional Path ...
. . exporting table BONUS
EXP-00056: ORACLE error 1578 encountered
ORA-01578: ORACLE data block corrupted (file # 4, block # 43)
ORA-01110: data file 4: 'C:\ORA10GHOME\ORADATA\ORCL10G\USERS01.DBF'

6. DBMS_REPAIR package

dbms_repair is a utility that can detect and repair block corruption within Oracle. It is provided by Oracle as part of the standard database installation.

http://www.oracleutilities.com/Packages/dbms_repair.html

How to Repair & Fix block corruption?

We can recover everything but we have valid database backup.
Whenever we found block corruption then first need to find out which type of block corruption occurred because block corruption recovery depends on block corruption type.

Like Corrupted block related to TABLE segment, INDEX segment, TABLE
PARTITION segment, INDEX PARTITION segment, ROLLBACK segment, LOB segment.

Through below query we can find out corrupted block type

select segment_type,owner'.'segment_name
from dba_extents
where file_id = [&file_id] and [&block] between block_id and block_id+blocks -1;

Below is example with RMAN BLOCK MEDIA RECOVERY.



SQL> conn scott/tiger
Connected.



SQL> select * from test;
select * from test
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 11)
ORA-01110: data file 5: 'C:\INDEXDATA01.DBF'

First check which type of block corruption happened through above mentioned query.




RMAN> blockrecover datafile 5 block 11;
Starting recover at 29-APR-08using channel
ORA_DISK_1 starting media recoverymedia recovery complete,
elapsed time: 00:00:00
Finished recover at 29-APR-08

If you are not using rman then applying below procedure
- if it is index then drop and recreate index
- if it is table and you have backup of that table then restore backup on another database and exp/imp the table.

Same Reference:
http://sysdba.wordpress.com/2006/04/05/how-to-check-for-and-repair-block-corruption-with-rman-in-oracle-9i-and-oracle-10g/

NOTE: Find more information “Handling Oracle Block Corruptions in Oracle7/8/8i/9i/10g” Metalink Note: 28814.1



How to corrupt database block for practice purpose?
On Unix:
Use dd command
$man dd



On Windows:
Use Editor and open datafile write some junk character at middle of file and save it.




****************************************

Another live example :




Block Corruption and Recovery


Step 1: Identify the corrupt blocks
-----------------------------------
1. Run below command to populate v$database_block_corruption view with information of all the corrupted blocks.



RMAN> backup validate check logical database;

“CHECK LOGICAL" option is used to identify both Physical and Logical Block Corruptions.


Select the view to identify the corrupted blocks detected by RMAN.


SQL> select * from v$database_block_corruption;

Please note, After a corrupt block is repaired, the row identifying the block is deleted from the view.


2. Check alert. log file for corrupted blocks, data file list.
For ex:- ORA-01578: ORACLE data block corrupted (file # 5, block # 15)
ORA-01110: data file 5: '/oracle/oradata/trgt/users01.dbf'


3. You can also use dbverify utility to identify Physical and Logical Intra Block Corruptions.

dbv file=datafile_name blocksize=datafile_block_size


Step 2: Recovering Data blocks
-------------------------------

1. Recovering Data blocks By Using All Available Backups

Run the BLOCKRECOVER command at the RMAN prompt, specifying the file and block numbers for the corrupted blocks

RMAN>BLOCKRECOVER DATAFILE 5 BLOCK 15;

Recover multiple blocks in single command

RMAN>BLOCKRECOVER DATAFILE 5 BLOCK 15 DATAFILE 2 BLOCK 10;

2. Recovering Data blocks Using Selected Backups

Run the BLOCKRECOVER command at the RMAN prompt, Specifying the data file and block numbers for the corrupted blocks and limiting the backup candidates by means of the available options. For example, specify what type of backup should be used to restore the blocks.


# restore from backupset
RMAN> BLOCKRECOVER DATAFILE 5 BLOCK 15 FROM BACKUPSET;


# restore from datafile image copy
RMAN> BLOCKRECOVER DATAFILE 5 BLOCK 15 FROM DATAFILECOPY;


# restore from backup set with tag "Sunday"
RMAN> BLOCKRECOVER DATAFILE 5 BLOCK 15 FROM TAG = Sunday;


# restore using backups created before log sequence 100
RMAN> BLOCKRECOVER DATAFILE 5 BLOCK 15 RESTORE UNTIL SEQUENCE 100;


# restore using one week ago backups
RMAN> BLOCKRECOVER DATAFILE 5 BLOCK 15 RESTORE UNTIL 'SYSDATE-7';


# restore using backups until SCN 100
RMAN> BLOCKRECOVER DATAFILE 5 BLOCK 15 RESTORE UNTIL SCN 100;


3 . Recovering blocks listed in V$DATABASE_BLOCK_CORRUPTION view

Run the below command to recover all blocks marked corrupt in V$DATABASE_BLOCK_CORRUPTION.

RMAN> BLOCKRECOVER CORRUPTION LIST;

Restores blocks from backup sets created more than 7 days ago
RMAN> BLOCKRECOVER CORRUPTION LIST FROM BACKUPSET RESTORE UNTIL TIME 'SYSDATE-7';


Note:-Block corruptions in RMAN backups and copies is kept in V$BACKUP_CORRUPTION and V$COPY_CORRUPTION


Step 3: Allow Recovery to corrupt blocks
-----------------------------------------
During recovery database finds corrupt blocks then recovery stops. Run the below command in order to skip the corrupt blocks and proceed with recovery.

SQL>RECOVER DATABASE ALLOW n CORRUPTION;

Where n is the number of allowable corrupt blocks




**************************************



Another Live example :********




Problem: the application encounters an ORA-01578 runtime error because there are one or more corrupt blocks in a table it is reading.

How can corrupt blocks be caused?


First of all we have two diffent kinds of block corruption:

- physical corruption (media corrupt)

- logical corruption (soft corrupt)

Physical corruption can be caused by defected memory boards, controllers or broken sectors on a hard disk;


Logical corrution can amoung other reasons be caused by an attempt to recover through a NOLOGGING action.


There are two initialization parameters for dealing with block corruption:

- DB_BOCK_CHECKSUM (calculates a checksum for each block before it is written to disk, every time)

causes 1-2% performance overhead

- DB_BLOCK_CHECKING (serverprocess checks block for internal consistency after every DML)

causes 1-10% performance overhead

If performance is not a big issue then you should use these!


Normally RMAN checks only for physically corrupt blocks
with every backup it takes and every image copy it makes.
This is a common misunderstanding amoung a lot of DBAs.
RMAN doesn not automatically detect logical corruption by default!
We have to tell it to do so by using CHECK LOGICAL!
The info about corruptions can be found in the following views:


SYS @ orcl AS SYSDBA SQL > select * from v$backup_corruption;


RECID STAMP SET_STAMP SET_COUNT PIECE# FILE# BLOCK#
———- ———- ———- ———- ———- ———- ———-
BLOCKS CORRUPTION_CHANGE# MAR CORRUPTIO
———- —————— — ———
1 586945441 586945402 3 1 5 81
4 0 YES CORRUPT



– SYS @ orcl AS SYSDBA SQL > select * from v$copy_corruption;

Here is a case study:


HR @ orcl SQL > select last_name, salary
2 from employees;


ERROR at line 2:
ORA-01578: ORACLE data block corrupted (file # 5, block # 83)
# this could be an ORA-26040 in Oracle 8i! and before
ORA-01110: data file 5: ‘/u01/app/oracle/oradata/orcl/
example01.dbf’



This is what you find in the alert_.log:
Wed Apr 5 08:17:40 2006
Hex dump of (file 5, block 83) in trace file
/u01/app/oracle/admin/orcl/udump/orcl_ora_14669.trc
Corrupt block relative dba: 0×01400053 (file 5, block 83)
Bad header found during buffer read
Data in bad block:
type: 67 format: 7 rdba: 0x0a545055
last change scn: 0×0000.0006d162 seq: 0×1 flg: 0×04
spare1: 0×52 spare2: 0×52 spare3: 0×0
consistency value in tail: 0xd1622301
check value in block header: 0x63be
computed block checksum: 0xe420
Reread of rdba: 0×01400053 (file 5, block 83)
found same corrupted data
Wed Apr 5 08:17:41 2006
Corrupt Block Found
TSN = 6, TSNAME = EXAMPLE
RFN = 5, BLK = 83, RDBA = 20971603
OBJN = 51857, OBJD = 51255, OBJECT = , SUBOBJECT =
SEGMENT OWNER = , SEGMENT TYPE =

Starting with Oracle 9i we can use RMAN

to check a database for both physically and logically corrupt blocks.
Here is the syntax:


RMAN> backup validate check logical database;

Starting backup at 05-04-2006:08:23:20
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=136 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/u01/app/oracle/oradata/orcl/
system01.dbf
input datafile fno=00003 name=/u01/app/oracle/oradata/orcl/
sysaux01.dbf
input datafile fno=00005 name=/u01/app/oracle/oradata/orcl/
example01.dbf
input datafile fno=00002 name=/u01/app/oracle/oradata/orcl/
undotbs01.dbf
input datafile fno=00004 name=/u01/app/oracle/oradata/orcl/
users01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:45
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
including current control file in backupset
including current SPFILE in backupset
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
including current control file in backupset
including current SPFILE in backupset
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 05-04-2006:08:24:10



RMAN does not physically backup the database with this command

but it reads all blocks and checks for corruptions.

If it finds corrupted blocks it will place the information about the corruption into a view:

SYS @ orcl AS SYSDBA SQL > select * from v$database_block_corruption;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
———- ———- ———- —————— ———
5 81 4 0 CORRUPT

this is what we find in the alert_.log:

Corrupt block relative dba: 0x014000b1 (file 5, block 177)
Bad header found during backing up datafile
Data in bad block:
type: 67 format: 7 rdba: 0x0a545055
last change scn: 0×0000.0007bc77 seq: 0×3 flg: 0×04
spare1: 0×52 spare2: 0×52 spare3: 0×0
consistency value in tail: 0xbc772003
check value in block header: 0xb32
computed block checksum: 0xe4c1
Reread of blocknum=177, file=/u01/app/oracle/oradata/orcl/
example01.dbf.
found same corrupt data


Now we can tell RMAN to recover all the blocks
which it has found as being corrupt:

RMAN> blockrecover corruption list;

# (all blocks from v$database_block_corruption)

Starting blockrecover at 05-04-2006:10:09:15
using channel ORA_DISK_1
channel ORA_DISK_1: restoring block(s) from datafile copy /u01/app/
oracle/flash_recovery_area/ORCL/datafile/o1_mf_example_236tmb1c_.dbf
starting media recovery
archive log thread 1 sequence 2 is already on disk as file /u01/app/oracle/
flash_recovery_area/ORCL/archivelog/2006_04_05/o1_mf_1_2_236wxbsp_.arc
archive log thread 1 sequence 1 is already on disk as file
/u01/app/oracle/oradata/
orcl/redo01.log
media recovery complete, elapsed time: 00:00:01
Finished blockrecover at 05-04-2006:10:09:24



this is in the alert_.log:

Starting block media recovery

Wed Apr 5 10:09:22 2006
Media Recovery Log /u01/app/oracle/flash_recovery_area/ORCL/
archivelog/2006_04_05/o1_mf_1_2_%u_.arc
Wed Apr 5 10:09:23 2006
Media Recovery Log /u01/app/oracle/flash_recovery_area/ORCL/
archivelog/2006_04_05/o1_mf_1_2_236wxbsp_.arc ( restored)
Wed Apr 5 10:09:23 2006
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1 Reading mem 0
Mem# 0 errs 0: /u01/app/oracle/oradata/orcl/redo01.log
Wed Apr 5 10:09:23 2006
Completed block media recovery


I recommend you to check your database for corrupt blocks
with RMAN on a regular basis, proactively.
If you do so you RMAN finds out about block corruptions
before your application runs into an ORA-01578 and
before you find out that you have backed up the corrupt blocks again and again.


There have been incidents when DBAs found out
that they did not have a backup with the un-corruted block any more,
because you have deleted the last one with a not corrupted version.
They could not recover the block any more!


For more detailed info about recovering corrupt blocks
(without and with RMAN, releases 7-10g)

pls see metalink also:
Subject: Handling Oracle Block Corruptions in Oracle7/8/8i/9i/10g
Doc ID: Note:28814.1 Type: BULLETIN

No comments:

Post a Comment