OraZone4All: Dealing with Database Block corruption

DB_BLOCK_CHECKSUM (default true) - This helps to detect damage introduced by the disk or I/O system. The system uses a check sum in the header of the block to detect corruption, this has minimal impact on performance. The system tablespace is always set to true regardless of this parameter, others are only set when you enable them. Oracle checks the disk not the memory.

DB_BLOCK_CHECKING (default off - off, low, medium, full, true and false) - Help to detect damage introduced by faulty memory. Setting this option will impact performance, the processes will check for corruption every time the buffer containing the block is accessed, this could take up to 10% cpu performance.

Both db_block_checksum (set to true) and db_block_checking (set to low) for the system tablespace.

db_block_checksum

alter system set db_block_checksum = typical;
alter system set db_block_checksum = full;
Notes: additional overhead will be occurred
typical - 1-2% increase
full - 4-5% increase

db_block_checking

alter system set db_block_checking = off;
alter system set db_block_checking = low;
alter system set db_block_checking = medium;
alter system set db_block_checking = full;
alter system set db_block_checking = false; /* same as off */
alter system set db_block_checking = true; /* same as full */

Note: additional overhead will be occurred
low - 1-3% increase
medium - 3-6% increase
full - 6-10% increase

Normally you are informed by a corrupted block from a user trying to
 use the block, an alert will be sent to the alert log. An ORA-01578 
indicated a corrupted block and trace files will be created, use the 
DBA_EXTENTS table to match up the corrupted block to an object:

Example

select owner, segment_name, segment_type from dba_extents where file_id=7 and 5 between block_id and block_id + blocks-1;

select segment_name, file_id, block_id from dba_extents where owner = 'TEST01';

If a table has corrupted blocks but the primary key index does not, 
index searches may succeed depending on what columns/rows are selected, 
as long as the index that has the information to complete the query and 
not does not read the corrupted blocks.

DBverify:

dbverify is a oracle binary that can detect corrupted blocks in a datafile, it is used when running user managed backups, rman performs its own verification. dbverify checks all blocks including the
 blocks above the high watermark. dbverify can check offline/online 
datafiles and data file image copies but nothing else. dbverify is used 
at the operating system level it checks the structural integrity of the 
datafiles.

help page	$dbv help=y
test datafile	$dbv file=test01.dbf Note: you are looking for failing pages (blocks), an influx block is one that DBWn was writing while dbverify was trying to verify it, this is not an error and dbverify will try again until it gets consistent read.
test datafile with specific blocksize	$dbv blocksize=8192 file=test01.dbf logfile=/tmp/dbv_13022013.log

you can also use the analyze command to check for table or index corruption, however the analyze
 command will only check for corruption below the high watermark, so 
will not check unused space, it does not identify the corrupt block but 
gives a ORA-01498 error.

table only	analyze table <table_name> validate structure;
index only	analyze index <table_name> validate structure;
table and associated indexes	analyze table <table_name> validate structure cascade;

dbms_repair:

Is a set of procedures that will check objects for problems and make 
the object useable, note that it cannot repair the data (you are 
required to recover it). You must create a table that dbms_repair uses to store its output.

There are 4 procedures that will be used

dbms_repair.admin_tables - used to store the block address
dbms_repair.check_object - checks all the blocks
dbms_repair.fix_corrupt_object - marks the corrupted blocks (does not fix it)
dbms_repair.skip_corrupted_blocks - instructs oracle to ignore marked corrupted blocks

create the admin table	exec dbms_repair.admin_tables ( table_name => 'REPAIR_TABLE', table_type => dbms_repair.repair_table, action => dbms_repair.create_action, tablespace => 'TEST'); select owner, object_name, object_type from dba_objects where object_name like ‘%REPAIR_TABLE’; Note: The table REPAIR_CORRUPT_TAB will store details of any problems encountered when checking a table (must have prefix REPAIR_).
check the object	declare num_corrupt int; begin num_corrupt := 0; exec dbms_repair.check_object(schema_name =>'TEST03',object_name =>'TEST03',repair_table_name =>'REPAIR_TABLE',corrupt_count => num_corrupt); end; / select object_name, block_id, corrupt_type, marked_corrupt, corrupt_description, repair_description from repair_table; Note: If no rows exist in the REPAIR_TABLE then the table is ok. Referential integrity constraints on related tables can be broken, indexes can become out of sync with the table data and triggers on a table can cause logical corruption when using the DBMS_REPAIR package.
mark the block as corrupt	declare fix_block_count int; begin fix_block_count := 0; dbms_repair.fix_corrupt_blocks(schema_name =>'TEST03',object_name =>'TEST03', object_type => dbms_repair.table_ojbects, repair_table_name => ‘REPAIR_TABLE’, fix_count => fix_block_count); select object_name, block_id, marked_corrupt from repair_table;
skip the corrupted blocks	exec dbms_repair.skip_corrupt_blocks(schema_name =>'TEST03',object_name =>'TEST03', object_type => dbms_repair.table_ojbects, flags => dbm_repair.noskip_flag);

Recover corrupted block with RMAN 

If you are using user-managed backups then restore the file in the normal way

take damaged file offline
restore file from the backup made before the corruption
recover the file completely
bring recovered file online

RMAN can be set that it can either abort a backup when corruption is 
detected or specify a tolerance recording any corruption with the 
repository. By default RMAN only checks for physical corruptions (disk problem) not logical 
(software corruption disk is ok but oracle does not understand the 
block). 

BMR (block media recovery) can only be performed by RMAN and recovery either completes or fails (no incomplete recovery). It can only recover datafiles as it uses the redo log to recover. If the MAXCORRUPT setting has been set then RMAN backup will continue until this threshold, otherwise it will fail. If 
recovering a table while the database is up, indexes searches may 
succeed depending on what columns/rows are selected, if a user hits a 
block that is being recovered it will report an ORA-01578 error.

set corruption threshold	rman> run { set maxcorrupt for datafile 7 to 100; backup datafile 7; }
turn off block checking	rman> backup nochecksum datafile 7;
check for logical corruption	rman> backup check logical datafile 7;
display corruption	select * from v$database_block_corruption;
display corruption in backup set	select * from v$backup_corruption;
display corruption in image copy	select * from v$copy_corruption;

To recover a corrupted block using RMAN:

recover one block	rman> blockrecover datafile 7 block 5;
recover two datafiles and a number of blocks	rman> blockrecover datafile 7 block 5,6,7 datafile 9 block 21,25;
use a specified backup	rman> blockrecover datafile 7 block 5 from backupset 1093; rman> blockrecover datafile 7 block 5 from tag monthly_whole_backup;
recover blocks using backups made a week ago	rman> blockrecover corruption list until time sysdate – 7;

You can follow these steps as well:

1) Restore a backup from before this corruption occurs and recover.

STEPS TO BE Followed:

Step a. shutdown the database
Step b. restore the datafile
Step c. startup mount
Step d. recover database
        auto
Step e. alter database open
Step f. analyze table sys.source$ validate structure cascade;

2) If 1 is not an option (but 1 is the best solution) then we can try this option.

a. select obj#, rowid from sys.source$;

— When this stops/fails the last line should give you the obj# of the object affected by the corruption.

b. select OWNER#, NAME, TYPE#, STATUS from sys.obj$ where OBJ# = 

— This should give you the name of the package/procedure hit be the corruption.

c. exec DBMS_REPAIR.SKIP_CORRUPT_BLOCKS('SYS','source$');

— This will skip the corrupt block, and might allow you to re-create the object hit by the corruption.

IF you are not allowed to re-create the object – then after DBMS_REPAIR.
SKIP_CORRUPT_BLOCKS you should be able to perform an full Export of the database excluding the procedure/package hit by the corruption.

d. Use the Export to re-create the database and afterwards manually re-create the object that was skipped by the Export.

BUT, this is only a last option if 1 (restore/recover) really isn't an option, and it is with no guarantee for success.
The only way to guarantee that you will not run into further problems with this block is to recreate your database or go to a backup where the problem does not exist. 

Dump a block of data:

find the block data	select header_file, header_block from dba_segments where segment_name = 'TEST03';
dump the block data	alter system dump datafile 6 block 75; Note: the information will be in a trace file in the user dump directory (udump)
Using dumped info comfirm its the right object	select name from sys.obj$ where obj#='52716'; Note: object number objected from dumped information in trace file

Dump the tree of an index:

object object ID	select object_id from user_objects where object_type = 'INDEX' & object_name = 'TEMP_INDEX';
dump the tree	alter session set events 'immediate trace name treedump level <object_id>'; Note: object ID would have been obtained above, the information again is dumped into the user dump area in a trace file

OraZone4All

Wednesday, February 13, 2013

Dealing with Database Block corruption

No comments:

Post a Comment

Oracle ASM Concepts