Detect And Repair Corruption in an Oracle Database

原文

Oracle provides a number of methods to detect and repair corruption within datafiles.

RMAN (BACKUP VALIDATE, RESTORE VALIDATE, VALIDATE)

Oracle Recovery Manager (RMAN) can validate the database using the BACKUP VALIDATE command.

RMAN> BACKUP VALIDATE DATABASE ARCHIVELOG ALL;

The process outputs the same information you would see during a backup, but no backup is created. Any block corruptions are visible in the V$DATABASE_BLOCK_CORRUPTION view, as well as in the RMAN output.

By default the command only checks for physical corruption. Add the CHECK LOGICAL clause to include checks for logical corruption.

RMAN> BACKUP VALIDATE CHECK LOGICAL DATABASE ARCHIVELOG ALL;

RMAN can validate the contents of backup files using the RESTORE VALIDATE command.

RMAN> RESTORE DATABASE VALIDATE;
RMAN> RESTORE ARCHIVELOG ALL VALIDATE;

In a similar way to the BACKUP VALIDATE command, the RESTORE VALIDATE command mimics the process of a restore, without actually performing the restore.

Prior to 11g, the straight VALIDATE command could only be used to validate backup related files. In Oracle 11g onward, the VALIDATE command can also validate datafiles, tablespaces or the whole database, so you can use it in place of the BACKUP VALIDATE command.

RMAN> VALIDATE DATAFILE 1;
RMAN> VALIDATE DATAFILE '/u01/app/oracle/oradata/ORCL/system01.dbf';

RMAN> VALIDATE CHECK LOGICAL DATAFILE 1;
RMAN> VALIDATE CHECK LOGICAL DATAFILE '/u01/app/oracle/oradata/ORCL/system01.dbf';

RMAN> VALIDATE TABLESPACE users;
RMAN> VALIDATE CHECK LOGICAL TABLESPACE users;

RMAN> VALIDATE DATABASE;
RMAN> VALIDATE CHECK LOGICAL DATABASE;

Any block corruptions are visible in the V$DATABASE_BLOCK_CORRUPTION view. You can identify the objects containing a corrupt block using a query like this.

COLUMN owner FORMAT A20
COLUMN segment_name FORMAT A30

SELECT DISTINCT owner, segment_name
FROM   v$database_block_corruption dbc
       JOIN dba_extents e ON dbc.file# = e.file_id AND dbc.block# BETWEEN e.block_id and e.block_id+e.blocks-1
ORDER BY 1,2;

More syntax examples can be found here.

Multitenant : RMAN VALIDATE

Many of the RMAN commands for the main instance can be targeted at a pluggable database (PDB) or a root container. For example, when connect to the root container you can do the following.

rman target=/

# Everything.
VALIDATE DATABASE;
VALIDATE CHECK LOGICAL DATABASE;

# Just the root container.
VALIDATE DATABASE ROOT;
VALIDATE CHECK LOGICAL DATABASE ROOT;

# Just the specified PDB, or comma-separated list.
VALIDATE PLUGGABLE DATABASE pdb1;
VALIDATE CHECK LOGICAL PLUGGABLE DATABASE pdb1;

If you are connected directly to the PDB you can issue the commands directly.

rman target=sys/SysPassword1@pdb1

# Just the specified PDB.
VALIDATE DATABASE;
VALIDATE CHECK LOGICAL DATABASE;

DBVerify

DBVerify is an external utility that allows validation of offline and online datafiles. In addition to offline datafiles it can be used to check the validity of backup datafiles.

C:\>dbv file=C:\Oracle\oradata\TSH1\system01.dbf feedback=10000 blocksize=8192

This utility is not usually used for controlfiles or redo logs, but in MOS Doc ID 1949795.1 there is an example of using it with controlfiles.

ANALYZE .. VALIDATE STRUCTURE

The ANALYZE command can be used to verify each data block in the analyzed object. If any corruption is detected rows are added to the INVALID_ROWS table.

-- Create the INVALID_ROWS table
SQL> @C:\Oracle\901\rdbms\admin\UTLVALID.SQL

-- Validate the table structure.
SQL> ANALYZE TABLE scott.emp VALIDATE STRUCTURE;

-- Validate the table structure along with all it's indexes.
SQL> ANALYZE TABLE scott.emp VALIDATE STRUCTURE CASCADE;

-- Validate the index structure.
SQL> ANALYZE INDEX scott.pk_emp VALIDATE STRUCTURE;

DB_BLOCK_CHECKING

When the DB_BLOCK_CHECKING parameter is set to [TRUE|HIGH] Oracle performs a walk through of the data in the block to check it is self-consistent. Unfortunately block checking can add between 1 and 10% overhead to the server. Oracle recommend setting this parameter to [TRUE|HIGH] if the overhead is acceptable.

Allowable values include [OFF|FALSE], LOW, MEDIUM, [HIGH|TRUE]. Read the definitions here.

Block Media Recovery (BMR)

Block Media Recovery (BMR) allows specified blocks to be recovered without affecting the entire datafile. It is only intended for use where a known and limited number of blocks is affected. This results in a reduced mean time to recovery (MTTR) and higher availability as only the affected blocks are offline during the operation. BMR can only be performed via RMAN using the BLOCKRECOVER command.

 Block Media Recovery is an Enterprise Edition feature.

Corrupt blocks can be identified using:

  • Error messages
  • The alert log
  • Trace files
  • ANALYZE [TABLE | INDEX] commands
  • The dbverify utility
  • The V$BACKUP_CORRUPTION & V$COPY_CORRUPTION views list corrupt blocks in the backups, not the database itself.
  • The V$DATABASE_BLOCK_CORRUPTION lists corrupt blocks in the database detected during a variety of RMAN operations. Recovered blocks will still be listed until the next backup is performed.

Once detected, corrupt blocks can be recovered individually. Alternatively, the CORRUPTION LIST option can be used to recover all blocks listed in the V$DATABASE_BLOCK_CORRUPTION view. This list can be limited using the UNTIL option.

BLOCKRECOVER DATAFILE 3 BLOCK 121;
BLOCKRECOVER CORRUPTION LIST RESTORE UNTIL TIME 'SYSDATE - 7';

DBMS_REPAIR

Unlike the previous methods dicussed, the DBMS_REPAIR package allows you to detect and repair corruption. The process requires two administration tables to hold a list of corrupt blocks and index keys pointing to those blocks. These are created as follows.

BEGIN
  DBMS_REPAIR.admin_tables (
    table_name => 'REPAIR_TABLE',
    table_type => DBMS_REPAIR.repair_table,
    action     => DBMS_REPAIR.create_action,
    tablespace => 'USERS');

  DBMS_REPAIR.admin_tables (
    table_name => 'ORPHAN_KEY_TABLE',
    table_type => DBMS_REPAIR.orphan_table,
    action     => DBMS_REPAIR.create_action,
    tablespace => 'USERS');
END;
/

With the administration tables built we are able to check the table of interest using the CHECK_OBJECT procedure.

SET SERVEROUTPUT ON
DECLARE
  v_num_corrupt INT;
BEGIN
  v_num_corrupt := 0;
  DBMS_REPAIR.check_object (
    schema_name       => 'SCOTT',
    object_name       => 'DEPT',
    repair_table_name => 'REPAIR_TABLE',
    corrupt_count     =>  v_num_corrupt);
  DBMS_OUTPUT.put_line('number corrupt: ' || TO_CHAR (v_num_corrupt));
END;
/

Assuming the number of corrupt blocks is greater than 0 the CORRUPTION_DESCRIPTION and the REPAIR_DESCRIPTION columns of the REPAIR_TABLE can be used to get more information about the corruption.

At this point the currupt blocks have been detected, but are not marked as corrupt. The FIX_CORRUPT_BLOCKS procedure can be used to mark the blocks as corrupt, allowing them to be skipped by DML once the table is in the correct mode.

SET SERVEROUTPUT ON
DECLARE
  v_num_fix INT;
BEGIN
  v_num_fix := 0;
  DBMS_REPAIR.fix_corrupt_blocks (
    schema_name       => 'SCOTT',
    object_name       => 'DEPT',
    object_type       => Dbms_Repair.table_object,
    repair_table_name => 'REPAIR_TABLE',
    fix_count         => v_num_fix);
  DBMS_OUTPUT.put_line('num fix: ' || TO_CHAR(v_num_fix));
END;
/

Once the corrupt table blocks have been located and marked all indexes must be checked to see if any of their key entries point to a corrupt block. This is done using the DUMP_ORPHAN_KEYS procedure.

SET SERVEROUTPUT ON
DECLARE
  v_num_orphans INT;
BEGIN
  v_num_orphans := 0;
  DBMS_REPAIR.dump_orphan_keys (
    schema_name       => 'SCOTT',
    object_name       => 'PK_DEPT',
    object_type       => DBMS_REPAIR.index_object,
    repair_table_name => 'REPAIR_TABLE',
    orphan_table_name => 'ORPHAN_KEY_TABLE',
    key_count         => v_num_orphans);
  DBMS_OUTPUT.put_line('orphan key count: ' || TO_CHAR(v_num_orphans));
END;
/

If the orphan key count is greater than 0 the index should be rebuilt.

The process of marking the table block as corrupt automatically removes it from the freelists. This can prevent freelist access to all blocks following the corrupt block. To correct this the freelists must be rebuilt using the REBUILD_FREELISTS procedure.

BEGIN
  DBMS_REPAIR.rebuild_freelists (
    schema_name => 'SCOTT',
    object_name => 'DEPT',
    object_type => DBMS_REPAIR.table_object);
END;
/

The final step in the process is to make sure all DML statements ignore the data blocks marked as corrupt. This is done using the SKIP_CORRUPT_BLOCKS procedure.

BEGIN
  DBMS_REPAIR.skip_corrupt_blocks (
    schema_name => 'SCOTT',
    object_name => 'DEPT',
    object_type => DBMS_REPAIR.table_object,
    flags       => DBMS_REPAIR.skip_flag);
END;
/

The SKIP_CORRUPT column in the DBA_TABLES view indicates if this action has been successful.

At this point the table can be used again but you will have to take steps to correct any data loss associated with the missing blocks.

Other Repair Methods

Other methods to repair corruption include:

  • Full database recovery.
  • Individual datafile recovery.
  • Recreate the table using the CREATE TABLE .. AS SELECT command, taking care to avoid the corrupt blocks by retricting the where clause of the query.
  • Drop the table and restore it from a previous export. This may require some manual effort to replace missing data.

For more information see:

Leave a Reply