Thursday, October 14, 2010

Why excessive redo logs and archive logs are getting generated when we start a hot backup of database ?

This question was asked always in all interview datafile get updated with online transaction during hot backup or not , if not than why ?


Most of people always failed to convenience interviewer with valid reason. The below comments will help out to understand the complete flow of data during hot backup.



Many of you must have heard or experienced that while taking hot backup of database LGWR process writes aggressively. Meaning that more redo data has been written to redo log file and consecutively more archive logs gets generated.



Here is the common misconception we have in our mind. If some one ask, why excessive redo logs and archive logs are getting generated when we start a hot backup of database ??????????

Quickly we answer .. Its simple, when we put tablespace in hot backup mode,
Oracle will take a check point of tablespace and data files belonging to this tablespace will be freezed.

Any user activity happening on objects belonging to this tablespace wont write data to these datafiles, instead it will write data to redo log files. So obviously there will be more redo log file generation.


Well, to some extent this is COMPLETELY WRONG !!!!!!!!!!!!!!!!!!


I will straight way come to the point and explain you what happens when we put the tablespace in hot backup mode.


When a tablespace is put into backup mode, the following three things happen:

1. Oracle checkpoints the tablespace, flushing all changes from shared memory to disk.


2. The SCN markers for each datafile in that tablespace are "frozen" at their current values. Even though further updates will be sent to the datafiles, the SCN markers will not be updated until the tablespace is taken out of backup mode.


3. Oracle switches to logging full images of changed database blocks to the redologs. Instead of recording how it changed a particular block (the change vector), it will log the entire image of the block after the change. This is why the redologs grow at a much faster rate while hot backups are going on.



Your first assumption that datafiles belonging to the tablespace in hot backup mode is freezed is wrong.

Datafiles are not freezed, only the datafile headers will be freezed !!!!!!!

So simply imagine that when you put the tablespace in backup mode, Oracle will take a checkpoint and update the datafile headers with checkpoint SCN and there after it is freezed until we take tablespace out of backup mode.

Other datafile (other then header part) remains as normal and data changes happens continuously to this datafile.

Now you may want to ask me “do I mean to say that datafiles gets updated continuously even when we are coping the same to backup location ?”. The answer is YES. Never think that the datafile you are coping is “Consistent”. No, datafiles gets changed continuously !!!!!!!!!!!!!!



You might want to ask couple of more questions then.

1) If we say that backup file is not consistent and changes continuously, then how come Oracle is able to recover the database when we restore that datafile?

2) If the data changes are anyway happening continuously on data files, then why there is excess redo log generation ?


Thats it !! don’t ask me more then this. Let me explain answers to these questions.



Consider a typical case, where an Oracle database is installed on Linux platform. The standard Oracle block size if 8K and lets say that OS level data block size is 512K.

Now when we put the tablespace in “Begin Backup” mode checkpoint has happened and datafile header is freezed.

You found which are the files related to this tablespace and started copying using OS command. Now when you copy a datafile using OS command it is going to copy as per OS block size.

Lets say when you start copying it gave 8 blocks to you to copy – that means you are copying 4K (512K X 4) to backup location.

That means you are copying half of Oracle block to backup location. Now this process of copy can be preempted by Server CPU depending on load. Lets say when you started copying after copy of those 8 block (4K, half of Oracle block), your process get preempted by CPU and it has allocated CPU time to some other important process.

Mean while DBWR process changes that block that you have copied halfway (since datafile is not freezed and only header is freezed, continuous updates can happen to datafile).


After a while CPU returns back and gives you next 8 blocks to copy (rest of the halk Oracle block). Now here is the problem !!!!!!!!!!!

we copied half of the oracle block taken at time T0 and another half taken at time T1 and in-between the data block got changed.

Does this sounds consistent ? Not to me !!!!!!!!!!!!!!!!!!!!!!!!!

Such type of block is called “Fractured Block”.

Well, since Oracle copies files like this it should do some thing, so that during recovery it wont face any problem.



Usually in case of a normal tablespace (which is not in begin backup mode), when a transaction happens oracle generates redo information and puts in redo log file.

This is the bare minimum information that oracle generates in order to redo the information. It does not copy the complete block.

Where as in case of begin backup mode, if a transaction happens and changes any block FOR THE FIST TIME, oracle copies the complete block to redo log file. This happens only during first time.

If subsequent transaction updates the same block again, oracle will not copy the complete block to redo, instead it will generate minimum information to redo the changes. Now because oracle has to copy the complete block when it changes for the first time in begin backup mode, we say that excess redo gets generated when we put tablespace in begin backup mode.


Question arises, why Oracle has to copy the complete block to redo log files. As you have seen above that during copy of datafile, there can be many fractured blocks, and during restore and recovery its going to put those block back and try to recover.

Now assume that block is fractured and oracle has minimum information that it generates in the redo. Under such condition it wont be possible for Oracle to recover such blocks. So instead Oracle just copies the entire block back from redo log files to datafiles during recovery process. This will make the datafile consistent.

So recovery process is very important which takes care of all fractured blocks and makes it possible to recover a database.

I hope this explains above 2 questions.


Now you can easily explain why hot backup is not possible if database is in NOARCHIVELOG mode.


When you take a backup using RMAN, it does not generate excessive redo logs. The reason is simple. RMAN is intelligent. It does not use OS block for copying, instead it uses oracle blocks for copying datafiles so the files are consistent.


Hope this helps !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

No comments:

Post a Comment