Friday, January 21, 2011

Logical Standby Databases - Support for DSS-xxxx Errors

What to do when you receive a DSS-00001: SQL Apply is not running:

Confirm that the logical is not applying changes and that there is actually an issue.
Check the SqlApply Events to determine the latest error and determine if the SqlApply was stopped manually, or has failed due to an error.

If it was stopped manually, caused by a database restart, or some other non-error reason, restart the SQL Apply and confirm that Apply is catching up.

If it failed due to an error, combine the information from the SqlApply Events and the database alert log and take appropriate action:

All schema related errors need to be sent to Application support such as
ORA- 01403 "no data found" error
ORA-00955 name is already used by an existing object
Any errors such as database parameters, memory issues, ORA-0600 and datafile errors should be resolved by DBA.

Do not skip any transactions without being 100% sure that there will be no impact. E.g. if for some reason there is a DDL to delete an objects that does not exist, although this should not occur, no impact could be caused from skipping drop statement for a non-existent object.

What to do when you receive a DSS-00002: DSS-00002: Logical standby guard status set to None
Nothing needs to be done immediately. This is really a warning that the correct protection level is not set correctly. This can always be resolved during business hours.
What to do when you receive a DSS-00003: Logical standby apply latency greater than .....
Check that the SqlApply is still running
Double check if the SqlApply is up to date. It can catch up quickly sometimes. Run this check a number of times to determine if the SqlApply is progressing, and if it is catching up or falling behind further.

Check what SqlApply processes are doing. This will show if they are waiting for an archive log, They are processing a transaction, or are paging LCR's out to disk.

If SqlApply is waiting for an archivelog, check the archive log destinations for space issues and that the destinations on the primary and standby are not in error.

If the SqlApply has been restarted and a log that is still required had been deleted, there are cases when the log will not ship as a FAL request and needs to be copied manually.

If you find a specific apply process looks to be hung on a single transaction, obtain the SID for that SqlApply process and perform normal dba investigation as you would for any troublesome session.



What to do when you receive a DSS-00004: Archive log shipping contains .... gap(s) between sequences ....

Check that there archive log destinations are correct and not in error. Check the FAL parameters "SHOW PARAMETER FAL" are correct and gaps should be resolved automaticly.

All the queries required to perform these checks are in the section below.
Answers and queries to help monitor the Logical standby databases.

The formatting for these queries assume that SqlPlus is used and that 132 character wide screen is available.

Is this database a Logical standby?

Run the following SQL against the database:

SELECT DATABASE_ROLE FROM V$DATABASE;

Logical standby databases will show "LOGICAL STANDBY"


Is the Logical standby applying changes?

Run the following SQL against the Logical standby database:

SET LINESIZE 132
COLUMN REALTIME_APPLY FORMAT A15
COLUMN STATE FORMAT A60
SELECT REALTIME_APPLY, STATE FROM V$LOGSTDBY_STATE;

If the value of STATE is "NULL" or "SQL APPLY NOT ON" then the Sql Apply is not running.

The value of REALTIME_APPLY should be Y to allow for real time apply from the standby redo logs.





Is the SQL Apply up to date?

Run the following SQL against the Logical standby database:

SELECT
TO_CHAR(LATEST_TIME,'yyyy/mm/dd hh24:mi:ss') "LATEST_TIME",
TO_CHAR(APPLIED_TIME,'yyyy/mm/dd hh24:mi:ss') "APPLIED_TIME",
APPLIED_SCN, LATEST_SCN
FROM V$LOGSTDBY_PROGRESS;

If the LATEST_TIME and the APPLIED_TIME are close, then the SQL Apply is working fine.

A NULL value in the APPLIED_TIME may indicate that the SQL Apply is not running.
If the LATEST_TIME is not close to the current real time, there may be an issue with receiving the archive logs from the primary.


Which archive log sequences are at what stage for logical standby?

Run the following SQL against the Logical standby database:

SELECT 'RESTART' "TYPE", P.RESTART_SCN "SCN", TO_CHAR(P.RESTART_TIME,'yyyy/mm/dd hh24:mi:ss') "TIME", L.SEQUENCE# "SEQ#"
FROM V$LOGSTDBY_PROGRESS P,DBA_LOGSTDBY_LOG L WHERE P.RESTART_SCN >= L.FIRST_CHANGE# and P.RESTART_SCN < L.NEXT_CHANGE#
UNION
SELECT 'RESTART', P.RESTART_SCN, TO_CHAR(P.RESTART_TIME,'yyyy/mm/dd hh24:mi:ss'), L.SEQUENCE#
FROM V$LOGSTDBY_PROGRESS P, V$STANDBY_LOG L WHERE P.RESTART_SCN >= L.FIRST_CHANGE# and P.LATEST_SCN <=L.LAST_CHANGE#
UNION
SELECT 'APPLIED', P.APPLIED_SCN, TO_CHAR(P.APPLIED_TIME,'yyyy/mm/dd hh24:mi:ss'), L.SEQUENCE#
FROM V$LOGSTDBY_PROGRESS P,DBA_LOGSTDBY_LOG L WHERE P.APPLIED_SCN >= L.FIRST_CHANGE# and P.APPLIED_SCN < L.NEXT_CHANGE#
UNION
SELECT 'APPLIED', P.APPLIED_SCN, TO_CHAR(P.APPLIED_TIME,'yyyy/mm/dd hh24:mi:ss'), L.SEQUENCE#
FROM V$LOGSTDBY_PROGRESS P, V$STANDBY_LOG L WHERE P.APPLIED_SCN >= L.FIRST_CHANGE# and P.LATEST_SCN <=L.LAST_CHANGE#
UNION
SELECT 'MINING', P.MINING_SCN, TO_CHAR(P.MINING_TIME,'yyyy/mm/dd hh24:mi:ss'), L.SEQUENCE#
FROM V$LOGSTDBY_PROGRESS P, DBA_LOGSTDBY_LOG L WHERE P.MINING_SCN >= L.FIRST_CHANGE# and P.MINING_SCN < L.NEXT_CHANGE#
UNION
SELECT 'MINING', P.MINING_SCN, TO_CHAR(P.MINING_TIME,'yyyy/mm/dd hh24:mi:ss'), L.SEQUENCE#
FROM V$LOGSTDBY_PROGRESS P, V$STANDBY_LOG L WHERE P.MINING_SCN >= L.FIRST_CHANGE# and P.LATEST_SCN <=L.LAST_CHANGE#
UNION
SELECT 'SHIPPED', P.LATEST_SCN, TO_CHAR(P.LATEST_TIME,'yyyy/mm/dd hh24:mi:ss'), L.SEQUENCE#
FROM V$LOGSTDBY_PROGRESS P, DBA_LOGSTDBY_LOG L WHERE P.LATEST_SCN >= L.FIRST_CHANGE# and P.LATEST_SCN < L.NEXT_CHANGE#
UNION
SELECT 'SHIPPED', P.LATEST_SCN, TO_CHAR(P.LATEST_TIME,'yyyy/mm/dd hh24:mi:ss'), L.SEQUENCE#
FROM V$LOGSTDBY_PROGRESS P, V$STANDBY_LOG L WHERE P.LATEST_SCN >= L.FIRST_CHANGE# and P.LATEST_SCN <=L.LAST_CHANGE#;


An important piece of information is the SEQ# associated with RESTART. Although archive logs may have been mined and applied, the archive logs back to the RESTART SEQ# are required in the event that the SQL Apply is restarted.



********

What major Sql Apply events have occurred?

Run the following SQL against the Logical standby database:

SET LINESIZE 200
SET LONG 400
SET PAGESIZE 999
column EVENT_TIME FORMAT A20
column STATUS FORMAT A50
column EVENT FORMAT A100
SELECT TO_CHAR(EVENT_TIME,'YYYY/MM/DD HH24:MI:SS') "EVENT_TIME", STATUS, EVENT FROM DBA_LOGSTDBY_EVENTS ORDER BY EVENT_TIME;


By default, only the last 100 events are listed, you may be lucky enough to find older events in the alert.log file, however this will not always show all the information.


What major Dataguard events have occurred?
Run the following SQL against the Logical standby database:
SET PAGESIZE 999
SET LINESIZE 132
column TIME FORMAT A19
column ERROR FORMAT 999999
column DEST FORMAT 9999
column MESSAGE FORMAT A90
SELECT TO_CHAR(TIMESTAMP,'yyyy/mm/dd hh24:mi:ss') "TIME", ERROR_CODE "ERROR", DEST_ID "DEST", MESSAGE
FROM V$DATAGUARD_STATUS
WHERE timestamp > TRUNC(sysdate+6/24)
ORDER by timestamp DESC;


Which objects or statements are being skipped by Sql Apply?

Run the following SQL against the Logical standby database:

set linesize 132
set pagesize 999
column STATEMENT_OPT FORMAT A30
column OWNER FORMAT A20
column NAME FORMAT A30
column PROC FORMAT A30
SELECT STATEMENT_OPT, OWNER, NAME, PROC FROM DBA_LOGSTDBY_SKIP;

To check if there are specific transactions scheduled to be skipped, run the following SQL against the Logical standby database:

SELECT * FROM DBA_LOGSTDBY_SKIP_TRANSACTION;
What are the running Sql Apply processes doing?
Run the following SQL against the Logical standby database:
set linesize 132
COLUMN SID FORMAT 99999
COLUMN SERIAL# FORMAT 9999999
COLUMN LOGSTDBY_ID FORMAT 99999
COLUMN SPID FORMAT A10
COLUMN TYPE FORMAT A12
COLUMN STATUS_CODE FORMAT 99999
COLUMN STATUS FORMAT A50
COLUMN HIGH_SCN FORMAT 9999999999
SELECT SID, SERIAL#, LOGSTDBY_ID, SPID, TYPE, STATUS_CODE, HIGH_SCN, STATUS
FROM v$logstdby_process;


The STATUS column is very descriptive of what each Sql Apply process is doing.

Where are the archive logs going and are there any achieving issues?

Run the following SQL against either the logical standby or primary database:
set linesize 150
column DID FORMAT 999
column STATUS FORMAT A10
column DESTINATION FORMAT A30
column ARCHIVER FORMAT A4
column VALID_TYPE FORMAT A15
column VALID_ROLE FORMAT A12
column VALID_NOW FORMAT A16
column RECOVERY_MODE FORMAT A30
column ERROR FORMAT A40
SELECT DEST_ID "DID", STATUS, DESTINATION, ARCHIVER, VALID_NOW, VALID_TYPE, VALID_ROLE, ERROR FROM V$ARCHIVE_DEST
WHERE STATUS <> 'INACTIVE';


What are the current Sql Apply settings and statistics?

Run the following SQL against either the logical standby or primary database:
SELECT NAME, VALUE FROM V$LOGSTDBY_STATS;

Lots of useful information contained in this table.

How do I tell if there are not enough Apply processes?

Run the following SQL against either the logical standby or primary database:

COLUMN TRAN_APPLIED FORMAT A10
COLUMN TRAN_READY FORMAT A10
COLUMN NUM_APPLIERS FORMAT A10
COLUMN RATIO FORMAT 9999999999
SELECT TA.TRAN_APPLIED, TR.TRAN_READY, AP.NUM_APPLIERS, (TA.TRAN_APPLIED - TR.TRAN_READY )/ AP.NUM_APPLIERS "RATIO"
FROM
(SELECT VALUE "TRAN_READY" FROM V$LOGSTDBY_STATS WHERE NAME ='transactions ready') TR,
(SELECT VALUE "TRAN_APPLIED" FROM V$LOGSTDBY_STATS WHERE NAME ='transactions applied') TA,
(SELECT VALUE "NUM_APPLIERS" FROM V$LOGSTDBY_STATS WHERE NAME ='number of appliers') AP;

The RATIO column should return a value of 0 or greater.

A value of between 0 and 1 means there are idle apply processes and there is no gain by adding additional apply processes.

A value of between 1 and 2 means there are queued transactions waiting to be applied and there may be minimal gain by adding additional apply processes.

A value of greater that 2 shows contention on the apply processes and indicates additional apply processes need to be added.

This processes needs to be ran over a period of time as it only measures instantaneous values.


How do I skip a specific transaction?

1. Obtain the XIDUSN, XIDSLT, AND XIDSQN values for the failing transaction from the DBA_LOGSTDBY_EVENTS table. The following SQL may be of help:

SET LINESIZE 200
SET LONG 400
SET PAGESIZE 999
column EVENT_TIME FORMAT A20
column STATUS FORMAT A50
column CURRENT_SCN 999999999999999
column COMMIT_SCN 999999999999999
column XIDUSN FORMAT 999999
column XIDSLT FORMAT 999999
column XIDSQN FORMAT 999999
SELECT TO_CHAR(EVENT_TIME,'YYYY/MM/DD HH24:MI:SS') "EVENT_TIME", STATUS, CURRENT_SCN, COMMIT_SCN, XIDUSN, XIDSLT, XIDSQN
FROM DBA_LOGSTDBY_EVENTS
WHERE EVENT_TIME > SYSDATE-1
and status like 'ORA-00955%'
ORDER BY EVENT_TIME;


2. Execute the skipped transaction manually with any changes as needed

3. Skip the failed DDL statement by using the DBMS_LOGSTDBY.SKIP_TRANSACTION
procedure. With values from step 1.

exec dbms_logstdby.skip_transaction(, , );

The following statement may be helpful:

SELECT 'exec dbms_logstdby.skip_transaction( '||XIDUSN||','|| XIDSLT||','|| XIDSQN||');' FROM DBA_LOGSTDBY_EVENTS where ...


4. Restart Sql Apply.
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;

How do stop and start the SqlApply processes?
Run the following SQL against the logical standby to start realtime SqlApply:
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;


Run the following SQL against the logical standby to start realtime SqlApply if the SqlApply failed with an error, and you are 100% certain that the transaction is safe to skip :

ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE SKIP FAILED TRANSACTION;

Run the following SQL against the logical standby to stop SqlApply:
ALTER DATABASE STOP LOGICAL STANDBY APPLY;

No comments:

Post a Comment