Sunday, September 26, 2010

Redo Concepts

Redo:

All changes to the database are recorded by redo. Redo includes all changes to datafiles, but does not include changes to control files or the parameter file.

Redo is initially written to online redo logs. The contents of a redo log file depend on a combination of Oracle version, operating system and server architecture. In general redo logs written on one architecture cannot be read on another. There are a few exceptions to this rule. For example, in Oracle 10.2 a redo log written in Linux can be read by a Windows database.
Redo Threads

Each online redo log has a thread number and a sequence number. The thread number is mainly relevant in RAC databases where there can be multiple threads; one for each instance. The thread number is not necessarily the same as the instance number. For single instance databases there is only one redo log thread at any time.
Redo Log Groups

A redo thread consists of two or more redo log groups.

Each redo log group contains one or more physical redo log files known as members. Multiple members are configured to provide protection against media failure (mirroring). All members within a redo log group should be identical at any time.

Each redo log group has a status. Possible status values include UNUSED, CURRENT, ACTIVE and INACTIVE. Initially redo log groups are UNUSED. Only one redo log group can be CURRENT at any time. Following a log switch, redo log group continues to be ACTIVE until a checkpoint has completed. Thereafter the redo log group becomes INACTIVE until it is reused by the LGWR background process.
Log Switches

Log switches occur when the online redo log becomes full. Alternatively log switches can be triggered externally by commands such as:

ALTER SYSTEM SWITCH LOGFILE;

When a log switch occurs, the sequence number is incremented and redo continues to be written to the next file in the sequence. If archive logging is enabled, then following a low switch the completed online redo log will be copied to the archive log destination(s) either by the ARCH background process or the LNSn background process depending on the configuration.
Redo Log Files

A redo log file consists of a number of fixed size blocks. The overall size of the redo log file is specified when the log group is created. For most platforms including Linux and Solaris the redo log block size is 512 bytes. On other platforms including HP/UX Itanium the redo log block size can be 1024 bytes.

Each redo log file has a fixed header. In recent versions of Oracle (8.0 and above) this header is two blocks. Therefore on Linux/Solaris the header is 1024 bytes. The second block of the header contains a standard Oracle file header which includes the following information:

* Database name
* Thread
* Compatibility Version
* Start Time
* End Time
* Start SCN
* End SCN

Other data is stored in the header. Note that the End SCN is actually the Start SCN of the next redo log file.
Redo Blocks

The body of the redo log file is used to store redo blocks. Each redo block has a 16 byte header (Oracle 9.2 and 10.2). The remainder of each redo block is used to store redo records.
Redo Records

Redo records are a logical structure. The upper size limit is probably 65536 bytes. Redo records can therefore span multiple physical redo blocks. A physical redo block can also contain multiple redo records.

Each redo record has a header. The VLD field in the redo record header specifies the type of the redo record. The size of the redo record header varies depending on the type.

In Oracle 9.2 the redo record header is normally 12 bytes, though they can occasionally increase in size to 28 bytes. In Oracle 10.2 the redo record header is normally 24 bytes, though under some circumstances they can increase to 68 bytes.

The following is an example of a redo record header from Oracle 10.2:

REDO RECORD - Thread:1 RBA: 0x000092.00000193.0088 LEN: 0x0050 VLD: 0x01
SCN: 0x0000.00181068 SUBSCN: 1 05/07/2009 21:53:48

The header includes the following fields

* Thread - redo log thread number
* RBA - redo byte address - address of redo record within redo log. Format is ..
* LEN - length of redo record in bytes including header
* VLD - see below
* SCN - system change number of redo record
* SUBSCN: Unknown
* Timestamp e.g. 05/07/2009 21:53:48

The VLD field determines the size of the redo record header. Known values are shown in the following table. These values may vary from one release to another.

Mnemonic Value Description
KCRVOID 0 The contents are not valid
KCRVALID 1 Includes change vectors
KCRDEPND 2 Includes commit SCN
KCRVOID 4 Includes dependent SCN
KCRNMARK 8 New SCN mark record. SCN allocated exactly at this point in the redo log by this instance

KCROMARK 16 Old SCN mark record. SCN allocated at or before this point in the redo. May be allocated by another instance
KCRORDER 32 New SCN was allocated to ensure redo for some block would be ordered by inc/seq# when redo sorted by SCN
Change Vectors


A redo record consists of one or more change records known as change vectors. Each change vector consists of:

* change header
* list of element lengths
* list of elements

The size of the change header is 28 bytes in both Oracle 9.2 and 10.2.

The list of element lengths has a two byte header specifying the overall length of the element length list in bytes. The length of each element is stored in a two byte field. Finally if the structure does not align on a four byte boundary, a further two byte field is appended.

The list of elements consists of one or more elements aligned on a four byte boundary. Element sizes can range from four bytes to at least 32K.

If supplemental logging is enabled then for update operations (11.5), additional elements are appended to the change vector containing the primary key, unique key or column values of the row.
Operation Codes

Each change vector has an operation code. In Oracle 9.2 there were over 150 redo log operations; this number has grown significantly in Oracle 10.2 though the exact figure is not known. The operation code consists of a major number and a minor number.

The major number describes the level in the kernel where the redo is generated. The following table shows common levels:

Level Description
4 Block Cleanout
5 Transaction Layer (Undo)
10 Index Operation
11 Table Operation (DML)
13 Block Allocation
14 Extent Allocation
17 Backup Management
18 Online Backup
19 Direct Load
20 Transaction Metadata (LogMiner)
22 Space Management (ASSM)
23 Block Write (DBWR)
24 DDL Statement

For each level there is one or more subcode. Follow the hyperlinks for more details on individual operations:

* Level 4 - Block Cleanout
* Level 5 - Transaction Layer (Undo)
* Level 10 - Index Operation
* Level 11 - Table Operation (DML)
* Level 13 - Block Allocation
* Level 14 - Extent Allocation
* Level 17 - Backup Management
* Level 18 - Online Backup
* Level 19 - Direct Load
* Level 20 - Transaction Metadata (LogMiner)
* Level 22 - Space Management (ASSM)
* Level 23 - Block Write (DBWR)
* Level 24 - DDL Statement

Log File Dumps

Symbolic dumps can be created for both online redo logs and archived redo logs using the following syntax:

ALTER SYSTEM DUMP LOGFILE '';

For online redo logs the filename of the current redo log can be obtained using the following SQL:

SELECT member FROM v$logfile
WHERE group# =
(
SELECT group# FROM v$log
WHERE status = 'CURRENT'
);


*******************************************

Oracle Streams Apply Process changes in 11GR2

I’ve setup a simple replication for table t1 from schema src to schema dst.
Changed Apply Server parallelism to 1 and did a simple test with inserting 100 rows while performing a sql trace:



SQL>desc src.t1;

Name Type Nullable Default Comments
---- ------------- -------- ------- --------
N NUMBER
V VARCHAR2(100) Y

SQL> select count(*) from src.t1;

COUNT(*)
----------
0

SQL> select count(*) from dst.t1;

COUNT(*)
----------
0

SQL> select sid from v$streams_apply_server;

SID
----------
22

SQL> exec dbms_monitor.session_trace_enable(22, waits => true);

PL/SQL procedure successfully completed

SQL> insert into src.t1
2 select level, to_char(level)
3 from dual
4 connect by level <= 100;

100 rows inserted SQL> commit;

Commit complete

SQL> select count(*) from dst.t1;

COUNT(*)
----------
100

SQL> exec dbms_monitor.session_trace_disable(22);

PL/SQL procedure successfully completed


******************

To my surprise, when I went to take a look at the trace file, I couldn’t find anything related to Apply process inserting rows into dst.t1, only a handful of internal housekeeping statements.

This made me curious as to where all the stuff really went to so I’ve decided to take a look at v$sql and see if there will be any clues:

******************

SQL> select sql_text
from v$sql
where lower(sql_text) like '%insert%dst%t1%';

SQL_TEXT
--------------------------------------------------------------------------------
select sql_text from v$sql where lower(sql_text) like '%insert%dst%t1%'




************


The only thing I was able to see there was . . . my own statement. Things were starting to look serious.

In order to finally figure out who did what I’ve launched a logminer:



SQL> begin
dbms_logmnr.start_logmnr(
startTime => sysdate-30/1440,
endTime => sysdate,
Options => dbms_logmnr.DICT_FROM_ONLINE_CATALOG+dbms_logmnr.CONTINUOUS_MINE
);
end;
/

PL/SQL procedure successfully completed


*********************


Let’s see who actually inserted all these rows:



SQL> select * from (
select session#, sql_redo
from v$logmnr_contents
where operation='INSERT'
and seg_owner='DST'
and table_name='T1'
order by timestamp desc
) where rownum <= 5;

SESSION# SQL_REDO
---------- --------------------------------------------------------------------------------
22 insert into "DST"."T1"("N","V") values ('1','1');
22 insert into "DST"."T1"("N","V") values ('5','5');
22 insert into "DST"."T1"("N","V") values ('4','4');
22 insert into "DST"."T1"("N","V") values ('3','3');
22 insert into "DST"."T1"("N","V") values ('2','2');



********

Session with SID 22 is nothing else but our Apply Server . . .

The next step was to try and figure out whether we’re really dealing with some new codepath responsible for such spectacular performance (apparently, due to the complete lack of instrumentation :) or this is just another weird Oracle bug.

I’ve blocked the Apply Server on updating a row and looked at the Apply Server’s stack:


************


01 [oracle@ora11gr2 trace]$ pstack 17036
02 #0 0x0000003b83cd450a in semtimedop () from /lib64/libc.so.6
03 #1 0x00000000085ef3f3 in sskgpwwait ()
04 #2 0x00000000085ee5c6 in skgpwwait ()
05 #3 0x000000000829ee31 in ksliwat ()
06 #4 0x000000000829e422 in kslwaitctx ()
07 #5 0x0000000000af92f5 in ksqcmi ()
08 #6 0x00000000082ac019 in ksqgtlctx ()
09 #7 0x00000000082aa77a in ksqgelctx ()
10 #8 0x0000000000c4d566 in ktcwit1 ()
11 #9 0x00000000082d5d99 in kdddgb ()
12 #10 0x00000000082c7530 in kdusru ()
13 #11 0x00000000082c0902 in kauupd ()
14 #12 0x0000000001f57c14 in kddiruUpdate ()
15 #13 0x000000000179eeab in knasdaExecDML ()
16 #14 0x000000000179d928 in knasdaProcDML ()
17 #15 0x000000000178c6fd in knaspd ()
18 #16 0x0000000001787d2f in knasplcr ()
19 #17 0x00000000017866d7 in knaspx ()
20 #18 0x0000000001770fd5 in knalsProc1Txn ()
21 #19 0x000000000177022d in knalsptxn ()
22 #20 0x00000000017424a6 in knasm2 ()
23 #21 0x0000000001776d8d in knalsma ()
24 #22 0x0000000000c25a7d in knlkcbkma ()
25 #23 0x0000000000b93ba7 in ksvrdp ()
26 #24 0x00000000020d2dd7 in opirip ()
27 #25 0x00000000016fe729 in opidrv ()
28 #26 0x0000000001b7183f in sou2o ()
29 #27 0x00000000009d3f8a in opimai_real ()
30 #28 0x0000000001b76ace in ssthrdmain ()
31 #29 0x00000000009d3e71 in main ()

Before we move on, here is a stack dump from a blocked Apply Server in 10.2.0.4:
view source
print?
01 [oracle@ora10gr2 trace]$ pstack 23787
02 #0 0x0000003b83cd450a in semtimedop () from /lib64/libc.so.6
03 #1 0x00000000085ef3f3 in sskgpwwait ()
04 #2 0x00000000085ee5c6 in skgpwwait ()
05 #3 0x000000000829ee31 in ksliwat ()
06 #4 0x000000000829e422 in kslwaitctx ()
07 #5 0x0000000000af92f5 in ksqcmi ()
08 #6 0x00000000082ac019 in ksqgtlctx ()
09 #7 0x00000000082aa77a in ksqgelctx ()
10 #8 0x0000000000c4d566 in ktcwit1 ()
11 #9 0x00000000082d5d99 in kdddgb ()
12 #10 0x00000000082c7530 in kdusru ()
13 #11 0x00000000082c0902 in kauupd ()
14 #12 0x00000000084588c9 in updrow ()
15 #13 0x00000000084f2580 in qerupFetch ()
16 #14 0x0000000008453cdd in updaul ()
17 #15 0x0000000008451bca in updThreePhaseExe ()
18 #16 0x00000000084509f5 in updexe ()
19 #17 0x00000000083fe18f in opiexe ()
20 #18 0x00000000083f5c0d in opiall0 ()
21 #19 0x0000000008403d25 in opikpr ()
22 #20 0x00000000083f78b9 in opiodr ()
23 #21 0x00000000084892af in __PGOSF141_rpidrus ()
24 #22 0x00000000085ee820 in skgmstack ()
25 #23 0x000000000848a759 in rpiswu2 ()
26 #24 0x000000000848fdf4 in kprball ()
27 #25 0x0000000001c7c4d7 in knipxup ()
28 #26 0x0000000001c72651 in knipdis ()
29 #27 0x000000000178cacc in knaspd ()
30 #28 0x0000000001787d2f in knasplcr ()
31 #29 0x00000000017866d7 in knaspx ()
32 #30 0x0000000001770fd5 in knalsProc1Txn ()
33 #31 0x000000000177022d in knalsptxn ()
34 #32 0x00000000017424a6 in knasm2 ()
35 #33 0x0000000001776d8d in knalsma ()
36 #34 0x0000000000c25a7d in knlkcbkma ()
37 #35 0x0000000000b93ba7 in ksvrdp ()
38 #36 0x00000000020d2dd7 in opirip ()
39 #37 0x00000000016fe729 in opidrv ()
40 #38 0x0000000001b7183f in sou2o ()
41 #39 0x00000000009d3f8a in opimai_real ()
42 #40 0x0000000001b76ace in ssthrdmain ()
43 #41 0x00000000009d3e71 in main ()


The stack is only 30 functions deep in 11.2.0.1 compared to 42 in 10.2.0.4! Given that whatever goes up the stack from ktcwit1 () function is due to both sessions waiting on the enqueue, the relative codepath change is even bigger.



All the difference comes from the one key thing: a recursive call. If you take a look at line #25 (highlighted), you’ll notice rpiswu2 () function (for these of you unfamiliar with Oracle Kernel Layers, RPI stands for Recursive Program Interface).


Whatever happens further up the stack is essentially the same codepath any user session would use while executing an UPDATE statement. The Apply Servers in 10.2.0.4 generally behave like any other user session would and whatever diagnostic techniques you have learned while troubleshooting user issues could be, to a large extent, applied to the Apply Servers as well.

Every LCR execution leads to at least one recursive call (so if you got, say, a transaction with 1000 LCRs that would be at least 1000 recursive calls by the Apply Server). In 11.2.0.1 the recursive call is missing and the codepath is different up to kauupd () (KA, Access Layer) function.

Indeed, by looking at the Apply Server statistics in 11.2.0.1 you will notice that executing an LCR no longer results in a recursive call so the entire change seems to be around a shortcut which allows the Apply Server to proceed directly into KD (Data) layer, bypass the “regular” codepath and avoid a recursive call.

On a side note it appears the this new codepath was first introduced in 11.1.0.7.




**************************************


What’s up with the instrumentation?

While performance improvement is certainly most welcome, there is a big downside—all these new functions seems to be poorly, or not-at-all, instrumented. This makes it hard to evaluate the gains, as some stuff is simply not there.



How are you supposed to figure out what’s going on then? The good news is that all regular dynamic performance views (like v$session_wait, v$session_event, etc.) seems to be populated correctly, but sql trace took a big hit (plus you can no longer see sql_id in v$session). Whatever falls out of the “old” stuff looks
like a black box . . . pretty much.


Puzzled by this problem, I’ve tried to see whether there is any easy way to enable the old codepath so you can get all the instrumentation facilities back in place. After some trial and error, it turned out that a simple row level trigger . . .
view source
print?
01 SQL> create or replace trigger dst.buid_t1 before delete or insert or update on dst.t1
02 2 for each row
03 3 begin
04 4 null;
05 5 end;
06 6 /
07
08 Trigger created
09
10 SQL> begin
11 2 dbms_ddl.set_trigger_firing_property('DST', 'BUID_T1', false);
12 3 end;
13 4 /
14
15 PL/SQL procedure successfully completed

. . . is enough to get the old codepath back. Here is the stack of Apply Server process in 11.2.0.1 with such a trigger in place:
view source
print?
01 [oracle@ora11gr2 trace]$ pstack 30640
02 #0 0x0000003b83cd450a in semtimedop () from /lib64/libc.so.6
03 #1 0x00000000085ef3f3 in sskgpwwait ()
04 #2 0x00000000085ee5c6 in skgpwwait ()
05 #3 0x000000000829ee31 in ksliwat ()
06 #4 0x000000000829e422 in kslwaitctx ()
07 #5 0x0000000000af92f5 in ksqcmi ()
08 #6 0x00000000082ac019 in ksqgtlctx ()
09 #7 0x00000000082aa77a in ksqgelctx ()
10 #8 0x0000000000c4d566 in ktcwit1 ()
11 #9 0x00000000082d5d99 in kdddgb ()
12 #10 0x00000000082c7530 in kdusru ()
13 #11 0x00000000082c0902 in kauupd ()
14 #12 0x00000000084588c9 in updrow ()
15 #13 0x00000000084f2580 in qerupFetch ()
16 #14 0x00000000046a363f in qerstFetch ()
17 #15 0x0000000008453cdd in updaul ()
18 #16 0x0000000008451bca in updThreePhaseExe ()
19 #17 0x00000000084509f5 in updexe ()
20 #18 0x00000000083fe18f in opiexe ()
21 #19 0x00000000083f5c0d in opiall0 ()
22 #20 0x0000000008403d25 in opikpr ()
23 #21 0x00000000083f78b9 in opiodr ()
24 #22 0x00000000084892af in __PGOSF141_rpidrus ()
25 #23 0x00000000085ee820 in skgmstack ()
26 #24 0x000000000848a759 in rpiswu2 ()
27 #25 0x000000000848fdf4 in kprball ()
28 #26 0x0000000001c7c4d7 in knipxup ()
29 #27 0x0000000001c72651 in knipdis ()
30 #28 0x000000000178cacc in knaspd ()
31 #29 0x0000000001787d2f in knasplcr ()
32 #30 0x00000000017866d7 in knaspx ()
33 #31 0x0000000001770fd5 in knalsProc1Txn ()
34 #32 0x000000000177022d in knalsptxn ()
35 #33 0x00000000017424a6 in knasm2 ()
36 #34 0x0000000001776d8d in knalsma ()
37 #35 0x0000000000c25a7d in knlkcbkma ()
38 #36 0x0000000000b93ba7 in ksvrdp ()
39 #37 0x00000000020d2dd7 in opirip ()
40 #38 0x00000000016fe729 in opidrv ()
41 #39 0x0000000001b7183f in sou2o ()
42 #40 0x00000000009d3f8a in opimai_real ()
43 #41 0x0000000001b76ace in ssthrdmain ()
44 #42 0x00000000009d3e71 in main ()



Now that looks much more familiar! All the instrumentation appeared to be back in place as well.

I’ve also discovered that:

1. DELETE seems to be always handled through the old codepath.
2. In case you have a unique constraint or a primary key supported by a non-unique index, INSERT will fall back to the old codepath.
3. UPDATE needs a primary key or a key column(-s) supported by an index in order to use the new codepath.


It remains to be seen whether this new codepath has been implemented as a shortcut for most frequently used scenarios, or whether there are some implementation restrictions as it progresses with the future releases . . . or maybe not, due to Golden Gate taking over as a strategic direction.

Sunday, September 19, 2010

Removing a Node from a 10gR1 RAC Cluster

Removing a Node from a 10gR1 RAC Cluster [ID 269320.1]

Modified 27-MAY-2009 Type BULLETIN Status PUBLISHED

Note: This article is only relevant for 10gR1 RAC environments.

For 10gR2 RAC environments please follow the documented procedures in the manual:
Oracle® Database Oracle Clusterware and Oracle Real Application Clusters Administration
and Deployment Guide
10g Release 2 (10.2)
Part Number B14197-03
PURPOSE
-------
The purpose of this note is to provide the user with a document that
can be used as a guide to remove a cluster node from an Oracle 10g Real
Applications environment.
SCOPE & APPLICATION
-------------------
This document can be used by DBAs and support analsyts who need to
either remove a cluster node or assist another in removing a cluster
node in a 10g Unix Real Applications environment.
REMOVING A NODE FROM A 10g RAC CLUSTER
--------------------------------------

If you have to remove a node from a RAC 10g database, even if the node
will no longer be available to the environment, there is a certain
amount of cleanup that needs to be done. The remaining nodes need to
be informed of the change of status of the departing node. If there are
any steps that must be run on the node being removed and the node is no
longer available those commands can be skipped.
The most important 3 steps that need to be followed are;
A. Remove the instance using DBCA.
B. Remove the node from the cluster.
C. Reconfigure the OS and remaining hardware.
Here is a breakdown of the above steps.
A. Remove the instance using DBCA.

--------------------------------------

1. Verify that you have a good backup of the OCR (Oracle Configuration
Repository) using ocrconfig -showbackup.
2. Run DBCA from one of the nodes you are going to keep. Leave the
database up and also leave the departing instance up and running.
3. Choose "Instance Management"
4. Choose "Delete an instance"
5. On the next screen, select the cluster database from which you
will delete an instance. Supply the system privilege username
and password.
6. On the next screen, a list of cluster database instances will
appear. Highlight the instance you would like to delete then
click next.
7. If you have services configured, reassign the services. Modify the
services so that each service can run on one of the remaining
instances. Set "not used" for each service regarding the instance
that is to be deleted. Click Finish.
8. If your database is in archive log mode you may encounter the
following errors:
ORA-350
ORA-312
This may occur because the DBCA cannot drop the current log, as
it needs archiving. This issue is fixed in the 10.1.0.3
patchset. But previous to this patchset you should click the
ignore button and when the DBCA completes, manually archive
the logs for the deleted instance and dropt the log group.

SQL> alter system archive log all;
SQL> alter database drop logfile group 2;
9. Verify that the dropped instance's redo thread has been removed by
querying v$log. If for any reason the redo thread is not disabled
then disable the thread.


SQL> alter database disable thread 2;
10. Verify that the instance was removed from the OCR (Oracle
Configuration Repository) with the following commands:
srvctl config database -d
cd /bin
./crs_stat
11. If this node had an ASM instance and the node will no longer be a
part of the cluster you will now need to remove the ASM instance with:
srvctl stop asm -n
srvctl remove asm -n
Verify that asm is removed with:
srvctl config asm -n


B. Remove the Node from the Cluster
----------------------------------------
Once the instance has been deleted. The process of removing the node
from the cluster is a manual process. This is accomplished by running
scripts on the deleted node to remove the CRS install, as well as
scripts on the remaining nodes to update the node list. The following
steps assume that the node to be removed is still functioning.
1. To delete node number 2 first stop and remove the nodeapps on the
node you are removing. Assuming that you have removed the ASM
instance as the root user on a remaining node;
# srvctl stop nodeapps -n
2. Run netca. Choose "Cluster Configuration".
3. Only select the node you are removing and click next.
4. Choose "Listener Configuration" and click next.
5. Choose "Delete" and delete any listeners configured on the node
you are removing.

6. Run /bin/crs_stat. Make sure that all database
resources are running on nodes that are going to be kept. For
example:
NAME=ora..db
TYPE=application
TARGET=ONLINE
STATE=ONLINE on
Ensure that this resource is not running on a node that will be
removed. Use /bin/crs_relocate to perform this.
Example:
crs_relocate ora..db

7. As the root user, remove the nodeapps on the node you are removing.

# srvctl remove nodeapps -n

8. Next as the Oracle user run the installer with the
updateNodeList option on any remaining node in the cluster.
a. DISPLAY=ipaddress:0.0; export DISPLAY
This should be set even though the gui does not run.
b. $ORACLE_HOME/oui/bin/runInstaller -updateNodeList
ORACLE_HOME= CLUSTER_NODES=,
,
With this command we are defining the RDBMS $ORACLE_HOME's that
now are part of the cluster in the Oracle inventory. If there is
no $ORACLE_HOME this step can be skipped.

9. Change to the root user to finish the removal on a node that
is being removed. This command will stop the CRS stack
and delete the ocr.loc file on the node to be removed. The
nosharedvar option assumes the ocr.loc file is not on a shared
file sytem. If it does exist on a shared file system then
specify sharedvar instead. The nosharedhome option specifies
that the CRS_HOME is on a local filesystem. If the CRS_HOME is
on a shared file system, specify sharedhome instead.
Run the rootdelete.sh script from /install. Example:
# cd /install
# ./rootdelete.sh local nosharedvar nosharedhome

10. On a node that will be kept, the root user should run the
rootdeletenode.sh script from the /install directory.
When running this script from the CRS home specify both the node
name and the node number. The node name and the node number are
visiable in olsnodes -n. Also do NOT put a space after the
comma between the two.
# olsnodes -n
1
2
# cd /install
# ./rootdeletenode.sh ,2
11. Confirm success by running OLSNODES.
/bin>: ./olsnodes -n
1
12. Now switch back to the oracle user account and run the same
runInstaller command as before. Run it this time from the
instead of the ORACLE_HOME. Specify all of the
remaining nodes.
a. DISPLAY=ipaddress:0.0; export DISPLAY
b. /oui/bin/runInstaller -updateNodeList
ORACLE_HOME= CLUSTER_NODES=,
, CRS=TRUE
With this command we are defining the CRS HOME's that now are
part of the cluster in the Oracle inventory.
13. Once the node updates are done you will need to manually delete
the $ORACLE_HOME and $CRS_HOME from the node to be expunged,
unless, of course, either of these is on a shared file system
that is still being used.
a. $ORACLE_HOME>: rm -rf *
b. $CRS_HOME> : rm -rf * (as root)

14. Next, as root, from the deleted node, verify that all init scripts
and soft links are removed:
Sun:
rm /etc/init.d/init.cssd
rm /etc/init.d/init.crs
rm /etc/init.d/init.crsd
rm /etc/init.d/init.evmd
rm /etc/rc3.d/K96init.crs
rm /etc/rc3.d/S96init.crs
rm -Rf /var/opt/oracle/scls_scr
rm -Rf /var/opt/oracle/oprocd
Linux:
rm -f /etc/init.d/init.cssd
rm -f /etc/init.d/init.crs
rm -f /etc/init.d/init.crsd
rm -f /etc/init.d/init.evmd
rm -f /etc/rc2.d/K96init.crs
rm -f /etc/rc2.d/S96init.crs
rm -f /etc/rc3.d/K96init.crs
rm -f /etc/rc3.d/S96init.crs
rm -f /etc/rc5.d/K96init.crs
rm -f /etc/rc5.d/S96init.crs
rm -Rf /etc/oracle/scls_scr
HP-UX:
rm /sbin/init.d/init.cssd
rm /sbin/init.d/init.crs
rm /sbin/init.d/init.crsd
rm /sbin/init.d/init.evmd
rm /sbin/rc3.d/K960init.crs
rm /sbin/rc3.d/S960init.crs
rm /sbin/rc2.d/K960init.crs
rm /sbin/rc2.d/K001init.crs
rm -Rf /var/opt/oracle/scls_scr
rm -Rf /var/opt/oracle/oprocd
HP Tru64:
rm /sbin/init.d/init.cssd
rm /sbin/init.d/init.crs
rm /sbin/init.d/init.crsd
rm /sbin/init.d/init.evmd
rm /sbin/rc3.d/K96init.crs
rm /sbin/rc3.d/S96init.crs
rm -Rf /var/opt/oracle/scls_scr
rm -Rf /var/opt/oracle/oprocd
IBM AIX:
rm /etc/init.cssd
rm /etc/init.crs
rm /etc/init.crsd
rm /etc/init.evmd
rm /etc/rc.d/rc2.d/K96init.crs
rm /etc/rc.d/rc2.d/S96init.crs
rm -Rf /etc/oracle/scls_scr
rm -Rf /etc/oracle/oprocd
16. You can also remove the /etc/oracle directory, the
/etc/oratab file, and the Oracle inventory (if desired)
17. To remove an ADDITIONAL ORACLE_HOME, ASM_HOME, or EM_HOME from the
inventory on all remaining nodes, run the installer to update the
node list. Example (if removing node 2):
runInstaller -updateNodeList -local \
ORACLE_HOME=$ORACLE_HOME CLUSTER_NODES=node1,node3,node4
(If you are using private home installations, please ignore the "-local" flag.)
RELATED DOCUMENTS
-----------------
Oracle® Real Application Clusters Administrator's Guide 10g Release 1 (10.1)
Part Number B10765-02
Chapter 5
Oracle Series/Oracle Database 10g High Availabilty Chapter 5 28-34.
Note 239998.1


Oracle Clusterware and RAC Admin and Deployment Guide - Ch. 10 and 11
Executed steps:


Node 3: lxdb0109c03 deleting.
1 >>
1>> srvctl stop nodeapps -n lxdb0109c03 Run for node 1

1>>srvctl remove nodeapps -n lxdb0109c03 run for node 1

3>> ./rootdelete.sh local nosharedvar nosharedhome run for deleting node (node 3 is going to delete)

1>> ./rootdeletenode.sh lxdb0109c03,3 run for node 1
/u01/app/oracle/product/10.2.0/crs/oui/bin/runInstaller -updateNodeList
ORACLE_HOME=/u01/app/oracle/product/10.2.0/crs CLUSTER_NODES=lxdb0109c01,lxdb0109c02 CRS=TRUE run from
node1

./racgons add_config lxdb0109c03:6201
orapwd file=orapwDBFAP3 password=Alr3_ady entries=10
ln -s /u03/oradata/DBFAP/spfile/spfileDBFAP.ora spfileDBFAP3.ora

Adding Datafiles to the Primary Database

Contents

1. Introduction
2. Adding Datafiles to the Primary Database


Introduction

This article provides detailed instructions and notes for adding a new datafile (or new tablespace) to the primary database in a standby database configuration.

The Standby Database feature of Oracle (Oracle7 and higher) allows the DBA to maintain a duplicate, or standby, copy of a database at a remote site to provide continued primary database availability in the event of failure. The standby database is created with a special copy of the control file from the primary database. The standby database is kept in close synchronization with the primary database by applying the primary database's archived log files to the standby database. It is therefore necessary to operate the primary database in Archivelog Mode to avail the benefit of a standby database.

A standby database is generally used where high availability of data is required. A standby database is one way of quickly providing access to data if the primary database fails and recovery will take longer than the desired time.


Adding Datafiles to the Primary Database

The control file of the standby database may need to be refreshed (or recreated) when the control file on the primary database is changed. This is generally caused when altering the physical structure of the primary database like when adding datafiles and/or tablespaces. The steps below detail how to refresh the control file and add a datafile to a standby database.

1. Before you add a datafile to the primary database, you should CANCEL your recovery on standby database.

SQL> recover managed standby database cancel;

Media recovery complete.

NOTE: If you don't cancel, you will get an error on the standby when it hits any redo for that datafile. This won't be a big deal because at that point you can cancel and create the new datafile.

2. Perform a SHUTDOWN IMMEDIATE on the standby database (if needed). It is possible, however, that you have the standby database configured to shutdown the standby database instance when managed recovery is cancelled. In this case, you will not need to perform a SHUTDOWN IMMEDIATE.

SQL> shutdown immediate

3. Create the new tablespace -or- add the new datafile on the primary database as usual. For example:

SQL> alter tablespace users add datafile '/u06/app/oradata/ORA817/users02.dbf' size 10m;

Tablespace altered.

4. Create a new standby controlfile on the primary database.

SQL> alter database create standby controlfile as '/u01/app/oracle/control01.ctl';

Database altered.

5. Archive the current online logs of the primary database. Insure consistency in the backup datafiles, standby controlfile and logfiles. From the primary database:

SQL> alter system archive log current;

System altered.

6. Transfer the new standby controlfile and all new archived redo logs to the standy machine.

% rcp control01.ctl linux4:/u03/app/oradata/ORA817/control01.ctl
% rcp control01.ctl linux4:/u04/app/oradata/ORA817/control02.ctl
% rcp control01.ctl linux4:/u05/app/oradata/ORA817/control03.ctl

% rcp /u06/app/oradata/ORA817/archive/* linux4:/u06/app/oradata/ORA817/archive/

7. Mount the standby database. Keep in mind that after mounting the database with the new control file, the new datafile(s) will be in the new control file. On the standby database, perform the following:

SQL> connect / as sysdba
Connected to an idle instance.

SQL> startup nomount
ORACLE instance started.

Total System Global Area 252777660 bytes
Fixed Size 451772 bytes
Variable Size 218103808 bytes
Database Buffers 33554432 bytes
Redo Buffers 667648 bytes

SQL> alter database mount standby database;

Database altered.

8. Create the datafile on the standby database. The following command will create the physical OS file on disk. For example, issue:

SQL> alter database create datafile '/u06/app/oradata/ORA817/users02.dbf'
2 as '/u06/app/oradata/ORA817/users02.dbf';

Database altered.

9. On the standby site, continue by applying archived redo logs by issuing the following statement. Keep in mind that when prompted for the first archived redo log, you can type in "auto" to Oracle to automatically apply all required archived redo logs:

SQL> recover standby database until cancel;

10. You can now resume the standby database by putting it back into managed recovery mode:

SQL> recover managed standby database;

Monday, September 6, 2010

runInstaller failed because of java version mismatch

Enable a debug level trace on the Installer and upload the resultant trace to find the root cause of behaviour.

./runInstaller -logLevel trace -debug /-J-DTRACING.ENABLED=true -J-DTRACING.LEVEL=2/

The trace will be written to oraInventory/logs , kindly upload the same once you reproduce the issue .

3.) Enable a system level trace on the DV installer and upload the trace .

truss -aefo ./runInstaller
- OR -
strace -fo ./runInstaller

upload the output file .

I think the java mismatch version comes when the code is compiled from a different version than the JVm used.


Please do below:

a. check which java are you using on the server currently:
% which java

then check its version
% java -version


b. Check the database JDK version
% cd $ORACLE_HOME/jdk/bin
% ./java -version


If a & b are different version, then please follow:

(1) Set the PATH with $ORACLE_HOME/jdk/bin
% export PATH=$ORACLE_HOME/jdk/bin:$PATH


(2) Re-check the issue.

I hope issue will bypass the java mismatch

Thursday, September 2, 2010

PRKP-1029 : Failed to register the service S999

Problem Description: Three node RAC, third node added and trying to modify service ... thrown below error:

srvctl modify service -d DBFAP -s S999 -n -i DBFAP1,DBFAP2,DBFAP3
PRKP-1029 : Failed to register the service S999


=== ODM Action Plan ==========

1.- Provide the output of the following
srvctl config service -d [-s ] [-a]

2.- Before modifying you need to stop the resources. Please confirm you did this

3.- Try to recreate/modify the Service invoking using dbca


************

Here is the output

srvctl config service -d DBFAP -s S999
S999 PREF: DBFAP1 DBFAP2 AVAIL:

We didnt stop the service before modifying .. what would be the impact for users using this service if we stop ?

I dont have GUI access, hence cannot use DBCA at this moment. Please provide me alternateways.

*******************

there are lot of resource on this .. which resources need to be down for service modify. Also we are able to modify other services while all are up, but only this S999 service is throwing the error.
oracle@lxdb0109c03:/u01/app/oracle/product/10.2.0/crs/bin> crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ebr_vip application ONLINE ONLINE lxdb0109c02
ora....P1.inst application ONLINE ONLINE lxdb0109c01
ora....P2.inst application ONLINE ONLINE lxdb0109c02
ora....P3.inst application ONLINE ONLINE lxdb0109c03
ora....AP1.srv application ONLINE ONLINE lxdb0109c02
ora....AP2.srv application ONLINE ONLINE lxdb0109c01
ora....AP3.srv application ONLINE ONLINE lxdb0109c03
ora....S001.cs application ONLINE ONLINE lxdb0109c02
ora....AP1.srv application ONLINE ONLINE lxdb0109c01
ora....AP2.srv application ONLINE ONLINE lxdb0109c02
ora....AP3.srv application ONLINE ONLINE lxdb0109c03
ora....S002.cs application ONLINE ONLINE lxdb0109c02
ora....AP1.srv application ONLINE ONLINE lxdb0109c02
ora....AP2.srv application ONLINE ONLINE lxdb0109c01
ora....AP3.srv application ONLINE ONLINE lxdb0109c03
ora....S003.cs application ONLINE ONLINE lxdb0109c02
ora....AP1.srv application ONLINE ONLINE lxdb0109c01
ora....AP2.srv application ONLINE ONLINE lxdb0109c02
ora....AP3.srv application ONLINE ONLINE lxdb0109c03
ora....S004.cs application ONLINE ONLINE lxdb0109c02
ora....AP1.srv application ONLINE ONLINE lxdb0109c02
ora....AP2.srv application ONLINE ONLINE lxdb0109c01
ora....AP3.srv application ONLINE ONLINE lxdb0109c03
ora....S005.cs application ONLINE ONLINE lxdb0109c02
ora....AP1.srv application ONLINE ONLINE lxdb0109c01
ora....AP2.srv application ONLINE ONLINE lxdb0109c02
ora....AP3.srv application ONLINE ONLINE lxdb0109c03
ora....S006.cs application ONLINE ONLINE lxdb0109c02
ora.DBFAP.db application ONLINE ONLINE lxdb0109c02
ora....AP1.srv application ONLINE ONLINE lxdb0109c01
ora....AP2.srv application ONLINE ONLINE lxdb0109c02
ora....s999.cs application ONLINE ONLINE lxdb0109c02
ora....01.lsnr application ONLINE ONLINE lxdb0109c01
ora....c01.gsd application ONLINE ONLINE lxdb0109c01
ora....c01.ons application ONLINE ONLINE lxdb0109c01
ora....c01.vip application ONLINE ONLINE lxdb0109c01
ora....02.lsnr application ONLINE ONLINE lxdb0109c02
ora....c02.gsd application ONLINE ONLINE lxdb0109c02
ora....c02.ons application ONLINE ONLINE lxdb0109c02
ora....c02.vip application ONLINE ONLINE lxdb0109c02
ora....03.lsnr application ONLINE ONLINE lxdb0109c03
ora....c03.gsd application ONLINE ONLINE lxdb0109c03
ora....c03.ons application ONLINE ONLINE lxdb0109c03
ora....c03.vip application ONLINE ONLINE lxdb0109c03
oracle@lxdb0109c03:/u01/app/oracle/product/10.2.0/crs/bin>

*****************************************************************

=== ODM Action Plan ===

S999 PREF: DBFAP1 DBFAP2 AVAIL:

1. When did You first add the service S999?

2. Were there any errors reported when creating the service?

3. check for available ocr backups:

ocrconfig -showbackup

4. on both nodes backup the files in $CRS_HOME/cdata/
You may copy them to another directory

5. try the following commands

script /tmp/srvctl.log

export SRVM_TRACE=true
srvctl config service -d DBFAP
srvctl status service -d DBFAP -s S999 -f -v
srvctl modify service -d DBFAP -s S999 -n -i DBFAP1,DBFAP2 -f

exit

upload output and resulting trace files

******************************************

This is a three node rac, Third node is added recently to existing 2 node setup. The instances are DBFAP1, DBFAP2. DBFAP3 , database is DBFAP. So you want me to backup for all three nodes and run the commands to modify the service from third node ?

Also please confirm whether modify command is : (includes three instances)

srvctl modify service -d DBFAP -s S999 -n -i DBFAP1,DBFAP2,DBFAP3



Hi,
As this is Three node RAC,

Please update whether the modify command should be

srvctl modify service -d DBFAP -s S999 -n -i DBFAP1,DBFAP2 -f -------------- OR
srvctl modify service -d DBFAP -s S999 -n -i DBFAP1,DBFAP2,DBFAP3 -f

*********************

Hi!

It have to use:

srvctl modify service -d DBFAP -s S999 -n -i DBFAP1,DBFAP2,DBFAP3 -f

**********************


oracle@lxdb0109c03:/u01/app/oracle/product/10.2.0/crs/bin> ocrconfig -showbackup

lxdb0109c02 2010/08/27 21:14:46 /u04/oracm/DBFPCluster1

lxdb0109c02 2010/08/27 17:14:40 /u04/oracm/DBFPCluster1

lxdb0109c02 2010/08/27 13:14:34 /u04/oracm/DBFPCluster1

lxdb0109c02 2010/08/26 21:14:07 /u04/oracm/DBFPCluster1

lxdb0109c02 2010/08/20 21:10:22 /u04/oracm/DBFPCluster1
oracle@lxdb0109c03:/u01/app/oracle/product/10.2.0/crs/bin>


**********************************

FileComment (trace file output)
----------------------

[main] [1:46:48:626] [HASContext.getCRSHome:457] /u01/app/oracle/product/10.2.0/crs
[main] [1:46:48:626] [Util.getCRSHome:458] getCRSHome: ret=/u01/app/oracle/product/10.2.0/crs
[main] [1:46:48:626] [HAOperation.setCRSHome:148] CRS_HOME: /u01/app/oracle/product/10.2.0/crs
[main] [1:46:48:627] [HAOperation.checkCommandVerb:211] Checking an existence of CRS command
[main] [1:46:48:627] [Service.genSCProfile:915] Creating profile for Service Composite
[main] [1:46:48:627] [LocalCommand.execute:54] LocalCommand.execute: Calling run method
[main] [1:46:48:627] [HAProfileOperation.run:175] Executing the HA operation /u01/app/oracle/product/10.2.0/crs/bin/crs_profile ora.DBFAP.S999.cs
[main] [1:46:48:627] [HAOperationImpl.writeProfileOnDisk:1167] Writing the profile on disk for ora.DBFAP.S999.cs
[main] [1:46:48:628] [HAOperationImpl.writeProfileOnDisk:1168] Running /u01/app/oracle/product/10.2.0/crs/bin/crs_stat for ora.DBFAP.S999.cs
[main] [1:46:48:628] [HAOperationImpl.runCommand:1246] CRS cmd is: /u01/app/oracle/product/10.2.0/crs/bin/crs_stat -p ora.DBFAP.S999.cs
[main] [1:46:48:628] [RuntimeExec.runCommand:74] Calling Runtime.exec() with the command
[main] [1:46:48:628] [RuntimeExec.runCommand:76] /u01/app/oracle/product/10.2.0/crs/bin/crs_stat
[main] [1:46:48:628] [RuntimeExec.runCommand:76] -p
[main] [1:46:48:628] [RuntimeExec.runCommand:76] ora.DBFAP.S999.cs
[Thread-3] [1:46:48:632] [StreamReader.run:61] In StreamReader.run
[main] [1:46:48:632] [RuntimeExec.runCommand:131] runCommand: Waiting for the process
[Thread-2] [1:46:48:632] [StreamReader.run:61] In StreamReader.run
[Thread-2] [1:46:48:678] [StreamReader.run:65] OUTPUT>CRS-0210: Could not find resource 'ora.DBFAP.S999.cs'.
[Thread-2] [1:46:48:678] [StreamReader.run:65] OUTPUT>
[main] [1:46:48:680] [RuntimeExec.runCommand:133] runCommand: process returns -46
[main] [1:46:48:680] [RuntimeExec.runCommand:147] RunTimeExec: output>
[main] [1:46:48:680] [RuntimeExec.runCommand:150] CRS-0210: Could not find resource 'ora.DBFAP.S999.cs'.
[main] [1:46:48:680] [RuntimeExec.runCommand:150]
[main] [1:46:48:680] [RuntimeExec.runCommand:155] RunTimeExec: error>
[main] [1:46:48:680] [RuntimeExec.runCommand:175] Returning from RunTimeExec.runCommand
[main] [1:46:48:681] [HAOperationImpl.runCommand:1264] Unsigned return code = 210
[main] [1:46:48:681] [HAOperationImpl.runCommand:1278] set status HA_RES_NOT_EXIST_ERR
[main] [1:46:48:681] [HASContext.getInstance:197] Module init : 6
[main] [1:46:48:681] [HASContext.getInstance:221] Local Module init : 0
[main] [1:46:48:681] [HASContext.getInstance:256] HAS Context Allocated: 9 to oracle.ops.mgmt.has.Util@998b08
[main] [1:46:48:681] [Util.:87] Util Instance created.
[main] [1:46:48:682] [has.UtilNative.Native] prsr_trace: Native: hasHAPrivilege

[main] [1:46:48:682] [HAOperationImpl.writeProfileOnDisk:1175] Writing profile on disk at /u01/app/oracle/product/10.2.0/crs/crs/public/ora.DBFAP.S999.cs.cap
[main] [1:46:48:683] [HAOperationImpl.runCommand:1246] CRS cmd is: /u01/app/oracle/product/10.2.0/crs/bin/crs_profile -update ora.DBFAP.S999.cs -h lxdb0109c01 lxdb0109c02 lxdb0109c03
[main] [1:46:48:683] [RuntimeExec.runCommand:74] Calling Runtime.exec() with the command
[main] [1:46:48:683] [RuntimeExec.runCommand:76] /u01/app/oracle/product/10.2.0/crs/bin/crs_profile
[main] [1:46:48:683] [RuntimeExec.runCommand:76] -update
[main] [1:46:48:683] [RuntimeExec.runCommand:76] ora.DBFAP.S999.cs
[main] [1:46:48:683] [RuntimeExec.runCommand:76] -h
[main] [1:46:48:684] [RuntimeExec.runCommand:76] lxdb0109c01 lxdb0109c02 lxdb0109c03
[main] [1:46:48:687] [RuntimeExec.runCommand:131] runCommand: Waiting for the process
[Thread-4] [1:46:48:687] [StreamReader.run:61] In StreamReader.run
[Thread-5] [1:46:48:687] [StreamReader.run:61] In StreamReader.run
[Thread-4] [1:46:48:722] [StreamReader.run:65] OUTPUT>Syntax error in /u01/app/oracle/product/10.2.0/crs/crs/public/ora.DBFAP.S999.cs.cap, line number 1: "CRS-0210: Could not find resource 'ora.DBFAP.S999.cs'."
[Thread-4] [1:46:48:722] [StreamReader.run:65] OUTPUT>(File: caa_profile.cpp, line: 155
[Thread-4] [1:46:48:722] [StreamReader.run:65] OUTPUT>
[main] [1:46:48:723] [RuntimeExec.runCommand:133] runCommand: process returns 1
[main] [1:46:48:724] [RuntimeExec.runCommand:147] RunTimeExec: output>
[main] [1:46:48:724] [RuntimeExec.runCommand:150] Syntax error in /u01/app/oracle/product/10.2.0/crs/crs/public/ora.DBFAP.S999.cs.cap, line number 1: "CRS-0210: Could not find resource 'ora.DBFAP.S999.cs'."
[main] [1:46:48:724] [RuntimeExec.runCommand:150] (File: caa_profile.cpp, line: 155
[main] [1:46:48:724] [RuntimeExec.runCommand:150]
[main] [1:46:48:724] [RuntimeExec.runCommand:155] RunTimeExec: error>
[main] [1:46:48:724] [RuntimeExec.runCommand:175] Returning from RunTimeExec.runCommand
[main] [1:46:48:724] [HAOperationImpl.runCommand:1290] set status FAILED
[main] [1:46:48:724] [HAProfileOperation.run:178] Returned from executing the HA Operation
[main] [1:46:48:725] [HAProfileOperation.run:183] OUTPUT> Syntax error in /u01/app/oracle/product/10.2.0/crs/crs/public/ora.DBFAP.S999.cs.cap, line number 1: "CRS-0210: Could not find resource 'ora.DBFAP.S999.cs'."
[main] [1:46:48:725] [HAProfileOperation.run:183] OUTPUT> (File: caa_profile.cpp, line: 155
[main] [1:46:48:725] [HAProfileOperation.run:183] OUTPUT>
[main] [1:46:48:725] [HAProfileOperation.run:189] ERROR>
[main] [1:46:48:725] [LocalCommand.execute:56] LocalCommand.execute: Returned from run method
[main] [1:46:48:725] [Service.genSCProfile:920] HA Profile operation failed
[main] [1:46:48:725] [HAOperationResult.getOutputAll:114] outLine is []
[main] [1:46:48:726] [HAOperationResult.getOutputAll:115] errLine is []
[main] [1:46:48:726] [Action.releaseLock:345] Release CSS lock SRVM.DBFAP
[main] [1:46:48:726] [has.ClusterLockNative.Native] prsr_trace: Native: unlock

PRKP-1029 : Failed to register the service S999.

oracle@lxdb0109c03:/u01/app/oracle/product/10.2.0/crs/bin>
oracle@lxdb0109c03:/u01/app/oracle/product/10.2.0/crs/bin> exit
exit

************************

=== ODM Action Plan ===

1.- The command is failing because it can not find the file:

/u01/app/oracle/product/10.2.0/crs/crs/public/ora.DBFAP.S999.cs.cap

This file is created when you run the "srvctl add service" command on the node where the command is ran.

Please check if this file exist in node 1 and 2, Execute the command from the node where the file exist.

************************************

On Node 1 and 3 file exists and has the following contents. ... Node 2 doesn't have the file .

oracle@lxdb0109c03:/u01/app/oracle/product/10.2.0/crs/bin> cat /u01/app/oracle
/product/10.2.0/crs/crs/public/ora.DBFAP.S999.cs.cap

CRS-0210: Could not find resource 'ora.DBFAP.S999.cs'.


*******************************************

=== ODM Action Plan ===

1.- I talk to the customer and explained that the command is failing because it can not allocate the file /u01/app/oracle/product/10.2.0/crs/crs/public/ora.DBFAP.S999.cs.cap

This is the profile for the services and it is created on the node where the "srvctl add service" command is ran. We checked cat /u01/app/oracle/product/10.2.0/crs/crs/profile on each node but the file does not exist either.

The commun fix a this issue is:

- Stop the services
- Drop the service
- Create the service

Customer can not have downtime on this maintanence window, he is going to request for the next windows. He asked us to investigate if there is another way to recreate the file

*********************************
Generic Note
------------------------
NOTE: Please keep in mind that stopping the services will not cause any outage, it will not allow any new connection with this service name and existing connection will work fine.


****************************

Generic Note
------------------------
Hi!

I will recommend to backup the OCR and then stop/remove and services as it is explained on this note.

Note 259301.1: CRS and 10g/11.1 Real Application Clusters


This should be done from node 1, keep in mind that you might need to user the "-f" option to remove the services.

I dont know of any pre-check to create the services.


**************************************

Generic Note
------------------------
Hi!

I had reviewed all the bugs related to this issue and I could not find any workaround beside the one that we had already exposed.

I believe given above steps will work out in you environment.


Thanks for your time and patience.

ORA-27102: out of memory on Solaris 10

ORA-27102: out of memory on Solaris 10



Oracle, Solaris
This problem is due to insufficient shared memory from system while Oracle tries to create shared memory segment(depends on the size and SGA and PGA). Unlike earlier releases of Solaris (Solaris 8 and 9), most of the system parameters needed to run Oracle are already set properly, so the only one parameter to be set is the maximum shared memory. In earlier versions this was called shmsys:shminfo_shmmax and was set by editing the /etc/system file and rebooting. With Solaris 10 the parameter is set by modifying a "Resource Control Value". You can do this temporarily by using prctl, but that is lost at reboot so you will need to add the command to the oracle user. The other option is to create a default project for the oracle user:


$ projadd -U oracle -K "project.max-shm-memory=(priv,13G,deny)" user.oracle



How large shared memory should be set for Oracle depends on the physical memory size. For Solaris , 8M is the default value on Solaris 9 and prior versions where as 1/4th of the physical memory is the default on Solaris 10 and later. Verifying the default setting of os is shown as the following


$ prtconf | grep Mem
Memory size: 32760 Megabytes



$ id -p
uid=59008(oracle) gid=10001(dba) projid=3(default)



$ prctl -n project.max-shm-memory -i project 3
project: 3: default
NAME PRIVILEGE VALUE FLAG ACTION RECIPIENT
project.max-shm-memory
privileged 7.84GB - deny -
system 16.0EB max deny -





By default, Oracle 10 will allocate 40% of the total system physical memory to create SGA and PGA. So for 32G system memory, the shmmax for Oracle 10 will be 0.4*32G = 12.8G.

(Temporary method)


$ prctl -n project.max-shm-memory -r -v 10G -i project 3

(No need to reboot)

$ projadd -p 100 -c 'test shmmax' -U oracle -G dba -K 'project.max-shm-memory=(privileged,13G,deny)' user.oracle


$ projects -l
...
user.oracle
projid : 100
comment: "test shmmax"
users : oracle
groups : dba
attribs: project.max-shm-memory=(privileged,13958643712,deny)

$ cat /etc/project
user.oracle:100:test shmmax:oracle:dba:project.max-shm-memory=(privileged,13958643712,deny)