Wednesday, September 12, 2012

How to know export START TIME in Oracle database?

Aim:
To determine export start time which is generally not published in export log file.

Situation:
Below is sample of export log.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in UTF8 character set and AL16UTF16 NCHAR character set

About to export specified tables via Conventional Path ...
Current user changed to APPLSYS
. . exporting table                       FND_USER       5641 rows exported
Export terminated successfully without warnings.

You can find export finish time by

[oracle@myhost tmp]$ ls -l 1.log
-rw-r--r--   1 oracle   dba          547 Apr  4 16:39 1.log

But how to find old (might be 2 months old) export start time which you require now and don't remember at all.


Solution:
Oracle 10g has a nice feature called Active Session History which stores all the data regarding your old sessions.

By the logfile time stamp you atleast know which day export started and you can guess the start date. In this example my export ended at 4 Apr 2011. Out of which I can determine start date should be 4 only (if export is bigger, it might be 3 Apr)

Use your anticiated date below in the query

select  sid,sample_time,session_type,event,program,action,module from  V$ACTIVE_SESSION_HISTORY where EXTRACT(day FROM sample_time)=&dt and module like '%exp%';

Enter only day(number) when asked, in this case only enter 4 (for our example).

It will list all the sessions which ran export. See the example xls attached.
** Only thing make sure is all SIDs are same since active session history is watching session at a period of time called sample_time so you will find multiple rows in output.
But sort by sample_time and pick the oldest time which is your export start time :)

Tricky but simple!!

How to schedule AWR report(awrrpt.sql) via crontab?

Hi All,


I came across a requirement to schedule a AWR snap report between 9 AM to 6 PM on certain days. 


The aim here was to automatically identify snap id for morning 9 AM and 6 PM and run the report from background and email the report.


Following is a sample script. In my test instance begin snap id is returning only 1 snap so I am good :)
If you get multiple snap id by given select statement please use min(snap_id) and max(snap_id) in begin and end respectively. I have commented the script to help understanding better.


Save the following as any shell script and schedule in cron.


myAWR.sh


## START


# Following mechanism will record begin and end snap_id by given time 9 and 18 hour of the day
# It also generate define statements which are required to run awrrpti.sql from background


sqlplus -s '/as sysdba' << EOF
set echo off
set head off
set feed off
spool /tmp/tmpScr.sql
select 'Define begin_snap= '|| snap_id from dba_hist_snapshot where 
EXTRACT(HOUR FROM BEGIN_INTERVAL_TIME)=9 and
EXTRACT(year FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,'YYYY') and 
EXTRACT(month FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,'MM') and
EXTRACT(day FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,'DD');


SELECT 'Define end_snap= '|| snap_id from dba_hist_snapshot where 
EXTRACT(HOUR FROM BEGIN_INTERVAL_TIME)=18 and
EXTRACT(year FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,'YYYY') and 
EXTRACT(month FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,'MM') and
EXTRACT(day FROM BEGIN_INTERVAL_TIME) = TO_CHAR(sysdate,'DD');


select 'define  report_type  = ' || '''text''' from dual;
select 'define  report_name  = /tmp/awrrep.txt' from dual;
select 'define  inst_name    = ' || INSTANCE_NAME from v\$instance;
select 'define  db_name    = ' || name from v\$database;
select 'define  dbid = ' || dbid from v\$database;
select 'define inst_num = ' || INSTANCE_NUMBER from v\$instance;
select 'define  num_days     = 3' from dual;
select '@$ORACLE_HOME/rdbms/admin/awrrpti.sql' from dual;
exit
EOF


# Following part runs the script composed earlier with snap ids and define statements
sqlplus -s '/as sysdba' << EOF
@/tmp/tmpScr.sql
exit
EOF


## END


Crontab Example (Linux)

## Running AWR everyday 7:00 PM
00 19 * * * /njadav/myAWR.sh | mailx -s "AWR Today" myemail@mydomain.com 2>&1

Note:
To further customize, please check parameters of $ORACLE_HOME/rdbms/admin/awrrpti.sql by opening it.

Following sample ..

define  inst_num     = 1;
define  num_days     = 3;
define  inst_name    = 'Instance';
define  db_name      = 'Database';
define  dbid         = 4;
define  begin_snap   = 10;
define  end_snap     = 11;
define  report_type  = 'text';
define  report_name  = /tmp/swrf_report_10_11.txt

Tuesday, June 26, 2012

Performance Views

V$UNDOSTAT
Use to monitor and configure database system to achieve efficient use of UNDO space. V$UNDOSTAT shows various undo and transaction statistics, such as the amount of undo space consumed in the instance. In earlier releases, undo space management was performed using rollback segments. This method is now called manual undo management mode.

Displays a histogram of statistical data to show how well the system is working. The available statistics include undo space consumption, transaction concurrency, and length of queries executed in the instance. Use this view to estimate the amount of undo space required for the current workload. Oracle uses this view to tune undo usage in the system. The view returns null values if the system is in manual undo management mode.

Each row in the view keeps statistics collected in the instance for a 10-minute interval. Each column in a row contains statistics gathered during this 10-minute interval. The rows are in descending order by the BEGIN_TIME column value. Each row belongs to the time interval marked by (BEGIN_TIME, END_TIME). Each column represents the data collected for the particular statistic in that time interval. The first row of the view contains statistics for the (partial) current time period. The view contains a total of 1008 rows, spanning a 7 day cycle.





Column Datatypes Description
BEGIN_TIME DATE Identifies the beginning of the time interval.
END_TIME DATE Identifies the end of the time interval.
UNDOTSN NUMBER Represents the last active undo tablespace in the duration of time. The tablespace ID of the active undo tablespace is returned in this column. If more than one undo tablespace was active in that period, the active undo tablespace that was active at the end of the period is reported.
UNDOBLKS NUMBER Represents the total number of undo blocks consumed. You can use this column to obtain the consumption rate of undo blocks, and thereby estimate the size of the undo tablespace needed to handle the workload on your system.
TXNCOUNT NUMBER Identifies the total number of transactions executed within the period.
MAXQUERYLEN NUMBER Identifies the length of the longest query (in number of seconds) executed in the instance during the period. Use this statistic to estimate the proper setting of undo_retention.
MAXCONCURRENCY NUMBER Identifies the highest number of transactions executed concurrently within the period.
UNXPSTEALCNT NUMBER Number of attempts to obtain undo space by stealing unexpired extents from other transactions.
UNXPBLKRELCNT NUMBER Number of unexpired blocks removed from certain undo segments so they can be used by other transactions.
UNXPBLKREUCNT NUMBER Number of unexpired undo blocks reused by transactions.
EXPSTEALCNT NUMBER Number of attempts to steal expired undo blocks from other undo segments.
EXPBLKRELCNT NUMBER Number of expired undo blocks stolen from other undo segments.
EXPBLKREUCNT NUMBER Number of expired undo blocks reused within the same undo segments.
SSOLDERRCNT NUMBER Identifies the number of times the error ORA-01555 occurred. Use statistic to decide whether or not undo_retention is set properly given the size of the undo tablespace. Increasing the value of undo_retention can reduce the occurrence of this error.
NOSPACEERRCNT NUMBER Identifies the number of times space was requested in the undo tablespace and there was no free space available. That is, all of the space in the undo tablespace was in use by active transactions. The corrective action is to add more space to the undo tablespace.


See how much undo information has been written in the past hour.
 
 
 select begin_time, end_time, undoblks, txncount, maxconcurrency as maxcon,nospaceerrcnt from V$UNDOSTAT where begin_time > sysdate-(1/24);

BEGIN_TIME     END_TIME         UNDOBLKS   TXNCOUNT     MAXCON NOSPACEERRCNT
-------------- -------------- ---------- ---------- ---------- -------------
01-04-30 14:07 01-04-30 14:14       1883          7          3             0
01-04-30 13:57 01-04-30 14:07        488         34          3             0
01-04-30 13:47 01-04-30 13:57          0          2          1             0
01-04-30 13:37 01-04-30 13:47          0          2          1             0
01-04-30 13:27 01-04-30 13:37          0          2          1             0
01-04-30 13:17 01-04-30 13:27          0          2          1             0
 
 
 
 
This query shows how much redo has been used over the last hour. With the last 20 minutes being the only time that has used undo space. Here we can see that there have been 34 transactions from 13:57 – 14:07 using 488 undo blocks and none of them received any errors due to space management. And from 14:07 until present there are 7 transactions in total using 1883 undo blocks and so far no space problems.

If space problems occur so a user receives a segment out of space error then the column NOSPACEERRCNT will increase. If this happens then the undo tablespace is now to small and will need to be increased via enlarging a datafile or adding another data file to the tablespace.

To find the time of the longest query in the instance:
 
 select begin_time, MAXQUERYLEN, undoblks from V$UNDOSTAT where begin_time > sysdate-(1/12);

BEGIN_TIME     MAXQUERYLEN   UNDOBLKS
-------------- ----------- ----------
01-04-30 15:27          15        482
01-04-30 15:17          14       1895
01-04-30 15:07           0          1
01-04-30 14:57           0          0
01-04-30 14:47           0          0
01-04-30 14:37           0          1
01-04-30 14:27         738       3426  <--- 738 seconds
01-04-30 14:17         377       2564
01-04-30 14:07         236       2277
01-04-30 13:57          21        488
01-04-30 13:47           0          0
01-04-30 13:37           0          0
This shows that the longest query that has been run was 738 seconds long. To ensure this query never gets ORA-1555 set undo_retention to at least 750 seconds. If undo_retention is set to high the UNDO tablespace will need to be very large. If to low ORA-1555 may occur to often.

Related Parameters:
undo_management Specifies which undo space management mode the system should use.
undo_retention Specifies (in seconds) the amount of committed undo information to retain in the database
undo_tablespace Specifies the undo tablespace to used during startup.
undo_suppress_errors

Related Commands:
ALTER SYSTEM

Related Views:
DBA_ROLLBACK_SEGS Describes rollback segments.
DBA_UNDO_EXTENTS Describes the extents comprising the segments in all undo tablespaces in the database.
V$ROLLNAME Lists the names of all online rollback segments.
V$ROLLSTAT For automatic undo management mode, information reflects behavior of the undo segments in the UNDO tablespace.
V$TRANSACTION Contains undo segment information.

Related Packages:
DBMS_FLASHBACK Flash back to a version of the database at a specific time or a specific.

Tuesday, May 15, 2012



  1. Hi All,
    Lot of time DBAs are asked to check the health of the Database,Health of the Database can be check in various ways.It includes:


    SL No Monitoring Scope Current Status OS Level
    1 Physical memory / Load :Load normal, Load averages: 0.35, 0.37, 0.36
    2 OS Space threshold ( archive, ora_dump etc.. ) :Sufficient Space available.
    3 Top 10 process consuming memory:No process using exceptional high memory
    4 Free volumes available :Sufficient disk space is available on the mount points
    5 Filesystem space Under normal threshold
    Database level.
    6 Check extents / Pro active Space addition:Space is being regularly added.
    7 Check alert log for ORA- and warn messages.
    8 Major wait events (latch/enqueue/Lib cache pin) No major wait events
    9 Max Sessions
    10 Long running Jobs 6 inactive sessions running for more than 8 hrs
    11 Invalid objects 185
    12 Analyze Jobs ( once in a week ) Done on 20-JAN-2008 Time 06:00:06
    13 Temp usage / Rollback segment usage Normal
    14 Nologging Indexes
    15 Hotbackup/Coldbackup Gone fine
    16 Redo generation normal
    17 PQ proceses Normal
    18 I/O Generation Under normal threshold
    19 2 PC Pending transactions 0
    DR / backup
    1 Sync arch Normal
    2 Purge arch Normal
    3 Recovery status Normal
    20)DATABASE HEALTH CHECK SCRIPT: Showing locks and Archive generation details

    In Detail DATABASE Health check:
    OPERATING SYSTEM:

    1)Physical memory/ Load:
    1) Free:free command displays amount of total, free and used physical memory (RAM) in the system as well as showing information on shared memory, buffers, cached memory and swap space used by the Linux kernel.
    Usage:

    $ free -m

    2) vmstat:vmstat reports report virtual memory statistics, which has information about processes, swap, free, buffer and cache memory, paging space, disk IO activity, traps, interrupts, context switches and CPU activity
    Usage:
    $vmstat 5


    3) top:top command displays dynamic real-time view of the running tasks managed by kernel and in Linux system. The memory usage stats by top command include real-time live total, used and free physical memory and swap memory with their buffers and cached memory size respectively
    Usage:
    $top

    4) ps :ps command reports a snapshot on information of the current active processes. ps will show the percentage of memory resource that is used by each process or task running in the system. With this command, top memory hogging processes can be identified.
    Usage:
    $ps aux

    2) OS Space threshold ( archive, ora_dump etc.. ):
    Checking the OS space is available in all filesystems,specially the location which is having archive logs ,oracle Database files.We can use the below OS commands:
    $df –h
    $du –csh *


    3) Top 10 process consuming memory:
    We can Displaying top 10 memory consuming processes as follows:

    ps aux|head -1;ps aux|sort -m

    We can use the top command, and press M which orders the process list by memory usage.

    4) Free volumes available:

    We have to make sure Sufficient disk space is available on the mount points on each OS servers where the Database is up and running.

    $df –h

    5)Filesystem space:

    Under normal threshold.Check the filesystem in the OS side whether the sufficient space is available at all mount points.

    DATABASE :

    6)Check extents / Pro active Space addition:
    Check each of the Data,Index and temporary tablespaces for extend and blocks
    Allocation details.

    SET LINES 1000
    SELECT SEGMENT_NAME,TABLESPACE_NAME,EXTENTS,BLOCKS
    FROM DBA_SEGMENTS;

    SELECT SEGMENT_NAME,TABLESPACE_NAME,EXTENTS,BLOCKS
    FROM DBA_SEGMENTS WHERE TABLESPACE_NAME=’STAR01D’;

    7) Check alert log for ORA- and warn messages:

    Checking the alert log file regulary is a vital task we have to do.In the alert log files we have to looks for the following things:

    1) Look for any of the oracle related errors.
    Open the alert log file with less or more command and search for ORA-
    This will give you the error details and time of occurrence.

    2) Look for the Database level or Tablespace level changes
    Monitor the alert log file and search the file for each Day activities happening
    In the Database either whether it is bouncing of Database.Increase in the size of the tablespaces,Increase in the size of the Database parameters.In the 11g Database we can look for TNS errors in the alert log file.

    8) Major wait events (latch/enqueue/Lib cache pin):

    We can check the wait events details with the help of below queries:

    SELECT s.saddr, s.SID, s.serial#, s.audsid, s.paddr, s.user#, s.username,
    s.command, s.ownerid, s.taddr, s.lockwait, s.status, s.server,
    s.schema#, s.schemaname, s.osuser, s.process, s.machine, s.terminal,
    UPPER (s.program) program, s.TYPE, s.sql_address, s.sql_hash_value,
    s.sql_id, s.sql_child_number, s.sql_exec_start, s.sql_exec_id,
    s.prev_sql_addr, s.prev_hash_value, s.prev_sql_id,
    s.prev_child_number, s.prev_exec_start, s.prev_exec_id,
    s.plsql_entry_object_id, s.plsql_entry_subprogram_id,
    s.plsql_object_id, s.plsql_subprogram_id, s.module, s.module_hash,
    s.action, s.action_hash, s.client_info, s.fixed_table_sequence,
    s.row_wait_obj#, s.row_wait_file#, s.row_wait_block#,
    s.row_wait_row#, s.logon_time, s.last_call_et, s.pdml_enabled,
    s.failover_type, s.failover_method, s.failed_over,
    s.resource_consumer_group, s.pdml_status, s.pddl_status, s.pq_status,
    s.current_queue_duration, s.client_identifier,
    s.blocking_session_status, s.blocking_instance, s.blocking_session,
    s.seq#, s.event#, s.event, s.p1text, s.p1, s.p1raw, s.p2text, s.p2,
    s.p2raw, s.p3text, s.p3, s.p3raw, s.wait_class_id, s.wait_class#,
    s.wait_class, s.wait_time, s.seconds_in_wait, s.state,
    s.wait_time_micro, s.time_remaining_micro,
    s.time_since_last_wait_micro, s.service_name, s.sql_trace,
    s.sql_trace_waits, s.sql_trace_binds, s.sql_trace_plan_stats,
    s.session_edition_id, s.creator_addr, s.creator_serial#
    FROM v$session s
    WHERE ( (s.username IS NOT NULL)
    AND (NVL (s.osuser, 'x') <> 'SYSTEM')
    AND (s.TYPE <> 'BACKGROUND') AND STATUS='ACTIVE'
    )
    ORDER BY "PROGRAM";

    The following query provides clues about whether Oracle has been waiting for library cache activities:

    Select sid, event, p1raw, seconds_in_wait, wait_time
    From v$session_wait
    Where event = 'library cache pin'
    And state = 'WAITING';

    The below Query gives details of Users sessions wait time and state:

    SELECT NVL (s.username, '(oracle)') AS username, s.SID, s.serial#, sw.event,
    sw.wait_time, sw.seconds_in_wait, sw.state
    FROM v$session_wait sw, v$session s
    WHERE s.SID = sw.SID
    ORDER BY sw.seconds_in_wait DESC;

    9) Max Sessions:
    There should not be more than 6 inactive sessions running for more than 8 hours in a Database in order to minimize the consumption of CPU and I/O resources.

    a)Users and Sessions CPU consumption can be obtained by below query:

    Set lines 1000
    select ss.username, se.SID,VALUE/100 cpu_usage_seconds
    from v$session ss, v$sesstat se, v$statname sn
    where se.STATISTIC# = sn.STATISTIC#
    and NAME like '%CPU used by this session%'
    and se.SID = ss.SID and ss.status='ACTIVE'
    and ss.username is not null
    order by VALUE desc;


    b) Users and Sessions CPU and I/O consumption can be obtained by below query:

    -- shows Day wise,User wise,Process id of server wise- CPU and I/O consumption
    set linesize 140
    col spid for a6
    col program for a35 trunc
    select p.spid SPID,to_char(s.LOGON_TIME,'DDMonYY HH24:MI') date_login,s.username,decode(nvl(p.background,0),1,bg.description, s.program ) program,
    ss.value/100 CPU,physical_reads disk_io,(trunc(sysdate,'J')-trunc(logon_time,'J')) days,
    round((ss.value/100)/(decode((trunc(sysdate,'J')-trunc(logon_time,'J')),0,1,(trunc(sysdate,'J')-trunc(logon_time,'J')))),2) cpu_per_day
    from V$PROCESS p,V$SESSION s,V$SESSTAT ss,V$SESS_IO si,V$BGPROCESS bg
    where s.paddr=p.addr and ss.sid=s.sid
    and ss.statistic#=12 and si.sid=s.sid
    and bg.paddr(+)=p.addr
    and round((ss.value/100),0) > 10
    order by 8;

    10) Long running Jobs:

    We can find out long running jobs with the help of the below query:

    col username for a20
    col message for a50
    col remaining for 9999
    select username,to_char(start_time, 'hh24:mi:ss dd/mm/yy') started,
    time_remaining remaining, message
    from v$session_longops
    where time_remaining = 0
    order by time_remaining desc;

    11) Invalid objects:

    We can check the invalid objects with the help of the below query:

    select owner||' '||object_name||' '||created||' '||status from dba_objects where status='INVALID';

    12) Analyze Jobs ( once in a week ):

    We need to analyze the jobs that are running once in a week as a golden rule.
    The below steps can be considered for analyzing jobs.

    Analyzing a Running Job
    The status of a job or a task changes several times during its life cycle. A job can have the following as its status:
    Scheduled: The job is created and will run at the specified time.
    Running: The job is being executed and is in progress.
    Initialization Error: The job or step could not be run successfully. If a step in a job fails initialization, the job status is Initialization Error.
    Failed: The job was executed but failed.
    Succeeded: The job was executed completely.
    Stopped: The user canceled the job.
    Stop Pending: The user has stopped the job. The already running steps are completing execution.
    Suspended: This indicates that the execution of the job is deferred.
    Inactive: This status indicates that the target has been deleted.
    Reassigned: The owner of the job has changed.
    Skipped: The job was not executed at the specified time and has been omitted.
    The running jobs can be found out by the help of below query:

    select sid, job,instance from dba_jobs_running;

    We can find out the failed jobs and Broken jobs details with the help of the Below query:

    select job||' '||schema_user||' '||Broken||' '||failures||' '||what||' '||last_date||' '||last_sec from dba_jobs;

    13) Temp usage / Rollback segment/PGA usage:

    We can get information of temporary tablespace usage details with the help of below query:
    Set lines 1000
    SELECT b.tablespace,
    ROUND(((b.blocks*p.value)/1024/1024),2)||'M' "SIZE",
    a.sid||','||a.serial# SID_SERIAL,
    a.username,
    a.program
    FROM sys.v_$session a,
    sys.v_$sort_usage b,
    sys.v_$parameter p
    WHERE p.name = 'db_block_size'
    AND a.saddr = b.session_addr
    ORDER BY b.tablespace, b.blocks;

    We can get information of Undo tablespace usage details with the help of the below query:
    set lines 1000
    SELECT TO_CHAR(s.sid)||','||TO_CHAR(s.serial#) sid_serial,
    NVL(s.username, 'None') orauser,
    s.program,
    r.name undoseg,
    t.used_ublk * TO_NUMBER(x.value)/1024||'K' "Undo"
    FROM sys.v_$rollname r,
    sys.v_$session s,
    sys.v_$transaction t,
    sys.v_$parameter x
    WHERE s.taddr = t.addr
    AND r.usn = t.xidusn(+)
    AND x.name = 'db_block_size';

    We can get the PGA usage details with the help of the below query:
    select st.sid "SID", sn.name "TYPE",
    ceil(st.value / 1024 / 1024/1024) "GB"
    from v$sesstat st, v$statname sn
    where st.statistic# = sn.statistic#
    and sid in
    (select sid from v$session where username like UPPER('&user'))
    and upper(sn.name) like '%PGA%'
    order by st.sid, st.value desc;
    Enter value for user: STARTXNAPP
    14)Validating the Backup:

    We have to verify the Hotbackup/Coldbackup(or any physical or logical backup) of all the Production and non-production Databases went fine.Make sure you are having a valid backups of all the Databases.Check the Backup locations to make sure the Backup completed on time with the required Backup data.

    14)Hotbackup/Coldbackup:
    Validating the backup of Database.It should complete on time with the required data for restoring and recovery purpose if required.

    15) Redo generation/Archive logs generation details:
    We should make sure there should not be frequent log switch happening in a Database.If there are frequent log switches than archive logs might generate more which may decrease the performance of the Database however in a production Database log switches could vary depending upon the Server configuration between 5 to 20.

    We can the log switch details with the help of the below query:

    Redolog switch Datewise and hourwise:
    -------------------------------
    set lines 120;
    set pages 999;
    select to_char(first_time,'DD-MON-RR') "Date",
    to_char(sum(decode(to_char(first_time,'HH24'),'00',1,0)),'99') " 00",
    to_char(sum(decode(to_char(first_time,'HH24'),'01',1,0)),'99') " 01",
    to_char(sum(decode(to_char(first_time,'HH24'),'02',1,0)),'99') " 02",
    to_char(sum(decode(to_char(first_time,'HH24'),'03',1,0)),'99') " 03",
    to_char(sum(decode(to_char(first_time,'HH24'),'04',1,0)),'99') " 04",
    to_char(sum(decode(to_char(first_time,'HH24'),'05',1,0)),'99') " 05",
    to_char(sum(decode(to_char(first_time,'HH24'),'06',1,0)),'99') " 06",
    to_char(sum(decode(to_char(first_time,'HH24'),'07',1,0)),'99') " 07",
    to_char(sum(decode(to_char(first_time,'HH24'),'08',1,0)),'99') " 08",
    to_char(sum(decode(to_char(first_time,'HH24'),'09',1,0)),'99') " 09",
    to_char(sum(decode(to_char(first_time,'HH24'),'10',1,0)),'99') " 10",
    to_char(sum(decode(to_char(first_time,'HH24'),'11',1,0)),'99') " 11",
    to_char(sum(decode(to_char(first_time,'HH24'),'12',1,0)),'99') " 12",
    to_char(sum(decode(to_char(first_time,'HH24'),'13',1,0)),'99') " 13",
    to_char(sum(decode(to_char(first_time,'HH24'),'14',1,0)),'99') " 14",
    to_char(sum(decode(to_char(first_time,'HH24'),'15',1,0)),'99') " 15",
    to_char(sum(decode(to_char(first_time,'HH24'),'16',1,0)),'99') " 16",
    to_char(sum(decode(to_char(first_time,'HH24'),'17',1,0)),'99') " 17",
    to_char(sum(decode(to_char(first_time,'HH24'),'18',1,0)),'99') " 18",
    to_char(sum(decode(to_char(first_time,'HH24'),'19',1,0)),'99') " 19",
    to_char(sum(decode(to_char(first_time,'HH24'),'20',1,0)),'99') " 20",
    to_char(sum(decode(to_char(first_time,'HH24'),'21',1,0)),'99') " 21",
    to_char(sum(decode(to_char(first_time,'HH24'),'22',1,0)),'99') " 22",
    to_char(sum(decode(to_char(first_time,'HH24'),'23',1,0)),'99') " 23"
    from v$log_history
    group by to_char(first_time,'DD-MON-RR')
    order by 1
    /
    Archive logs generations is directly proportional to the number of log switches happening in a Database. If there are frequent log switches than archive logs might generate more which can affect the performance of Database.


    We can use the below queries for archive logs generation details:

    a)Archive logs by dates:
    set lines 1000
    select to_char(first_time,'DD-MON-RR') "Date",
    to_char(sum(decode(to_char(first_time,'HH24'),'00',1,0)),'99') " 00",
    to_char(sum(decode(to_char(first_time,'HH24'),'01',1,0)),'99') " 01",
    to_char(sum(decode(to_char(first_time,'HH24'),'02',1,0)),'99') " 02",
    to_char(sum(decode(to_char(first_time,'HH24'),'03',1,0)),'99') " 03",
    to_char(sum(decode(to_char(first_time,'HH24'),'04',1,0)),'99') " 04",
    to_char(sum(decode(to_char(first_time,'HH24'),'05',1,0)),'99') " 05",
    to_char(sum(decode(to_char(first_time,'HH24'),'06',1,0)),'99') " 06",
    to_char(sum(decode(to_char(first_time,'HH24'),'07',1,0)),'99') " 07",
    to_char(sum(decode(to_char(first_time,'HH24'),'08',1,0)),'99') " 08",
    to_char(sum(decode(to_char(first_time,'HH24'),'09',1,0)),'99') " 09",
    to_char(sum(decode(to_char(first_time,'HH24'),'10',1,0)),'99') " 10",
    to_char(sum(decode(to_char(first_time,'HH24'),'11',1,0)),'99') " 11",
    to_char(sum(decode(to_char(first_time,'HH24'),'12',1,0)),'99') " 12",
    to_char(sum(decode(to_char(first_time,'HH24'),'13',1,0)),'99') " 13",
    to_char(sum(decode(to_char(first_time,'HH24'),'14',1,0)),'99') " 14",
    to_char(sum(decode(to_char(first_time,'HH24'),'15',1,0)),'99') " 15",
    to_char(sum(decode(to_char(first_time,'HH24'),'16',1,0)),'99') " 16",
    to_char(sum(decode(to_char(first_time,'HH24'),'17',1,0)),'99') " 17",
    to_char(sum(decode(to_char(first_time,'HH24'),'18',1,0)),'99') " 18",
    to_char(sum(decode(to_char(first_time,'HH24'),'19',1,0)),'99') " 19",
    to_char(sum(decode(to_char(first_time,'HH24'),'20',1,0)),'99') " 20",
    to_char(sum(decode(to_char(first_time,'HH24'),'21',1,0)),'99') " 21",
    to_char(sum(decode(to_char(first_time,'HH24'),'22',1,0)),'99') " 22",
    to_char(sum(decode(to_char(first_time,'HH24'),'23',1,0)),'99') " 23"
    from v$log_history
    group by to_char(first_time,'DD-MON-RR')
    order by 1
    /
    b)Archive log generation details Day-wise :

    select to_char(COMPLETION_TIME,'DD-MON-YYYY'),count(*)
    from v$archived_log group by to_char(COMPLETION_TIME,'DD-MON-YYYY')
    order by to_char(COMPLETION_TIME,'DD-MON-YYYY');

    c) Archive log count of the day:

    select count(*)
    from v$archived_log
    where trunc(completion_time)=trunc(sysdate);

    count of archived logs generated today on hourly basis:
    -------------------------------------------------------
    select to_char(first_time,'DD-MON-RR') "Date",
    to_char(sum(decode(to_char(first_time,'HH24'),'00',1,0)),'99') " 00",
    to_char(sum(decode(to_char(first_time,'HH24'),'01',1,0)),'99') " 01",
    to_char(sum(decode(to_char(first_time,'HH24'),'02',1,0)),'99') " 02",
    to_char(sum(decode(to_char(first_time,'HH24'),'03',1,0)),'99') " 03",
    to_char(sum(decode(to_char(first_time,'HH24'),'04',1,0)),'99') " 04",
    to_char(sum(decode(to_char(first_time,'HH24'),'05',1,0)),'99') " 05",
    to_char(sum(decode(to_char(first_time,'HH24'),'06',1,0)),'99') " 06",
    to_char(sum(decode(to_char(first_time,'HH24'),'07',1,0)),'99') " 07",
    to_char(sum(decode(to_char(first_time,'HH24'),'08',1,0)),'99') " 08",
    to_char(sum(decode(to_char(first_time,'HH24'),'09',1,0)),'99') " 09",
    to_char(sum(decode(to_char(first_time,'HH24'),'10',1,0)),'99') " 10",
    to_char(sum(decode(to_char(first_time,'HH24'),'11',1,0)),'99') " 11",
    to_char(sum(decode(to_char(first_time,'HH24'),'12',1,0)),'99') " 12",
    to_char(sum(decode(to_char(first_time,'HH24'),'13',1,0)),'99') " 13",
    to_char(sum(decode(to_char(first_time,'HH24'),'14',1,0)),'99') " 14",
    to_char(sum(decode(to_char(first_time,'HH24'),'15',1,0)),'99') " 15",
    to_char(sum(decode(to_char(first_time,'HH24'),'16',1,0)),'99') " 16",
    to_char(sum(decode(to_char(first_time,'HH24'),'17',1,0)),'99') " 17",
    to_char(sum(decode(to_char(first_time,'HH24'),'18',1,0)),'99') " 18",
    to_char(sum(decode(to_char(first_time,'HH24'),'19',1,0)),'99') " 19",
    to_char(sum(decode(to_char(first_time,'HH24'),'20',1,0)),'99') " 20",
    to_char(sum(decode(to_char(first_time,'HH24'),'21',1,0)),'99') " 21",
    to_char(sum(decode(to_char(first_time,'HH24'),'22',1,0)),'99') " 22",
    to_char(sum(decode(to_char(first_time,'HH24'),'23',1,0)),'99') " 23"
    from v$log_history
    where to_char(first_time,'DD-MON-RR')='16-AUG-10'
    group by to_char(first_time,'DD-MON-RR')
    order by 1
    /

    16)I/O Generation:
    We can find out CPU and I/O generation details for all the users in the Database with the help of the below query:
    -- Show IO per session,CPU in seconds, sessionIOS.
    set linesize 140
    col spid for a6
    col program for a35 trunc
    select p.spid SPID,to_char(s.LOGON_TIME,'DDMonYY HH24:MI') date_login,s.username,decode(nvl(p.background,0),1,bg.description, s.program ) program,
    ss.value/100 CPU,physical_reads disk_io,(trunc(sysdate,'J')-trunc(logon_time,'J')) days,
    round((ss.value/100)/(decode((trunc(sysdate,'J')-trunc(logon_time,'J')),0,1,(trunc(sysdate,'J')-trunc(logon_time,'J')))),2) cpu_per_day
    from V$PROCESS p,V$SESSION s,V$SESSTAT ss,V$SESS_IO si,V$BGPROCESS bg
    where s.paddr=p.addr and ss.sid=s.sid
    and ss.statistic#=12 and si.sid=s.sid
    and bg.paddr(+)=p.addr
    and round((ss.value/100),0) > 10
    order by 8;
    To know what the session is doing and what kind of sql it is using:

    -- what kind of sql a session is using
    set lines 9999
    set pages 9999

    select s.sid, q.sql_text from v$sqltext q, v$session s
    where q.address = s.sql_address
    and s.sid = &sid order by piece;

    eg: sid=1853

    17)Sync arch:
    In a Dataguard environment we have to check primary is in sync with the secondary Database.This we can check as follows:
    The V$ MANAGED_STANDBY view on the standby database site shows you the activities performed by
    both redo transport and Redo Apply processes in a Data Guard environment
    SELECT PROCESS, CLIENT_PROCESS, SEQUENCE#, STATUS FROM V$MANAGED_STANDBY;
    In some situations, automatic gap recovery may not take place and you will need to perform gap recovery manually. For example, you will need to perform gap recovery manually if you are using logical standby databases and the primary database is not available.
    The following sections describe how to query the appropriate views to determine which log files are missing and perform manual recovery.
    On a physical standby database
    To determine if there is an archive gap on your physical standby database, query the V$ARCHIVE_GAP view as shown in the following example:
    SQL> SELECT * FROM V$ARCHIVE_GAP;

    If it displays no rows than the primary Database is in sync with the standy Database.If it display any information with row than manually we have to apply the archive logs.

    After you identify the gap, issue the following SQL statement on the primary database to locate the archived redo log files on your primary database (assuming the local archive destination on the primary database is LOG_ARCHIVE_DEST_1):
    Eg:
    SELECT NAME FROM V$ARCHIVED_LOG WHERE THREAD#=1 AND DEST_ID=1 AND SEQUENCE# BETWEEN 7 AND 10;
    Copy these log files to your physical standby database and register them using the ALTER DATABASE REGISTER LOGFILE statement on your physical standby database. For example:
    SQL> ALTER DATABASE REGISTER LOGFILE
    '/physical_standby1/thread1_dest/arcr_1_7.arc';
    SQL> ALTER DATABASE REGISTER LOGFILE
    '/physical_standby1/thread1_dest/arcr_1_8.arc';

    After you register these log files on the physical standby database, you can restart Redo Apply. The V$ARCHIVE_GAP fixed view on a physical standby database only returns the next gap that is currently blocking Redo Apply from continuing. After resolving the gap and starting Redo Apply, query the V$ARCHIVE_GAP fixed view again on the physical standby database to determine the next gap sequence, if there is one. Repeat this process until there are no more gaps.

    On a logical standby database:
    To determine if there is an archive gap, query the DBA_LOGSTDBY_LOG view on the logical standby database. For example, the following query indicates there is a gap in the sequence of archived redo log files because it displays two files for THREAD 1 on the logical standby database. (If there are no gaps, the query will show only one file for each thread.) The output shows that the highest registered file is sequence number 10, but there is a gap at the file shown as sequence number 6:
    SQL> COLUMN FILE_NAME FORMAT a55
    SQL> SELECT THREAD#, SEQUENCE#, FILE_NAME FROM DBA_LOGSTDBY_LOG L
    2> WHERE NEXT_CHANGE# NOT IN
    3> (SELECT FIRST_CHANGE# FROM DBA_LOGSTDBY_LOG WHERE L.THREAD# = THREAD#)
    4> ORDER BY THREAD#,SEQUENCE#;

    THREAD# SEQUENCE# FILE_NAME
    ---------- ---------- -----------------------------------------------
    1 6 /disk1/oracle/dbs/log-1292880008_6.arc
    1 10 /disk1/oracle/dbs/log-1292880008_10.arc

    Copy the missing log files, with sequence numbers 7, 8, and 9, to the logical standby system and register them using the ALTER DATABASE REGISTER LOGICAL LOGFILE statement on your logical standby database. For example:
    SQL> ALTER DATABASE REGISTER LOGICAL LOGFILE '/disk1/oracle/dbs/log-1292880008_10.arc';

    After you register these log files on the logical standby database, you can restart SQL Apply.

    The DBA_LOGSTDBY_LOG view on a logical standby database only returns the next gap that is currently blocking SQL Apply from continuing. After resolving the identified gap and starting SQL Apply, query the DBA_LOGSTDBY_LOG view again on the logical standby database to determine the next gap sequence, if there is one. Repeat this process until there are no more gaps.
    Monitoring Log File Archival Information:
    Step 1 Determine the current archived redo log file sequence numbers.
    Enter the following query on the primary database to determine the current archived redo log file sequence numbers:
    SQL> SELECT THREAD#, SEQUENCE#, ARCHIVED, STATUS FROM V$LOG
    WHERE STATUS='CURRENT';
    Step 2 Determine the most recent archived redo log file.
    Enter the following query at the primary database to determine which archived redo log file contains the most recently transmitted redo data:
    SQL> SELECT MAX(SEQUENCE#), THREAD# FROM V$ARCHIVED_LOG GROUP BY THREAD#;
    Step 3 Determine the most recent archived redo log file at each destination.
    Enter the following query at the primary database to determine which archived redo log file was most recently transmitted to each of the archiving destinations:
    SQL> SELECT DESTINATION, STATUS, ARCHIVED_THREAD#, ARCHIVED_SEQ#
    2> FROM V$ARCHIVE_DEST_STATUS
    3> WHERE STATUS <> 'DEFERRED' AND STATUS <> 'INACTIVE';

    DESTINATION STATUS ARCHIVED_THREAD# ARCHIVED_SEQ#
    ------------------ ------ ---------------- -------------
    /private1/prmy/lad VALID 1 947
    standby1 VALID 1 947
    The most recently written archived redo log file should be the same for each archive destination listed. If it is not, a status other than VALID might identify an error encountered during the archival operation to that destination.
    Step 4 Find out if archived redo log files have been received.
    You can issue a query at the primary database to find out if an archived redo log file was not received at a particular site. Each destination has an ID number associated with it. You can query the DEST_ID column of the V$ARCHIVE_DEST fixed view on the primary database to identify each destination's ID number.
    Assume the current local destination is 1, and one of the remote standby destination IDs is 2. To identify which log files are missing at the standby destination, issue the following query:
    SQL> SELECT LOCAL.THREAD#, LOCAL.SEQUENCE# FROM
    2> (SELECT THREAD#, SEQUENCE# FROM V$ARCHIVED_LOG WHERE DEST_ID=1)
    3> LOCAL WHERE
    4> LOCAL.SEQUENCE# NOT IN
    5> (SELECT SEQUENCE# FROM V$ARCHIVED_LOG WHERE DEST_ID=2 AND
    6> THREAD# = LOCAL.THREAD#);

    THREAD# SEQUENCE#
    --------- ---------
    1 12
    1 13
    1 14

    18)Purge arch:
    We have to make sure the archive logs files are purged safely or move to Tape drive or any other location in order to make space for new archive logs files in the Archive logs destination locations.

    19)Recovery status:
    In order to do recover make sure you are having latest archive logs,so that you can restore and do the recovery if required.


    20) MY DATABASE HEALTH CHECK SCRIPT:
    /* SCRIPT FOR MONITORING AND CHECKING HEALTH OF DATABASE-USEFUL FOR PRODUCTION DATABASES */

    -- SHOWS RUNNING JOBS
    select 'RUNNING JOBS', sid, job,instance from dba_jobs_running;
    set lines 1000
    -- SHOWS ARCHIVE LOGS GENERAION DETAILS HOURLY AND DATE WISE BASIS
    select 'ARCHIVE LOG REPORT',to_char(first_time,'DD-MON-RR') "Date",
    to_char(sum(decode(to_char(first_time,'HH24'),'00',1,0)),'99') " 00",
    to_char(sum(decode(to_char(first_time,'HH24'),'01',1,0)),'99') " 01",
    to_char(sum(decode(to_char(first_time,'HH24'),'02',1,0)),'99') " 02",
    to_char(sum(decode(to_char(first_time,'HH24'),'03',1,0)),'99') " 03",
    to_char(sum(decode(to_char(first_time,'HH24'),'04',1,0)),'99') " 04",
    to_char(sum(decode(to_char(first_time,'HH24'),'05',1,0)),'99') " 05",
    to_char(sum(decode(to_char(first_time,'HH24'),'06',1,0)),'99') " 06",
    to_char(sum(decode(to_char(first_time,'HH24'),'07',1,0)),'99') " 07",
    to_char(sum(decode(to_char(first_time,'HH24'),'08',1,0)),'99') " 08",
    to_char(sum(decode(to_char(first_time,'HH24'),'09',1,0)),'99') " 09",
    to_char(sum(decode(to_char(first_time,'HH24'),'10',1,0)),'99') " 10",
    to_char(sum(decode(to_char(first_time,'HH24'),'11',1,0)),'99') " 11",
    to_char(sum(decode(to_char(first_time,'HH24'),'12',1,0)),'99') " 12",
    to_char(sum(decode(to_char(first_time,'HH24'),'13',1,0)),'99') " 13",
    to_char(sum(decode(to_char(first_time,'HH24'),'14',1,0)),'99') " 14",
    to_char(sum(decode(to_char(first_time,'HH24'),'15',1,0)),'99') " 15",
    to_char(sum(decode(to_char(first_time,'HH24'),'16',1,0)),'99') " 16",
    to_char(sum(decode(to_char(first_time,'HH24'),'17',1,0)),'99') " 17",
    to_char(sum(decode(to_char(first_time,'HH24'),'18',1,0)),'99') " 18",
    to_char(sum(decode(to_char(first_time,'HH24'),'19',1,0)),'99') " 19",
    to_char(sum(decode(to_char(first_time,'HH24'),'20',1,0)),'99') " 20",
    to_char(sum(decode(to_char(first_time,'HH24'),'21',1,0)),'99') " 21",
    to_char(sum(decode(to_char(first_time,'HH24'),'22',1,0)),'99') " 22",
    to_char(sum(decode(to_char(first_time,'HH24'),'23',1,0)),'99') " 23"
    from v$log_history
    group by to_char(first_time,'DD-MON-RR')
    order by 1
    /
    -- WHAT ALL THE SESSIONS ARE GETTING BLOCKED
    select 'SESSIONS BLOCKED',process,sid, blocking_session from v$session where blocking_session is not null;
    -- WHICH SESSION IS BLOCKING WHICH SESSION
    set lines 9999
    set pages 9999
    select s1.username || '@' || s1.machine
    || ' ( SID=' || s1.sid || ' ) is blocking '
    || s2.username || '@' || s2.machine || ' ( SID=' || s2.sid || ' ) ' AS blocking_status
    from v$lock l1, v$session s1, v$lock l2, v$session s2
    where s1.sid=l1.sid and s2.sid=l2.sid
    and l1.BLOCK=1 and l2.request > 0
    and l1.id1 = l2.id1
    and l2.id2 = l2.id2 ;
    -- SHOWS BLOCK CHANGES DETAILS AND PHYSICAL READS DETAIL
    select a.sid,b.username,block_gets,consistent_gets,physical_reads,block_changes
    from V$SESS_IO a,V$SESSION b
    where a.sid=b.sid and block_changes > 10000 order by block_changes desc;
    -- show IO per session / CPU in seconds. sessionIOS.sql
    set linesize 140
    col spid for a6
    col program for a35 trunc
    select p.spid SPID,to_char(s.LOGON_TIME,'DDMonYY HH24:MI') date_login,s.username,decode(nvl(p.background,0),1,bg.description, s.program ) program,ss.value/100 CPU,physical_reads disk_io,(trunc(sysdate,'J')-trunc(logon_time,'J')) days,round((ss.value/100)/(decode((trunc(sysdate,'J')-trunc(logon_time,'J')),0,1,(trunc(sysdate,'J')-trunc(logon_time,'J')))),2) cpu_per_day
    from V$PROCESS p,V$SESSION s,V$SESSTAT ss,V$SESS_IO si,V$BGPROCESS bg
    where s.paddr=p.addr and ss.sid=s.sid
    and ss.statistic#=12 and si.sid=s.sid
    and bg.paddr(+)=p.addr
    and round((ss.value/100),0) > 10
    order by 8;
    -- SCRIPT TO IDENTIFY LONG RUNNING STATEMENTS
    rem LONGOPS.SQL
    rem Long Running Statements
    rem Helmut Pfau, Oracle Deutschland GmbH
    set linesize 120
    col opname format a20
    col target format a15
    col units format a10
    col time_remaining format 99990 heading Remaining[s]
    col bps format 9990.99 heading [Units/s]
    col fertig format 90.99 heading "complete[%]"
    select sid,
    opname,
    target,
    sofar,
    totalwork,
    units,
    (totalwork-sofar)/time_remaining bps,
    time_remaining,
    sofar/totalwork*100 fertig
    from v$session_longops
    where time_remaining > 0
    /
    -- ACTIVE SESSIONS IN DATABASE
    select 'ACTIVE SESSION', sid, serial#,machine, osuser,username,status from v$session where username!='NULL' and status='ACTIVE';
    -- WHAT SQL A SESSION IS USING
    set lines 9999
    set pages 9999
    select s.sid, q.sql_text from v$sqltext q, v$session s
    where q.address = s.sql_address
    and s.sid = &sid order by piece;

    eg:SID=1844
    I would like to add one more script which will tell me details regarding the Size of the Database used,occupied and available and Tablespace usage
    details along with hit ratio of various SGA components which can be very helpfull
    to monitor the performance of the Databases.

    Database_monitor.sql:

    ttitle "1. :============== Tablespace Usage Information ==================:" skip 2
    set linesize 140
    col Total format 99999.99 heading "Total space(MB)"
    col Used format 99999.99 heading "Used space(MB)"
    col Free format 99999.99 heading "Free space(MB)"
    break on report
    compute sum of Total space(MB) on report
    compute sum of Used space(MB) on report
    compute sum of Free space(MB) on report
    select a.tablespace_name, round(a.bytes/1024/1024,2) Total,
    round( nvl( b.bytes,0)/1024/1024,2) Used,
    round(nvl(c.bytes, 0)/1024/1024,2) Free ,
    round(nvl(b.bytes,0)*100/nvl(a.bytes,0),2) "% Used"
    from sys.sm$ts_avail a, sys.sm$ts_used b, sys.sm$ts_free c
    where a.tablespace_name=b.tablespace_name(+)
    and b.tablespace_name=c.tablespace_name(+);

    ttitle "2. :============== Hit Ratio Information ==================:" skip 2
    set linesize 80
    clear columns
    clear breaks
    set pagesize 60 heading off termout off echo off verify off
    REM
    col val1 new_val lib noprint
    select 100*(1-(SUM(Reloads)/SUM(Pins))) val1
    from V$LIBRARYCACHE;

    ttitle off
    col val2 new_val dict noprint
    select 100*(1-(SUM(Getmisses)/SUM(Gets))) val2
    from V$ROWCACHE;

    ttitle off
    col val3 new_val phys_reads noprint
    select Value val3
    from V$SYSSTAT
    where Name = 'physical reads';

    ttitle off
    col val4 new_val log1_reads noprint
    select Value val4
    from V$SYSSTAT
    where Name = 'db block gets';

    ttitle off
    col val5 new_val log2_reads noprint
    select Value val5
    from V$SYSSTAT
    where Name = 'consistent gets';

    ttitle off
    col val6 new_val chr noprint
    select 100*(1-(&phys_reads / (&log1_reads + &log2_reads))) val6
    from DUAL;

    ttitle off
    col val7 new_val avg_users_cursor noprint
    col val8 new_val avg_stmts_exe noprint
    select SUM(Users_Opening)/COUNT(*) val7,
    SUM(Executions)/COUNT(*) val8
    from V$SQLAREA;
    ttitle off
    set termout on
    set heading off
    ttitle -
    center 'SGA Cache Hit Ratios' skip 2

    select 'Data Block Buffer Hit Ratio : '||&chr db_hit_ratio,
    ' Shared SQL Pool ',
    ' Dictionary Hit Ratio : '||&dict dict_hit,
    ' Shared SQL Buffers (Library Cache) ',
    ' Cache Hit Ratio : '||&lib lib_hit,
    ' Avg. Users/Stmt : '||
    &avg_users_cursor||' ',
    ' Avg. Executes/Stmt : '||
    &avg_stmts_exe||' '
    from DUAL;

    ttitle "3. :============== Sort Information ==================:" skip 2

    select A.Value Disk_Sorts,
    B.Value Memory_Sorts,
    ROUND(100*A.Value/
    DECODE((A.Value+B.Value),0,1,(A.Value+B.Value)),2)
    Pct_Disk_Sorts
    from V$SYSSTAT A, V$SYSSTAT B
    where A.Name = 'sorts (disk)'
    and B.Name = 'sorts (memory)';

    ttitle "4. :============== Database Size Information ==================:" skip 2


    select sum(bytes/1024/1024/1024) Avail from sm$ts_avail union all select sum(bytes/1024/1024/1024) Used from sm$ts_used union all select sum(bytes/1024/1024/1024) Free from sm$ts_free;


    Hope this helps you in monitoring your Databases.

Thursday, May 10, 2012

Oracle 11gR2 – Active Data Guard 

Active Data Guard allows a standby database to be opened for read-only access whilst redo is still being applied. For some applications Active Data Guard can represent a more efficient use of Oracle licenses on the standby database.  However, this benefit is offset to a certain extent by the fact that Active Data Guard is available on Enterprise Edition only and is cost option which must be licensed on both the primary and standby database.
Several of my customers are currently using Active Data Guard; in general they are very happy with it. A few others have discovered that it is very easy to inadvertently enable Active Data Guard. This is not desirable or advisable as Oracle have instigated licence audits with a large number of UK customers over the past couple of years.
To determine whether a standby database is using Active Data Guard use the following query:

SELECT database_role, open_mode FROM v$database;
 
For example:
SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY WITH APPLY
 
 
If you start a database in SQL*Plus using the STARTUP command and then invoke managed recovery, the Active Data Guard will be enabled. For example:

[oracle@server14]$ sqlplus / as sysdba
 SQL> STARTUP
ORACLE instance started.
Total System Global Area 6497189888 bytes
Fixed Size 2238672 bytes
Variable Size 3372222256 bytes
Database Buffers 3103784960 bytes
Redo Buffers 18944000 bytes
Database mounted
Database opened

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
WITH SESSION SHUTDOWN;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY WITH APPLY
 
However, if the database is started in SQL*Plus using the STARTUP MOUNT command and then managed recovery is invoked, Active Data Guard will not be enabled.

[oracle@server14]$ sqlplus / as sysdba
SQL> STARTUP MOUNT
ORACLE instance started.
Total System Global Area 6497189888 bytes
Fixed Size 2238672 bytes
Variable Size 3372222256 bytes
Database Buffers 3103784960 bytes
Redo Buffers 18944000 bytes
Database mounted
Database opened

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY MOUNTED

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
WITH SESSION SHUTDOWN;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY MOUNTED
 
 
In the database has been started in SQL*Plus using STARTUP MOUNT and the database is subsequently opened read only, then invoking managed recovery will enable Active Data Guard. For example:

[oracle@server14]$ sqlplus / as sysdba
SQL> STARTUP MOUNT
ORACLE instance started.
Total System Global Area 6497189888 bytes
Fixed Size 2238672 bytes
Variable Size 3372222256 bytes
Database Buffers 3103784960 bytes
Redo Buffers 18944000 bytes
Database mounted
Database opened

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY MOUNTED

SQL> ALTER DATABASE OPEN READ ONLY;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
WITH SESSION SHUTDOWN;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY WITH APPLY
 
 
Of course not all databases are started using SQL*Plus.
If you start the database using SRVCTL then the default open mode can be specified in the OCR.
You can check the default open mode for a database using SRVCTL CONFIG DATABASE. For example if the database is called PROD:

[oracle@server14]$ srvctl config database -d PROD
Database unique name: PROD
Database name: PROD
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile: +DATA1/PROD/spfilePROD.ora
Domain:
Start options: open
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Server pools: PROD
Disk Groups: DATA1, FRA1
Mount point paths:
Services:
Type: SINGLE
Database is administrator managed
 
In the above example, if the PROD database is started using SRVCTL then the database will be opened in read-only mode. For example:

[oracle@server14]$ srvctl start database -d PROD
[oracle@server14]$ sqlplus / as sysdba

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
WITH SESSION SHUTDOWN;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY WITH APPLY
The default start mode can be modified in the OCR using the SRVCTL MODIFY DATABASE command.
For example:

[oracle@server14]$ srvctl modify database -d PROD -s mount
The database configuration is updated as follows:
[oracle@server14]$ srvctl config database -d PROD
Database unique name: PROD
Database name: PROD
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile: +DATA1/PROD/spfilePROD.ora
Domain:
Start options: mount
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Server pools: PROD
Disk Groups: DATA1, FRA1
Mount point paths:
Services:
Type: SINGLE
Database is administrator managed
 
When the default start mode is set to mount, Active Data Guard will not be enabled when managed recovery is invoked. For example:

[oracle@server14]$ srvctl start database -d PROD
 
 [oracle@server14]$ sqlplus / as sysdba

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY MOUNTED

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
WITH SESSION SHUTDOWN;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY MOUNTED
You can also specify the start mode as a parameter to the SRVCTL START DATABASE command
For example:
[oracle@server14] srvctl start database -d PROD -o open
[oracle@server14] srvctl start database -d PROD -o mount
Take care when performing a switchover or switchback that the OCR is updated as part of the procedure.


Note : Below command helps to get the archive gap or archive lag from standby database.


SQL> select name,open_mode from v$database;

NAME      OPEN_MODE
--------- --------------------
PR01PIMI  READ ONLY WITH APPLY

SQL> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received",
APPL.SEQUENCE# "Last Sequence Applied", (ARCH.SEQUENCE# - APPL.SEQUENCE#)
"Difference"
FROM
(SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME )  2    3    4    5 
IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#))
ARCH,
(SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME )
IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#))
APPL
WHERE
ARCH.THREAD# = APPL.THREAD#
ORDER BY 1
/  6    7    8    9   10   11   12   13   14 

    Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
         1                  72253                 72253          0
         2                  63653                 63653          0
         3                  57721                 57720          1
         4                  63051                 63050          1
         5                  61088                 61086          2
         6                  98916                 98915          1
         7                  68499                 68499          0
         8                  77052                 77052          0

8 rows selected.

Wednesday, May 9, 2012

Oracle's Sun Database Machine X2-2 Setup / Configuration Best Practices

 

Oracle's Sun Database Machine X2-2 Setup / Configuration Best Practices

Applies to:

Oracle Exadata Storage Server Software - Version: 11.2.1.2.0 to 11.2.2.1.1 - Release: 11.2 to 11.2
The information in this document applies to any platform.

Purpose

The goal of this document is to present the best practices for the deployment of Sun Oracle Database Machine X2-2 in the area of Setup and Configuration.

The Scope and Application

general audience working on the Sun Oracle Database Machine X2-2

Oracle's Sun Database Machine X2-2 Setup / Configuration Best Practices

The Primary and the standby databases should NOT reside on the IB fabric, the same

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact: To properly protect the primary databases residing on the "primary" Exadata Database Machine, the physical standby database requires fault isolation from IB switch maintenance issues, IB switch failures and software issues, RDS bugs and timeouts or any issue resulting from a complete IB fabric failure. To protect the standby from these failures that impact the primary's availability, we highly recommend that at least one viable standby database resides on a separate IB fabric.

Risk:
If the primary and standby resides on the same IB fabric, both primary and standby systems can be unavailable due a bug causing an IB fabric failure.
Action / Repair: The primary and at least one viable standby database must not reside on the same inter-racked Exadata Database Machine. The communication between the primary and standby Exadata Database Machines must use GigE or 10GigE. The trade-off is lower network bandwidth . The higher network bandwidth is desirable for standby database instantiation (should only be done first time) but that requirement is eliminated for post-failover operations when flashback database is enabled.

Use the hostname and domain name in lower case

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
Using lowercase will avoid known deployment time issues.
Risk:
OneCommand deployment will fail in step 16 if this is not done. This will abort the installation with:
"ERROR: unable to locate the file to check for the string 'Configure Oracle Grid Infrastructure for the Cluster, a ... succeeded' # Step 16 #"
Action / the Repair:
As a best practice, user lower case for the hostnames and domain names

Verify Hardware and Firmware on Database and Storage Servers (CheckHWnFWProfile)

Priority Added Machine the Type The OS the Type Exadat Version Oracle the Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
The Oracle Exadata Database Machine is tightly integrated, and verifying the hardware and firmware before the Oracle Exadata Database Machine is placed into or returned to production status can avoid problems related to the hardware or firmware modifications.
The impact for these verification steps is minimal.
Risk:
If the hardware and firmware are not validated, inconsistencies between database and storage servers can lead to problems and outages.
Action / the Repair:
To verify the hardware and firmware configuration execute the following command as the root userid:
 
  / Opt / oracle.SupportTools / CheckHWnFWProfile 
 
 
The output, will contain a line similar to the following:
  [SUCCESS] The hardware and firmware profile matches one of the supported profile 
If any result other than "SUCCESS" is returned, investigate and correct the condition.
NOTE: CheckHWnFWProfile is also executed at each boot of the storage and database servers.

Verify Software on Storage Servers (CheckSWProfile.sh)

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
Verifying the software configuration after initial deployment, upgrades, or patching and before the Oracle Exadata Database Machine is placed into or returned to production status can avoid problems related to the software modifications.
The overhead for these verification steps is minimal.
Risk:
If the software is not validated, inconsistencies between database and storage servers can lead to problems and outages.
Action / the Repair:
To verify the storage server software configuration execute the following command as the root userid:
 
  / Opt / oracle.SupportTools / CheckSWProfile.sh-c 
 
The output will be similar to:
  [INFO] SUCCESS: Meets requirements of operating platform and installed software for
 [INFO] below listed releases and patches of Exadata and of corresponding Database.
 [INFO] the Check does NOT verify correctness of configuration for the installed software and.

 [The_ExadataAndDatabaseReleases]
 Exadata: 11.2.2.1.0 OracleDatabase: 11.2.0.2 + Patches 
If any result other than "SUCCESS" is returned, investigate and correct the condition.

Verify Software on InfiniBand Switches (CheckSWProfile.sh)

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
Verifying the software configuration after initial deployment, upgrades, or patching and before the Oracle Exadata Database Machine is placed into or returned to production status can avoid problems related to the software modifications.
The overhead for these verification steps is minimal.
Risk:
If the software is not validated, problems may occur when the machine is utilized.
Action / the Repair:
The commands required to verify the InfiniBand switches software configuration vary slightly by the physcial configuration of the Oracle Exadata Database Machine. The key difference is whether or not the physical configuration includes a designated spine switch.
To verify the InfiniBand switches software configuration for a X2-8, a full rack Oracle Exadata Database Machine X2-2 or a late production model half rack Oracle Exadata Database Machine X2-2, with a designated spine switch properly configured per the "Oracle Exadata Database Machine Owner's Guide 11g Release 2 (11.2) E13874-15 "with" sm_priority = 8 ", and the name" RanDomsw-ib1 ", execute the following command as

 the" root "userid on one of the database servers:

 
  / Opt / oracle.SupportTools / CheckSWProfile.sh-I IS_SPINERanDomsw-ib1, RanDomsw-ib3, RanDomsw-ib2 
 

Where "RanDomsw-ib1, RanDomsw-ib3, and RanDomsw-ib2" are the switch names returned by the "ibswitches" command.
NOTE: There is no space between the "IS_SPINE" qualifier and the name of the designated spine switch.
The output will be similar to:
 Checking if switch RanDomsw-ib1 is pingable ... Checking if switch RanDomsw-ib3 is pingable ... Checking if switch RanDomsw-ib2 is pingable ... Use the default password for all switches? (Y / n) [n]: y [INFO] SUCCESS Switch RanDomsw-ib1 has correct software and firmware version: SWVer: 1.3.3-2 [INFO] SUCCESS Switch RanDomsw-ib1 has correct opensm configuration: controlled_handover = TRUE polling_retry_number = 5 routing_engine = ftree sminfo_polling_timeout = 1000 sm_priority = 8 [INFO] SUCCESS Switch RanDomsw-ib3 has correct software and firmware version: SWVer: 1.3.3-2 [INFO] SUCCESS Switch RanDomsw-ib3 has correct opensm configuration: controlled_handover = TRUE polling_retry_number = 5 routing_engine = ftree sminfo_polling_timeout = 1000 sm_priority = 5 [INFO] SUCCESS Switch RanDomsw-ib2 has correct software and firmware version: SWVer: 1.3.3-2 [INFO] SUCCESS Switch RanDomsw-ib2 has correct opensm configuration: controlled_handover = TRUE polling_retry_number = 5 routing_engine = ftree sminfo_polling_timeout = 1000 sm_priority = 5 [INFO] SUCCESS All switches have correct software and firmware version: SWVer: 1.3.3-2 [INFO] SUCCESS All switches have correct opensm configuration: controlled_handover = TRUE polling_retry_number = 5 routing_engine = ftree sminfo_polling_timeout = 1000 sm_priority = 5 for non spine and 8 for spine switch5 
To verify the InfiniBand switches software configuration for an early production model half rack Oracle Exadata Database Machine X2-2 (may not have shipped with a designated spine switch), or a quarter rack Oracle Exadata Database Machine X2-2 properly configured per the "Oracle Exadata Database Machine Owner's Guide 11g Release 2 (11.2) E13874-15 ", execute the following command as the" root "userid on one of the database servers:

 
  / Opt / oracle.SupportTools / CheckSWProfile.sh-I RanDomsw-ib3, RanDomsw-ib2 
 
 
Where "RanDomsw-ib3 and RanDomsw-ib2" are the switch names returned by the "ibswitches" command.
The output will be similar to the output for the first command, but there will be no references to a spine switch and all switches will have "sm_priority" of 5.
In either command case, the expected output is to return "SUCCESS". If anything else is returned, investigate and correct the condition.

Verify the InfiniBand Cable, the Connection the Quality

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
InfiniBand cables require proper connections for optimal efficiency. Verifying the InfiniBand cable connection quality helps to ensure that the InfiniBand network operates at optimal efficiency.
There is minimal impact to verify InfiniBand cable connection quality.
Risk:
InfiniBand cables that are not properly connected may negotiate to a lower speed, work intermittently, or fail.
Action / the Repair:
Execute the following command on all database and storage servers:
  for ib_cable in `ls / sys / class / net | grep ^ ib`; do printf "$ ib_cable:"; cat / sys / class / net / $ ib_cable / carrier; done 
Of The output, should look similar to:
  ib0: 1
 ib1: 1 
If anything other than "1" is reported, investigate that cable connection.
NOTE: Storage servers should report 2 connections. X2-2 (4170) and X2-2 database servers should report 2 connections. X2-8 database servers should report 8 connections.

Verify the Ethernet Cable, the Connection Quality

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
Ethernet cables require proper connections for optimal efficiency. Verifying the Ethernet cable connection quality helps to ensure that the Ethernet network operates at optimal efficiency.
There is minimal impact to verify the Ethernet cable connection quality.
Risk:
Ethernet cables that are not properly connected may negotiate to a lower speed, work intermittently, or fail.
Action / the Repair:
Execute the following command as the root userid on all database and storage servers:
  for cable in `ls / sys / class / net | grep ^ eth`; do printf "$ cable:"; cat / sys / class / net / $ cable / carrier; done 
Of The output, should look similar to:
  eth0: 1
 eth1: cat: / sys/class/net/eth1/carrier: Invalid argument
 eth2: cat: / sys/class/net/eth2/carrier: Invalid argument
 eth3: cat: / sys/class/net/eth3/carrier: Invalid argument
 eth4: 1
 eth5: 1 
"Invalid argument" usually indicates the device has not been configured and is not in use. If a device reports "0", investigate that cable connection.
NOTE: Within machine types, the output of this command will vary by customer depending on how the customer chooses to configure the available ethernet cards.

Verify the InfiniBand fabric, the Topology (verify-topology)

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
Verifying that the InfiniBand network is configured with the correct topology for an Oracle Exadata Database Machine helps to ensure that the InfiniBand network operates at maximum efficiency.
Risk:
An incorrect InfiniBand topology will cause the InfiniBand network to operate at degraded efficiency, intermittently, or fail to operate.
Action / the Repair:
Execute the verify-topology command as shown below:
  / Opt / oracle.SupportTools / ibdiagtools / verify-topology-t fattree 
The output will be similar to:
  [DB Machine the InfiniBand Cabling the Topology Verification Tool]
 Is every external switch connected to every internal switch .......... [SUCCESS by
 Are any external switches connected to the each other .............. SUCCESS by
 Are any hosts connected to spine switch .............................. [SUCCESS]
 Check if all hosts have 2 CAs to different switches .................. [SUCCESS]
 Leaf switch check: cardinality and even distribution ................. [SUCCESS] 
If anything other than "SUCCESS" is reported, investigate and correct the condition.

Verify No InfiniBand Network Errors (ibqueryerrors)

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
Verifying that there are no high, persistent InfiniBand network error counters helps to maintain the InfiniBand network at peak efficiency.
The impact of verifying there are no the InfiniBand network errors is minimal.
Risk:
Without verifying the InfiniBand network error counters, there is a risk that a component will degrade the InfiniBand network performance, yet may not be sending an alert or error condition.
Action / the Repair:
Use the command shown below on one of the database or storage servers:
  ibqueryerrors.pl-rR-s RcvSwRelayErrors, XmtDiscards, XmtWait 
There should be no errors reported.
The InfiniBand counters are cumulative and the errors may have occurred at any time in the past. If there are errors, it is recommended to clear the InfiniBand counters with ibclearcounters, let the system run for a few minutes under load, and then re-execute the ibquerryerrors command. Any links reporting persistent errors (especially RcvErrors or SymbolErrors) may indicate a bad / loose cable or port.
Some counters (eg RcvErrors, SymbolErrors) can increment when nodes are rebooted. Small values ​​for these counters which are less than the "LinkDowned" counter are generally not a problem. The "LinkDowned" counter indicates the number of times the port has gone down (usually for valid reasons, eg reboot) and is not usually an error indicator by itself.
If there are persistent, high InfiniBand network error counters, investigate and correct the condition.

Verify There is Are No. the Storage Server Memory (ECC) Errors

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
Memory modules that have corrected Memory Errors (ECC) can show degraded performance, IPMI driver timeouts, and BMC error messages in / var / log / messages file.
Correcting the condition restores optimal performance.
The impact of checking for memory ECC errors is slight. Correction will likely require a firmware upgrade and reboot, or hardware repair downtime.
Risk:
If not corrected, the faulty memory will lead to performance degradation and other errors.
Action / the Repair:
To check for memory ECC errors, run the following command as the root userid on the storage server:
  the ipmitool sel list | grep of ECC | cut-f1-d: | sort-u 
If any errors are reported, take the following actions in order:
  1. Upgrade to the latest the BIOS as it addresses a potential cause,
  2. The reseat the DIMMs.
  3. The Open an SR for hardware replacement.

Verify Database Server Disk the Controller the Configuration

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
For X2-2, there are 4 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 3 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
For X2-8, there are 8 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 7 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of validating the RAID devices is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Risk:
Not verifying the RAID devices increases the chance of a performance degradation or an outage.
Action / the Repair:
To verify the database server disk controller configuration, use the following command:
  / Opt/MegaRAID/MegaCli/MegaCli64 AdpAllInfo-aALL | grep "Device Present"-A 8 
For X2-2, the output will be similar to:
  The Device Present
 ================
 Virtual Drives,: 1
 Degraded: The 0
 The Offline: 0
 The Physical Devices,: 5
 Disks: 4
 The Critical Disks: 0
 Failed Disks: 0 
The expected output is 1 virtual drive, none degraded or offline, 5 physical devices (controller + 4 disks), 4 disks, and no critical or failed disks.
For X2-8, the output will be similar to:
  The Device Present
 ================
 Virtual Drives,: 1
 Degraded: The 0
 The Offline: 0
 The Physical Devices,: 11
 Disks: 8
 The Critical Disks: 0
 Failed Disks: 0 
The expected output is 1 virtual drive, none degraded or offline, 11 physical devices (1 controller + 8 disks + 2 SAS2 expansion ports), 8 disks, and no critical or failed disks.
On X2-8, there is a SAS2 expander on each NEM, which takes in the 8 ports from the Niwot REM and expands it out to both the 8 physical drive slots through the midplane and the 2 SAS2 expansion ports external on each NEM. See the output below from the MegaRaid ? the FW event log.
If the reported output differs, investigate and correct the condition.

Verify Database Server Virtual Drive Configuration

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
For X2-2, there are 4 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 3 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
For X2-8, there are 8 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 7 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of validating the virtual drives is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Risk:
Not verifying the virtual drives increases the chance of a performance degradation or an outage.
Action / the Repair:
To verify the database server the virtual drive configuration, use the following command:
  / Opt/MegaRAID/MegaCli/MegaCli64 CfgDsply-aALL | grep "Virtual Drive:" ;/ opt/MegaRAID/MegaCli/MegaCli64 CfgDsply-aALL | grep "Number Of Drives" ;/ opt/MegaRAID/MegaCli/MegaCli64 CfgDsply-aALL | grep "^ State" 
For X2-2 of the output should be similar to:
  Virtual Drive: 0 (Target Id: 0)
 Number Of Drives,: 3
 State: Optimal 
The expected result is that the virtual device has 3 drives and a state of optimal.
For X2-8, the output should be similar to:
  Virtual Drive: 0 (Target Id: 0)
 Number Of Drives: 7
 State: Optimal 
The expected result is that the virtual device has 7 drives and a state of optimal.
If the reported output differs, investigate and correct the condition.
NOTE: The virtual device number reported may vary depending upon configuration and version levels.
NOTE: If a bare metal restore procedure is performed on a database server without using the "dualboot = no" configuration, that database server may be left with three virtual devices for X2-2 and 7 for X2-8. Please see My Oracle Support note 1,323,309.1 for additional information and correction instructions.

Verify the Database Server, the Physical Drive Configuration

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
For X2-2, there are 4 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 3 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
For X2-8, there are 8 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 7 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of validating the physical drives is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Risk:
Not verifying the physical drives increases the chance of a performance degradation or an outage.
Action / the Repair:
To verify the database server the physical drive configuration, use the following command:
  / Opt/MegaRAID/MegaCli/MegaCli64 PDList-aALL | grep "Firmware state" 
The output, for X2-2 will be similar to:
  Firmware the state: Online, Spun Up
 Firmware the state: Online, Spun Up
 Firmware the state: Online, Spun Up
 Firmware state: Hotspare, Spun down 
There should be three lines of output showing a state of "Online, Spun Up", and one line showing a state of "Hotspare, Spun down". The ordering of the output lines is not significant and may vary based upon a given database server's physical drive replacement history.
The output, for X2-8 will be similar to:
  Firmware the state: Online, Spun Up
 Firmware the state: Online, Spun Up
 Firmware the state: Online, Spun Up
 Firmware the state: Online, Spun Up
 Firmware the state: Online, Spun Up
 Firmware the state: Online, Spun Up
 Firmware the state: Online, Spun Up
 Firmware state: Hotspare, Spun down 
There should be seven lines of output showing a state of "Online, Spun Up", and one line showing a state of "Hotspare, Spun down". The ordering of the output lines is not significant and may vary based upon a given database server's physical drive replacement history.
If the reported output differs, investigate and correct the condition.

Verify the InfiniBand is the Private Network for Oracle Clusterware Communication

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
The InfiniBand network in an Oracle Exadata Database Machine provides superior performance and throughput characteristics that allow Oracle Clusterware to operate at optimal efficiency.
The overhead for these verification steps is minimal.
Risk:
If the the InfiniBand network is not used for the Oracle Clusterware communication, the performance will be sub-optimal.
Action / the Repair:
The InfiniBand network is preconfigured on the storage servers. Perform the following on the database servers:
Verify the InfiniBand network is the private network used for Oracle Clusterware communication with the following command:
  $ GI_HOME / bin / oifcfg getif-type cluster_interconnect 
For X2-2 of the output should be similar to:
  bondib0 192.168.8.0 global cluster_interconnect 
For X2-8 the output should be similar to:
  bondib0 192.168.8.0 global cluster_interconnect
 bondib1 192.168.8.0 global cluster_interconnect
 bondib2 192.168.8.0 global cluster_interconnect
 bondib3 192.168.8.0 global cluster_interconnect 
If the InfiniBand network is not the private network used for Oracle Clusterware communication, configure it following the instructions in MOS note 1073502.1, "How to Modify Private Network Interface in 11.2 Grid Infrastructure".
NOTE: It is important to ensure that your public interface is properly marked as public and not private. This can be checked with the oifcfg getif command. If it is inadvertantly marked private, you can get errors such as "OS system dependent operation: bind failed with status "and" OS failure message: Cannot assign requested address ". It can be corrected with a command like oifcfg setif-global eth0 / : public

Verify the Oracle RAC Databases use the RDS Protocol over InfiniBand Network.

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
The RDS protocol over InfiniBand provides superior performance because it avoids additional memory buffering operations when moving data from process memory to the network interface for IO operations. This includes both IO operations between the Oracle instance and the storage servers, as well as instance to instance block transfers Via Cache Fusion.
There is minimal impact to verify that the RDS protocol is in use. Implementing the RDS protocol over InfiniBand requires an outage to relink the Oracle software.
Risk:
If the Oracle RAC databases do not use RDS protocol over the InfiniBand network, IO operations will be sub-optimal.
Action / the Repair:
To verify the RDS protocol is in use by a given Oracle instance, set the ORACLE_HOME and LD_LIBRARY_PATH variables properly for the instance and execute the following command as the oracle userid on each database server where the instance is running:
  $ ORACLE_HOME / bin / skgxpinfo 
The output, should be:
  rds 
Note: For Oracle software versions below 11.2.0.2, the skgxpinfo command is not present. For 11.2.0.1, you can copy over skgxpinfo to the proper path in your 11.2.0.1 environment from an available 11.2.0.2 environment and execute it against the 11.2.0.1 database home (s) using the provided command.
Note: An alternative check (regardless of Oracle software version) is to scan each instance's alert log (must contain a startup sequence!) For the following line:
  Cluster communication is configured to use the following interface (s) for this instance 192.168.20.21 cluster interconnect IPC version: Oracle RDS / IP (generic) 
If the instance is not using the RDS protocol over InfiniBand, relink the Oracle binary using the following commands (with variables properly defined for each home being linked):
  • (As oracle) Shutdown, any processes using the Oracle binary
  • If and only if relinking the grid infrastructure home, then (as root) GRID_HOME / crs / install / rootcrs.pl-unlock
  • (As oracle) cd $ ORACLE_HOME / rdbms / lib
  • (As oracle) make-f ins_rdbms.mk ipc_rds ioracle
  • If and only if relinking the Grid Infrastructure home, then (as root) GRID_HOME / crs / install / rootcrs.pl-patch
Note: Avoid using the relink all command due to various issues. Use the make commands provided.

Configure Storage Server alerts to be sent via e-mail

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
Oracle Exadata Storage Servers can send various levels of alerts and clear messages via email or snmp, or both. Sending these messages via email at a minimum helps to ensure that a problem is detected and corrected.
There is little impact to storage the server operation to send these messages via email.
Risk:
If the storage servers are not configured to send alerts and clear messages via email at a minimum, there is an increased risk of a problem not being detected in a timely manner.
Action / the Repair:
Use the following cellcli command to validate the email configuration by sending a test email:
  alter the cell the validate mail; 
The output will be similar to:
  Cell slcc09cel01 successfully altered 
If the output is not successful, configure a storage server to send email alerts using the following cellcli command (tailored to your environment):
  ALTER CELL smtpServer = 'mailserver.maildomain.com', -
 smtpFromAddr = 'firstname.lastname @ maildomain.com', -
 smtpToAddr = 'firstname.lastname @ maildomain.com', -
 smtpFrom = 'Exadata cell', -
 smtpPort = '', -
 smtpUseSSL = 'TRUE', -
 notificationPolicy = 'critical, warning, clear', -
 notificationMethod = 'mail'; 
NOTE: The recommended best practice to monitor an Oracle Exadata Database Machine is with Oracle Enterprise Manager (OEM) and the suite of OEM plugins developed for the Oracle Exadata Database Machine. Please reference My Oracle Support (MOS) note 1110675.1 for details.

Configure the NTP and Timezone on the the InfiniBand switches

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
Synchronized timestamps are important to switch operation and message logging, both within an InfiniBand switch between the InfiniBand switches. There is little impact to correctly configure the switches.
Risk:
If the InfiniBand switches are not correctly configured, there is a risk of improper operation and disjoint message timestamping.
Action / Repair:
The InfiniBand switches should be properly configured during the initial deployment process. If for some reason they were were not, please consult the “Configuring Sun Datacenter InfiniBand Switch 36 Switch” section of the “Oracle® Exadata Database Machine Owner's Guide, 11g Release 2 (11.2)”.

Verify NUMA Configuration

Priority Added Machine Type OS Type Exadata Version Oracle Version
Critical N/A X2-2(4170), X2-2, X2-8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
X2-2 Database servers in Oracle Exadata Database Machine by default are booted with operating system NUMA support enabled. Commands that manipulate large files without using direct I/O on ext3 file systems will cause low memory conditions on the NUMA node (Xeon 5500 processor) currently running the process.
By turning NUMA off, a potential local node low memory condition and subsequent performance drop is avoided.
X2-8 Database servers should have NUMA on
The impact of turning NUMA off is minimal.
Risk:
Once local node memory is depleted, system performance as a whole will be severely impacted.
Action / Repair:
Follow the instructions in MOS Note 1053332.1 to turn NUMA off in the kernel for database servers.
NOTE: NUMA is configured to be off in the storage servers and should not be changed.

Set “mpt_cmd_retry_count=10″ in /etc/modprobe.conf on Storage Servers

Priority Added Machine Type OS Type Exadata Version Oracle Version
Critical N/A X2-2(4170), X2-2, X2-8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
If a flash card DOM fails, and a storage server is rebooted, the startup sequence can hang for a very long time starting the udev service.
By setting “mpt_cmd_retry_count=10″, the potential delay at boot time is avoided.
The impact of setting “mpt_cmd_retry_count=10″ is minimal, and it will take effect with the next reboot.
Risk:
If rebooted with a failed flash card DOM, a storage server may hang for an extended period.
Action / Repair:
Add the following line to the /etc/modprobe.conf file and reboot the storage server:
options mptsas mpt_cmd_retry_count=10
NOTE: This configuration will be implemented in an upcoming Exadata image patch.

Configure Storage Server Flash Memory as Exadata Smart Flash Cache

Priority Added Machine Type OS Type Exadata Version Oracle Version
Critical N/A X2-2(4170), X2-2, X2-8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
For the vast majority of situations, maximum performance is achieved by configuring the storage server flash memory as cache, allowing the Exadata software to determine the content of the cache.
The impact of configuring storage server flash memory as cache at initial deployment is minimal. If there are already grid disks configured in the flash memory, consideration must be given as to the relocation of the data when converting the flash memory back to cache.
Risk:
Not configuring the storage server flash memory as cache may result in a degradation of overall performance.
Action / Repair:
To confirm all storage server flash memory is configured as smart flash cache, execute the command shown below:
 cellcli -e "list flashcache detail" | grep size 
The output will be similar to:
  size: 365.25G 
The expected result is 365.25G. If the size is less than that, it is an indication that some or all of the storage server flash memory has been configured as grid disks. Investigate and correct the condition.

Verify database server disk controllers use writeback cache

Priority Added Machine Type OS Type Exadata Version Oracle Version
Critical N/A X2-2(4170), X2-2, X2-8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
Database servers use an internal RAID controller with a battery-backed cache to host local filesystems. For maximum performance when writing I/O to local disks, the battery-backed cache should be in “WriteBack” mode.
The impact of configuring the battery-backed cache in “WriteBack” mode is minimal.
Risk:
Not configuring the battery-backed cache in “WriteBack” mode will result in degraded performance when writing I/O to the local database server disks {AB: and there is a risk of data corruption if the node panics].
Action / Repair:
To verify that the disk controller battery-backed cache is in “WriteBack” mode, run the following command on all database servers:
 /opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -a0 | grep -i writethrough 
There should be no output returned.
If the battery-backed cache is not in “WriteBack” mode, run these commands on the effected server to place the battery-backed cache into “WriteBack” mode:
 /opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp WB -Lall -a0
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp NoCachedBadBBU -Lall -a0
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp NORA -Lall -a0
/opt/MegaRAID/MegaCli/MegaCli64 -LDSetProp Direct -Lall -a0 
NOTE: No settings should be modified on Exadata storage cells. The mode described above applies only to database servers in an Exadata database machine.

Verify that “Disk Cache Policy” is set to “Disabled”

Priority Added Machine Type OS Type Exadata Version Oracle Version
Critical 06/13/11 X2-2(4170), X2-2, X2-8 Linux 11.2.x + 11.2.x +


Benefit / Impact:
“Disk Cache Policy” is set to “Disabled” by default at imaging time and should not be changed because the cache created by setting “Disk Cache Policy” to “Enabled” is not battery backed. It is possible that a replacement drive has the disk cache policy enabled so its a good idea to check this setting after replacing a drive.
The impact of verifying that “Disk Cache Policy” is set to “Disabled” is minimal. The impact of suddenly losing power with “Disk Cache Policy” set to anything other than “Disabled” will vary according to each specific case, and cannot be estimated here.
Risk:
If the “Disk Cache Policy” is not “Disabled”, there is a risk of data loss in the event of a sudden power loss because the cache created by “Disk Cache Policy” is not backed up by a battery.
Action / Repair:
To verify that “Disk Cache Policy” is set to “Disabled” on all servers, use the following command as the “root” userid on the first database server in the cluster:
 dcli -g /opt/oracle.SupportTools/onecommand/all_group -l root /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aALL | grep -i 'Disk Cache Policy' 
The output will be similar to:
 randomdb01: Disk Cache Policy : Disabled
randomdb01: Disk Cache Policy : Disabled
randomdb01: Disk Cache Policy : Disabled

randomcel03: Disk Cache Policy : Disabled
randomcel03: Disk Cache Policy : Disabled
randomcel03: Disk Cache Policy : Disabled
randomcel03: Disk Cache Policy : Disabled
randomcel03: Disk Cache Policy : Disabled 
If any of the results are other than “Disabled”, identify the LUN in question and reset the “Disk Cache Policy” to “Disabled” using the following command (where Lx= the lun in question, for example: L2):
 MegaCli64 -LDSetProp -DisDskCache -Lx -a0 
Note: The “Disk Cache Policy” is completely separate from the disk controller caching mode of “WriteBack”. Do not
confuse the two. The cache created by “WriteBack” cache mode is battery-backed, the cache created by “Disk Cache Policy” is not!

Verify Master (Rack) Serial Number is Set

Priority Added Machine Type OS Type Exadata Version Oracle Version
Critical 03/02/11 X2-2(4170), X2-2, X2-8 Linux 11.2.x + 11.2.x +


Benefit/Impact
Setting the Master Serial Number (MSN) (aka Rack Serial Number) assists Oracle Support Services to resolve entitlement issues which may arise. The MSN is listed on a label on the front and the rear of the chassis but is not electronically readable unless this value is set.
The impact to set the MSN is minimal.
Risk
Not having the MSN set for the system may hinder entitlement when opening Service Requests.
Action/Repair
Use the following command to verify that all the MSN's are set correctly and all match:
 ipmitool sunoem cli "show /SP system_identifier" 
The output should resemble one of the following:
EV2: Sun Oracle Database Machine xxxxAKyyyy
X2-2: Exadata Database Machine X2-2 xxxxAKyyyy
X2-8: Exadata Database Machine X2-8 xxxAKyyyy
(MSN's almost always have 4 numbers, the letters 'AK' followed by 4 more numbers)
If none of the values are set contact X64 Support.
If one is not set correctly set it to match the others with the command:
ipmitool sunoem cli 'set /SP system_identifier="Exadata Database Machine X2-2 xxxxAKyyyy"' 

Verify Management Network Interface (eth0) is on a Separate Subnet

Priority Added Machine Type OS Type Exadata Version Oracle Version
Critical 03/02/11 X2-2(4170), X2-2, X2-8 Linux 11.2.x + 11.2.x +
Benefit/Impact:
It is a requirement that the management network be on a different non-overlapping sub-net than the InfiniBand network and the client access network. This is necessary for better network security, better client access bandwidths, and for Auto Service Request (ASR) to work correctly.
The management network comprises of the eth0 network interface in the database and storage severs, the ILOM network interfaces of the database and storage servers, and the Ethernet management interfaces of the InfiniBand switches and PDUs.
Risk:
Having the management network on the same subnet as the client access network will reduce network security, potentially restrict the client access bandwidth to/from the Database Machine to a single 1GbE link, and will prevent ASR from working correctly.
Action/Repair:
To verify that the management network interface (eth0) is on a separate network from other network interfaces, execute the following command as the “root” userid on both storage and database servers:
 grep -i network /etc/sysconfig/network-scripts/ifcfg* | cut -f5 -d"/" | grep -v "#" 
The output will be similar to:
 ifcfg-bondeth0:NETWORK=10.204.77.0
ifcfg-bondib0:NETWORK=192.168.76.0
ifcfg-eth0:NETWORK=10.204.78.0
ifcfg-lo:NETWORK=127.0.0.0 
The expected result is that the network values are different. If they are not, investigate and correct the condition.

Verify RAID Controller Battery Condition

Priority Added Machine Type OS Type Exadata Version Oracle Version
Critical 03/02/11 X2-2(4170), X2-2, X2-8 Linux 11.2.x + 11.2.x +


Benefit/Impact:
The RAID controller battery loses its ability to support cache over time. Verifying the battery charge and condition allows proactive battery replacement.
The impact of verifying the RAID controller battery condition is minimal.
Risk:
A failed RAID controller battery will put the RAID controller into WriteThrough mode which significantly impacts write I/O performance.
Action/Repair:
Execute the following command on all servers:
 /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep "Full Charge" -A5 | sort | grep Full -A1 
The output will be similar to:
 Full Charge Capacity: 1357 mAh
Max Error: 2 % 
Proactive battery replacement should be performed within 60 days for any batteries that do not meet the following criteria:
1) “Full Charge Capacity” less than or equal to 800 mAh and “Max Error” less than 10%.
Immediately replace any batteries that do not meet the following criteria:
1) “Max Error” is 10% or greater (battery deemed unreliable regardless of “Full Charge Capacity” reading)
2) “Full Charge Capacity” less than 674 mAh regardless of “Max Error” reading
[NOTE: The complete reference guide for LSI disk controller batteries used in Exadata can be found in MOS 1329989.1 (INTERNAL ONLY)]

Verify RAID Controller Battery Temperature

Priority Added Machine Type OS Type Exadata Version Oracle Version
Critical 03/02/11 X2-2(4170), X2-2, X2-8 Linux 11.2.x + 11.2.x +


Benefit/Impact:
Maintaining proper temperature ranges maximizes RAID controller battery life.
The impact of verifying RAID controller battery temperature is minimal.
Risk:
A reported temperature of 60C or higher causes the battery to suspend charging until the temperature drops and shortens the service life of the battery, causing it to fail prematurely and put the RAID controller into WriteThrough mode which significantly impacts write I/O performance.
Action/Repair:
To verify the RAID controller battery temperature, execute the following command on all servers:
 /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep BatteryType; /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep -i temper 
The output will be similar to:
 BatteryType: iBBU08
Temperature: 38 C
 Temperature : OK
 Over Temperature : No 
If the battery temperature is equal to or greater than 55C, investigate and correct the environmental conditions.
NOTE: Replace Battery Module after 3 Year service life assuming the battery temperature has not exceeded 55C. If the temperature has exceeded 55C (battery temp shall not exceed 60C), replace the battery every 2 years.
[NOTE: The complete reference guide for LSI disk controller batteries used in Exadata can be found in MOS 1329989.1 (INTERNAL ONLY)]

Verify Electronic Storage Module (ESM) Lifetime is within Specification

Priority Added Machine Type OS Type Exadata Version Oracle Version
Critical 03/02/11 X2-2(4170), X2-2, X2-8 Linux 11.2.x + 11.2.x +


Benefit/Impact:
The Flash 20 card supports ESM lifetime to enable proactive replacement before failure.
The impact of verifying that the ESM lifetime is within specification is minimal. Replacing an ESM requires a storage server outage. The database and application may remain available if the appropriate grid disks are properly inactivated before and activated after the storage server outage. Refer to MOS Note 1188080.1 and “Shutting Down Exadata Storage Server” in Chapter 7 of “Oracle® Exadata Database Machine Owner's Guide 11g Release 2 (11.2) E13874-14″ for additional details.
Risk:
Failure of the ESM will put the Flash 20 card in WriteThrough mode which has a high impact on performance.
Action/Repair:
Top verify the ESM lifetime value, use the following command on the storage servers:
 for RISER in RISER1/PCIE1 RISER1/PCIE4 RISER2/PCIE2 RISER2/PCIE5; do ipmitool sunoem cli "show /SYS/MB/$RISER/F20CARD/UPTIME"; done | grep value -A4 
The output will be similar to:
value = 3382.350 Hours upper_nonrecov_threshold = 17500.000 Hours upper_critical_threshold = 17200.000 Hours upper_noncritical_threshold = 16800.000 Hours lower_noncritical_threshold = N/A --  
If the “value” reported exceeds the “upper_noncritical_threshold” reported, schedule a replacement of the relevant ESM.
NOTE: There is a bug in ILOM firmware version 3.0.9.19.a which may report “Invalid target…” for “RISER1/PCIE4″. If that happens, consult your site maintenance records to verify the age the ESM Module.

Verify Proper ASM Disk Group Attributes

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
The components in the I/O stack are tightly integrated in Exadata. You must use the proper versions of software both on the storage servers and the database servers. Setting compatible attributes defines available functionality. Setting CELL.SMART_SCAN_CAPABLE enables the offloading of certain query work to the storage servers. Setting AU_SIZE maximizes available disk technology and throughput by reading 4MB of data before performing a disk seek to a new sector location.
There is minimal impact to verify and configure these settings.
Risk:
If these attributes are not set as directed, performance will be sub-optimal.
Action / the Repair:
For the ASM disk group containing Oracle Exadata Storage Server grid disks, verify the attribute settings as follows:
  • COMPATIBLE.ASM attribute is set to the Oracle ASM software version in use.
  • COMPATIBLE.RDBMS attribute is set to the Oracle database software version in use.
  • CELL.SMART_SCAN_CAPABLE attribute is TRUE.
  • AU_SIZE attribute is 4M.
If these attributes are not set properly, correct the condition.

Verify Initialization Parameters

Verify Platform Configuration and Initialization Parameters for Consolidation

Platform Consolidation Considerations

Consolidation Parameters Reference Table
Critical, 08/02/11
Benefit / Impact: Experience and testing has shown that certain database initialization parameter settings should use the following formulas for platform consolidation. By using these formulas as recommended, known problems may be avoided and performance maximized.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact.
Risk: If the operating system and database parameters are not set as recommended, a variety of issues may be encountered that can lead to system and database instability.
Action / Repair: To verify the database initialization parameters, use the following guidance:
The following are important platform level considerations in a consolidated environment.
    • Operating System Configuration Recommendations
      • Hugepages – when set, should equal the sum of shared memory from all databases – see MOS note 401749.1 for precise computations and see MOS 361323.1 for a description of Hugepages. Hugepages is generally required if 'PageTables' in /proc/meminfo is > 2% of physical memory
        • Benefits: Memory savings. Prevent cases of paging and swapping when not configured.
        • Tradeoffs: Set Hugepages correctly and need to be adjusted when another instance is added/dropped or when sga sizes change.
        • As of 11.2.0.2 to disable hugepages on an instance set parameter “use_large_pages=false”
        • Note that as of onecommad version that supports 11.2.0.2 BP9 hugepages is automatically configured upon deployment. The vm.nr_hugepages value may need to be adjusted if an instance memory parameters are changed post initial deployment
      • Amount of locked memory – 75% of physical memory
      • Number of Shared Memory Identifiers – set greater than the number of databases
      • Size of Shared Memory Segments – OS setting for max size = 85% of physical memory
      • Number of semaphores – sum of processes cannot exceed the maximum number of semaphors. On linux, the max can be obtained with cat /proc/sys/kernel/sem | awk '{print $2}'. The number of semaphores on the system should not be so high such that maximizing oracle processes running causes performance problems.
      • Number of semaphores in a semaphore set: The number of semaphores in a semaphore set must be at least as high as the largest value for the processes parameter in all databases. On linux, the number of semaphore sets can be obtained with cat /proc/sys/kernel/sem | awk '{print $4}'
    • Applications with similar SLA requirements are best suited to co-exist in a consolidated environment together. Do not mix mission critical applications with non mission critical applications in the same consolidated environment. Do not mix production and test/dev databases in the same environment.
    • It is possible to “over-subscribe” an application's resource requirements in a consolidated environment as long as the other applications “under-subscribe” at that time. The exception to this is mission critical applications. Do not “over-subscribe” in a consolidated environment that contains mission critical applications. Oracle Resource Manager can be used to manage varying degrees of IO and CPU requirements within one database and across databases. Within one database, Oracle Resource Manager can also manage parallel query processing.

Consolidation Parameters Reference Table

Update 8/2/11
The performance related recommendations provide guidance to maintain highest stability without sacrificing performance. Changing these performance settings can be done after careful performance evaluation and clear understanding of the performance impact.
This parameter consolidation health check table is a general reference for environments. This is not a hard prerequisite for a consolidated environment, rather a guideline used to establish the formulas, maximum values, and notes below. It should suffice for most customers, but if you do not qualify for this formula, the table below can be used as a reference solely for important parameters that must be considered. These values are per node.
Parameter Formula Max Notes
Sga_target / Pga_aggregate_target OLTP: Sum of all sga_target and pga_aggregate_target for all databases < 75% of physical memory DW/BI: Sum of Sga_target + (pga_aggregate_target x 3) < 75% of physical memory 75% of total memory Check aforementioned formula. Exceeding recommended memory usage can potentially cause performance problems. It is important to also ensure that the value computed from the formula is sufficient for the application using the associated database. Pga_aggregate_target setting does not enforce a maximum PGA usage. For some data warehouse and BI applications, 3 X specified target has been observed. For OLTP applications, the spill over is much less. The 25% room provides insurance from any additional spill over and for non-SGA/PGA memory allocation. Process memory and non-memory allocations can add up to be 1-5 MB/process in some cases. Monitoring application and system memory utilizatoin is required to ensure there's sufficient memory throughout your workload/business cycles. Oracle recommends at least 5% memory free at all times. DBM Machine Type: Memory Available : Oracle Memory Target DBM V2 | 72 GB | 54 GB
X2-2 | 96 GB | 60.8 GB
can be expanded to
144 GB
X2-8 | 1 TB | 768 GB
Cpu_count For mission critical applications: Sum of cpu_count of all databases <= 75% X Total CPUs Alternatively: For light-weight CPU usage applications,
sum (CPU_COUNT) <=3 X CPUs
and
CPU intensive applications,
sum(CPU_COUNT) <= Total CPUs
Refer to the formulas in the previous column Rules of thumbs: 1.Leverage CPU_COUNT and instance caging for platform consolidation (eg managing multiple databases within Exadata DBM). They are particularly helpful in preventing processes and jobs from over-consuming target CPU resources. 2. Most light weight applications are idle and consume < 3 CPUs. 3. Large reporting/DW/BI and some OLTP applications (“CPU intensive applications) can easily consume all the CPU so they need to be bounded with instance caging and resource management.
4. For consolidating mission critical applications, recommend not over-subscribing CPU resources to maximize stability and performance consistency.
Exadata DBM | # Cores |# CPUs
DBM V2 | 8 CPUs | 16 CPUs
X2-2 | 12 CPUs | 24 CPUs
X2-8 | 64 CPUs | 128 CPUs
resource_manager_plan NA NA Ensure this is enabled. A good starting value is 'default_plan'
processes Sum of processes of all databases < max Number of semaphores on the system Check formula. Alert if > max Alert if # Active Processes > 4 X CPUs Sum (all processes for all instances) < 21K
Parallel parameters Sum of parallel parameters for all databases should not exceed the recommendation for a single database Parallel_max_servers defined for a single database Check formula.
Db_recovery_file_dest_size Sum of Db_recovery_file_dest_size <= Fast Recovery Area Size of Usable Fast Recovery Area Check formula; Usable FRA space subtracts the space consumed by other files such as online log files in the case of RECO being the only high redundancy diskgroups

Configure the Number of Mounts before a File System check on the Database Servers

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version

N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
A filesystem will be checked (fsck) after a specified number of times it is mounted, typically at reboot time. This maximum number of mounts before a check can be determined via the tune2fs -l command (look for Maximum mount count ) and is -1 by default on the database servers of the database machine, meaning no fsck will be run. If you would like to change this maximum, it can be done with the tune2fs -c command.

Ensure Temporary Tablespace is correctly defined

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version

N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
The temporary tablespace should be
  1. A BigFile Tablespace
  2. Located in DATA or RECO, whichever one is not HIGH redundancy
  3. Sized 32GB Initially
  4. Configured with AutoExtend on at 4GB
  5. Configured with a Max Size defined to limit out of control growth.

Enable portmap service if app requires it

By default, the portmap service is not enabled on the database nodes and it is required for things such as NFS. If needed, enable and start it using the following with dcli across required nodes:
chkconfig –level 345 portmap on
service portmap start

Enable proper services on database nodes to use NFS

In addition to the portmap service previously explained, the nflsock service must also be enabled and running to use NFS on database nodes. Below is a working example, showing the errors that will be encountered with various utilities if not setup correclty. MOS Note 359515.1can also be referenced.
SQL> create tablespace nfs_test_on_nfs datafile '/shared/dscbbg02/users/user/nfs_test/nfs_test_on_nfs.dbf' size 16M;
create tablespace nfs_test_on_nfs datafile '/shared/dscbbg02/users/user/nfs_test/nfs_test_on_nfs.dbf' size 16M
*
ERROR at line 1:
ORA-01119: error in creating database file
'/shared/dscbbg02/users/user/nfs_test/nfs_test_on_nfs.dbf'
ORA-27086: unable to lock file – already in use
Linux-x86_64 Error: 37: No locks available
Additional information: 10
Elapsed: 00:00:30.08
SQL> create tablespace nfs_test datafile '+D/user/datafile/nfs_test.dbf' size 16M;
Tablespace created.
SQL> create table nfs_test(n not null) tablespace nfs_test as select rownum from dual connect by rownum < 1e5 + 1;
Table created.
SQL> alter tablespace nfs_test read only;
Tablespace altered.
SQL> create directory nfs_test as '/shared/dscbbg02/users/user/nfs_test';
Directory created.
SQL> create table nfs_test_x organization external(type oracle_datapump default directory nfs_test location('nfs_test.dp')) as select * from nfs_test;
create table nfs_test_x organization external(type oracle_datapump default directory nfs_test location('nfs_test.dp')) as select * from nfs_test
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEPOPULATE callout
ORA-31641: unable to create dump file
“/shared/dscbbg02/users/user/nfs_test/nfs_test.dp”
ORA-27086: unable to lock file – already in use
Linux-x86_64 Error: 37: No locks available
Additional information: 10
Elapsed: 00:00:31.17
$ expdp userid=scott/tiger parfile=nfs_test.par
Export: Release 11.2.0.1.0 – Production on Wed Jun 2 10:44:51 2010
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 – 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORA-39001: invalid argument value
ORA-39000: bad dump file specification
ORA-31641: unable to create dump file “/shared/dscbbg02/users/user/nfs_test/nfs_test.dmp”
ORA-27086: unable to lock file – already in use
Linux-x86_64 Error: 37: No locks available
Additional information: 10
RMAN works:
$ rman target=/
Recovery Manager: Release 11.2.0.1.0 – Production on Wed Jun 2 10:46:40 2010
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: USER (DBID=3710096878)
RMAN> backup as copy datafile '+D/user/datafile/nfs_test.dbf' format '/shared/dscbbg02/users/user/nfs_test/nfs_test.dbf';
Starting backup at 20100602104700
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=204 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00007 name=+D/user/datafile/nfs_test.dbf
output file name=/shared/dscbbg02/users/user/nfs_test/nfs_test.dbf tag=TAG
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:01
Finished backup at 20100602104702
The solution is to ensure that the nfslock service (aka rpc.statd) is running:
# service nfslock status
rpc.statd (pid 10795) is running… Of course you'd want to enable the service via chkconfig too.

Be Careful when Combining the InfiniBand Network across Clusters and Database Machines

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version

N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
If you want multiple database machines to run as separate environments yet be connected through the InfiniBand network, please be aware of the following items especially when the database machines were deployed as separate environments.
The cell name, cell disk name, grid disk name, ASM diskgroup name, and ASM failgroup name should be unique to help avoid accidental damage during mainenance operations. For example do not have diskgroup DATA on both database machines, call them DATA_DM01 and DATA_DM02.

IP Addresses

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version

N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
All nodes on the InfiniBand network must have a unique IP address. When an Oracle Database Machine is deployed, the default InfiniBand network is 192.168.10.x and we start with 192.168.10.1. If you used the default IP address on each Database Machine, you will have duplicate IP addresses. You must modify the IP addresses on one of the machines before re-configuring the InfiniBand Network.
Ensure any additional equipment ordered from Oracle is marked for an Oracle Exadata Database Machine and the hardware engineer is using the correct Multi-rack Cabling when the physical InfiniBand network is modified.
After the hardware engineer has modified the network, ensure that network is working correctly by running verify topology and infinicheck. Infinicheck will create load on the system and should not be run when there is active workload on the system. Note: Infinicheck will need an input file of all IP addresses on the network.
IE Create a temporary file in /tmp that contains all cells for both database machines. Pass this file to the inifnicheck command using the -c option. Also pass the -b option
#cd /opt/oracle.SupportTools/ibdiagtools
#./verify-topology -t fattree
#./infinicheck -c /tmp/combined_cellip.ora -b

CELLIP.ORA

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version

N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
The cellip.ora file in each database node of each cluster should only reference cells in use by that respective cluster.

Set fast_start_mttr_target=300 to optimize run time performance of writes

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version

N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
The deployment default for fast_start_mttr_target as of 12/22/2010 is 60. To optimize run time performance for write/redo generation intensive workloads, increase fast_start_mttr_target to 300. This will reduce checkpoint writes from DBWR processes, making more room for LGWR IO. The trade-off is that instance recovery will run longer, so if instance recovery is more important than performance, then keep fast_start_mttr_target low. Also keep in mind that an application with inadequately sized redo logs will likley not see an affect from this change due to frequent log switches.
Considerations for a direct writes in a data warehouse type of application: Even though direct operations aren't using the buffer cache, fast_start_mttr_target is very effective at controlling crash recovery time because it ensures adequate checkpointing for the few buffers that are resident (ex: undo segment headers). fast_start_mttr_target should be set to the desired RTO (Recovery Time Objective) while still maintaing performance SLAs.

Enable auditd on database servers

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version

N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
On database servers, when auditing is configured, as is done automatically by applying convenience pack 11.2.2.2.0 or higher, the audit records are logged in /var/log/messages if the auditd service is not running. By logging these messages to /var/log/messages, it may cause more frequent rotation of the messages file which may result in losing historical data more quickly than necessary or desired. By enabling auditd, audit records are sent to /var/log/audit/audit.log which is rotated and managed separately using settings in /etc/audit/audit.conf.
The best practice is to run the auditd service whenever auditing is configured during kernel bootup by setting audit=1 on the kernel line in /boot/grub/grub.conf, as shown here:
 title Trying_LABEL_DBSYS
 root (hd0,0)
 kernel /vmlinuz-2.6.18-194.3.1.0.2.el5 root=LABEL=DBSYS ro bootarea=dbsys loglevel=7 panic=60 debug rhgb audit=1 numa=off console=ttyS0,115200n8 console=tty1 crashkernel=128M@16M
 initrd /initrd-2.6.18-194.3.1.0.2.el5.img 
To configure auditd to be enabled, run the following commands as root on each database server:
 chkconfig auditd on
chkconfig --list auditd
auditd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
service auditd start
service auditd status
auditd (pid 32582) is running... 

Manage ASM Audit File Directory Growth with cron

Priority Added Machine the Type The OS the Type The Exadata the Version Oracle Version
Critical N / A X2-2 (4170), X2-2 X2 - 8 Linux 11.2.x + 11.2.x +
Benefit / Impact:
The audit file destination directories for an ASM instance can grow to contain a very large number of files if they are not regularly maintained. Use the Linux cron(8) utility and the find(1) command to manage the number of files in the audit file destination directories.
The impact of using cron(8) and find(1) to manage the number of files in the audit file destination directories is minimal.
Risk:
Having a very large number of files can cause the file system to run out of free disk space or inodes, or can cause Oracle to run very slowly due to file system directory scaling limits, which can have the appearance that the ASM instance is hanging on startup.
Action / the Repair:
Refer to MOS Note 1298957.1.

Updating database node OEL packages to match the cell

MOS Note 1284070.1 provides a working example of updating the db host OEL packages to match those on the cell.

Verify ASM Instance Database Initialization Parameters

Critical, 06/23/11 Benefit / Impact: Experience and testing has shown that certain ASM initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these ASM initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are specific to the ASM instances. Unless otherwise specified, the value is for both X2-2 and X2-8 Database Machines. The impact of setting these parameters is minimal. Risk: If the ASM initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. Action / Repair: To verify the database initialization parameters, compare the values in your environment against the table below (* = default value):
Parameter Recommended Value Priority Notes
cluster_interconnects Bondib0 IP address for X2-2 Colon delimited Bondib* IP addresses for X2-8 A This is used to avoid the Clusterware HAIP address; For an X2-8, the IP addresses are colon delimited
asm_power_limit 4 A This is Exadata default to mitigate application performance impact during ASM rebalance. Please evaluate application performance impact before using a higher ASM_POWER_LIMIT.
Memory_target 1025M A This avoids issues with 11.2.0.1 to 11.2.0.2 upgrade. This is the default setting for Exadata.
processes For < 10 instances per node, 50 X (DB instances per node + 1) For >= 10 instances per node, {50 * MIN (# db instances per node +1, 11) }+ {10 * MAX (# db instance per node – 10, 0)}
This new formula accommodates the consolidation case where there are a lot of instances per node.
A This avoids issues observed when ASM hits max # of processes. NOTE: “instances” means “non-ASM” instances]
Correct any Priority 1 parameter that is not set as recommended. Evaluate and correct any Priority 2 parameter that is not set as recommended.

Verify Common Instance Database Initialization Parameters

Critical, 08/02/11
Benefit / Impact: Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are common to all database instances. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact. Risk: If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. Action / Repair: To verify the database initialization parameters, compare the values in your environment against the table below (* = default value):
Parameter Recommended Value Priority Notes
cluster_interconnects Bondib0 IP address for X2-2 Colon delimited Bondib* IP addresses for X2-8 A This is used to avoid the Clusterware HAIP address; For an X2-8, the 4 IP addresses are colon delimited
compatible 11.2.0.2 A Need this for new RDBMS and ASM features
processes Set per doc definition [Deployment database uses 1024] 2 Customers should set this per doc definition
log_buffer 134217728 A Check this is not less than 128M. Ensures adequate buffer space for new LGWR transport
db_block_checking False * 2 For higher data corruption detection and prevention, enable this setting but performance impacts vary per workload. Evaluate performance impact. Refer to MOS 1302539.1.
db_block_checksum Typical * A Aids in block corruption detection. Enable for primary and standby databases. Refer to MOS 1302539.1.
db_lost_write_protect Typical A This is important for data block lost write detection and repair. Enable for primary and standby databases. Refer to MOS 1265884.1 and 1302539.1. Refer to section on how to address ORA-752 on the standby database.
control_files Check to ensure control file is in high redundancy disk group and there are two members (copies) of the controlfile. A A high redundancy diskgroup optimizes availability. 1. Control file should be in a high redundancy disk group. 2. Two controlfile members are recommended. If there's one high redundancy disk group, create both controlfile members in the high redundancy disk group.
Otherwise, multiplex the controlfile members across multiple ASM disk groups.
[Modified 8/16/11)
audit_trail Db 2 Security optimization
audit_sys_operations True 2 Security optimization
diagnostics_dest ORACLE_BASE 2 Customers should set this per doc definition
db_recovery_file_dest RECO diskgroup A Check to ensure diskgroup is different from db_file_create_dest
db_recovery_file_dest_size RECO diskgroup size A Check to ensure the size is <= 90% of the RECO diskgroup size
Db_block_size 8192 2 Check that db_block_size=8192. 8192 blocksize is generally recommended for Oracle applications unless a different block size is proven more efficient.
_lm_rcvr_hang_allow_time 140 A This parameter protects from corner case timeouts lower in the stack and prevents instance evictions
_kill_diagnostics_timeout 140 A This parameter protects from corner case timeouts lower in the stack and prevents instance evictions
Global_names True A Security optimization
_file_size_increase_increment 2143289344 A This ensures adequately sized RMAN backup allocations
os_authent_prefix "" A Security optimization NOTE: this is set to a null value, not literally two double quotes.]
sql92_security True A Security optimization
fast_start_mttr_target 300 2 Check that its set and not less than 300. Relaxing aggressive checkpointing prevents outliers and improves performance
parallel_adaptive_multi_user False A Performance impact: PQ degree will be reduced for some queries especially with concurrent workloads.
parallel_execution_message_size 16384 * A Improves PQ performance
Parallel_threads_per_cpu A A Check that this value is at 1. Setting this to account for hyper threading
Log_archive_dest_n LOCATION=Use_db_file_recovery_dest A Do NOT set to a specific diskgroup since fast recovery area auto space management is ignore unless “USE_DB_FILE_RECOVERY_DEST” is explicitly used. This is not the same as setting it to the equivalent diskgroup name from db_recovery_file_dest parameter
filesystemio_options Setall 2 Important to get both async and direct IO for performance
Db_create_online_log_dest_n Check for high redundancy diskgroup A A high redundancy diskgroup optimizes availability. If a high redundancy disk group is available, use the first high ASM redundancy disk group for all your Online Redo Logs or Standby Redo Logs. Use only one log member to minimize performance impact. If a high redundancy disk group isn't available, multiplex redo log members across DATA and RECO ASM disk group for additional protection.
Open_cursors Set per doc definition [Deployment database uses 1000] A Check to ensure this is at least 300
use_large_pages Only A This ensures the entire SGA is stored in hugepages Benefits: Memory savings and reduce paging and swapping Prerequisites: Operating system hugepages setting need to be correctly configured and need to be adjusted whenever another database instance is added or dropped or whenever the sga sizes change. Refer to MOS 401749.1 and 361323.1 to configure HugePages ? .
_enable_NUMA_support FALSE * for X2-2 TRUE * for X2-8 A Enable NUMA support on X2-8 only
Correct any Priority 1 parameter that is not set as recommended. Evaluate and correct any Priority 2 parameter that is not set as recommended.

Verify OLTP Instance Database Initialization Parameters

Note that, except for the case of OLTP applications using an X2-8, all parameters referenced here are for a single database. The reason the OLTP-on-X2-8 case is different is because it is unlikely a customer will put a single OLTP database on an X2-8. For more detail on platform and parameter configuration for consolidation of any application types, refer to the consolidation parameter reference page .
Critical, 06/23/11
Benefit / Impact: Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are specific to OLTP database instances. Unless otherwise specified, the value is for both X2-2 and X2-8 Database Machines. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact. Risk: If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. Action / Repair: To verify the database initialization parameters, compare the values in your environment against the table below (* = default value):
Parameter Recommended Value Priority Notes
parallel_max_servers 240 for X2-2 1280 for X2-8 2 Check to ensure not more than the recommended value. Setting this higher than this recommended value can deplete memory and impact performance.*
parallel_min_servers 0 A Check that it is 0. For OLTP, we don't want wasted resources that won't be use.
sga_target 24G for X2-2 128G for X2-8 A Check to ensure not higher than the recommended value.* For X2-2, the recommended value is for a single database. For X2-8, this number is based on a small set of databases (< 5)
pga_aggregate_target 16G for X2-2 64G for X2-8 A Check to ensure not higher than the recommended value.* For X2-2, the recommended value is for a single database. For X2-8, this number is based on a small set of databases (< 5)
_kgl_cluster_lock_read_mostly True A This improves cursor cache performance for OLTP applications and will be the default in 11.2.0.3
Correct any Priority 1 parameter that is not set as recommended. Evaluate and correct any Priority 2 parameter that is not set as recommended.

Verify DW/BI Instance Database Initialization Parameters

Critical, 06/23/11
Benefit / Impact: Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are specific to DW/BI database instances. Unless otherwise specified, the value is for both X2-2 and X2-8 Database Machines. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact. Risk: If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. Action / Repair: To verify the database initialization parameters, compare the values in your environment against the table below (* = default value):
Parameter Recommended Value Priority Notes
parallel_max_servers 240 for X2-2 1280 for X2-8 A Check to ensure not more than the recommended value. Setting this higher than this recommended value can deplete memory and impact performance.
parallel_min_servers 96 for X2-2 512 for X2-8 A Reduce overhead of allocating and deallocating parallel servers unncessary
parallel_degree_policy Manual 2 Evaluate workload management before deploying; otherwise set to manual by default.
parallel_degree_limit 16 for X2-2 24 for X2-8 2 Check that this is less than parallel_servers_target.
parallel_servers_target 128 for X2-2 512 for X2-8 2 Check to ensure not higher than parallel_max_servers. Setting this higher than this recommended value can deplete memory and impact performance.
sga_target 16G for X2-2 128G for X2-8 A Check to ensure not higher than the recommended value. * Note these values are for a single database and Exadata's default settings.
pga_aggregate_target 16G for X2-2 256G for X2-8 A Check to ensure not higher than the recommended value. Note these values are for a single database and Exadata's default settings.
Correct any Priority 1 parameter that is not set as recommended. Evaluate and correct any Priority 2 parameter that is not set as recommended.

Verify DBFS Instance Database Initialization Parameters

Critical, 05/23/11 Benefit / Impact: Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are specific to the DBFS database instances. Unless otherwise specified, the value is for both X2-2 and X2-8 Database Machines. The impact of setting these parameters is minimal. Risk: If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. Action / Repair: To verify the database initialization parameters, compare the values in your environment against the table below (* = default value):
Parameter Recommended Value Priority Notes
parallel_max_servers 2 2 Check to ensure not more than 2. Setting this higher than this recommended value can deplete memory and impact performance.
parallel_min_servers 0 A Check that it is 0. For OLTP, we don't want wasted resources that won't be use.
sga_target 1.5G A Check to ensure not higher than the recommended value. * Note these values are for a single database
pga_aggregate_target 6.5G A Check to ensure not higher than the recommended value. Note these values are for a single database
db_recovery_file_dest DBFS_DG diskgroup A Check to ensure DBFS_DG * Note this overrides the common database instance parameter reference because of a different requirement for DBFS
db_recovery_file_dest_size 10% of DBFS_DG size A Check to ensure the size is 10% of the DBFS_DG diskgroup size ; No archiving for data staging use cas. * Note this overrides the common database instance parameter reference because of a different requirement for DBFS
Correct any Priority 1 parameter that is not set as recommended. Evaluate and correct any Priority 2 parameter that is not set as recommended.