Sunday, July 25, 2010

Systemwide Tuning using StatsPack Reports

Systemwide Tuning using STATSPACK Reports
Subject: Systemwide Tuning using STATSPACK Reports
Doc ID: Note:228913.1 Type: BULLETIN
Last Revision Date: 13-FEB-2003 Status: PUBLISHED

Systemwide Tuning using StatsPack Reports


PURPOSE
This article is a reference to understand the output generated by the STATSPACK utility. Since performance tuning is a very broad area this document only provide tuning advice in very specific areas. Several documents are available in Metalink to resolve contention in specific resources. The following resources are available to find specific documentation related to a performance topic:

Metalink Database Performance Technical Library
Oracle9i Database Performance Tuning Guide and Reference
OTN Performance Technical Library

CONTENT
Introduction
Summary Information
Instance cache information
Load profile Information
Instance Efficiency Ratios
Top 5 Events section
Cluster Statistics
Foreground Wait Events
Background Wait Events
Notes Regarding Waitevents
SQL Information
Statistics
IO Activity
Buffer cache Activity Information
Instance Recovery Statistics
PGA Memory Statistics
Enqueue Activity
Undo (Rollback) Information
Latch Information
Dictionary Cache Statistics
Library Cache Statistics
SGA Memory Summary
SGA Memory Detail
Init.ora Parameters Summary



Introduction

StatsPack was created in response to a need for more relevant and more extensive statistical reporting beyond what was available via UTLBSTAT/UTLESTAT reports. Further, this information can be stored permanently in the database instance itself so that historical data is always available for comparison and diagnosis.

Statspack has been available since version 816, but can be installed on 806 and above. Snapshots created using older versions of statspack can usually be read using newer versions of Statspack although the newer features will not be available.

See the following notes for information on installing, configuring snapshots, and generating reports:

- Installing and Configuring StatsPack Package
- Gathering a StatsPack snapshot
- Creating a StatsPack performance report
- FAQ- StatsPack Complete Reference

Timed_statistics must be set to true prior to the creation of a snapshot. If it is not, the data within statspack will not be relevant. You can tell if timed_statistics was not set by looking at the total times columns in the report. If these are zero then timed_statistics was not set.

Snapshots during which the instance was recycled will not contain accurate information and should not be included in a statspack report.

In general, we suggest that snapshots intervals be 15 minutes in length. This allows fine-grained reporting when hangs are suspected/detected. The snapshots can also be combined into hourly reports for general performance tuning.

When a value is too large for the statspack field it will be represented by a series of pound signs such as #######. Should this occur and you need to see the value in the field you will need to decrease the number of snapshots in the report until the field can be read. Should there only be one snapshot in the report, then you will need to decrease the snapshot interval.

Profiles created using statspack information are quite helpful in determining long-term trends such as load increases, usage trends, resource consumption, latch activity, etc. It is especially important that a DBA know these things and be able to demonstrate changes in them that necessitate hardware improvements and load balancing policies. This document will describe the main sections of an statspack report, which will help to understand what information is available to diagnose and resolve performance tuning problems. Some of the sections of the statspack report may contain different information based on the Statspack release that was used to generate the report. This document will also indicate these changes for the different sections.



Summary Information

The summary information begins with the identification of the database on which the statspack report was run along with the time interval of the statspack report. Here is the 8i instance information:

STATSPACK report for

DB Name DB Id Instance Inst Num Release OPS Host
------------ ----------- ------------ -------- ----------- --- ------------
PHS2 975244035 phs2 2 8.1.7.2.0 YES leo2

Snap Id Snap Time Sessions
------- ------------------ --------
Begin Snap: 100 03-Jan-02 08:00:01 #######
End Snap: 104 03-Jan-02 09:00:01 #######
Elapsed: 60.00 (mins)


The database name, id, instance name, instance number if OPS is being utilized, Oracle binary release information, host name and snapshot information are provided.

Note that here the number of sessions during the snapshot was too large for the sessions field and so the overflow symbol is displayed.

Here is an example of an 806 instance using statspack:

STATSPACK report for

DB Name DB Id Instance Inst Num Release OPS Host
---------- ----------- ---------- -------- ---------- ---- ----------
GLOVP 1409723819 glovp 1 8.0.6.1.0 NO shiver

Snap Length
Start Id End Id Start Time End Time (Minutes)
-------- -------- -------------------- -------------------- -----------
454 455 07-Jan-03 05:28:20 07-Jan-03 06:07:53 39.55

Here is the 9i instance information. Note that the OPS column is now entitled 'Cluster' to accommodate the newer Real Applications Cluster (RAC) terminology and that the Cursors/Session and Comment columns have been added.

STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
ETSPRD7 1415901831 etsprd7a 1 9.2.0.2.0 YES tsonode1

Snap Id Snap Time Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 20 03-Jan-03 00:00:05 ####### .0
End Snap: 21 03-Jan-03 01:00:05 ####### .0
Elapsed: 60.00 (mins)



Instance Workload Information

Every statspack report start with a section that describes the instance's workload profile and instance metrics that may help to determine the instance efficiency.

- Instance cache information:

In the 8i report the buffer cache size can be determined by multiplying the db_block_buffers by the db_block_size.

Cache Sizes
~~~~~~~~~~~
db_block_buffers: 6400 log_buffer: 104857600
db_block_size: 32768 shared_pool_size: 150000000



In 9i this has been done for you. Std Block size indicates the primary block size of the instance.
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 704M Std Block Size: 8K
Shared Pool Size: 256M Log Buffer: 1,024K

Note that the buffer cache size is that of the standard buffer cache. If you have multiple buffer caches, you will need to calculate the others separately.

- Load profile Information:

The load profile information is next. It is identical in both 8i and 9i.

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 351,530.67 7,007.37
Logical reads: 5,449.81 108.64
Block changes: 1,042.0 8 20.77
Physical reads: 37.71 0.75
Physical writes: 134.68 2.68
User calls: 1,254.72 25.01
Parses: 4.92 0.10
Hard parses: 0.02 0.00
Sorts: 15.73 0.31
Logons: -0.01 0.00
Executes: 473.73 9.44
Transactions: 50.17

% Blocks changed per Read: 19.12 Recursive Call %: 4.71
Rollback per transaction %: 2.24 Rows per Sort: 20.91

Where:


. Redo size: This is the amount of redo generated during this report.

. Logical Reads: This is calculated as Consistent Gets + DB Block Gets = Logical Reads

. Block changes: The number of blocks modified during the sample interval

. Physical Reads: The number of requests for a block that caused a physical I/O.

. Physical Writes: The number of physical writes issued.

. User Calls: The number of queries generated

. Parses: Total of all parses: both hard and soft

. Hard Parses: Those parses requiring a completely new parse of the SQL statement. These consume both latches and shared pool area.

. Soft Parses: Not listed but derived by subtracting the hard parses from parses. A soft parse reuses a previous hard parse and hence consumes far fewer resources.

. Sorts, Logons, Executes and Transactions are all self explanatory

- Instance Efficiency Ratios:

Hit ratios are calculations that may provide information regarding different structures and operations in the Oracle instance. Database tuning never must be driven by hit ratios. They only provide additional information to understand how the instance is operating. For example, in a DSS systems a low cache hit ratio may be acceptable due the amount of recycling needed due the large volume of data accesed. So if you increase the size of the buffer cache based on this number, the corrective action may not take affect and you may be wasting memory resources.

See - THE COE PERFORMANCE METHOD , for further reference on how to approach a performance tuning problem.

This section is identical in 8i and 9i.

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.99 Redo NoWait %: 100.00
Buffer Hit %: -45.57 In-memory Sort %: 97.55
Library Hit %: 99.89 Soft Parse %: 99.72
Execute to Parse %: -1.75 Latch Hit %: 99.11
Parse CPU to Parse Elapsd %: 52.66 % Non-Parse CPU: 99.99

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 42.07 43.53
% SQL with executions>1: 73.79 75.08
% Memory for SQL w/exec>1: 76.93 77.64


It is possible for both the 'buffer hit ratio' and the 'execute to parse' ratios to be negative. In the case of the buffer hit ration, the buffer cache is too small and the data in is being aged out before it can be used so it must be retrieved again. This is a form of thrashing which degrades performance immensely.

The execute to parse ratio can be negative when the number of parses is larger than the number of executions. The Execute to Parse ratio is determined by the following formula:


100 * (1 - Parses/Executions) = Execute to Parse
Here this becomes:
100 * (1 - 42,757 / 42,023 ) = 100 * (1 - 1.0175) = 100* -0.0175 = -1.75

This can be caused by the snapshot boundary occurring during a period of high parsing so that the executions have not occurred before the end of the snapshot. Check the next snapshot to see if there are enough executes to account for the parses in this report.
Another cause for a negative execute to parse ratio is if the shared pool is too small and queries are aging out of the shared pool and need to be reparsed. This is another form of thrashing which also degrades performance tremendously.

- Top 5 Events section:

This section shows the Top 5 timed events that must be considered to focus the tuning efforts. Before Oracle 9.2 this section was called "Top 5 Wait Events". It was renamed in Oracle 9.2 to "Top 5 Timed Events" to include the "CPU Time" based on the 'CPU used by this session'. This information will allow you to determine SQL tuning problems.

For further see the Statspack readme file called $ORACLE_HOME/rdbms/admin/spdoc.txt. These events are particularly useful in determining which sections to view next. For instance if there are fairly high waits on latch free or one of the other latches you might want to examine the latch sections first. On the other hand, if the db file read waits events seem abnormally high, you might want to look at the file io section first.

Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
-------------------------------------------- ------------ ------------ -------
db file sequential read 12,131,221 173,910 58.04
db file scattered read 93,310 86,884 29.00
log file sync 18,629 9,033 3.01
log file parallel write 18,559 8,449 2.82
buffer busy waits 304,461 7,958 2.66



Notice that in Oracle 9.2 references are made "Elapsed Time" rather than to "Wait Time". Also the "CPU Time" is included as part of the Top events section.
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
log file sync 3,223,927 32,481 64.05
CPU time 7,121 14.04
global cache open x 517,153 3,130 6.17
log file parallel write 985,732 2,613 5.15
KJC: Wait for msg sends to complete 568,061 1,794 3.54
-------------------------------------------------------------

Note that db file scattered and sequential read are generally the top wait events when the instance is tuned well and not OPS/RAC. Wait Events

Cluster Statistics

In Oracle 9i with the introduction of real Application Clusters, several sections were added to the statspack report to show information related to cluster database environment. The following sections are now available in statspack to monitor RAC environments and are only displayed when a cluster is detected.

Oracle 9.0 and 9.1 Cluster Statistics :

Global Lock Statistics
----------------------
Ave global lock get time (ms): 0.3
Ave global lock convert time (ms): 0.0
Ratio of global lock gets vs global lock releases: 1.0

Global cache statistics
-----------------------
Global cache hit %: 0.3
Ave global cache get time (ms): 1.7
Ave global cache convert time (ms): 3.1

Cache fusion statistics
-----------------------
Ave time to process CR block request (ms): 0.2
Ave receive time for CR block (ms): 1.6
Ave build time for CR block (ms): 0.1
Ave flush time for CR block (ms): 0.0
Ave send time for CR block (ms): 0.1

Ave time to process current block request (ms): 0.2
Ave receive time for current block (ms): 2.5
Ave pin time for current block (ms): 0.0
Ave flush time for current block (ms): 0.0
Ave send time for current block (ms): 0.1

GCS and GES statistics
----------------------
Ave GCS message process time (ms): 0.1
Ave GES message process time (ms): 0.1
% of direct sent messages: 59.5
% of indirect sent messages: 40.3
% of flow controlled messages: 0.1
% of GCS messages received by LMD: 96.4
% of GES messages received by LMD: 3.6
% of blocked converts: 10.3
Ave number of logical side channel messages: 33.8
Ave number of logical recovery claim messages:

Oracle 9.2 Cluster Statistics :

Global Cache Service - Workload Characteristics
-----------------------------------------------
Ave global cache get time (ms): 4.6
Ave global cache convert time (ms): 20.2

Ave build time for CR block (ms): 0.0
Ave flush time for CR block (ms): 0.6
Ave send time for CR block (ms): 0.1
Ave time to process CR block request (ms): 0.7
Ave receive time for CR block (ms): 0.9

Ave pin time for current block (ms): 2.9
Ave flush time for current block (ms): 0.1
Ave send time for current block (ms): 0.1
Ave time to process current block request (ms): 3.1
Ave receive time for current block (ms): 7.2

Global cache hit ratio: 0.8
Ratio of current block defers: 0.0
% of messages sent for buffer gets: 0.5
% of remote buffer gets: 0.4
Ratio of I/O for coherence: 12.3
Ratio of local vs remote work: 1.2
Ratio of fusion vs physical writes: 0.0

Global Enqueue Service Statistics
---------------------------------
Ave global lock get time (ms): 0.2
Ave global lock convert time (ms): 2.3
Ratio of global lock gets vs global lock releases: 1.0

GCS and GES Messaging statistics
--------------------------------
Ave message sent queue time (ms): 0.1
Ave message sent queue time on ksxp (ms): 12.3
Ave message received queue time (ms): 0.0
Ave GCS message process time (ms): 0.1
Ave GES message process time (ms): 0.0
% of direct sent messages: 81.2
% of indirect sent messages: 13.1
% of flow controlled messages: 5.7

In all the Oracle9i release, a separate section shows the actual value for all the cluster statistics:
GES Statistics for DB: FUSION Instance: ecfsc2 Snaps: 161 -162

Statistic Total per Second per Trans
--------------------------------- ---------------- ------------ ------------
dynamically allocated gcs resourc 0 0.0 0.0
dynamically allocated gcs shadows 0 0.0 0.0
flow control messages received 0 0.0 0.0
flow control messages sent 10 0.0 0.0
gcs ast xid 30 0.0 0.0
gcs blocked converts 531,572 147.7 0.2
gcs blocked cr converts 55,739 15.5 0.0
gcs compatible basts 45 0.0 0.0
gcs compatible cr basts (global) 6,183 1.7 0.0
....

For further reference on tuning RAC clustered instances please refer to the documentation manual called Oracle9i Real Application Clusters Deployment and Performance

Wait Events Information

The following section will describe in detail most of the sections provided in a statspack report.

- Foreground Wait Events:

Foreground wait events are those associated with a session or client process waiting for a resource. The 8i version looks like this:

Wait Events for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> ordered by wait time desc, waits desc (idle events last)
Avg
Total Wait wait Waits
Event Waits Timeouts Time (cs) (ms) /txn
---------------------------- ------------ ---------- ----------- ------ ------
PX Deq: Execution Msg 15,287 6,927 1,457,570 953 694.9
enqueue 30,367 28,591 737,906 243 ######
direct path read 45,484 0 352,127 77 ######
PX Deq: Table Q Normal 7,185 811 241,532 336 326.6
PX Deq: Execute Reply 13,925 712 194,202 139 633.0
....



The 9.2 version is much the same but has different time intervals in the header.
Wait Events for DB: FUSION Instance: ecfsc2 Snaps: 161 -162
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync 3,223,927 1 32,481 10 1.0
global cache open x 517,153 777 3,130 6 0.2
log file parallel write 985,732 0 2,613 3 0.3
KJC: Wait for msg sends to c 568,061 34,529 1,794 3 0.2

- Background Wait Events:

Background wait events are those not associated with a client process. They indicate waits encountered by system and non-system processes. The output is the same for all the Oracle releases.

Background Wait Events for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> ordered by wait time desc, waits desc (idle events last)
Avg
Total Wait wait Waits
Event Waits Timeouts Time (cs) (ms) /txn
---------------------------- ------------ ---------- ----------- ------ ------
latch free 88,578 32,522 18,341 2 ######
enqueue 319 230 5,932 186 14.5
row cache lock 4,941 0 2,307 5 224.6
control file parallel write 1,172 0 332 3 53.3
db file parallel write 176 0 67 4 8.0
log file parallel write 315 0 65 2 14.3
db file scattered read 137 0 62 5 6.2
LGWR wait for redo copy 66 10 47 7 3.0


Examples of background system processes are LGWR and DBWR. An example of a non-system background process would be a parallel query slave.
Note that it is possible for a wait event to appear in both the foreground and background wait events statistics. Examples of this are the enqueue and latch free events.

The idle wait events appear at the bottom of both sections and can generally safely be ignored. Typically these type of events keep record of the time while the clien is connected to the database but not requests are being made to the server.

- Notes Regarding Waitevents:

- The idle wait events associated with pipes are often a major source of concern for some DBAs. Pipe gets and waits are entirely application dependent. To tune these events you must tune the application generating them. High pipe gets and waits can affect the library cache latch performance. Rule out all other possible causes of library cache contention prior to focusing on pipe waits as it is very expensive for the client to tune their application.A list of most wait events used by the RDBMS kernel can be found in Appendix A of the Oracle Reference manual for the version being used.

Some wait events to watch:
- global cache cr request: (OPS) This wait event shows the amount of time that an instance has waited for a requested data block for a consistent read and the transferred block has not yet arrived at the requesting instance. See Note 157766.1 'Sessions Wait Forever for 'global cache cr request' Wait Event in OPS or RAC'. In some cases the 'global cache cr request' wait event may be perfectly normal if large buffer caches are used and the same data is being accessed concurrently on multiple instances. In a perfectly tuned, non-OPS/RAC database, I/O wait events would be the top wait events but since we are avoiding I/O's with RAC and OPS the 'global cache cr request' wait event often takes the place of I/O wait events.
- Buffer busy waits, write complete waits, db file parallel writes and enqueue waits: If all of these are in the top wait events the client may be experiencing disk saturation. See Note 155971.1 Resolving Intense and "Random" Buffer Busy Wait Performance Problems for troubleshooting tips.
- log file switch, log file sync or log switch/archive: If the waits on these events appears excessive check for checkpoint tuning issues. See Note 147468.1 Checkpoint Tuning and Troubleshooting Guide.
- write complete waits, free buffer waits or buffer busy waits: If any of these wait events is high, the buffer cache may need tuning. See Note 62172.1 'Understanding and Tuning Buffer Cache and DBWR in Oracle7, Oracle8, and Oracle8i'
- latch free: If high, the latch free wait event indicates that there was contention on one or more of the primary latches used by the instance. Look at the latch sections to diagnose and resolve this problem.

SQL Information

The SQL that is stored in the shared pool SQL area (Library cache) is reported to the user via three different formats in 8i. Each has their own usefulness.



. SQL ordered by Buffer Gets
. SQL ordered by Physical Reads
. SQL ordered by Executions
9i has an additional section:

. SQL ordered by Parse Calls

- SQL ordered by Gets:

SQL ordered by Gets for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> End Buffer Gets Threshold: 10000
-> Note that resources reported for PL/SQL includes the resources used by
all SQL statements called within the PL/SQL code. As individual SQL
statements are also reported, it is possible and valid for the summed
total % to exceed 100

Buffer Gets Executions Gets per Exec % Total Hash Value
--------------- ------------ -------------- ------- ------------
198,924 37,944 5.2 41.7 2913840444
select length from fet$ where file#=:1 and block#=:2 and ts#=:3

111,384 7 15,912.0 23.4 1714733582
select f.file#, f.block#, f.ts#, f.length from fet$ f, ts$ t whe
re t.ts#=f.ts# and t.dflextpct!=0 and t.bitmapped=0

105,365 16 6,585.3 22.1 4111567099
CREATE TABLE "PHASE".:Q3236003("PID","CAMPAIGN","SCPOS1","SCPOS2
","SCPOS3","SCPOS4","SCPOS5","SCPOS6","SCPOS7","SCPOS8","SCPOS9"
,"SCPOS10","SCPOS11","SCPOS12","SCPOS13","SCPOS14","SCPOS15","SC
POS16","SCPOS17","MCELL","MAILID","RSPPROD","STATTAG","RSPREF","
RSPCRED","MAILDATE","RSPTDATE","BDATE","STATE","ZIP","INCOME","R
....



This section reports the contents of the SQL area ordered by the number of buffer gets and can be used to identify CPU Heavy SQL.
- Many DBAs feel that if the data is already contained within the buffer cache the query should be efficient. This could not be further from the truth. Retrieving more data than needed, even from the buffer cache, requires CPU cycles and interprocess IO. Generally speaking, the cost of physical IO is not 10,000 times more expensive. It actually is in the neighborhood of 67 times and actually almost zero if the data is stored in the UNIX buffer cache.

- The statements of interest are those with a large number of gets per execution especially if the number of executions is high.

- High buffer gets generally correlates with heavy CPU usage.

- SQL ordered by Physical Reads:

SQL ordered by Reads for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> End Disk Reads Threshold: 1000

Physical Reads Executions Reads per Exec % Total Hash Value
--------------- ------------ -------------- ------- ------------
98,401 16 6,150.1 14.2 3004232054
SELECT C0 C0 FROM (SELECT C0 C0 FROM (SELECT /*+ NO_EXPAND ROWID
(A1) */ A1."PID" C0 FROM "PHASE"."P0201F00_PLAT_MCOP_TB" PX_GRAN
ULE(0, BLOCK_RANGE, DYNAMIC) A1) UNION ALL SELECT C0 C0 FROM (S
ELECT /*+ NO_EXPAND ROWID(A2) */ A2."PID" C0 FROM "PHASE"."P0201
F00_UCS_MCOP_TB" PX_GRANULE(1, BLOCK_RANGE, DYNAMIC) A2) UNION

50,836 32 1,588.6 7.3 943504307
SELECT /*+ Q3263000 NO_EXPAND ROWID(A1) */ A1."PID" C0 FROM "PHA
SE"."P9999F00_NEW_RESP_HIST_TB" PX_GRANULE(0, BLOCK_RANGE, DYNAM
IC) A1 WHERE A1."CAMPAIGN"='200109M' AND A1."RSPPROD"='B'

50,836 32 1,588.6 7.3 3571039650
SELECT /*+ Q3261000 NO_EXPAND ROWID(A1) */ A1."PID" C0 FROM "PHA
SE"."P9999F00_NEW_RESP_HIST_TB" PX_GRANULE(0, BLOCK_RANGE, DYNAM
IC) A1 WHERE A1."CAMPAIGN"='200109M' AND A1."RSPPROD"='P'
....


This section reports the contents of the SQL area ordered by the number of reads from the data files and can be used to identify SQL causing IO bottlenecks which consume the following resources.
- CPU time needed to fetch unnecessary data.
- File IO resources to fetch unnecessary data.

- Buffer resources to hold unnecessary data.

- Additional CPU time to process the query once the data is retrieved into the buffer.

- SQL ordered by Executions:

SQL ordered by Executions for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> End Executions Threshold: 100

Executions Rows Processed Rows per Exec Hash Value
------------ ---------------- ---------------- ------------
37,944 16,700 0.4 2913840444
select length from fet$ where file#=:1 and block#=:2 and ts#=:3

304 1,219 4.0 904892542
select file#,block#,length from fet$ where length>=:1 and
ts#=:2 and file#=:3

295 0 0.0 313510536
select job from sys.job$ where next_date < sysdate and (field1
= :1 or (field1 = 0 and 'Y' = :2)) order by next_date, job

273 273 1.0 3313905788
insert into col$(obj#,name,intcol#,segcol#,type#,length,precisio
n#,scale,null$,offset,fixedstorage,segcollength,deflength,defaul
t$,col#,property,charsetid,charsetform,spare1,spare2)values(:1,:
2,:3,:4,:5,:6,decode(:7,0,null,:7),decode(:5,2,decode(:8,-127/*M
AXSB1MINAL*/,null,:8),178,:8,179,:8,180,:8,181,:8,182,:8,183,:8,
....

This section reports the contents of the SQL area ordered by the number of query executions. It is primarily useful in identifying the most frequently used SQL within the database so that they can be monitored for efficiency. Generally speaking, a small performance increase on a frequently used query provides greater gains than a moderate performance increase on an infrequently used query

- SQL ordered by Parse Calls (9i Only):

SQL ordered by Parse Calls for DB: S901 Instance: S901 Snaps: 2 -3
-> End Parse Calls Threshold: 1000
% Total
Parse Calls Executions Parses Hash Value
------------ ------------ -------- ----------
295 295 0.48 1705880752
select file# from file$ where ts#=:1

60 60 0.10 3759542639
BEGIN DBMS_APPLICATION_INFO.SET_MODULE(:1,NULL); END;

33 2,222 0.05 3615375148
COMMIT

1 200,000 0.00 119792462
INSERT into free.freelist_test values (:b2||'J'||:b1,'AAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAA')

....

This section shows the number of times a statement was parsed as compared to the number of times it was executed. One to one parse/executions may indicate that:

- Bind variables are not being used.

- On RDBMS version 8172 and higher the init.ora parameter session_cached_cursors was not set in the init.ora (100 is usually the suggested starting value). See enhancement bug 1589185 for an explanation of the change that shifts some of the load from the library cache to the user session cache.

- The shared pool may be too small and the parse is not being retained long enough for multiple executions.

- cursor_sharing is set to exact (this should NOT be changed without considerable testing on the part of the client).

Statistics

The statistics section shows the overall database statistics. These are the statistics that the summary information is derived from. A list of the statistics maintained by the RDBMS kernel can be found in Appendix C of the Oracle Reference manual for the version being utilized. The format is identical from 8i to 9i.

Instance Activity Stats for DB: PHS2 Instance: phs2 Snaps: 100 -104

Statistic Total per Second per Trans
--------------------------------- ---------------- ------------ ------------
CPU used by this session 84,161 23.4 3,825.5
CPU used when call started 196,346 54.5 8,924.8
CR blocks created 709 0.2 32.2
DBWR buffers scanned 0 0.0 0.0
DBWR checkpoint buffers written 245 0.1 11.1
DBWR checkpoints 33 0.0 1.5
DBWR cross instance writes 93 0.0 4.2
DBWR free buffers found 0 0.0 0.0
....

Of particular interest are the following statistics.


- CPU USED BY THIS SESSION, PARSE TIME CPU or RECURSIVE CPU USAGE: These numbers are useful to diagnose CPU saturation on the system (usually a query tuning issue). The formula to calculate the CPU usage breakdown is:
Service (CPU) Time = other CPU + parse time CPU
Other CPU = "CPU used by this session" - parse time CPU
Some releases do not correctly store this data and can show huge numbers. The rule to decide if you can use these metrics is:

Trustworthy if :
(db version>= 8.1.7.2 and 9.0.1)
OR ((db version >= 9.0.1.1) = 8.0.6.0 AND not using job_queue_processes AND CPU_PER_CALL = default)

- DBWR BUFFERS SCANNED: the number of buffers looked at when scanning the lru portion of the buffer cache for dirty buffers to make clean. Divide by "dbwr lru scans" to find the average number of buffers scanned. This count includes both dirty and clean buffers. The average buffers scanned may be different from the average scan depth due to write batches filling up before a scan is complete. Note that this includes scans for reasons other than make free buffer requests.
- DBWR CHECKPOINTS: the number of checkpoints messages that were sent to DBWR and not necessarily the total number of actual checkpoints that took place. During a checkpoint there is a slight decrease in performance since data blocks are being written to disk and that causes I/O. If the number of checkpoints is reduced, the performance of normal database operations improve but recovery after instance failure is slower.
- DBWR TIMEOUTS: the number of timeouts when DBWR had been idle since the last timeout. These are the times that DBWR looked for buffers to idle write.
- DIRTY BUFFERS INSPECTED: the number of times a foreground encountered a dirty buffer which had aged out through the lru queue, when foreground is looking for a buffer to reuse. This should be zero if DBWR is keeping up with foregrounds.
- FREE BUFFER INSPECTED: the number of buffers skipped over from the end of the LRU queue in order to find a free buffer. The difference between this and "dirty buffers inspected" is the number of buffers that could not be used because they were busy or needed to be written after rapid aging out. They may have a user, a waiter, or being read/written.
- RECURSIVE CALLS: Recursive calls occur because of cache misses and segment extension. In general if recursive calls is greater than 30 per process, the data dictionary cache should be optimized and segments should be rebuilt with storage clauses that have few large extents. Segments include tables, indexes, rollback segment, and temporary segments.
NOTE: PL/SQL can generate extra recursive calls which may be unavoidable.
- REDO BUFFER ALLOCATION RETRIES: total number of retries necessary to allocate space in the redo buffer. Retries are needed because either the redo writer has gotten behind, or because an event (such as log switch) is occurring
- REDO LOG SPACE REQUESTS: indicates how many times a user process waited for space in the redo log buffer. Try increasing the init.ora parameter LOG_BUFFER so that zero Redo Log Space Requests are made.
- REDO WASTAGE: Number of bytes "wasted" because redo blocks needed to be written before they are completely full. Early writing may be needed to commit transactions, to be able to write a database buffer, or to switch logs
- SUMMED DIRTY QUEUE LENGTH: the sum of the lruw queue length after every write request completes. (divide by write requests to get average queue length after write completion)
- TABLE FETCH BY ROWID: the number of rows that were accessed by a rowid. This includes rows that were accessed using an index and rows that were accessed using the statement where rowid = 'xxxxxxxx.xxxx.xxxx'.
- TABLE FETCH BY CONTINUED ROW: indicates the number of rows that are chained to another block. In some cases (i.e. tables with long columns) this is unavoidable, but the ANALYZE table command should be used to further investigate the chaining, and where possible, should be eliminated by rebuilding the table.
- Table Scans (long tables) is the total number of full table scans performed on tables with more than 5 database blocks. If the number of full table scans is high the application should be tuned to effectively use Oracle indexes. Indexes, if they exist, should be used on long tables if less than 10-20% (depending on parameter settings and CPU count) of the rows from the table are returned. If this is not the case, check the db_file_multiblock_read_count parameter setting. It may be too high. You may also need to tweak optimizer_index_caching and optimizer_index_cost_adj.
- Table Scans (short tables) is the number of full table scans performed on tables with less than 5 database blocks. It is optimal to perform full table scans on short tables rather than using indexes.
IO Activity

IO ActivityInput/Output (IO) statistics for the instance are listed in the following sections/formats:
- Tablespace IO Stats for DB: Ordered by total IO per tablespace.
- File IO Stats for DB: Ordered alphabetically by tablespace, filename.

In Oracle 8.1.7 many other columns were included as follow:
- Avg. Read / Second
- Avg. Blocks / Read
- Avg. Writes / Second
- Buffer Waits
- Avg. Buffer Waits / Milisecond

- Tablespace IO Stats

Tablespace IO Stats for DB: PHS2 Instance: phs2 Snaps: 100 -104
->ordered by IOs (Reads + Writes) desc

Tablespace
------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
PHASE_WORK_TS
138,361 38 0.0 3.9 6,859 2 0 0.0
OFFER_HISTORY_TS
24,714 7 0.0 4.0 0 0 0 0.0
ATTR1_TS
7,823 2 0.0 4.0 0 0 0 0.0
TEMP
886 0 0.0 20.1 1,147 0 0 0.0
SYSTEM
184 0 3.9 2.8 56 0 18 3.3


- File IO Stats
File IO Stats for DB: PHS2 Instance: phs2 Snaps: 100 -104
->ordered by Tablespace, File

Tablespace Filename
------------------------ ----------------------------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
ATTR1_TS /oradata/phs2/hsz16/attr1_01.dbf
398 0 0.0 3.9 0 0 0
/oradata/phs2/hsz17/attr1_02.dbf
400 0 0.0 4.0 0 0 0
/oradata/phs2/hsz18/attr1_03.dbf
398 0 0.0 4.0 0 0 0
/oradata/phs2/hsz19/attr1_04.dbf
480 0 0.0 4.0 0 0 0
....


Note that Oracle considers average read times of greater than 20 ms unacceptable. If a datafile consistently has average read times of 20 ms or greater then:
- The queries against the contents of the owning tablespace should be examined and tuned so that less data is retrieved.
- If the tablespace contains indexes, another option is to compress the indexes so that they require less space and hence, less IO.
- The contents of that datafile should be redistributed across several disks/logical volumes to more easily accommodate the load.
- If the disk layout seems optimal, check the disk controller layout. It may be that the datafiles need to be distributed across more disk sets.
Buffer cache Activity Information

The buffer statistics are comprised of two sections:

- Buffer Pool Statistics:

This section can have multiple entries if multiple buffer pools are allocated. This section is in both 8i and 9i and is identical in both.

Buffer Pool Statistics for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> Pools D: default pool, K: keep pool, R: recycle pool

Free Write Buffer
Buffer Consistent Physical Physical Buffer Complete Busy
P Gets Gets Reads Writes Waits Waits Waits
- ----------- ------------- ----------- ---------- ------- -------- ----------
D 4,167 362,492 3,091 413 0 0 60




A baseline of the database's buffer pool statistics should be available to compare with the current statspack buffer pool statistics. A change in that pattern unaccounted for by a change in workload should be a cause for concern.
- Buffer Wait Statistics:

This section shows a breakdown of each type of object waited for. This section follows the Instance Recovery Stats for DB in 9i and is identical to that in 8i.

Buffer wait Statistics for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> ordered by wait time desc, waits desc

Tot Wait Avg
Class Waits Time (cs) Time (cs)
------------------ ----------- ---------- ---------
undo header 42 21 1
data block 18 6 0


The above shows no real contention. Typically, when there is buffer contention, it is due to data block contention with large average wait times, like the example below:
Buffer wait Statistics for DB: GLOVP Instance: glovp Snaps: 454 - 455

Tot Wait Avg
Class Waits Time (cs) Time (cs)
------------------ ----------- ---------- ---------
data block 9,698 17,097 2
undo block 210 1,225 6
segment header 259 367 1
undo header 259 366 1

Instance Recovery Statistics

This section was added in 9i and is useful for monitoring the recovery and redo information.

Instance Recovery Stats for DB: S901 Instance: S901 Snaps: 2 -3
-> B: Begin snapshot, E: End snapshot

Targt Estd Log File Log Ckpt Log Ckpt
MTTR MTTR Recovery Actual Target Size Timeout Interval
(s) (s) Estd IOs Redo Blks Redo Blks Redo Blks Redo Blks Redo Blks
- ----- ----- ---------- ---------- ---------- ---------- ---------- ----------
B 15 8 8024 21033 20691 92160 20691 ##########
E 15 11 8024 77248 92160 92160 285818 ##########

PGA Memory Statistics


This section was added in 9i and which helps when using the new model to allocate PGA in Oracle9i using PGA_AGGREGATE_TARGET.
PGA Memory Stats for DB: S901 Instance: S901 Snaps: 2 -3
-> WorkArea (W/A) memory is used for: sort, bitmap merge, and hash join ops

Statistic Begin (M) End (M) % Diff
----------------------------------- ---------------- ---------------- ----------
maximum PGA allocated 10.405 10.405 .00
total PGA allocated 7.201 7.285 1.17
total PGA inuse 6.681 6.684 .04


This section is particularly useful when monitoring session memory usage on Windows servers.
Enqueue Activity

An enqueue is simply a locking mechanism. This section is very useful and must be used when the wait event "enqueue" is listed in the "Top 5 timed events".

In 8i the section looks like this.

Enqueue activity for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> ordered by waits desc, gets desc

Enqueue Gets Waits
---------- ------------ ----------
PS 2,632 716
ST 192 185
TM 973 184
TC 66 57
US 80 53
TS 68 46
TT 349 36
PI 56 32
HW 10 5
CF 275 3
DV 4 3
TX 499 1

In 9i the section looks like this.

Enqueue activity for DB: S901 Instance: S901 Snaps: 2 -3
-> Enqueue stats gathered prior to 9i should not be compared with 9i data
-> ordered by waits desc, requests desc

Avg Wt Wait
Eq Requests Succ Gets Failed Gets Waits Time (ms) Time (s)
-- ------------ ------------ ----------- ----------- ----------- ------------
HW 656 656 0 139 2.04 0

The action to take depends on the lock type that is causing the most problems. The most common lock waits are generally for:
- TX - Transaction Lock: Generally due to application concurrency mechanisms, or table setup issues.

- TM - DML enqueue: Generally due to application issues, particularly if foreign key constraints have not been indexed.

- ST - Space management enqueue: Usually caused by too much space management occurring. For example: create table as select on large tables on busy instances, small extent sizes, lots of sorting, etc.

Undo (Rollback) Information

Undo (Rollback) information is provided in two sections. They are identical in both 8i and 9i and are self explanatory.

- Rollback Segment Stats
- Rollback Segment Storage

In 9i the following two sections are added to provide similar information on the System Managed Undo (SMU) tablespace. Both are self explanatory.

- Undo Segment Summary for DB
- Undo Segment Stats for DB

The examples below show typical performance problem related to Undo (rollback) segments:

- Rollback Segment Stats for DB

Rollback Segment Stats for DB: PHS2 Instance: phs2 Snaps: 100 -104
->A high value for "Pct Waits" suggests more rollback segments may be required

Trans Table Pct Undo Bytes
RBS No Gets Waits Written Wraps Shrinks Extends
------ ------------ ------- --------------- -------- -------- --------
0 9.0 0.00 0 0 0 0
4 6,838.0 0.18 554,206 0 0 0
5 2,174.0 0.55 292,474 0 0 0
6 4,309.0 0.23 471,992 0 0 0
....

In this case, the PCT Waits on three of the rollback segments indicates that there is some minor contention on the rollbacks and that either another rollback or more space should be added.
- Rollback Segment Storage for DB

Rollback Segment Storage for DB: PHS2 Instance: phs2 Snaps: 100 -104
->Optimal Size should be larger than Avg Active

RBS No Segment Size Avg Active Optimal Size Maximum Size
------ --------------- --------------- --------------- ---------------
0 753,664 0 753,664
4 2,520,743,936 0 2,520,743,936
5 2,109,702,144 0 2,109,702,144
6 528,449,536 0 528,449,536

In this case, the client does not have optimal set.
Rollback Segment Storage for DB: RW1PRD Instance: rw1prd Snaps: 10489 - 1
->The value of Optimal should be larger than Avg Active

RBS No Segment Size Avg Active Optimal Size Maximum Size
------ --------------- ----------- --------------- ---------------
0 5,087,232 0 5,087,232
1 52,420,608 ########### 52,428,800 335,536,128
2 52,420,608 10,551,688 52,428,800 283,107,328
3 52,420,608 10,621,742 52,428,800 283,107,328
4 52,420,608 10,736,056 52,428,800 283,107,328
5 52,420,608 17,861,266 52,428,800 325,050,368
6 52,420,608 19,579,373 52,428,800 335,536,128
7 52,420,608 11,571,513 52,428,800 283,107,328
8 52,420,608 44,140,215 52,428,800 335,536,128
9 52,420,608 65,045,643 52,428,800 325,050,368

In this instance optimal is set and we can see an overflow for average active for RBS 1 and that RBS 9 was also larger than optimal. If this is a consistent problem it may be that the optimal value should be raised.

- Undo Segment Summary for DB

Undo Segment Summary for DB: S901 Instance: S901 Snaps: 2 -3
-> Undo segment block stats:
-> uS - unexpired Stolen, uR - unexpired Released, uU - unexpired reUsed
-> eS - expired Stolen, eR - expired Released, eU - expired reUsed

Undo Undo Num Max Qry Max Tx Snapshot Out of uS/uR/uU/
TS# Blocks Trans Len (s) Concurcy Too Old Space eS/eR/eU
---- -------------- ---------- -------- ---------- -------- ------ -------------
1 20,284 1,964 8 12 0 0 0/0/0/0/0/0

The description of the view V$UNDOSTAT in the Oracle9i Database Reference guide provides some insight as to the columns definitions. Should the client encounter SMU problems, monitoring this view every few minutes would provide more useful information.
- Undo Segment Stats for DB

Undo Segment Stats for DB: S901 Instance: S901 Snaps: 2 -3
-> ordered by Time desc

Undo Num Max Qry Max Tx Snap Out of uS/uR/uU/
End Time Blocks Trans Len (s) Concy Too Old Space eS/eR/eU
------------ ------------ -------- ------- -------- ------- ------ -------------
12-Mar 16:11 18,723 1,756 8 12 0 0 0/0/0/0/0/0
12-Mar 16:01 1,561 208 3 12 0 0 0/0/0/0/0/0

This section provides a more detailed look at the statistics in the previous section by listing the information as it appears in each snapshot.
It should be noted that 9i introduces an optional init.ora parameter called UNDO_RETENTION which allows the DBA to specify how long the system will attempt to retain undo information for a committed transaction without being overwritten or recaptured. This parameter, based in units of wall-clock seconds, is defined universally for all undo segments.

Use of UNDO_RETENTION can potentially increase the size of the undo segment for a given period of time, so the retention period should not be arbitrarily set too high. The UNDO tablespace still must be sized appropriately. The following calculation can be used to determine how much space a given undo segment will consume given a set value of UNDO_RETENTION.

Undo Segment Space Required = (undo_retention_time * undo_blocks_per_seconds)

As an example, an UNDO_RETENTION of 5 minutes (default) with 50 undo blocks/second (8k blocksize) will generate:

Undo Segment Space Required = (300 seconds * 50 blocks/ seconds * 8K/block) = 120 M

The retention information (transaction commit time) is stored in every transaction table block and each extent map block. When the retention period has expired, SMON will be signaled to perform undo reclaims, done by scanning each transaction table for undo timestamps and deleting the information from the undo segment extent map. Only during extreme space constraint issues will retention period not be obeyed.

Latch Information

Latch information is provided in the following three sections.

. Latch Activity
. Latch Sleep breakdown
. Latch Miss Sources

This information should be checked whenever the "latch free" wait event or other latch wait events experience long waits.

- Latch Activity

Latch Activity for DB: PHS2 Instance: phs2 Snaps: 100 -104
->"Get Requests", "Pct Get Miss" and "Avg Slps/Miss" are statistics for
willing-to-wait latch get requests
->"NoWait Requests", "Pct NoWait Miss" are for no-wait latch get requests
->"Pct Misses" for both should be very close to 0.0

Pct Avg Pct
Get Get Slps NoWait NoWait
Latch Name Requests Miss /Miss Requests Miss
----------------------------- -------------- ------ ------ ------------ ------
KCL freelist latch 9,382 0.0 0
KCL lock element parent latch 15,500 0.0 0.0 0
KCL name table latch 3,340 0.0 0
Token Manager 12,474 0.0 0.0 0
active checkpoint queue latch 2,504 0.0 0
batching SCNs 114,141 0.0 0.0 0
begin backup scn array 6,697 0.0 0
cache buffer handles 1 0.0 0
cache buffers chains 1,056,119 0.1 0.2 6,303 0.0
cache buffers lru chain 104,996 0.0 4,078 0.0



This section is identical in both 8i and 9i.
This section is particularly useful for determining latch contention on an instance. Latch contention generally indicates resource contention and supports indications of it in other sections.
Latch contention is indicated by a Pct Miss of greater than 1.0% or a relatively high value in Avg Sleeps/Miss.
While each latch can indicate contention on some resource, the more common latches to watch are:

- cache buffer chains: Contention on this latch confirms a hot block issue. See Note 62172.1 'Understanding and Tuning Buffer Cache and DBWR in Oracle7, Oracle8, and Oracle8i' for a discussion of this phenomenon.

- shared pool: Contention on this latch in conjunction with reloads in the SQL Area of the library cache section indicates that the shared pool is too small. Contention on this latch indicates that one of the following is happening:

. The library cache, and hence, the shared pool is too small.

. Literal SQL is being used. See Note 62143.1 'Understanding and Tuning the Shared Pool for an excellent discussion of this topic.

. On versions 8.1.7.2 and higher, session_cached_cursors might need to be set. See enhancement bug 1589185 for details.

See Note 62143.1 Understanding and Tuning the Shared Pool in Oracle7, Oracle8, and Oracle8i for a good discussion on literal SQL and its impact on the shared pool and library cache.

- Latch Sleep breakdown

Latch Sleep breakdown for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> ordered by misses desc

Get Spin &
Latch Name Requests Misses Sleeps Sleeps 1->4
-------------------------- -------------- ----------- ----------- ------------
row cache objects 1,908,536 70,584 16,976 54656/14893/
1022/13/0
dlm resource hash list 624,455 15,931 71,868 118/959/1483
5/19/0
parallel query alloc buffe 37,000 4,850 362 4502/335/12/
1/0
shared pool 176,560 3,238 773 2649/431/134
/24/0
library cache 871,408 1,572 935 925/433/151/
63/0
cache buffers chains 1,056,119 872 209 670/195/7/0/
0
....


This section provides additional supporting information to the previous section. It is identical in 8i and 9i.
Latch Miss Sources

Latch Miss Sources for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> only latches with sleeps are shown
-> ordered by name, sleeps desc

NoWait Waiter
Latch Name Where Misses Sleeps Sleeps
------------------------ -------------------------- ------- ---------- -------
batching SCNs kcsl01 0 1 1
cache buffers chains kcbgtcr: kslbegin 0 114 39
cache buffers chains kcbgcur: kslbegin 0 62 62
cache buffers chains kcbrls: kslbegin 0 29 104
cache buffers chains kcbchg: kslbegin: bufs not 0 1 1
dlm group lock table lat kjgalk: move a lock from p 0 1 0
dlm lock table freelist kjlalc: lock allocation 0 10 6
dlm lock table freelist kjgdlk: move lock to paren 0 1 2
dlm lock table freelist kjlfr: remove lock from pa 0 1 3
dlm resource hash list kjucvl: open cr lock reque 0 36,732 562
dlm resource hash list kjxcvr: handle convert req 0 29,189 39,519
dlm resource hash list kjskchcv: convert on shado 0 3,907 25
dlm resource hash list kjrrmas1: lookup master no 0 1,603 18
dlm resource hash list kjcvscn: remove from scan 0 383 0
dlm resource hash list kjrlck: lock resource 0 26 1,965


This section provides a detailed breakdown of which latches are missing and sleeping. It is particularly useful in identifying library cache bugs as it provides latch child information not available in the previous two sections.
Search on the latch child name experiencing high misses or sleeps and you can often find the bug responsible.
It is identical in 8i and 9i.

Dictionary Cache Statistics

This is an interesting section to monitor but about which you can do very little as the only way to change the size of the dictionary cache is to change the shared pool size as the dictionary cache is a percentage of the shared pool. It is identical in 8i and 9i.

Dictionary Cache Stats for DB: PHS2 Instance: phs2 Snaps: 100 -104
->"Pct Misses" should be very low (< 2% in most cases)
->"Cache Usage" is the number of cache entries being used
->"Pct SGA" is the ratio of usage to allocated size for that cache

Get Pct Scan Pct Mod Final Pct
Cache Requests Miss Requests Miss Req Usage SGA
---------------------- ------------ ------ -------- ----- -------- ------ ----
dc_constraints 0 0 0 0 0
dc_database_links 0 0 0 0 0
dc_files 0 0 0 161 98
dc_free_extents 226,432 16.8 304 0.0 288 ###### 99
...

Library Cache Statistics

This section of the report shows information about the different sub-areas activity in the library cache.

The 8i version looks like this.

Library Cache Activity for DB: PHS2 Instance: phs2 Snaps: 100 -104
->"Pct Misses" should be very low

Get Pct Pin Pct Invali-
Namespace Requests Miss Requests Miss Reloads dations
--------------- ------------ ------ -------------- ------ ---------- --------
BODY 48 0.0 48 0.0 0 0
CLUSTER 7 0.0 8 0.0 0 0
INDEX 0 0 0 0
OBJECT 0 0 0 0
PIPE 0 0 0 0
SQL AREA 42,640 0.2 193,249 0.1 23 17
TABLE/PROCEDURE 287 3.8 1,701 2.6 6 0
TRIGGER 0 0 0 0

The 9i version looks like this.
Library Cache Activity for DB: S901 Instance: S901 Snaps: 2 -3
->"Pct Misses" should be very low

Get Pct Pin Pct Invali-
Namespace Requests Miss Requests Miss Reloads dations
--------------- ------------ ------ -------------- ------ ---------- --------
BODY 29 0.0 29 0.0 0 0
SQL AREA 579 5.7 2,203,964 0.0 0 0
TABLE/PROCEDURE 292 0.0 496 0.0 0 0
TRIGGER 12 0.0 12 0.0 0 0

Values in Pct Misses or Reloads in the SQL Area, Tables/Procedures or Trigger rows indicate that the shared pool may be too small. To confirm this, consistent values (not sporadic) in Pct Misses or Reloads in the Index row indicate that the buffer cache is too small. (No longer available in 9i.)

Values in Invalidations in the SQL Area indicate that a table definition changed while a query was being run against it or a PL/SQL package being used was recompiled.

SGA Memory Summary

This section provides a breakdown of how the SGA memory is used at the time of the report. It is useful to be able to track this over time. This section is identical in 8i and 9i.

SGA regions Size in Bytes
------------------------------ ----------------
Database Buffers 209,715,200
Fixed Size 103,396
Redo Buffers 104,873,984
Variable Size 423,956,480
----------------
sum 738,649,060

SGA Memory Detail

This section shows a detailed breakdown of memory usage by the SGA at the beginning and ending of the reporting period. It allows the DBA to track memory usage throughout the business cycle. It is identical in 8i and 9i.

SGA breakdown difference for DB: PHS2 Instance: phs2 Snaps: 100 -104

Pool Name Begin value End value Difference
----------- ------------------------ -------------- -------------- -----------
java pool free memory 20,000,768 20,000,768 0
large pool PX msg pool 230,386,744 230,386,744 0
large pool free memory 299,976 299,976 0
shared pool Checkpoint queue 189,280 189,280 0
shared pool KGFF heap 252,128 252,128 0
shared pool KGK heap 31,000 31,000 0
shared pool KQLS heap 2,221,552 2,246,640 25,088
shared pool PL/SQL DIANA 436,240 436,240 0
shared pool PL/SQL MPCODE 138,688 138,688 0

Init.ora Parameters Summary

The final section shows the current init.ora parameter settings. It displays those that are more commonly used including some hidden. It is identical in 8i and 9i.

init.ora Parameters for DB: PHS2 Instance: phs2 Snaps: 100 -104

End value
Parameter Name Begin value (if different)
----------------------------- --------------------------------- --------------
_PX_use_large_pool TRUE
always_anti_join HASH
audit_trail TRUE
background_dump_dest /u01/app/oracle/admin/phs2/bdump
bitmap_merge_area_size 10485760
compatible 8.1.7
control_files /oradata/phs2/hsz16/control_01.db
core_dump_dest /u01/app/oracle/admin/phs2/cdump
cursor_space_for_time TRUE
Posted by ayyudba at 6:03 AM 0 comments Links to this post
Labels: Performance Tuning
Tuning I/O-related waits
Subject: Tuning I/O-related waits
Doc ID: Note:223117.1 Type: TROUBLESHOOTING
Last Revision Date: 05-APR-2007 Status: PUBLISHED


-------
PURPOSE
-------

This article provides guidelines for tuning an Oracle database
when the main source of contention is I/O-related.



-------------------
SCOPE & APPLICATION
-------------------

The techniques described here can be followed when:

o Statspack or AWR reports show I/O wait events in the "Top 5 Wait/Timed Events" section.

o SQL Tracing with wait events of a database session shows it is limited
mainly by I/O wait events.

o Operating System tools show very high utilization or saturation of disks
used for storage of database files.

The article should be of use to Database Administrators, Support Engineers,
Consultants and Database Performance Analysts.



-------------------------
TUNING WITH RESPONSE TIME
-------------------------

A critical activity in Database Performance Tuning is
Response Time Analysis: this consists of finding out where time is being
spent in a database.

TIME is the most important property in Performance Tuning.
Users perceive the performance of a system through the response time
they experience for their transactions or batch jobs.

Response Time Analysis for an Oracle Database is done
using the following equation:

Response Time = Service Time + Wait Time

'Service Time' is measured using the statistic 'CPU used by this session'

'Wait Time' is measured by summing up time spent on Wait Events

Note: although similar in appearance, this equation is not the fundamental
equation of Queueing Theory.

Performance Tuning methods using tools such as Statspack work by evaluating
the relative impact of the various components of overall Response Time and
direct the tuning effort to those components having the most impact in terms
of time consumed.

For a detailed discussion of this subject please refer to
Note 190124.1 THE COE PERFORMANCE METHOD

Starting with Oracle10g the above process is carried out automatically
by the Automatic Database Diagnostic Monitor (ADDM.) Please read
Note 260655.1 How to use the Automatic Database Diagnostic Monitor

----------------------------------------------------
DETERMINING THE REAL SIGNIFICANCE OF I/O WAIT EVENTS
----------------------------------------------------

Many tools including Statspack produce listings of the most significant Wait
Events. Statspack reports in versions previous to Oracle9i Release 2 contain
this information in a section called "Top 5 Wait Events".

When presented with such a list of top Wait Events it sometimes becomes easy
to simply start dealing with the listed Wait Events and to forget evaluating
their impact on overall Response Time first.

In situations where 'Service Time' i.e. CPU usage is much more significant
than 'Wait Time', it is very likely that investigating Wait Events will not
produce significant savings in 'Response Time'.

Therefore, one should always compare the time taken by the top wait events
to the 'CPU used by this session' and direct the tuning effort to the biggest
consumers.

Note:
To address this possible source of confusion, starting with Oracle9i Release 2
the "Top 5 Wait Events" section has been renamed to "Top 5 Timed Events".
Here, 'Service Time' as measured by the statistic 'CPU used by this session'
is listed as 'CPU time'. This means that it is now easier to accurately measure
the impact of Wait Events in overall 'Response Time' and to correctly target
the subsequent tuning effort.



-----------------------------------------------------
MISINTERPRETING THE IMPACT OF WAIT EVENTS: AN EXAMPLE
-----------------------------------------------------

Here is a real life example of why it is important to look at both 'Wait Time'
and 'Service Time' when investigating database performance.

The following is the "Top 5 Wait Events" section of a Statspack report
generated from two snapshots 46 minutes apart:

Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
-------------------------------------------- ------------ ------------ -------
direct path read 4,232 10,827 52.01
db file scattered read 6,105 6,264 30.09
direct path write 1,992 3,268 15.70
control file parallel write 893 198 .95
db file parallel write 40 131 .63
-------------------------------------------------------------

Based on this listing we may be tempted to immediately start looking at the
causes between the 'direct path read' and 'db file scattered read' waits and
to try to tune them. This approach would not take into account 'Service Time'.

Here is the statistic that measures 'Service Time' from the same report:

Statistic Total per Second per Trans
--------------------------------- ---------------- ------------ ------------
CPU used by this session 358,806 130.5 12,372.6

Let's do some simple math from these figures:
'Wait Time' = 10,827 x 100% / 52,01% = 20,817 cs
'Service Time' = 358,806 cs
'Response Time' = 358,806 + 20,817 = 379,623 cs

If we now calculate percentages for all the 'Response Time' components:

CPU time = 94.52%
direct path read = 2.85%
db file scattered read = 1.65%
direct path write = 0.86%
control file parallel write = 0.05%
db file parallel write = 0.03%

It is now obvious that the I/O-related Wait Events are not really a significant
component of the overall Response Time and that subsequent tuning should be
directed to the Service Time component i.e. CPU consumption.

Incidentally, the improved "Top 5 Timed Events" section in Statspack starting
with Oracle9i Release 2 would show output similar to our calculated listing.



-----------------------
I/O-RELATED WAIT EVENTS
-----------------------

In this section we list the I/O-related Wait Events that occur most often
in Oracle databases together with reference notes describing each wait.

In subsequent sections we explore some of the most important of these in detail.

Datafile I/O-Related Wait Events:
'db file sequential read' Note 34559.1
'db file scattered read' Note 34558.1
'db file parallel read'
'direct path read' Note 50415.1
'direct path write' Note 50416.1
'direct path read (lob)'
'direct path write (lob)'
Controlfile I/O-Related Wait Events:
'control file parallel write'
'control file sequential read'
'control file single write'
Redo Logging I/O-Related Wait Events:
'log file parallel write' Note 34583.1
'log file sync' Note 34592.1
'log file sequential read'
'log file single write'
'switch logfile command'
'log file switch completion'
'log file switch (clearing log file)'
'log file switch (checkpoint incomplete)'
'log switch/archive'
'log file switch (archiving needed)'
Buffer Cache I/O-Related Wait Events:
'db file parallel write' Note 34416.1
'db file single write'
'write complete waits'
'free buffer waits'
-------------------------------------------
GENERAL APPROACHES FOR HANDLING I/O PROBLEMS
-------------------------------------------

After an analysis of the database's Response Time using e.g. Statspack
has shown that performance is limited by I/O-related Wait Events, a number
of possible approaches can be followed.

Refer to the next section for the approaches to follow for each Wait Event.

Some of the approaches can be used regardless of the particular Wait Event.
In this section we present and explain the concepts and rationale behind
each approach.

o Reduce the I/O requirements of the database by tuning SQL:

A database with no user SQL being run generates little or no I/O.
Ultimately all I/O generated by a database is directly or indirectly
due to the nature and amount of user SQL being submitted for execution.

This means that it is possible to limit the I/O requirements of a database
by controlling the amount of I/O generated by individual SQL statements.
This is accomplished by tuning SQL statements so that their execution plans
result in a minimum number of I/O operations.
Typically in a problematic situation there will only be a few SQL statements
with suboptimal execution plans generating a lot more physical I/O than
necessary and degrading the overall performance for the database.

Starting with Oracle10g, ADDM aids the SQL tuning process by automatically
identifying the SQL statements with most impact. The SQL Tuning Advisor can
then be used to automatically tune these statements and reduce their I/O
resource consumption. For more information please see
Note 262687.1 How to use the Sql Tuning Advisor

o Reduce the I/O requirements of the database by tuning instance parameters:

This works in two ways:

a) Using memory caching to limit I/O:

The amount of I/O required by the database is limited by the use of a number
of memory caches e.g. the Buffer Cache, the Log Buffer, various Sort Areas etc.

Increasing the Buffer Cache, up to a point, results in more buffer accesses
by database processes (logical I/Os) being satisfied from memory instead of
having to go to disk (physical I/Os).

With larger Sort Areas in memory, the likelihood of them being exhausted
during a sorting operation and having to use a temporary tablespace on disk
is reduced.

The other caches also work according to similar concepts.

b) Tuning the size of multiblock I/O:

The size of individual multiblock I/O operations can be controlled by instance
parameters.

Up to a limit, multiblock I/Os are executed faster when there are fewer larger
I/Os than when there are more smaller I/Os.
For example, transferring 100Mb of data will complete faster if it is done in
100 requests of size 1Mb each than if it is done in 1,000 requests of size
100Kb each or 10,000 requests of 10Kb each.
After this limit is reached, the difference is no longer important: transferring
1Gb of data in 100 requests of size 10Mb each (if allowed by limits on maximum
I/O transfer size of Operating Systems) would be almost as efficient as a
single transfer of size 1Gb.

This is because the time taken to service an I/O involves two main components:
I/O Setup Time and I/O Transfer Time.

I/O Setup Time tends to be fairly constant across different I/O sizes
and for small I/O sizes tends to dominate the total service time.

I/O Transfer Time tends to increase in proportion to the size of the I/O
and for small I/O sizes is usually less than the I/O Setup Time.

The consequence of the above is that it is usually better to configure instance
parameters so that the database issues larger and fewer multiblock I/Os.
The typical parameter used for this is called DB_FILE_MULTIBLOCK_READ_COUNT.

o Optimizing I/O at the Operating System level

This involves making use of I/O capabilities such as Asynchronous I/O or
using Filesystems with advanced capabilities such as Direct I/O (bypassing the
Operating System's File Caches). Another possible action is to raise the limit
of maximum I/O size per transfer (referred to as max_io_size in this article).

o Balancing the database I/O by usage of Oracle ASM (Automatic Storage Manager)

ASM is introduced with Oracle10g. It is a file system and volume manager built
into the database kernel. It automatically does load balancing in parallel
across all available disk drives to prevent hot spots and maximize performance,
even with rapidly changing data usage patterns. It prevents fragmentation so
that there is never a need to relocate data to reclaim space. Data is well
balanced and striped over all disks.

For details please see
Note 249992.1 New Feature on ASM (Automatic Storage Manager)

o Balancing the database I/O by usage of Striping, RAID, SAN or NAS

This approach relies on storage technologies such as Striping, RAID, Storage
Area Networks (SAN) and Network Attached Storage (NAS) to automatically load
balance database I/O across multiple available physical disks in order to avoid disk
contention and I/O bottlenecks when there is still available unused disk
throughput in the storage hardware.

For more detailed discussions on these technologies please refer to
"Optimal Storage Configuration Made Easy" by J. Loaiza
Note 30286.1 I/O Tuning with Different RAID Configurations

o Redistribute database I/O by manual placement of database files across
different filesystems, controllers and physical devices

This is an approach used in the absence of advanced modern storage technologies.
Again the aim is to distribute the database I/O so that no single set of disks
or controller becomes saturated from I/O requests when there is still unused
disk throughput. It is harder to get right than the previous approach and most
often less successful.

Finally, it is important to remember that some I/O will always exist in most
databases. After all the guidelines above have been considered, if performance
is still not satisfactory on the existing system, you can consider:

o Reducing the data volumes of the current database by moving older data out.

o Investing in more & faster hardware.



--------------------------------
DATAFILE I/O-RELATED WAIT EVENTS
--------------------------------

These Wait Events occur on I/O operations to datafiles.


'db file sequential read' Note 34559.1
------------------------------------------------------------

This is one of the most common I/O-related waits.
It is in most cases a single block read e.g. for index data blocks or for
table data blocks accessed through an index but can also be seen for reads
on datafile header blocks.
In earlier versions it could be a multiblock read from Sort segments on disk
to contiguous ('sequential') buffers in the Buffer Cache.

If this Wait Event is a significant portion of Wait Time then a number of
approaches are possible:

o Find the Top SQL statements in Physical Reads (from a Statspack or AWR report
in the section titled "SQL ordered by Reads" or from the view V$SQL)
and tune them in order to reduce their I/O requirements:

- If Index Range scans are involved, more blocks than necessary could be
being visited if the index is unselective: by forcing or enabling the
use of a more selective index, we can access the same table data by
visiting fewer index blocks (and doing fewer physical I/Os).

- If indexes are fragmented, again we have to visit more blocks because
there is less index data per block. In this case, rebuilding the index
will compact its contents into fewer blocks.

- If the index being used has a large Clustering Factor, then more table
data blocks have to be visited in order to get the rows in each Index
block: by rebuilding the table with its rows sorted by the particular
index columns we can reduce the Clustering Factor and hence the number
of table data blocks that we have to visit for each index block.
For example, if the table has columns A, B, C & D and the index is on B, D
then we can rebuild the table as
CREATE TABLE new AS SELECT * FROM old ORDER BY b,d;

Note 39836.1 Clustering Factor

- Use Partitioning to reduce the number of index and table data blocks to be
visited for each SQL statement by usage of Partition Pruning.

o If there are no particular SQL statements with bad execution plans doing more
Physical I/Os than necessary, then one of the following may be happening:

- I/Os on particular datafiles may be being serviced slower due to excessive
activity on their disks. In this case, looking at the Statspack "File I/O
Statistics" section (or V$FILESTAT) will help us find such hot disks and
spread out the I/O by manually moving datafiles to other storage or by
making use of Striping, RAID and other technologies to automatically
perform I/O load balancing for us.

- Starting with Oracle 9.2, we can also find which segments (tables or
indexes) have the most Physical Reads being performed against them by
using the new Segment Statistics data from view V$SEGMENT_STATISTICS.
We can then look in detail at such segments and see if e.g. indexes
should be rebuilt or Partitioning could be used to reduce I/O on them.
Statspack also generates a "Segment Statistics" report starting at level 7.

o If there is no SQL with suboptimal execution plans and I/O is evenly spread
out with similar response times from all disks then a larger Buffer Cache
may help:

- In Oracle8i experiment with gradual increments of DB_BLOCK_BUFFERS followed
by measurements of the Buffer Cache Hit Ratio from Statspack until there is
no further improvement to it.

- In Oracle9i and above use the Buffer Cache Advisory facility (also available
in the Statspack report) to tune the size of the Buffer Cache.
For details please refer to the manual
Oracle9i Database Performance Guide and Reference,
Ch. 14 Memory Configuration and Use, Configuring and Using the Buffer Cache

- In Oracle10g and above Automatic Shared Memory Management (ASMM) can be
used to enable the database to automatically determine the optimal size
for the Buffer Cache according to recent workload. For more information see
Note 257643.1 Oracle Database 10g Automated SGA Memory Tuning

- For hot segments, usage of Multiple Buffer Pools can be explored: place
such hot indexes and tables in the KEEP Buffer Pool. For details refer to
Note 76374.1 Multiple Buffer Pools

o Finally, you can consider reducing the data held in the most frequently
accessed segments (by moving older unneeded data out of the database) or
moving these segments to new faster disks to reduce the response time on
their I/Os.


'db file scattered read' Note 34558.1
------------------------------------------------------------

This is another very common Wait Event.
It occurs when Oracle performs multiblock reads from disk into non-contiguous
('scattered') buffers in the Buffer Cache. Such reads are issued for up to
DB_FILE_MULTIBLOCK_READ_COUNT blocks at a time.
These typically happen for Full Table Scans and for Fast Full Index scans.

If this Wait Event is a significant portion of Wait Time then a number of
approaches are possible:

o Find which SQL statements perform Full Table or Fast Full Index scans and
tune them to make sure these scans are necessary and not the result of a
suboptimal plan.

- Starting with Oracle9i the new view V$SQL_PLAN view can help:
(ignore data dictionary SQL in the output of these queries)
For Full Table scans:
select sql_text from v$sqltext t, v$sql_plan p
where t.hash_value=p.hash_value and p.operation='TABLE ACCESS'
and p.options='FULL'
order by p.hash_value, t.piece;
For Fast Full Index scans:
select sql_text from v$sqltext t, v$sql_plan p
where t.hash_value=p.hash_value and p.operation='INDEX'
and p.options='FULL SCAN'
order by p.hash_value, t.piece;

- In Oracle8i a possible approach is to find sessions performing multiblock
reads by querying V$SESSION_EVENT for this Wait Event and then SQL Tracing
them. Alternatively, the Top SQL statements for Physical Reads can be
investigated to see if their execution plans contain Full Table or Fast
Full Index scans.

o In cases where such multiblock scans occur from optimal execution plans
it is possible to tune the size of multiblock I/Os issued by Oracle by
setting the instance parameter DB_FILE_MULTIBLOCK_READ_COUNT so that

DB_BLOCK_SIZE x DB_FILE_MULTIBLOCK_READ_COUNT = max_io_size of system

For more information refer to
Note 30712.1 Init.ora Parameter "DB_FILE_MULTIBLOCK_READ_COUNT" Reference
Note 1037322.6 WHAT IS THE DB_FILE_MULTIBLOCK_READ_COUNT PARAMETER?

Starting with Oracle10g Release 2 the DB_FILE_MULTIBLOCK_READ_COUNT
initialization parameter is now automatically tuned to use a default value
when this parameter is not set explicitly. This default value corresponds
to the maximum I/O size that can be performed efficiently.
This value is platform-dependent and is 1MB for most platforms.
Because the parameter is expressed in blocks, it will be set to a value that
is equal to the maximum I/O size that can be performed efficiently divided by
the standard block size.

o As blocks read using Full Table and Fast Full Index scans are placed on the
least recently used end of the Buffer Cache replacement lists, sometimes
it may help to use Multiple Buffer Pools and place such segments in the KEEP
pool. For more information please refer to
Note 76374.1 Multiple Buffer Pools

o Partitioning can also be used to reduce the amount of data to be scanned
as Partition Pruning can restrict the scan to a subset of the segment's
partitions.

o Finally, you can consider reducing the data held in the most frequently
accessed segments (by moving older unneeded data out of the database) or
moving these segments to new faster disks to reduce the response time on
their I/Os.


'db file parallel read'
------------------------------------------------------------

This Wait Event is used when Oracle performs in parallel reads from multiple
datafiles to non-contiguous buffers in memory (PGA or Buffer Cache).
This is done during recovery operations or when buffer prefetching is being
used as an optimization i.e. instead of performing multiple single-block reads.

If this wait is an important component of Wait Time, follow the same guidelines
as 'db file sequential read'.


'direct path read' Note 50415.1
'direct path write' Note 50416.1
'direct path read (lob)'
'direct path write (lob)'
------------------------------------------------------------

These occur when database processes perform special types of multiblock I/Os
between the disk and process PGA memory, thus bypassing the Buffer Cache.
Such I/Os may be performed both synchronously and asynchronously.

Examples where they may be used are:
o Sort I/Os when memory Sort areas are exhausted and temporary tablespaces
are used to perform the sort
o Parallel Execution (Query and DML)
o Readahead operations (buffer prefetching)
o Direct Load operations
o I/O to LOB segments (which are not cached in the Buffer Cache)

Due to the way in which time for these waits is recorded (it does not measure
the time taken to perform the I/O), their relative position in listings such
as Statspack's "Top 5 Wait/Timed Events" cannot be used to evaluate their
true impact.

Guidelines for tuning:
o Usage of Asynchronous I/O is recommended where available.

o In Oracle8i, minimize the number of I/O requests by setting the
DB_FILE_DIRECT_IO_COUNT instance parameter so that

DB_BLOCK_SIZE x DB_FILE_DIRECT_IO_COUNT = max_io_size of system

In Oracle8i the default for this is 64 blocks.

(In Oracle9i, it is replaced by _DB_FILE_DIRECT_IO_COUNT which governs
the size of direct I/Os in BYTES (not blocks). The default is 1Mb but
will be sized down if the max_io_size of the system is smaller.)

Note 47324.1 Init.ora Parameter "DB_FILE_DIRECT_IO_COUNT" Reference Note

o Tune memory Sort areas so that disk I/O for Sorting is minimized:
In 9i and above use Automated SQL Execution Memory Management.
In 8i tune the various Sort areas manually.

Note 147806.1 Oracle9i New Feature: Automated SQL Execution Memory Management
Note 109907.1 How to Determine an Optimal SORT_AREA_SIZE

o For LOB segments, store them on filesystems where an Operating System File
Buffer Cache can provide some memory caching.

o Identify sessions performing direct I/Os by querying V$SESSION_EVENT
for these Wait Events or V$SESSTAT for statistics
'physical reads direct', 'physical reads direct (lob)',
'physical writes direct' & 'physical writes direct (lob)'
and tune their SQL statements.

o Identify datafiles on bottlenecked disk storage and move elsewhere
using V$FILESTAT or Statspack's "File IO Statistics" section.



-----------------------------------
CONTROLFILE I/O-RELATED WAIT EVENTS
-----------------------------------

These Wait Events occur during I/O to one or all copies of the controlfile.

Frequency of Controlfile access is governed by activities such as Redo Logfile
switching and Checkpointing. Therefore it can only be influenced indirectly
by tuning these activities.


'control file parallel write'
------------------------------------------------------------

This occurs when a server process is updating all copies of the controlfile.
If it is significant, check for bottlenecks on the I/O paths (controllers,
physical disks) of all of the copies of the controlfile.

Possible solutions:

o Reduce the number of controlfile copies to the minimum that ensures
that not all copies can be lost at the same time.

o Use Asynchronous I/O if available on your platform.

o Move the controlfile copies to less saturated storage locations.


'control file sequential read'
'control file single write'
------------------------------------------------------------

These occur on I/O to a single copy of the controlfile.
If they are significant find out whether the waits are on particular copy
of the controlfile and if so whether its I/O path is saturated.

The following query can be used to find which controlfile is being accessed.
It has to be run when the problem is occuring:

select P1 from V$SESSION_WAIT
where EVENT like 'control file%' and STATUS='WAITING';

Possible solutions:

o Move the problematic controlfile copy to a less saturated storage location.

o Use Asynchronous I/O if available on your platform.



------------------------------------
REDO LOGGING I/O-RELATED WAIT EVENTS
------------------------------------

There are a number of Wait Events that happen during Redo Logging activities
and most of them are I/O-related.

The two most important ones are 'log file parallel write' and 'log file sync'.
Oracle foreground processes wait for 'log file sync' whereas the LGWR process
waits for 'log file parallel write'.

Although we usually find 'log file sync' in the "Top 5 Wait/Timed Events"
section of the Statspack report, in order to understand it we will first look
at 'log file parallel write':


'log file parallel write' Note 34583.1
------------------------------------------------------------

The LGWR background process waits for this event while it is copying redo
records from the memory Log Buffer cache to the current redo group's member
logfiles on disk.

Asynchronous I/O will be used if available to make the write parallel, otherwise
these writes will be done sequentially one member after the other.
However, LGWR has to wait until the I/Os to all member logfiles are complete
before the wait is completed.
Hence, the factor that determines the length of this wait is the speed with
which the I/O subsystem can perform the writes to the logfile members.

To reduce the time waited for this event, one approach is to reduce the amount
of redo generated by the database:

o Make use of UNRECOVERABLE/NOLOGGING options.

o Reduce the number of redo group members to the minimum necessary to ensure
not all members can be lost at the same time.

o Do not leave tablespaces in BACKUP mode for longer than necessary.

o Only use the minimal level of Supplemental Logging required to achieve
the required functionality e.g. in LogMiner, Logical Standby or Streams.

Another approach is to tune the I/O itself:

o Place redo group members on storage locations so that parallel
writes do not contend with each other.

o Do not use RAID-5 for redo logfiles.

o Use Raw Devices for redo logfiles.

o Use faster disks for redo logfiles.

o If archiving is being used setup redo storage so that writes for the current
redo group members do not contend with reads for the group(s) currently being
archived.


'log file sync' Note 34592.1
------------------------------------------------------------

This Wait Event occurs in Oracle foreground processes when they have issued
a COMMIT or ROLLBACK operation and are waiting for it to complete.
Part (but not all) of this wait includes waiting for LGWR to copy the redo
records for the session's transaction from Log Buffer memory to disk.

So, in the time that a foreground process is waiting for 'log file sync',
LGWR will also wait for a portion of this time on 'log file parallel write'.

The key to understanding what is delaying 'log file sync' is to compare
average times waited for 'log file sync' and 'log file parallel write':

o If they are almost similar, then redo logfile I/O is causing the delay
and the guidelines for tuning it should be followed.

o If 'log file parallel write' is significantly different i.e smaller,
then the delay is caused by the other parts of the Redo Logging mechanism
that occur during a COMMIT/ROLLBACK (and are not I/O-related).
Sometimes there will be latch contention on redo latches, evidenced by
'latch free' or 'LGWR wait for redo copy' wait events.


'log file sequential read'
'log file single write'
------------------------------------------------------------

Both these Wait Events are I/O-related so they are likely to appear together
with 'log file parallel write' if there is I/O contention on the redo logs.
Follow the same guidelines for tuning them.


'switch logfile command'
'log file switch completion'
'log file switch (clearing log file)'
------------------------------------------------------------

More LGWR I/O-related Wait Events, tune as before.


'log file switch (checkpoint incomplete)'
------------------------------------------------------------

This Wait Event occurs when Checkpointing activities are not occuring
quickly enough.

For guidelines on tuning Checkpoint operations please refer to:

Note 147468.1 Checkpoint Tuning and Troubleshooting Guide
Note 76713.1 8i Parameters that Influence Checkpoints


'log switch/archive'
'log file switch (archiving needed)'
------------------------------------------------------------

These Wait Events occur when archiving is enabled and indicate that archiving
is not performing fast enough.

For guidelines on tuning archiving operations please refer to:

Note 45042.1 Archiver Best Practices



------------------------------------
BUFFER CACHE I/O-RELATED WAIT EVENTS
------------------------------------

These Wait Events occur because of Buffer Cache operations involving the
DBWR process(es) and I/O Slaves.


'db file parallel write' Note 34416.1
'db file single write'
'write complete waits'
'free buffer waits'
------------------------------------------------------------

For guidelines on tuning these waits please refer to the following articles:

Note 62172.1 Understanding and Tuning Buffer Cache and DBWR
Note 147468.1 Checkpoint Tuning and Troubleshooting Guide
Note 76713.1 8i Parameters that Influence Checkpoints



---------------------------------
FINAL NOTE: CORRECT I/O OPERATION
---------------------------------

As a final note in this article, whenever I/O performance and response times
are low it is worth checking for related errors in Operating System logs.

There is little point in investigating I/O performance at the Oracle database
level if the I/O subsystem is malfunctioning. If this is the case your Hardware,
Operating System or Filesystem vendor should be contacted for assistance.

Please ensure that all steps described in Oracle Installation manuals and
Administrator's Reference guides involving Operating System patches, Kernel
parameters & related configuration tasks have been performed on systems
hosting Oracle databases.



----------------------------
REFERENCES & FURTHER READING
----------------------------

Note 190124.1 THE COE PERFORMANCE METHOD
Note 30286.1 I/O Tuning with Different RAID Configurations
Note 30712.1 Init.ora Parameter "DB_FILE_MULTIBLOCK_READ_COUNT" Reference Note
Note 1037322.6 WHAT IS THE DB_FILE_MULTIBLOCK_READ_COUNT PARAMETER?
Note 39836.1 Clustering Factor
Note 47324.1 Init.ora Parameter "DB_FILE_DIRECT_IO_COUNT" Reference Note
Note 45042.1 Archiver Best Practices
Note 62172.1 Understanding and Tuning Buffer Cache and DBWR
Note 147468.1 Checkpoint Tuning and Troubleshooting Guide
Note 76713.1 8i Parameters that Influence Checkpoints
Note 76374.1 Multiple Buffer Pools
Note 147806.1 Oracle9i New Feature: Automated SQL Execution Memory Management
Note 109907.1 How to Determine an Optimal SORT_AREA_SIZE

"Optimal Storage Configuration Made Easy" by J. Loaiza
http://otn.oracle.com/deploy/performance/pdf/opt_storage_conf.pdf

"Diagnosing Performance Using Statspack" by C. Dialeris & G. Wood
http://otn.oracle.com/deploy/performance/pdf/statspack.pdf

"Performance Tuning with Statspack, Part I" by C. Dialeris & G. Wood
http://otn.oracle.com/deploy/performance/pdf/20TUNING_dialeris.pdf

"Performance Tuning with Statspack, Part II" by C. Dialeris & G. Wood
http://otn.oracle.com/deploy/performance/pdf/statspack_tuning_otn_new.pdf

Oracle® Database Performance Tuning Guide 10g Release 2 (10.2)
Part Number B14211-01
http://www.oracle.com/pls/db102/to_toc?pathname=server.102%2Fb14211%2Ftoc.htm&remark=portal+%28Getting+Started%29

Oracle® Database Performance Tuning Guide 10g Release 1 (10.1)
Part Number B10752-01
http://www.oracle.com/pls/db10g/db10g.to_toc?pathname=server.101%2Fb10752%2Ftoc.htm&remark=portal+%28Getting+Started%29

Oracle9i Database Performance Planning Release 2 (9.2)
Part Number A96532-01
http://www.oracle.com/pls/db92/db92.to_toc?pathname=server.920%2Fa96532%2Ftoc.htm&remark=docindex

Oracle9i Database Performance Tuning Guide and Reference Release 2 (9.2)
Part Number A96533-01
http://www.oracle.com/pls/db92/db92.to_toc?pathname=server.920%2Fa96533%2Ftoc.htm&remark=docindex

Oracle9i Database Performance Methods Release 1 (9.0.1)
Part Number A87504-02
http://www.oracle.com/pls/db901/db901.to_toc?pathname=server.901/a87504/toc.htm&remark=docindex

Oracle9i Database Performance Guide and Reference Release 1 (9.0.1)
Part Number A87503-02
http://www.oracle.com/pls/db901/db901.to_toc?pathname=server.901/a87503/toc.htm&remark=docindex

Oracle8i Designing and Tuning for Performance Release 2 (8.1.6)
Part Number A76992-01
http://download-uk.oracle.com/docs/cd/A87860_01/

No comments:

Post a Comment