Note: Proofread any scripts before using. Always try scripts on a test instance first. This Blog is not responsible for any damage.
Buffer Cache Busy Waits:
Description - Buffer busy waits happens when a session tries to access a block in the buffer cache but it cannot because the buffer is busy, i.e. another session is modifying the block and the contents of the block are in flux. To guarantee that the reader has a coherent image of the block with either all of the changes or none of the changes, the session modifying the block marks the block header with a flag letting other users know a change is taking place and to wait until the complete change is applied.
The two main cases where this wait occurs are:
- Another session is reading the block into the buffer - this specific case has been split out into a "read by other session" wait event in 10g and higher
- Another session holds the buffer in an incompatible mode to our request
While the block is being changed, the block is marked as unreadable by others. The changes that are being made should last under a few hundredths of a second, e.g. a disk read should be under 20 milliseconds and a block modification should be under one millisecond. Therefore it will take a lot of buffer busy waits to cause a problem. Some examples of this are:
1. Hot block issue, such as the first block on the free list of a table, with high concurrent inserts. All users will insert into that block at the same time, until it fills up, then users start inserting into the next free block on the list, and so on
2. Multiple users running an inefficient SQL statement performing a full table scan on the same large table at the same time. One user will read the block off disk, and the other users will wait on buffer busy waits (or read by other session in 10g and higher) for the physical I/O to complete
Gather information:
-- Identify the wait event:
SELECT s.sql_hash_value, sw.p1 file#, sw.p2 block#, sw.p3 reason
FROM gv$session_wait sw, gv$session s
WHERE sw.event = 'buffer busy waits'
AND sw.sid = s.sid;
-- Identify the object of a wait event:
SELECT owner , segment_name , segment_type
FROM dba_extents
WHERE file_id = &FileNo
AND &BlockNo BETWEEN block_id AND (block_id + blocks-1);
-- Top 10 buffer busy wait events:
column owner format a10
column object_name format a20
column tsname format a10
column value format 99999
SELECT *
FROM (
SELECT owner, object_name, subobject_name, object_type, tablespace_name TSNAME, value
FROM gv$segment_statistics
WHERE statistic_name='buffer busy waits'
ORDER BY value DESC)
WHERE ROWNUM < 11;
Issue Resolution Considerations:
1. Increase extent size (are extents added too frequently?)
2. Reduce rows per block (is there hot block contention?)
3. Increase undo retention (by altering size or retention time)
4. Tune queries
TKPROF Output Parameters Parameter Number Description -
P1 File number of the data file containing the block
P2 Block number within the datafile
P3 Reason code
Control File Waits:
Description - The three different wait events of 'control file sequential read', 'control file single write', and 'control file parallel write' all contribute to the amount of time Oracle takes to keep the control file current.
Oracle maintains a record of the consistency of the database's physical structures and operational state through a set of control files. The Oracle control file is essential to the database operation and ability to recover from an outage. In fact, if you lose the control file(s) associated with an instance you may not be able to recover completely. It is the Oracle control file(s) that records information about the consistency of a database's physical structures and operational statuses. The database state changes through activities such as adding data files, altering the size or location of datafiles, redo being generated, archive logs being created, backups being taken, SCN numbers changing, or checkpoints being taken.
Through normal operation the control file is continuously hammered with reads and writes as it is being updated.
Why Control File Waits Occur:
The performance around reads and writes against control files is often an indication of misplaced control files that share the same I/O access path or are on devices that are heavily used. It is interesting to note that Oracle has always defaulted the creation of control files in a single directory.
You can check where your control files reside on disk with this simple query:
select VALUE from v$parameter where name='control_files';
-- View wait events:
col event format a30
col wait_class format a20
SELECT inst_id, event, total_waits, total_timeouts, time_waited, average_wait, wait_class
FROM gv$system_event
WHERE event LIKE '%control%';
-- View sessions impacted by control file wait events:
SELECT event, wait_time, p1, p2, p3
FROM v$session_wait WHERE event LIKE '%control%';
Issue Resolution Considerations:
1. Relocate files for less contention
2. Reduce the frequency of commits and log switches
TKPROF Output Parameters (Control File Parallel Write) Parameter Number Description -
P1 Number of control files being written to
P2 Number of blocks written
P3 Number of I/O requests
TKPROF Output Parameters (Control File Sequential Read) Parameter Number Description -
P1 Control containing the block
P2 Bock number within the control file
P3 Number of blocks read
Wait
Event
|
Possible Causes
|
Actions
|
Remarks
|
db file sequential
reads
|
Use of
an unselective index
Fragmented Indexes
High
I/O on a particular disk or mount point
Bad
application design
Index
reads performance can be affected by slow I/O subsystem and/or poor
database files layout, which result in a higher average wait time
|
Check
indexes on the table to ensure that the right index is being used
Check
the column order of the index with the WHERE clause of the Top SQL
statements
Rebuild
indexes with a high clustering factor
Use
partitioning to reduce the amount of blocks being
visited
Make
sure optimizer statistics are up to date
Relocate ‘hot’ datafiles
Consider the usage of multiple buffer pools and cache
frequently used indexes/tables in the KEEP pool
Inspect
the execution plans of the SQL statements that access data through
indexes
Is it
appropriate for the SQL statements to access data through index
lookups?
Is the
application an online transaction processing (OLTP) or decision
support system (DSS)?
Would
full table scans be more efficient?
Do the
statements use the right driving table?
The
optimization goal is to minimize both the number of logical and
physical I/Os.
|
The
Oracle process wants a block that is currently not in the SGA, and it is
waiting for the database block to be read into the SGA from
disk.
Significant db file sequential read wait time is most
likely an application issue.
If
the
DBA_INDEXES.CLUSTERING_FACTOR of the index approaches the
number of blocks in the table, then most of the rows in the table are
ordered. This is desirable.
However, if the clustering factor
approaches the number of rows in the table, it means the rows in the table
are randomly ordered and thus it requires more I/Os to complete the
operation. You can improve the index’s clustering factor by rebuilding the
table so that rows are ordered according to the index key and rebuilding
the index thereafter.
The
OPTIMIZER_INDEX_COST_ADJ and OPTIMIZER_INDEX_CACHING initialization
parameters can influence the optimizer to favour the nested loops
operation and choose an index access path over a full table
scan.
db file
sequential read Reference Note# 34559.1
|
db file scattered
reads
|
The
Oracle session has requested and is waiting for multiple contiguous
database blocks (up to DB_FILE_MULTIBLOCK_READ_COUNT) to be read into the SGA from disk.
Full
Table scans
Fast
Full Index Scans
|
Optimize multi-block I/O by setting the parameter
DB_FILE_MULTIBLOCK_READ_COUNT
Partition pruning to reduce number of blocks visited
Consider the usage of multiple buffer pools and cache
frequently used indexes/tables in the KEEP pool
Optimize the SQL statement that initiated most of the
waits. The goal is to minimize the number of physical and logical
reads.
Should
the statement access the data by a full table scan or index FFS?
Would an index range or unique scan be more efficient?
Does
the query use the right driving table?
Are the
SQL predicates appropriate for hash or merge
join?
If full scans are appropriate, can
parallel query improve the response time?
The
objective is to reduce the demands for both the logical and
physical I/Os, and this is best achieved through SQL and
application tuning.
Make
sure all statistics are representative of the actual data. Check
the LAST_ANALYZED date
|
If an
application that has been running fine for a while suddenly clocks a lot
of time on the db file scattered read event and there hasn’t been a
code change, you might want to check to see if one or more indexes has
been dropped or become unusable.
db file
scattered read Reference Note# 34558.1
|
log file parallel
write
|
LGWR
waits while writing contents of the redo log buffer cache to the
online log files on disk
I/O
wait on sub system holding the online redo log
files
|
Reduce
the amount of redo being generated
Do not
leave tablespaces in hot backup mode for longer than necessary
Do not
use RAID 5 for redo log files
Use
faster disks for redo log files
Ensure
that the disks holding the archived redo log files and the online
redo log files are separate so as to avoid
contention
Consider using NOLOGGING or UNRECOVERABLE options in SQL
statements
|
|
log file
sync
|
Oracle
foreground processes are waiting for a COMMIT or ROLLBACK to
complete
|
Tune
LGWR to get good throughput to disk eg: Do not put redo logs on
RAID5
Reduce
overall number of commits by batching transactions so that there
are fewer distinct COMMIT operations
|
High
Waits on log file sync Note# 125269.1
Tuning
the Redolog Buffer Cache and Resolving Redo Latch
Contention
|
buffer busy
waits
|
Buffer
busy waits are common in an I/O- bound Oracle
system.
The two
main cases where this can occur are:
Another
session is reading the block into the buffer
Another
session holds the buffer in an incompatible mode to our request
These waits
indicate read/read, read/write, or write/write
contention.
The
Oracle session is waiting to pin a buffer. A buffer must be pinned
before it can be read or modified. Only one process can pin a
buffer at any one time.
This
wait can be intensified by a large block size as more rows can be contained
within the block
This
wait happens when a session wants to access a database block in the
buffer cache but it cannot as the buffer is
"busy
It is
also often due to several processes repeatedly reading the same blocks
(eg: if lots of people scan the same index or data block)
|
The
main way to reduce buffer busy waits is to reduce the total I/O on the
system
Depending on the block type, the actions will
differ
Data
Blocks
Eliminate HOT blocks from the application.
Check
for repeatedly scanned / unselective indexes.
Try rebuilding the object with a higher PCTFREE so that you
reduce the number of rows per block.
Check for 'right- hand-indexes'
(indexes that get inserted into at the same point by many
processes).
Increase INITRANS and MAXTRANS
and reduce PCTUSED This will make the table less dense .
Reduce
the number of rows per block
Segment
Header
Increase of number of FREELISTs and FREELIST
GROUPs
Undo
Header
Increase the number of Rollback
Segments
|
A
process that waits on the buffer busy waits event publishes the
reason code in the P3 parameter of the wait event.
The
Oracle Metalink note # 34405.1
provides a table of reference - codes 130 and 220 are the most
common.
Resolving intense and random buffer busy wait performance
problems. Note# 155971.1
|
free buffer
waits
|
This
means we are waiting for a free buffer but there are none available in
the cache because there are too many dirty buffers in the cache
Either
the buffer cache is too small or the DBWR is slow in writing modified
buffers to disk
DBWR is
unable to keep up to the write requests
Checkpoints happening too fast – maybe due to high database activity and
under-sized online redo
log files
Large
sorts and full table scans are filling the cache with modified blocks faster
than the DBWR is able to
write to disk
If
the number of dirty buffers
that need to be written to
disk is larger than the number that DBWR can write per batch, then
these waits can be
observed
|
Reduce
checkpoint frequency -
increase the size of the online redo log files
Examine
the size of the buffer cache – consider increasing the size of the
buffer cache in the SGA
Set disk_asynch_io = true set
If not using asynchronous I/O
increase the number of db writer
processes or dbwr slaves
Ensure
hot spots do not exist by spreading datafiles over disks and disk
controllers
Pre-sorting or reorganizing data can
help
|
|
enqueue
waits
|
This
wait event indicates a wait for a lock that is held by another session
(or sessions) in an incompatible mode to the requested
mode.
TX
Transaction Lock
Generally due to table or application set up issues
This
indicates contention for row-level lock. This wait occurs when a transaction
tries to update or delete rows that are currently locked by another
transaction.
This
usually is an application issue.
TM DML
enqueue lock
Generally due to application issues,
particularly if foreign key constraints have
not been indexed.
ST lock
Database actions that modify the UET$ (used
extent) and FET$ (free extent) tables require
the ST lock, which includes actions such as
drop, truncate, and coalesce.
Contention for the ST lock indicates there are
multiple sessions actively performing
dynamic disk space allocation or deallocation
in dictionary managed tablespaces
|
Reduce
waits and wait times
The
action to take depends on the lock type which is causing the most
problems
Whenever you see an enqueue wait event for the TX
enqueue, the first step is to find out who the blocker is and if
there are multiple waiters for the same resource
Waits
for TM enqueue in Mode 3 are primarily due to unindexed foreign key
columns.
Create
indexes on foreign keys <
10g
Following are some of the things you can do to minimize ST
lock contention in your database:
Use locally managed tablespaces
Recreate all temporary tablespaces using the CREATE
TEMPORARY TABLESPACE TEMPFILE… command.
|
Maximum
number of enqueue resources that can be concurrently locked is controlled
by the ENQUEUE_RESOURCES parameter.
Tracing sessions waiting on an enqueue Note# 102925.1
|
Cache buffer chain
latch
|
This
latch is acquired when searching
for
data blocks Buffer cache is a chain of blocks and each chain is
protected by a child latch when it needs to be
scanned
Hot
blocks are another common cause of cache buffers chains latch
contention. This happens when multiple sessions repeatedly
access one or more blocks
that are protected by the same child cache buffers chains
latch.
SQL statements
with high BUFFER_GETS (logical reads) per EXECUTIONS are the main
culprits
Multiple concurrent sessions are executing the same
inefficient SQL that is going after the same data set
|
Reducing contention for the cache buffer chains latch will
usually require reducing logical I/O rates by tuning and
minimizing the I/O requirements of the SQL involved. High I/O rates
could be a sign of a hot block (meaning a block highly
accessed).
Exporting the table, increasing the PCTFREE significantly,
and importing the data. This minimizes the number of rows per block, spreading them over
many blocks. Of course, this is at the expense of storage and full
table scans operations will be slower
Minimizing the number of records per block in the
table
For
indexes, you can rebuild them with higher PCTFREE values, bearing
in mind that this may increase the height of the
index.
Consider reducing the block size
Starting in
Oracle9i Database, Oracle supports multiple block sizes. If the
current block size is 16K, you may move the table or recreate the
index in a tablespace with an 8K block size. This too will
negatively impact full table scans operations. Also, various block sizes increase management
complexity.
|
The
default number of hash latches is usually 1024
The
number of hash latches can be adjusted by the parameter
_DB_BLOCKS_HASH_LATCHES
|
Cache buffer LRU chain
latch
|
Processes need to get this latch when they need to move
buffers based on the LRU block replacement policy in the buffer
cache
The
cache buffer lru chain latch is acquired in order to introduce
a new block into the buffer cache and when writing a buffer back
to disk, specifically when trying to scan the LRU (least
recently used) chain containing all the dirty blocks in the buffer
cache.
Competition for the cache buffers lru chain
latch is symptomatic of intense buffer cache
activity caused by inefficient SQL
statements. Statements that repeatedly scan
large unselective indexes or perform full
table scans are the prime culprits.
Heavy contention for this latch is generally
due to heavy buffer cache activity which
can be caused, for example, by:
Repeatedly scanning large unselective
indexes
|
Contention in this latch can be avoided implementing
multiple buffer pools or increasing the number of LRU latches with
the parameter
DB_BLOCK_LRU_LATCHES (The default value is generally sufficient for most
systems).
Its
possible to reduce contention for the cache buffer lru chain
latch by increasing the size of the buffer cache and thereby
reducing the rate at which new blocks are introduced into the
buffer cache
|
|
Direct Path
Reads
|
These
waits are associated with direct read operations which read data directly
into the sessions PGA bypassing the SGA
The
"direct path read" and "direct path write" wait events are related to
operations that are performed in PGA like sorting, group by operation,
hash join
In DSS
type systems, or during heavy batch periods, waits on "direct path read"
are quite normal
However, for an OLTP system these waits are
significant
These
wait events can occur during sorting operations which is not surprising as
direct path reads and writes usually occur in connection with temporary
tsegments
SQL
statements with functions that require sorts, such as ORDER BY, GROUP BY,
UNION, DISTINCT, and ROLLUP, write sort
runs to the temporary tablespace when the input size is larger than the
work area in the PGA
|
Ensure
the OS asynchronous IO is configured correctly.
Check
for IO heavy sessions / SQL and see if the amount of IO can be reduced.
Ensure
no disks are IO bound.
Set
your PGA_AGGREGATE_TARGET to appropriate value (if the parameter
WORKAREA_SIZE_POLICY = AUTO)
Or set
*_area_size manually (like sort_area_size and then you have to set
WORKAREA_SIZE_POLICY = MANUAL
Whenever possible use UNION ALL instead of UNION, and where applicable use HASH JOIN instead of
SORT MERGE and NESTED LOOPS instead of HASH JOIN.
Make sure the optimizer selects the
right driving table. Check to see if the composite index’s columns can be
rearranged to match the ORDER BY clause to avoid sort entirely.
Also,
consider automating the SQL work areas using PGA_AGGREGATE_TARGET in
Oracle9i Database.
|
Default
size of HASH_AREA_SIZE is
twice that of SORT_AREA_SIZE
Larger
HASH_AREA_SIZE will influence optimizer to go for hash joins instead of
nested loops
Hidden
parameter DB_FILE_DIRECT_IO_COUNT can impact the direct path read
performance.It sets the maximum I/O buffer size of direct read and write
operations. Default is 1M in 9i
|
Direct Path Writes
|
These
are waits that are associated with direct write operations that write
data from users’ PGAs to data files or temporary
tablespaces
Direct
load operations (eg: Create Table as Select (CTAS) may use this)
Parallel DML operations
Sort IO
(when a sort does not fit in memory
|
If the file indicates a temporary tablespace check for
unexpected disk sort operations.
Ensure <Parameter:DISK_ASYNCH_IO> is TRUE . This
is unlikely to reduce wait times from the wait event timings but
may reduce sessions elapsed times (as synchronous direct IO is not
accounted for in wait event timings).
Ensure the OS asynchronous IO is configured correctly.
Ensure no disks are IO bound
|
|
Latch Free
Waits
|
This
wait indicates that the process is waiting for a latch that is
currently busy (held by another process).
When
you see a latch free wait event in the V$SESSION_WAIT view, it
means the process failed to obtain the latch in the
willing-to-wait mode after spinning _SPIN_COUNT times and went to
sleep. When processes compete heavily for latches, they will also
consume more CPU resources because of spinning. The result is a
higher response time
|
If the
TIME spent waiting for latches is significant then it is best to
determine which latches are suffering from contention.
|
A latch
is a kind of low level lock.
Latches
apply only to memory structures in the SGA. They do not apply to
database objects. An Oracle SGA has many latches, and they exist
to protect various memory structures from potential
corruption by concurrent
access.
The
time spent on latch waits is an effect, not a cause; the cause is that
you are doing too many block gets, and block gets require
cache buffer chain latching
|
Library cache
latch
|
The
library cache latches protect the cached SQL statements and
objects definitions held in the library cache within the shared
pool. The library cache latch must be acquired in order to add
a new statement to the library cache
Application is making heavy use of literal SQL- use of bind
variables will reduce this latch considerably
|
Latch
is to ensure that the application is reusing as much as possible SQL
statement representation. Use bind variables whenever possible in
the application
You can
reduce the library cache latch hold time by properly setting
the SESSION_CACHED_CURSORS parameter
Consider increasing shared pool
|
Larger
shared pools tend to have long free lists and processes that
need to allocate space in them must spend extra time scanning the long
free lists while holding the shared pool latch
if your
database is not yet on Oracle9i Database, an oversized
shared pool can increase the contention for the shared pool
latch.
|
Shared pool
latch
|
The
shared pool latch is used to protect critical operations when
allocating and freeing memory in the shared pool
Contentions for the shared pool and library
cache latches are mainly due to intense hard parsing. A hard parse applies to
new cursors and cursors that are aged out and must be
re-executed
The
cost of parsing a new SQL statement is expensive both in terms of
CPU requirements and the number of times the library cache and
shared pool latches may need to be acquired and
released.
|
Ways to
reduce the shared pool latch are, avoid hard parses when
possible, parse once, execute many.
Eliminating literal SQL is also useful to avoid the shared
pool latch. The size of
the shared_pool and use of MTS (shared server option) also greatly
influences the shared pool latch.
The
workaround is to set the initialization parameter CURSOR_SHARING
to FORCE. This allows statements that differ in literal values but are otherwise identical
to share a cursor and therefore reduce latch contention, memory
usage, and hard parse.
|
<Note
62143.1>
explains how to identify and correct problems with the shared
pool, and shared pool latch.
|
Row cache objects
latch
|
This
latch comes into play when user processes are attempting to
access the cached data dictionary values.
|
It is
not common to have contention in this latch and the only way to reduce
contention for this latch is by increasing the size of the shared
pool (SHARED_POOL_SIZE).
Use Locally Managed tablespaces for your application
objects especially indexes
Review and amend your database logical design , a good
example is to merge or decrease the number of indexes on tables
with heavy inserts
|
Configuring the library cache to an acceptable size usually
ensures that the data
dictionary cache is also properly sized. So tuning Library
Cache will tune Row Cache indirectly
|