While user commit/rollback, commit will trigger log file sync, that is lgwr writing log buffer(memory) to log file(disk), before the commit/rollback completing, user will see the
log file sync wait event.
Quote from MOS ID:1376916.1:
At the time of commit, the user session will post LGWR to write the log buffer (containing the current unwritten redo, including this session's redo records) to the redo log file. Once LGWR knows that its write requests have completed, it will post the user session to notify it that this has completed. The user session waits on 'log file sync' while waiting for LGWR to post it back to confirm all redo it generated have made it safely onto disk.
The time between the user session posting the LGWR and the LG's posting the user after the write has completed is the wait time for 'log file sync' that the user session will show.
For diagnosing log file sync issue, we can:
- Check the AWR report and dynamic views
- Check the alert log, alert log shows how frequently redo log file are switching, recommended time for redo log switching is 15 mins to 30 mins
- As of 10.2.0.4, Oracle will write the warning message("log write elapsed time xxms, xxKB") to LG's trace file when writing takes more then 500ms, if the size is very small, we can consider the I/O is poor
- Check the hidden parameter
_use_adaptive_log_file_sync, see its value is true or false
log file sync and
log file parallel write are approximate equal, we can consider the Disk I/O is poor.
The average wait time on
log file sync on HDD should be less then 20ms.
Checking LGWR wait time:
select event, state, seq#, seconds_in_wait, p1,p2
where program like '%LGWR%';
In AWR report, check the statistics of
Log file parallel write per second.
- Identify any bottle neck of I/O subsystems, including bandwidth
- Avoid placing redo log in RAID 5/RAID 6 disk array
- Avoid placing redo log in SSD
- Identify any other processes are occupied too much IO operations
Below example shows slow disk IO, we can see that, the avg wait ms is over 20ms.
TOP 10 wait event:
Log file parallel write:
log file sync are greater then
log parallel write and
log parallel write is within the normal time, it doesn't mean I/O subsystem works fine, peak IO may occur during the past snapshot time, if nmon or oswatcher is implemented, there should be some clues with peak IO statistics.
In the AWR or Statspack report, if the average user calls per commit/rollback calculated as "user calls/(user commits+user rollbacks)" is less than 30, then commits are happening too frequently:
Excessive commit(5.2 user calls/user commit):
r column of vmstat output is higher than CPU numbers, we can consider excessive consuming. Sometimes the IO response time is slower because of high CPU usages.
In AWR report, some other wait events maybe company with
log file sync. As shown previous top 10 event,
log buffer space and
log file switch also on the top 10 event.
From 18.104.22.168, underscore parameter
_use_adaptive_log_file_sync will affect log write performance because it changes the default behavior of LGWR. See Hidden parameter '_use_adaptive_log_file_sync'
v$log view shows how much size of the block size, the default size is 512 kb, on a busy oltp system, modify the size to 2M maybe is a better option.
First, if log file size is too small, alert.log will show lots of
log and AWR report may show
log file switch (checkpoint incomplete) wait event it the top 10 wait events.
Below AWR report shows excessive log switches, Oracle recommend that a log switch should occur at most once every 15 to 20 minutes.
Under this circumstance, resize redo log file size is a better choice(current size is the default size:50M).
High Waits for 'Log File Sync': Known Issue Checklist for 11.2 (Doc ID 1548261.1)
Troubleshooting: 'Log file sync' Waits (Doc ID 1376916.1)
Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql)