Tony’s Oracle Tips

July 24, 2011

‘log file sync’ and ‘log file parallel write’ – part 2

Filed under: Uncategorized — tonyhasler @ 8:24 pm

I mentioned at the very end of part 1 that ‘log file parallel write’ wait times are in fact unlikely to be close to ‘log file sync’ wait times.  There have been many questions and blog entries on this topic over the years.  One common starting point seems to be Metalink article 34592.1.  Another good article is this one:

http://kevinclosson.wordpress.com/2007/07/21/manly-men-only-use-solid-state-disk-for-redo-logging-lgwr-io-is-simple-but-not-lgwr-processing/.

However, it seems that everybody has missed a fundamental complication of the changes in commit processing introduced in 10g.  The point has probably been missed because it is normally very hard to demonstrate.

Oracle 9i and below Oracle had the concept of a ‘log buffer’.  You would be in good company if you thought that later versions had a log buffer as well.  After all, there is an initialisation parameter LOG_BUFFER that specifies how big it is.

In reality, Oracle now uses a combination of ‘private redo strands’ and ‘public redo strands’.

Private redo strands do not exist in instances within a RAC cluster.  In a non-RAC environment each transaction gets its own redo strand (subject to availability).  These strands are used in combination with in-memory undo (IMU) to reduce the amount of data that is logged but more importantly to reduce latching conflicts; data is normally copied from a private redo strand only at commit time.  Data will also be copied out if the private redo strand fills up before a commit is issued.  These private redo strands are allocated from shared pool.

RAC or no RAC there is also a minimum of two public redo strands as well.  In larger systems there may be more (number of CPUs divided by 16).  These public redo strands also contribute to latch reduction.  The number of these strands can be changed by use of the hidden parameter “_log_parallelism_max”.

In relatively quiet systems only one public redo strand may be in use but even with as few as two concurrent sessions busy generating redo you can see the second redo strand being used.  When that happens, LGWR writes out the data in the public redo strands sequentially!

How do I know this?

Well, I was performing a rather artificial insert-only performance test on one instance of an 11.2 RAC database on a 64-CPU system with a suspected faulty I/O subsystem (don’t ask).  LGWR was spending 90% of its time in ‘log file parallel write’ (LFPW) and about 10% of the time on the CPU.  I had taken the advice in Kevin’s article above and “reniced” LGWR so I was fairly sure that LGWR rescheduling was not a big issue: throughput was limited by the I/O subsystem.

Nevertheless, ‘log file sync’ (LFS) wait events averaged 4 or 5 times the LFPW times despite the average CPU utilisation only being about 25%.  Theoretically, one would expect that the average LFS time would be somewhat more than 50% higher than LFPW in a system that is limited by LGWR I/O:  if LGWR has just started an I/O at the time that you commit then you have to wait for that I/O to finish as well as your own that follows.  If you are luckier LGWR may just finish an I/O as you commit and now you just have to wait for your own I/O. This gives an average LFS/LFPW ratio of 1.5.  Given the fact that you are still in LFS when LGWR is on the CPU doing its post processing I thought that LFS should be a little higher than 50% above LFPW but certainly not 300% or 400% higher.

Explanations revolving around ‘CPU storms’ didn’t seem plausible so I had to dig deeper.  Well, by ‘digging deeper’ I mean a) posting a question on OTN and asking for help from ‘knowledgeable friends’.  When that failed b) guessing.  I eventually guessed correctly when I set “_log_parallelism_max” to 1 (it was previously 4 on my 64 CPU system).

The faulty I/O subsystem had (has) a characteristic that throughput is several times higher when the I/O size is larger.  The reason for this is not yet known.  However, when the initialisation parameter change was made:

  • The average size of the I/O increased several times
  • The (faulty) I/O subsystem throughput increased several fold
  • The LFS time came back down to just over 50% of the LFPW time.
  • LGWR started waiting on the idle event ‘rdbms ipc message’; the system was no longer limited by I/O.
  • Latching increased thus proving that the long term solution was to fix the I/O subsystem not make unsupported changes to initialisation parameters!

I then set about trying to devise some kind of experiment that others can repeat to verify my conclusions.  The problem is that the only way to get more than one public redo strand in play is to have sufficient redo being generated that a redo allocation latch collision occurs.  You can easily generate this scenario.  You can also prove that the number of transactions being committed each write (redo size/commit size) is much less than the number of processes in LFS but I can’t think of a way to prove that this discrepancy is caused by I/O serialisation without a faulty I/O subsystem.  Sure you can run a trace and see that there are multiple LFPW events between idle ‘rdbms ipc message’ events but that is hardly conclusive.

Any ideas on how to devise an experiment would be warmly welcomed.  In the meantime I guess you will just have to take my word for it!

‘Log file sync’ and ‘log file parallel write’ – part 1

Filed under: Uncategorized — tonyhasler @ 8:15 pm

Preamble
This is the first part of a three part blog about LGWR. If you want a thorough understanding of LGWR, ‘log file sync’, and ‘log file parallel write’ read on. If you are an Oak Table member you might want to skip to the more advanced topics in parts 2 and 3 (unless you want to point out errors and omissions of course :-)).

You are a programmer or a DBA and you are trying for the first time in your life to analyze an Oracle performance problem. What is the most important concept you need to understand to get started?

Well, there are a number of right answers to this question but rightly or wrongly the ‘The Oracle wait interface’ would be many people’s answer. A very basic approach would be to look at the highest wait event on the system and focus on that. So for example, if you see a lot of ‘db file sequential read’ wait events you might conclude that faster disks might be the answer. You may be a bit smarter and realise that these events are generally associated with indexed access and try and work out what indexes are being accessed and figure out if they (or the queries that use them) can be redesigned.

There are, of course, numerous books and web articles that explain the deficiencies of this type of approach but in many cases it seems to work. The success of this type of basic approach will be quite high in simple systems where performance has been neglected for a long time and there are one or two ‘quick wins’ that can have a significant benefit. In complex systems, or ones that have been properly managed over the years this sort of basic analysis is unlikely to be enough.

Cary Millsap wrote a book with Jeff Holt many years ago entitled ‘Optimizing Oracle Performance’ in which he explained the limitations of any system wide tuning approach. Even though the book was written many years ago, I would say that this is still mandatory reading for any Oracle performance specialist. The next most important book, in my opinion, is a more modern work by Christian Antognini http://antognini.ch/top/ but I digress.

Is ‘LOG FILE SYNC’ (LFS) relevant?

Arguably the single most misleading wait event in the Oracle database is ‘log file sync’ (LFS). Let me explain why. Suppose you have a single client process and it spends 1% of its time in LFS. Would you gain a lot by engaging in a tuning exercise that aimed exclusively at reducing this wait event? Well, obviously the upper limit on the gain you might make is 1% even if you eliminated these events in their entirety; the answer is, therefore, NO. You are unlikely to waste your time though because your Statspack or AWR report will probably not highlight the event in the first place.

On the other hand, suppose you have 10,000 sessions connecting to your database and 300 of these are active at any one time. Suppose that 1% of the active time of each session is spent in LFS would you gain much by tuning the event? The answer is the same as before: NO. However, you might be misled by Statspack or AWR because now the total wait time for the database for LFS will be 3 hours in any one hour! This is because at any one time we have 300 active sessions and on average three of them will be in LFS. That comes out to be 3 hours per hour waiting! So before you conclude that LFS is a major performance issue first check out what the average wait time per active session is. Of course, what you should really do is find out the percentage of wait time for your business critical processes but I digress again. Let us get back to the technicalities of LFS.

‘Log File Sync’ (LFS) versus ‘log file parallel write’ (LFPW)

When a transaction commits it will usually wait for its redo data to be recorded on disk before returning control to the application. That wait is LFS. There are three main exceptions to this:

  • When the commit is issued within a PL/SQL block. In this case the wait will be deferred until the block returns. If the PL/SQL block contains multiple commit statements then only one wait will occur when the block returns.
  • When the commit statement is actually COMMIT WORK NOWAIT. In this case, there is no LFS wait. See part 3 in this series for an explanation of why this is dangerous.
  •  When the session parameter COMMIT_WAIT (COMMIT_WRITE in earlier releases) is set to NOWAIT. This parameter can be set at the instance level but this is dangerous practice and should be set at the session level only with care. See part 3 in this series for an explanation of why this is dangerous.

The waiting client process does not do the I/O itself. LGWR does. When LGWR does the I/O it waits on ‘log file parallel write’ (LFPW). So does this mean that the number of LFPW events should equal the number of LFS events? NO! This is because LGWR employs a tactic known as ‘group write’ and may write out the redo data for multiple transactions and multiple sessions in one go. Suppose you have 5 client processes each of which is waiting for its transaction to complete. These transactions will be waiting for LFS. LGWR will be waiting for LFPW. If the I/O takes 10ms then:

  • The system will accumulate 50ms of time in LFS (10ms for each client process). 5 wait events will be recorded.
  • LGWR will accumulate 10ms. One wait event will be recorded.

That is 6 wait events totalling 60ms clocked up for one 10ms I/O! You see the danger in oversimplifying?

In the simplified example above I have suggested that the average wait time for LFS should be about the same as the average wait time for LFPW as they are both associated with waiting for the redo data to be written to disk. In fact, this is itself a misleading simplification. Onto part 2.

Create a free website or blog at WordPress.com.