Tony’s Oracle Tips

August 24, 2011

ACID and TPC-C

Filed under: ACID — tonyhasler @ 7:53 pm

I saw Tom Kyte yesterday and he stated that in his opinion the ACID issue had no bearing on TPC-C tests.  Well, I beg to differ.  The current specification can be viewed in full here but here are the three key parts of the specification.

First from section 3.4 on isolation requirements:

P1 (“Dirty Read”): Database transaction T1 modifies a data element. Database transaction T2 then reads that data element before T1 performs a COMMIT. If T1 were to perform a ROLLBACK, T2 will have read a value that was never committed and that may thus be considered to have never existed.

A little ater on it says that dirty reads are not allowed, as you might expect.

Isolation Level P0 P1 P2 P3
0 Not Possible Possible Possible Possible
1 Not Possible Not Possible Possible Possible
2 Not Possible Not Possible Not Possible Possible
3 Not Possible Not Possible Not Possible Not Possible

skipping ahead a bit…

2. {Ti, Tn}1 ≤ i ≤ 4 P0, P1, P2 Ti Level 2 isolation for New-Order, Payment, Delivery, and Order-Status transactions relative to any arbitrary transaction.

Now, in section 3.5.2 we have the committed property definition:

A transaction is considered committed when the transaction manager component of the system has either written the log or written the data for the committed updates associated with the transaction to a durable medium.

Therefore, Oracle violates the isolation property because data not yet committed has been made visible to other transactions and “Dirty Reads” are not prevented.  Specifically, the read-only order status transaction may see changes made by the other read-write transactions listed above.

August 22, 2011

Why is ACID important?

Filed under: ACID — tonyhasler @ 9:34 am

As the updated version of my last post stated, what Jonathan Lewis described as a “truly stunning observation” was uncovered: depending on your point of view, Oracle violates either the “I” in ACID or the “D”.  I won’t repeat the technical description of the issue.  This is already covered 
here and here.

The purpose of this blog entry is to explain why this is such a critical issue.  There are three main areas I want to touch on (in increasing order of importance):

– Giving humans invalid data
– Giving computers invalid data
– Legal and reputational issues

Let us begin with the human case.  You call your broker late at night with instructions:  “I hear that Oracle is not ACID.  Please sell my 10 million pounds worth of Oracle stock!”.  Your broker says, “OK, I have submitted your instructions for processing.  It might take a few minutes”.  You say…”I’ll hold.  I am going on vacation tomorrow and want to make sure that this is done before I leave”.  Your broker replies: “Let me query the system to check……oh yes your instructions are definitely in our system.  Your transaction to sell Oracle at best will be actioned tomorrow morning”.  You feel relieved.  However, the system crashed after the broker was sent confirmation that your transaction was processed but before the transaction was on disk.  Your request to sell Oracle stock is lost and you return from vacation to find that no record exists of your request.

This sort of scenario has probably happened a few times as a result of Oracle’s lack of ACID.  However, nobody would have known why.  On the other hand, it will be a very rare event.  Even an entire call centre will only make at most a few queries per second and most of these will be for data that was committed long ago.  It will be a rare event that sees queried data lost on a crash.

If this sort of situation has ever happened to you it is highly unlikely that Oracle’s lack of ACID was the cause.  Bugs in application code, media recovery and human error amongst other causes would account for the vast majority of such mishaps.

However, let us turn our attention to computer to computer communication.  Consider the following situation.  You process some kind of incoming data and place the data in a staging table.  Let us assume that the table is partitioned by hash into 256 segments. You have 256 producer threads that:

– read data from their own partition
– enrich it in a few ways (including adding a primary key generated from a sequence)
– send the enriched data to an AQ queue
– delete the data from the staging table
– commit.
– loop

Meanwhile, you have another group of sessions that read messages from the AQ queue and send data to a remote system that you have no control over.  Your consumer threads:

– Destructively read a message from the AQ queue
– Insert the data into the remote system (using the remote systems transaction system)
– Commit the remote transaction
– Commit the local transaction.
– loop

You are a smart guy or gal and realise that there is an issue with your code:  the system may crash after the remote transaction has committed and before the local one does.  This means that after a recovery the message will be back in AQ and will be sent again.  However, there is a unique primary key in the AQ message and the remote system is built in such a way as to detect these “possible duplicates” and discards the message when it is resent.

Wonderful, but because of the failure of ACID it is possible that not only the uncommitted consumers transaction is lost on a crash but also the “committed” producers transaction!  Now on recovery, the message would not be in the AQ queue but back in the staging table.  Now when the message is resent it will have a brand new primary key and the remote system will not detect the duplicate.

There are lots of systems where the log writer is the bottleneck, perhaps because of replication to a DR site or for other reasons.  In these sorts of heavily loaded systems it is virtually guaranteed that at the time of a crash there will be some transactions that are sitting the log buffer waiting to be written out to disk.  When automated tasks are processing transactions immediately after that have been committed recovery will create a problem nearly all the time.

What is so bad about this situation is that there is no easy way, short of a complete redesign, to fix the application to workaround this problem as Oracle provides no way for a session (other than the committing one) to wait for the data it is seeing to be on disk.  Sure, the consumer threads could wait a few seconds before sending the data to the remote host but this would kill performance.

It is the potentially high rate that consumer threads can read “undurable data” that makes the probability of an error high in these situations.

The final, and most important, area is the potential legal and reputation issues that this topic raises.  It will be very difficult for Oracle to claim that this is a documentation error and amend same to say that Oracle is NOT ACID.  On the other hand, it is very difficult to see how they can treat this as a simple “bug” as it is a fundamental part of the design of the product and any “bug fix” would seem to have massive performance implications.

August 4, 2011

‘log file sync’ and ‘log file parallel write’ – part 3

Filed under: ACID — tonyhasler @ 5:25 pm

In the third and final part of this series on ‘log file sync’ and ‘log file parallel write’ (see here for part 1 and part 2) I want to talk about the difference between a commit statement issued from PL/SQL and a commit statement issued from SQL.

When I first wrote this blog the intention was to discuss the few circumstances when a COMMIT from a PL/SQL might impact ACID properties and to explain why 99% of the time it didn’t.  At the time I wrote this blog the assumption was that ACID properties could confidently be assigend to standard COMMIT statements executed from SQL (e.g. SQL*PLUS, JDBC, etc.).  As you will see in the comments to this blog this is far from the case.  I have annoted the false assumption in bold.

The first thing to get your head round is that PL/SQL language statements are different from SQL statements. When you issue a COMMIT statement from SQL*PLUS, from Java, C, or almost all other languages you need to refer to the “SQL Language Reference” manual for syntax and semantics. When you issue a COMMIT statement (or one of the other seven SQL statements with direct PL/SQL equivalents) from PL/SQL, however, you need to look in “PL/SQL Language Reference” manual for syntax and semantics. Sure, they do very similar things but there are subtle differences.

In the 11.2 SQL Language Reference manual, you will see this statement:

If you omit [the write clause], then the behavior of the commit operation is controlled by the COMMIT_LOGGING and COMMIT_WAIT initialization parameters, if they have been set.

The COMMIT_LOGGING and COMMIT_WAIT parameters replace the COMMIT_WRITE parameter used in previous releases. Unfortunately, the documentation for COMMIT_WRITE is at best incomplete and at worst wrong. To see what the default semantics are we need to look at the documentation for the deprecated COMMIT_WRITE parameter. Here we can see:

If this parameter is not explicitly specified, then database commit behaviour defaults to writing commit records to disk before control is returned to the client.

We also see this line:

Be aware that the NOWAIT option can cause a failure that occurs after the database receives the commit message, but before the redo log records are written. This can falsely indicate to a transaction that its changes are persistent. Also, it can violate the durability of ACID (Atomicity, Consistency, Isolation, Durability) transactions if the database shuts down unexpectedly.

This is poor English but obviously sounds scary. Most programmers would and should avoid using NOWAIT in their programs or overriding the default session or system initialisation settings. However, let us look at the documentation in the PL/SQL reference manual:

The default PL/SQL commit behaviour for nondistributed transactions is BATCH NOWAIT if the COMMIT_LOGGING and COMMIT_WAIT database initialization parameters have not been set.

Whoa! Suddenly the behaviour that you were warned might cause a violation of the durability of durability (that is what it says isn’t it?) is now the default behaviour! Why is something that is so dangerous from SQL suddenly safe from PL/SQL and why?

Well, as we shall see it isn’t always safe. But to understand what is safe and what is not safe let me try to expand on the scary documentation from the SQL language reference manual. Consider this piece of pseudo-code from some non-PL/SQL program:

   do some DML;
   commit;
   put_on_screen('Transaction complete') ;

Imagine that the commit returned immediately without waiting for the data to be recorded on disk. You might tell the user that the transaction committed and then the database might crash causing all the DML to be lost. You have misled the user. What if, however, the ‘user’ was another computer system or database?

    get a message from a remote system;
    do some DML;
    commit;
    send confirmation of receipt of message to remote system;

Well, a human might (or might not) realise that there was a crash around the time his message was displayed saying that his transaction completed but a remote computer system will not be so paranoid. It is highly unlikely that such a remote application would retransmit a message if it receives an explicit positive acknowledgement to the first transmission.

So this is why the default behaviour for SQL is to wait until the transaction details are recorded in the redo log on disk; it is to ensure that if the application relays the fact that the transaction is committed to a third party (human or computer) that the transaction isn’t later discovered to be ‘uncommitted’ so to speak. This is the “durability” part of ACID.

Now that we have that sorted out let us turn our attention to PL/SQL. First of all, the documentation here is “incomplete”. Let me try and expand:

  • For all but the last commit statement executed before PL/SQL returns control to the caller the default PL/SQL commit behaviour for nondistributed transactions is BATCH NOWAIT if the COMMIT_LOGGING and COMMIT_WAIT database initialization parameters have not been set.
  • The default behaviour for the last commit statement executed before PL/SQL returns control to the caller is to trigger a wait at the point control is returned to the caller.
  • Sometimes you will experience a ‘log file sync’ wait event even if you explicitly or implicitly specify NOWAIT.

Ok, so the semantics are not only apparently dangerous but also complicated. Why?

Well let us take an example PL/SQL block and add some measurements of ‘log file sync’wait events.  Let us begin by creating a test table:

CREATE TABLE T1 (dummy varchar2(100)) ;

Now our test:

 SELECT total_waits
  FROM v$session_event
 WHERE sid = USERENV ('SID') AND event = 'log file sync';

BEGIN
   INSERT INTO t1 (dummy)
        VALUES ('Transaction 1');

   COMMIT;

   INSERT INTO t1 (dummy)
        VALUES ('Transaction 2');

   COMMIT;
END;

SELECT total_waits
  FROM v$session_event
 WHERE sid = USERENV ('SID') AND event = 'log file sync';

We see only one ‘log file sync’. This is the one triggered at the end of the block by the second commit. The first commit triggered no wait. However, no risk of a consistency problem arises. This is because:

  • Locking will prevent either inserted row being visible to outside sessions until the redo is actually on disk and

The statement above is how databases in general are supposed to work.  Oracle does not work this way, however: commits from PL/SQL or SQL are visible to other sessions before they are actually on disk.  See the comments below for more discussion.

  • The second transaction will have a higher SCN than the first and in the event of a crash Oracle will ensure that if the first transaction needs to be rolled back the second will be also.

This can be a bit tricky to get your head round but it is a legitimate optimisation that I believe (historians are welcome to correct me) was first introduced in IMS Fastpath many decades ago and has been copied by Oracle and other database systems subsequently.

When I have explained this to other DBAs and programmers over the years I have known when the point has been understood when I get the question:

“Yes, but what if I communicate the commit outside of the PL/SQL block by some means other than updating the database”?

This question indicates a grasp of the key concept. Unfortunately, I have always answered this question incorrectly! Let us get one easy case out of the way first:

BEGIN
   INSERT INTO t1 (dummy)
        VALUES ('Transaction 1');

   COMMIT;
   DBMS_OUTPUT.put_line ('Transaction committed');
END;

This is actually safe because the results of DBMS_OUTPUT.PUT_LINE are invisible until after the PL/SQL block returns. So if the supposedly committed transaction is subsequently rolled back nobody will ever have known that the commit statement returned to the PL/SQL block in the first place.

But what about this:

SELECT total_waits
  FROM v$session_event
 WHERE sid = USERENV ('SID') AND event = 'log file sync';

DECLARE
   filehandler   UTL_FILE.file_type;
BEGIN
   INSERT INTO t1 (dummy)
        VALUES ('Transaction 1');

   COMMIT;

   INSERT INTO t1@dblink (dummy)
        VALUES ('Transaction 2');

   COMMIT;
   filehandler :=
      UTL_FILE.fopen (
                      'DATA_PUMP_DIR',
                      'test_file.txt',
                      'W'
                     );
   UTL_FILE.putf (
                  filehandler,
                  'A transaction committed\n'
                 );
   UTL_FILE.fclose (filehandler);

   INSERT INTO t1 (dummy)
        VALUES ('Transaction 3');

   COMMIT;
END;

SELECT total_waits
  FROM v$session_event
 WHERE sid = USERENV ('SID') AND event = 'log file sync';

You might have to run this a couple of times as there is sometimes an ‘extra’ ‘log file sync’ wait event but most of the time there will only be one wait for the entire PL/SQL block. The use of a database link doesn’t mean there is a distributed transaction because there are no changes to the local database in the transaction. If the local database crashes neither the update to the text file nor the update to the remote database will be “rolled back” so an inconsistency can arise after all.

Until I started to prepare for writing this blog I had always assumed that whenever a remote database was updated or a package such as UTL_FILE or UTL_HTTP was called then a ‘log file sync’ would occur at that point. This is such a fundamental requirement of database consistency, it never occurred to me to check. Not only that, but the PL/SQL engine knows enough to do this easily. However, even in 11.2 this is not the case. You can protect against these inconsistencies by changing your PL/SQL COMMIT statements (the last being optional) to ‘COMMIT WORK WRITE WAIT’ statements.

So when is it safe to use COMMIT WORK WRITE NOWAIT (either explicitly or not)?

  • If the only thing you do between ‘commit number one’ and ‘commit number two’ is issue queries and/or updates to the local database it is safe for ‘commit number one’ to use NOWAIT. This is true both for PL/SQL and SQL.
  • If you commit a transaction and then your program terminates completely then it is safe to use NOWAIT. This is true for both SQL and PL/SQL.
  • If you commit a transaction and then ‘communicate with the outside world’ then you should wait (explicitly in PL/SQL) if you care about consistency.

Create a free website or blog at WordPress.com.