As the updated version of my last post stated, what Jonathan Lewis described as a “truly stunning observation” was uncovered: depending on your point of view, Oracle violates either the “I” in ACID or the “D”. I won’t repeat the technical description of the issue. This is already covered
here and here.
The purpose of this blog entry is to explain why this is such a critical issue. There are three main areas I want to touch on (in increasing order of importance):
- Giving humans invalid data
– Giving computers invalid data
– Legal and reputational issues
Let us begin with the human case. You call your broker late at night with instructions: “I hear that Oracle is not ACID. Please sell my 10 million pounds worth of Oracle stock!”. Your broker says, “OK, I have submitted your instructions for processing. It might take a few minutes”. You say…”I’ll hold. I am going on vacation tomorrow and want to make sure that this is done before I leave”. Your broker replies: “Let me query the system to check……oh yes your instructions are definitely in our system. Your transaction to sell Oracle at best will be actioned tomorrow morning”. You feel relieved. However, the system crashed after the broker was sent confirmation that your transaction was processed but before the transaction was on disk. Your request to sell Oracle stock is lost and you return from vacation to find that no record exists of your request.
This sort of scenario has probably happened a few times as a result of Oracle’s lack of ACID. However, nobody would have known why. On the other hand, it will be a very rare event. Even an entire call centre will only make at most a few queries per second and most of these will be for data that was committed long ago. It will be a rare event that sees queried data lost on a crash.
If this sort of situation has ever happened to you it is highly unlikely that Oracle’s lack of ACID was the cause. Bugs in application code, media recovery and human error amongst other causes would account for the vast majority of such mishaps.
However, let us turn our attention to computer to computer communication. Consider the following situation. You process some kind of incoming data and place the data in a staging table. Let us assume that the table is partitioned by hash into 256 segments. You have 256 producer threads that:
- read data from their own partition
– enrich it in a few ways (including adding a primary key generated from a sequence)
– send the enriched data to an AQ queue
– delete the data from the staging table
Meanwhile, you have another group of sessions that read messages from the AQ queue and send data to a remote system that you have no control over. Your consumer threads:
- Destructively read a message from the AQ queue
– Insert the data into the remote system (using the remote systems transaction system)
– Commit the remote transaction
– Commit the local transaction.
You are a smart guy or gal and realise that there is an issue with your code: the system may crash after the remote transaction has committed and before the local one does. This means that after a recovery the message will be back in AQ and will be sent again. However, there is a unique primary key in the AQ message and the remote system is built in such a way as to detect these “possible duplicates” and discards the message when it is resent.
Wonderful, but because of the failure of ACID it is possible that not only the uncommitted consumers transaction is lost on a crash but also the “committed” producers transaction! Now on recovery, the message would not be in the AQ queue but back in the staging table. Now when the message is resent it will have a brand new primary key and the remote system will not detect the duplicate.
There are lots of systems where the log writer is the bottleneck, perhaps because of replication to a DR site or for other reasons. In these sorts of heavily loaded systems it is virtually guaranteed that at the time of a crash there will be some transactions that are sitting the log buffer waiting to be written out to disk. When automated tasks are processing transactions immediately after that have been committed recovery will create a problem nearly all the time.
What is so bad about this situation is that there is no easy way, short of a complete redesign, to fix the application to workaround this problem as Oracle provides no way for a session (other than the committing one) to wait for the data it is seeing to be on disk. Sure, the consumer threads could wait a few seconds before sending the data to the remote host but this would kill performance.
It is the potentially high rate that consumer threads can read “undurable data” that makes the probability of an error high in these situations.
The final, and most important, area is the potential legal and reputation issues that this topic raises. It will be very difficult for Oracle to claim that this is a documentation error and amend same to say that Oracle is NOT ACID. On the other hand, it is very difficult to see how they can treat this as a simple “bug” as it is a fundamental part of the design of the product and any “bug fix” would seem to have massive performance implications.