Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 14.02.2012 04:57, Dan Ports wrote:
>> Looking over the SSI 2PC code recently, I noticed that I
>> overlooked a case that could lead to non-serializable behavior
>> after a crash.
>>
>> When we PREPARE a serializable transaction, we store part of the
>> SERIALIZABLEXACT in the statefile (in addition to the list of
>> SIREAD locks). One of the pieces of information we record is
>> whether the transaction had any conflicts in or out. The problem
>> is that that can change if a new conflict occurs after the
>> transaction has prepared.
>> I discussed this a bit with Kevin and we agreed that this is
>> important to fix, since it's a false negative that violates
>> serializability. The question is how to fix it. There are a
>> couple of options...
>>
>> The easiest answer would be to just treat every prepared
>> transaction found during recovery as though it had a conflict in
>> and out. This is roughly a one-line change, and it's certainly
>> safe.
Dan, could you post such a patch, please?
>> But the downside is that this is pretty restrictive: after
>> recovery, we'd have to abort any serializable transaction that
>> tries to read anything that a prepared transaction wrote, or
>> modify anything that it read, until that transaction is either
>> committed or rolled back.
>
> +1 for this solution.
+1 for 9.2 and backpatching this; with the notion that we might be
able to do better in some later release. (A TODO entry?)
Should we add anything to the docs to warn people that if they crash
with serializable prepared transactions pending, they will see this
behavior until the prepared transactions are either committed or
rolled back, either by the transaction manager or through manual
intervention?
> Perhaps it would be simpler to add the extra information to the
> commit records of the transactions that commit after the first
> transaction is prepared. In the commit record, you would include a
> list of prepared transactions that this transaction conflicted
> with. During recovery, you would collect those lists in memory,
> and use them at the end of recovery to flag the conflicts in
> prepared transactions that are still in prepared state.
That indeed seems simpler. I'm not even sure that you would need to
build a list and process it at the end; couldn't this be done as the
commit records are replayed? Keep in mind that if the prepared
transaction is not still pending, the information can be safely
ignored, and if it *is* still pending you don't need to know *which*
transaction it had the conflict with, because it will certainly have
committed before the start of any post-recovery transaction.
>> A third option is to observe that the only conflicts *in* that
>> matter from a recovered prepared transaction are from other
>> prepared transactions. So we could have prepared transactions
>> include in their statefile the xids of any prepared transactions
>> they conflicted with at prepare time, and match them up during
>> recovery to reconstruct the graph. This is a middle ground
>> between the other two options. It doesn't require modifying the
>> statefile after prepare. However, conflicts *out* to non-prepared
>> transactions do matter, and this doesn't record those, so we'd
>> have to do the conservative thing -- which means that after
>> recovery, no one can read anything a prepared transaction wrote.
>
> This would be fairly simple to do, but I'm not sure it's worth
> it, either. The nasty thing about this is whole thing is precisely
> that no-one can read anything the prepared transaction wrote, so
> making the conflict-in bookkeeping more accurate doesn't seem very
> helpful.
Yeah, the benefit of this would be marginal without solving the
other side of the problem; but if we're adding TODO entries for this
area, perhaps they should be two separate entries, because either
side of this could be done without touching the other.
To summarize the above discussion, there is a bug that can be hit
when using both SSI and 2PC if a crash or shutdown occurs while any
serializable prepared transactions are pending and certain other
conditions are met. The proposed quick fix would be to cause a
serialization failure after recovery on any attempt by a
serializable transaction to read data written by a serializable
prepared transaction that was pending when a crash or shutdown
occurred, and on any attempt by a serializable transaction to do a
write which conflicts with a predicate lock acquired by such a
prepared transaction. This would tend to be more than a little
inconvenient until the prepared statements pending at crash or
shutdown were all committed or rolled back. A more sophisticated
solution is available that could be implemented in 9.3 or later.
-Kevin