Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> "Markus Wanner" <markus(at)bluegap(dot)ch> wrote:
>>
>>> Another line of thought: isn't this like READ COMMITTED for just
>>> the first operation in a SERIALIZABLE transaction?
>>
>> I've mulled it over and I have two different logical proofs that
>> this is safe; if anyone is dubious I'd be happy to share.
>>
>> This seems likely to be of significant benefit in some workloads,
>> and I can't see anywhere that it is likely to cost much. Any
>> objections to adding this to the TODO list as a performance item?
>
> I thought you concluded two emails ago that it wouldn't work for
> PG? It's certainly not clear to me what exactly the TODO would
> be.
Tom's emails had me pretty convinced that this technique wouldn't
work in PostgreSQL, but Markus put a fresh perspective on it which
makes it seem relatively painless. (Although, as is often the case,
my perspective may be naive.)
Basically, in a SERIALIZABLE transaction, if the first statement
which would require a snapshot would currently fail with "ERROR:
could not serialize access due to concurrent update" we would
instead get a fresh snapshot and retry -- which is what we do in a
READ COMMITTED transaction.
One way of looking at this is that any transaction which fails with
a serialization error can be retried with a reasonable chance of
success. There is no evidence of anything wrong with the
transaction itself, just that its actions conflicted with those of a
concurrent transaction. For the case we're discussing, that other
transaction has now committed. (We blocked waiting to see whether
it would commit or roll back.) If this is the first statement which
needed a snapshot, retrying it with a new snapshot can't create any
conflicting views of the data. We *could* view this sort of as an
automatic transaction retry in the limited situations where the
database engine can determine what to do. (If there had been prior
statements, you can't really know that the current statement would
have been issued by the client had the prior statements been run
against a different snapshot.) Where this view of things is a
little off is that explicit locks obtained earlier in the
transaction would still be held; we're not really starting the
*whole* transaction over. While this doesn't seem a fatal flaw, it
does mean the other way of looking at it is a more technically
correct.
The other way of looking at it is that until a statement succeeds
with a given snapshot, you have not fixed your snapshot for the
serializable transaction. A retry similar to what we do for READ
COMMITTED would just be part of obtaining the one snapshot used for
the SERIALIZABLE transaction -- it isn't fixed until that first
statement succeeds.
I'm assuming that this could be a fairly small change because we
already have code to do exactly the right thing for READ COMMITTED
transactions. The logic to choose which way to handle the commit of
a transaction which held a competing lock would need to be modified
to use the READ COMMITTED lock on the first statement which obtains
a snapshot in a SERIALIZABLE transaction, and the snapshot for a
SERIALIZABLE transaction would not be "fixed" until the completion
of the first statement needing a snapshot.
-Kevin