Bruce Momjian wrote:
> I must be missing something but I thought the only problem with our
> existing snapshot system was that you could see a row updated after
> your snapshot was created, and that the solution to that was to
> abort the transaction that would see the new row. Can you tell me
> what I am missing?
Well, that's roughly on target as a 30,000 foot overview, although I
think you're also working in an issue from the read committed
implementation.
The issue with read committed is that you start with one snapshot but
may wind up combining views of data from that initial snapshot with
one or more updated versions of rows from subsequent commits -- if
you blocked on a conflicting write which was subsequently committed.
I recently started a thread which drifted into a discussion of that
issue, but it's almost completely orthogonal to the issue of
implementing truly serializable transactions -- the overlap is that a
fix for the read committed anomalies might provide groundwork for an
optimization to the serializable implementation. After re-reading
the thread on the read committed issue and pondering it a bit more,
I'm inclined to think that we should worry about that if and when
serializable is working and we actually see problems that the
aforementioned optimization might fix. (Unless someone cares enough
about the read committed anomalies to want to champion that as a
separate issue.)
What I'm trying to stay focused on is the serializable
implementation. I'd settle for the traditional Strict 2 Phase
Locking (S2PL) approach, but since there is a new technique which
appears to perform much better than that for most workloads, and has
most of the benefits of snapshot isolation (reads don't block writes
and vice versa), I figure why not go for the gold.
You are right in that the new technique will basically work like
snapshot isolation but will roll back a transaction when there is one
transaction which reads data which is modified by a concurrent
transaction and writes data which is read by a concurrent
transaction (assuming all are serializable). Years of research by Dr
Alan Fekete et al has established that only when there is a cycle of
such read-write dependencies among transactions, the graph contains a
"pivot" transaction which has dependencies in both directions (to the
same or different transactions), and a transaction which is on the
write end of such an edge commits first, can snapshot isolation mode
show behavior inconsistent with fully serializable behavior. The SSI
technique doesn't attempt to build and analyze a complete graph of
such dependencies because the overhead would be prohibitive; rather,
it looks for the "pivot" which must be present in such a graph.
The SSI technique requires that serializable transactions (but not
transactions in other isolation levels) use a non-blocking SIREAD
lock to track reads, so that the read-write dependencies among
concurrent serializable transactions can be identified. The hard
part is that this does require some implementation of predicate
locking to prevent problems with phantom rows. Honestly, that is
probably going to be over half the battle. I had been nibbling
around the edges of the serializable issue trying to address
optimizations which should help performance of serializable
transactions by reducing the rollback rate; but came to the
conclusion that that was all premature optimization and a
distraction. I think the thing to do is start with the hard part, so
I've been reading up on practical techniques for implementing
predicate locking, and reading through the PostgreSQL locking code,
trying to get my head around this. Once this hardest part is solved,
I really think that a workable serializable mode is not too far past
it, and *then* would be the time to look at optimizations.
It's just a little bit of a stretch to call SILOCKs locks, because
they don't actually block anything. They are used at various points
to see where a transaction is reading data which has been modified by
another transaction or vice versa. And they do need to be kept until
all concurrent transactions have completed. Other than those quirks,
they behave pretty much like read locks, though, so it seems to make
sense to use the locking system for them. The differences are such
that I thought a new lock method might be appropriate. This thread
is to try to solicit opinions on whether that makes sense to anyone
but me. :-)
Once I sort out the subject issue, I'm about ready to try to start
generating a very rough prototype of predicate locking. I don't want
to start a discussion of those details on this thread, because it
seems to me that a decision on the subject issue affects significant
details about how I go about that.
Consider this the 5,000 foot overview. Previous thread contain much
more, and references to more authoritative documents.
This is not a small effort. I'll be surprised if it comes in at less
than 2,000 lines of code, plus documentation and tests. I'm not
really expecting it to make the next release after 8.5. With some
luck, maybe the next release after that. And that's all conditional
on our CIO approving programming time for the effort. I'm currently
trying to dig in far enough to provide more accurate estimates and
time lines to facilitate approval.
Hopefully I haven't just scared you off of your offer to help. ;-)
-Kevin