Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Looking at the prior/next version chaining, aside from the
> looping issue, isn't it broken by lock promotion too? There's a
> check in RemoveTargetIfNoLongerUsed() so that we don't release a
> lock target if its priorVersionOfRow is set, but what if the tuple
> lock is promoted to a page level lock first, and
> PredicateLockTupleRowVersionLink() is called only after that? Or
> can that not happen because of something else that I'm missing?
I had to ponder that a while. Here's my thinking.
Predicate locks only matter when there is a write. Predicate locks
on heap tuples only matter when there is an UPDATE or DELETE of a
locked tuple. The problem these links are addressing is that an
intervening transaction might UPDATE the transaction between the
read of the tuple and a later UPDATE or DELETE. We want the second
UPDATE to see that it conflicts with a read from before the first
UPDATE. The first UPDATE creates the link from the "before" tuple
ID the "after" tuple ID at the target level. What predicate locks
exist on the second target are irrelevant when it comes to seeing
the conflict between the second UPDATE (or DELETE) and the initial
read. So I don't see where granularity promotion for locks on the
second target is a problem as long as the target itself doesn't get
deleted because of the link to the prior version of the tuple.
Promotion of the lock granularity on the prior tuple is where we
have problems. If the two tuple versions are in separate pages then
the second UPDATE could miss the conflict. My first thought was to
fix that by requiring promotion of a predicate lock on a tuple to
jump straight to the relation level if nextVersionOfRow is set for
the lock target and it points to a tuple in a different page. But
that doesn't cover a situation where we have a heap tuple predicate
lock which gets promoted to page granularity before the tuple is
updated. To handle that we would need to say that an UPDATE to a
tuple on a page which is predicate locked by the transaction would
need to be promoted to relation granularity if the new version of
the tuple wasn't on the same page as the old version.
That's all doable without too much trouble, but more than I'm
likely to get done today. It would be good if someone can confirm
my thinking on this first, too.
That said, the above is about eliminating false negatives from some
corner cases which escaped notice until now. I don't think the
changes described above will do anything to prevent the problems
reported by YAMAMOTO Takashi. Unless I'm missing something, it
sounds like tuple IDs are being changed or reused while predicate
locks are held on the tuples. That's probably not going to be
overwhelmingly hard to fix if we can identify how that can happen.
I tried to cover HOT issues, but it seems likely I missed something.
:-( I will be looking at it.
-Kevin