> Heikki Linnakangas wrote:
> On 03.06.2011 23:44, Kevin Grittner wrote:
>> Heikki Linnakangas wrote:
>>
>>> I think you'll need to just memorize the lock deletion command in
>>> a backend-local list, and perform the deletion in a post-commit
>>> function.
>>
>> Hmm. As mentioned earlier in the thread, cleaning these up doesn't
>> actually have any benefit beyond freeing space in the predicate
>> locking collections. I'm not sure that benefit is enough to
>> justify this much new mechanism. Maybe I should just leave them
>> alone and let them get cleaned up in due course with the rest of
>> the locks. Any opinions on that?
>
> Is there a chance of false positives if oid wraparound happens and
> a new table gets the same oid as the old one? It's also possible
> for a heap to get the OID of an old index or vice versa, will that
> confuse things?
Some additional thoughts after working on the new code...
The risk we're trying to cover is that a table can be dropped while
committed transactions still have predicate locks against it, which
won't be cleared until overlapping transactions which are still
active, and which can form a dangerous structure with the committed
transactions, commit. If we just leave those predicate locks to be
cleaned up in the normal course of transaction completion, there
could be a new object created with the old OID which, if a
complicated set of conditions are met, could lead to a serialization
failure of a transaction which would otherwise succeed. We can't
clean the predicate locks up immediately when the DROP TABLE appears
to succeed, because the enclosing transaction (or subtransaction)
could still be rolled back.
That last bit seems pretty hard to dodge, because premature predicate
lock cleanup could lead to false negatives -- which would allow data
inconsistencies. So cleanup before commit is definitely out. But
some additional time meditating on the dangerous structure diagram
today has me wondering whether we might be OK just ignoring cleanup
prior to what would normally happen as transactions complete.
Tin - - -> Tpivot - - -> Tout
As mentioned above, this whole issue is based around concern about
false positives when a transaction has read from a table and
committed before the table is dropped. We don't need to worry about
that for Tout -- its position in the dangerous structure is just
based around what it *writes*; it can't be responsible for the
read-and-commit before the table is dropped. To have a problem,
Tpivot can't be the transaction which commits before the DROP TABLE
either -- since it reads what Tout writes and Tout has to be first to
commit, then it cannot read and commit before the DROP TABLE and
still have Tout write afterward. That means that the transaction
which did the read and commit would have to be Tin, and Tpivot would
need to be the transaction which later wrote to an object with the
same OID as the dropped table.
For a problem to manifest itself, this sequence must occur:
- The timing of transaction starts doesn't matter (other than the RO
optimizations) as long as Tpivot overlaps both Tin and Tout.
- Tout commits first
- Any time before Tin commits, it reads from relation X.
- Tin commits second
- Tpivot develops a rw-conflict out to Tout on a relation *other
than* X, any time after both have started and before Tpivot
commits.
- Tpivot writes to a relation which has the same OID as relation X,
but which isn't relation X. This requires that a new relation be
created with the same OID which had been used by X, and that this
new relation is visible to Tpivot -- which acquired its snapshot
*before X was dropped*.
Is this possible? If a transaction gets its snapshot while OID of N
is assigned to relation X, can that transaction wind up seeing an OID
of N as a reference to relation Y? If not, there aren't any false
positives possible.
Given the quantity and complexity of code which I've needed to
generate to cover this, I'm wondering again whether it is justified
*even if* such a false positive is possible. It certainly seems like
it would be a *very* infrequent occurrence, There's also the
question of what shows in pg_locks, but I'm wondering whether even
that is enough to justify this much code.
Thoughts? Maybe I should submit a patch without added complexity of
the scheduled cleanup and we can discuss that as a separate patch?
-Kevin