Robert Haas wrote:
> Kevin Grittner wrote:
>> Anyway, I could clean up all but that last issue in the old code.
>> I'm not sure whether that makes sense if you're refactoring it
>> anyway. Would you like me to look at the refactored code to
>> suggest fixes? Would you rather do it yourself based on my
>> answers here? Do we need to sort out that last issue before
>> proceeding on the others?
> I haven't a clue how to fix this. What I was doing was of course
> targeted toward 9.2, but I have half a thought that making
> index_getnext() call heap_hot_search_buffer() might be a sensible
> thing to do in 9.1, because code duplication = more bugs. On the
> third hand, at the moment the code that Heikki wrote to do that is
> tangled up in a bunch of other things that we almost certainly
> don't want to do in 9.1, and it's not obvious that it can be
> cleanly untangled, so maybe that's not the right idea after all.
>
> I think a good starting point might be to design a test case that
> fails with the current code, and come up with a plan for what to
> do about it. I have a very ugly feeling about this problem. I
> agree with your feeling that chasing down the update pointers
> isn't sensible, but a whole-relation lock seems like it could lead
> to a lot more rollbacks.
OK, will work on a test case for this last issue, but it might make
sense to address some of the other points separately first. For one
thing it might allow you to continue on with your 9.2 work with
clean tests. I can't do much on any of it today, as I have to deal
with some other things before being away for a week.
This is such a remote corner case that it would be really good if
we can limit the relation locks to cases where we're somewhere near
that corner. I've been trying to work out how to do that -- not
there yet, but I see some glimmers of how it might be done. The
nice thing about putting together a test case for something this
hard to hit is that it helps clarify the dynamics of the problem,
and solutions sometimes just pop out of it.
FWIW, so far what I know is that it will take an example something
like the one shown here:
http://archives.postgresql.org/pgsql-hackers/2011-02/msg00325.php
with the further requirements that the update in T3 must not be a
HOT update, T1 would still need to acquire a snapshot before T2
committed while moving its current select down past the commit of
T3, and that select would need to be modified so that it would scan
the visible tuple and then stop (e.g., because of a LIMIT) before
reaching the tuple which represents the next version of the row.
-Kevin