From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Date: | 2021-12-11 22:03:36 |
Message-ID: | CAH2-Wzm9_2s=_6eKm5bNfz6-X9QYmWPd2h=iF_1wBhZufQ4WQw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Fri, Dec 10, 2021 at 8:57 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Given the lack of further discussion, let's go with the "minimal" fix and your
> stuff later. Attached is further polished patch - no significant changes,
> plenty copy-editing stuff.
Makes sense.
> 0001 is the bugfix
LGTM.
> 0002 adds the additional assertions - I'm wondering about only committing that
> in HEAD?
I don't think that it makes much difference, since these are only
assertions. I probably wouldn't bother with that myself, but I also
have zero concern about it breaking anything.
> I think your patch should easily be able to use prstate->htsv?
Let's get this committed, to 14 and HEAD. And then leave it to me to
rebase, as additional HEAD-only hardening.
> 0003 removes the redundant RelationGetNumberOfBlocks() calls. As 0001 does not
> change the total number of RelationGetNumberOfBlocks() calls I'm inclined to
> just apply this to HEAD?
Yeah, that can be part of my follow-up hardening patch, which (to
repeat) will be HEAD-only.
> 0004 is a patch that tries to address your point about the GlobalVisTestFor()
> placement, sort of
>
> My earlier point that moving it to earlier would be a bad idea because it'd
> make the horizon "older" was bogus - GlobalVisTestFor() doesn't itself do
> any horizon determination. However moving the GlobalVisTestFor() earlier
> still seems wrong - imo the vacuum_set_xid_limits() call should be moved to
> later. The attached patch moves both, wrapped in a new function, to just
> before the scan in lazy_scan_heap().
I think that this is the right direction, and I like the comments a
lot. But I don't like the idea of putting this initialization into
lazy_scan_heap(), for purely structural reasons. I plan to get rid of
the VacuumParams argument to lazy_scan_heap() completely, which would
make "VacuumParams setup" the sole responsibility of its caller,
heap_vacuum_rel(). The draft version of my "remove special cases to
advance relfrozenxid" patch (the one you've reviewed) already does
this. In general I really think it's important to make the state in
vacuumlazy.c as simple as possible.
I have no objection to delaying the lazy_scan_heap_limits() stuff
until right before lazy_scan_heap() is called. However, I do think
that we should always know certain basic "immutable" facts about a
VACUUM at the point that we call lazy_scan_heap(), which is not the
case with this patch.
Honestly, I'm surprised that you see much value in delaying the
lazy_scan_heap_limits() stuff until the very last microsecond. How
many microseconds could we possibly delay it by?
> I tried pretty hard to get the test to be reliable enough to commit it. But in
> the end I think using the index locking as a "control" mechanism just isn't
> great - as evidenced by 0003 breaking the test entirely. I think unfortunately
> there's not much hope for testing this unless we get wait points - which seems
> not too close :(.
Hopefully somebody will seriously commit to that at some point. It's
important, but not very glamorous.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2021-12-11 22:51:35 | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Previous Message | Strahinja Kustudić | 2021-12-11 14:40:43 | Re: BUG #17330: EXPLAIN hangs and very long query plans |