Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-11 03:16:41
Message-ID: 20211111031641.kqj4w64klqt7ta2h@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2021-11-10 18:40:13 -0800, Peter Geoghegan wrote:
> On Wed, Nov 10, 2021 at 6:16 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Hm. To me all of this is more general than vacuum[lazy].c. Or even than
> > anything heap related.
>
> Here is a sensible compromise: put most of what you want to say
> wherever (I guess procarray.c), and then move the vacuumlazy.c call to
> GlobalVisTestFor() back, so that it comes immediately after the
> vacuum_set_xid_limits() call. Then place a few breadcrumb comments
> that reference the place in procarray.c that has the real discussion.

I don't really think it's a good idea to move it to earlier. If anything I
would want to move it later (and eventually occasionally re-compute it, that'd
be a *huge* win for longrunning vacuums). We can't really do the same with the
hard horizons. It's currently super likely that the horizon moves forward a
lot between vacuum_set_xid_limits() and GlobalVisTestFor(), but moving it
closer still is going in the wrong direction.

So I'm inclined to add a comment saying that we need to do the
GlobalVisTestFor(), pointing to the comment in ComputeXidHorizons(), and
adding a TODO that we should find a heuristic when to recompute horizons?

> We could probably *also* freeze tuples opportunistically (e.g., freeze
> a few tuples on a page early to be able to mark it all-frozen sooner),

We should definitely do that when a page is already being dirtied for other
reasons.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-11-11 03:19:09 Re: BUG #17280: global-buffer-overflow on select from pg_stat_slru
Previous Message Andres Freund 2021-11-11 03:08:21 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum