Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-11 01:37:38
Message-ID: CAH2-Wz=QrJsLN3UhQ2KOXsdubhqfuNUQiJDjESubscPdaDT5eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Nov 10, 2021 at 4:47 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2021-11-10 13:04:43 -0800, Peter Geoghegan wrote:
> > On Wed, Nov 10, 2021 at 11:20 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > The way this definitely breaks - I have been able to reproduce this in
> > > isolation - is when one tuple is processed twice by heap_prune_chain(), and
> > > the result of HeapTupleSatisfiesVacuum() changes from
> > > HEAPTUPLE_DELETE_IN_PROGRESS to DEAD.
> >
> > I had no idea that that was now possible. I really think that this
> > ought to be documented centrally.
>
> Where would you suggest?

Offhand I'd say that it would be a good idea to add comments over the
call to vacuum_set_xid_limits() made from vacuumlazy.c.

You might also move the call to GlobalVisTestFor() out of
lazy_scan_heap(), so that it gets called right after
vacuum_set_xid_limits(). That would make the new explanation easier to
follow, since you are after all explaining the relationship between
OldestXmin (or the vacuum_set_xid_limits() call itself) and vistest
(or the GlobalVisTestFor() call itself).

Why do they have to be called in that order? Or do they? I noticed
that "make check-world" won't break if you switch the order.

I assume that you're going to want to say something about what needs
to happen in lazy_scan_prune() in these new comments -- since that is
where the relationship between these two things is most crucial.

Thanks
--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-11-11 01:39:23 Re: BUG #17280: global-buffer-overflow on select from pg_stat_slru
Previous Message Peter Geoghegan 2021-11-11 01:19:14 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum