Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-11 20:25:18
Message-ID: CAH2-Wz=czwtPTE_RNjBWQ5mTA1WQ6zY41RK1q5dn2kwV8YjZGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Nov 11, 2021 at 12:05 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > ...why is that super likely? I have to admit that I have no idea what
> > you mean by that.
>
> Err, I missed a *not* in there, sorry for the confusion.

Can you at least move the GlobalVisTestFor() call back? Move it from
lazy_scan_heap() to the point right before we call lazy_scan_heap(),
from heap_vacuum_rel()? That still seems like a big improvement, while
obviously having no real side-effects.

> > Okay. Even still, I suggest that you say something about this new
> > DELETE_IN_PROGRESS pruning behavior in vacuumlazy.c. Maybe in
> > lazy_scan_prune()?
>
> That doesn't quite make sense to me. There's quite a few ways that a HTSV can
> change for the same tuple, and listing them all in vacuumlazy.c doesn't really
> help people notice that when they add a new call or such. There's plenty
> other callsites - like the one in heap_prune_chain()...

Well, the retry in lazy_scan_prune() says that it only expects to have
to retry in the event of an aborted transaction. If nothing else, it
seems like you might want to amend that. If in fact the same
"DELETE_IN_PROGRESS becomes DEAD" issue applies (can't see why it
wouldn't).

> How about changing the comment above lazy_scan_prune() to note that the
> concurrent abort case is one example of this behaviour? And then add a comment
> to HeapTupleSatisfiesVacuum() along the lines of:
>
> NB: Repeated calls to HeapTupleSatisfiesVacuum for the same tuple may change
> even while the buffer is locked. Cases of this include:
> - the transaction that inserted a tuple (version) aborted, changing the result
> from HEAPTUPLE_INSERT_IN_PROGRESS to HEAPTUPLE_DEAD
> - when used with GlobalVisTestIsRemovableXid(), HEAPTUPLE_DELETE_IN_PROGRESS
> can change to HEAPTUPLE_DEAD if the horizon is updated
>
> However, HEAPTUPLE_DEAD will not change back to HEAPTUPLE_RECENTLY_DEAD or
> such.

Okay - sounds good.

> > Agreed. But at the very least we should be thinking about the
> > whole-page picture. If we made freezing a whole page cheap enough
> > (smaller WAL records), then maybe the separate all-visible state
> > becomes totally unnecessary (except for pg_upgrade).
>
> I doubt it. But I think getting rid of unnecessarily dirtying the page a
> second (or third) time would be a big enough win already...

Why do you doubt it? Do you suspect that the additional cost of
freezing the whole page (relative to just setting it all-visible)
cannot be reduced to the point where it can be ignored? That might be
true, but I'd like to know if that's what you meant. Because I can't
think of any other reason, and it's not like I haven't thought about
it a fair bit already.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2021-11-11 20:40:03 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Previous Message Andres Freund 2021-11-11 20:05:17 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum