Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-13 17:06:48
Message-ID: CAH2-WzmeBU-WgDY_X=EAGkLf4bw8gPRB62HRvt++JHU3TsERgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Nov 13, 2021 at 7:05 AM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> I've got curious if modifying the Alexander's test case could reveal
> something interesting, and sprinkled it with savepoints and rollbacks.
> Almost immediately a new problem has manifested itself, although the
> crash has nothing to do with the disconnected tuples as far as I can
> tell -- still probably worth mentioning. In this case vacuum invoked
> lazy_scan_prune, and during the first scan one of the chains had a
> HEAPTUPLE_DEAD at the third position. The processing flow fell through
> to heap_prune_record_prunable and crashed on an assert with an
> InvalidTransactionId:

Is this just with the bugfix applied? I think that it is. Looks like a
minor bug to me.

I think that I need to consistently "break" in the DEAD case, to avoid
ending up here. In other words, it should not literally be
"reinterpreted" as RECENTLY_DEAD by falling through in the switch
statement (though the concept of reinterpreting certain DEAD tuples as
RECENTLY_DEAD still seems perfectly sound).

Here's why the assertion (invalid xmax/update xid cannot be used in
heap_prune_record_prunable() call) fails:

DEAD means that you might not have a valid update XID -- aborted
update is what we expect. But RECENTLY_DEAD means that there must have
been a deleter xact, and that the xact must have committed (can't have
been that the inserter aborted). This is a consequence of the fact
that the tuple is at least still visible to somebody (or could be),
unlike in the DEAD case. And so xmax must be a valid XID, and so the
existing RECENTLY_DEAD case handling can legitimately always expect
that. But I cannot (and should not) allow a call to
heap_prune_record_prunable() with a DEAD-to-HTSV tuple, even when I
"reinterpret" it as RECENTLY_DEAD in order to make a clean
determination of the tuple to delete up until for the entire HOT
chain.

--
Peter Geoghegan

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2021-11-13 18:23:48 Re: BUG #17284: Assert failed in SerialAdd() when the summarize_serial mode is engaged
Previous Message Dmitry Dolgov 2021-11-13 15:06:40 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum