Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-12 21:11:54
Message-ID: CAH2-WzmNk6V6tqzuuabxoxM8HJRaWU6h12toaS-bqYcLiht16A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Nov 11, 2021 at 9:46 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I wonder if we're approaching this business with "RECENTLY_DEAD can be
> upgraded to DEAD" in entirely the wrong way. Why not just not do that
> at all anymore, on the off chance that it's unsafe? Why even take a
> small chance? Our decision has to work at the level of the whole
> entire HOT chain, and it seems to me that we should make that as
> simple as possible.

Attached revision does it that way.

It also addresses the separate issue of DEAD vs RECENTLY_DEAD
disconnected tuples -- that was the other unresolved question. This
revision takes a harder line on the state of disconnected heap-only
tuples. Andres said that he doesn't know for sure that disconnected
heap-only tuples cannot be DELETE/INSERT_IN_PROGRESS -- "I'm not
actually sure the Assert is unreachable. I can imagine cases where
we'd see e.g. DELETE/INSERT_IN_PROGRESS due to a concurrent
subtransaction abort or such". But I don't see how that's possible. In
fact, I don't even see how it's possible for these items to be
RECENTLY_DEAD -- I think that they must always be DEAD (or we're in
big trouble anyway).

These are not just any heap-only tuples. They're heap-only tuples that
cannot possibly be accessed from a HOT chain. And so it's just
physically impossible for them to be returned by index scans -- this
is a certainty. How could they not be DEAD, in every possible sense?
How could they not come from an aborted transaction, specifically?

Naturally, I also went through the exercise of trying to find a
counterexample, where pruning doesn't see a disconnected tuple as DEAD
in its HTSV. I could not get the assertion to fail with Alexander's
test case, nor with make check-world. If the assertion did fail, then
I imagine that that would have to be due to a problem elsewhere -- it
couldn't be a problem with the "disconnected heap-only tuples must
already be DEAD to HTSV" assumption itself. That is one of the few
things about this patch that *isn't* complicated.

--
Peter Geoghegan

Attachment Content-Type Size
v4-0001-Fix-aborted-HOT-update-bug-in-heap-pruning.patch application/octet-stream 21.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2021-11-12 22:29:19 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Previous Message Euler Taveira 2021-11-12 19:00:18 Re: Logical Replication not working for few Tables