Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-12 22:29:19
Message-ID: 20211112222919.e7fkfpbpcoje6hsj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2021-11-12 13:11:54 -0800, Peter Geoghegan wrote:
> It also addresses the separate issue of DEAD vs RECENTLY_DEAD
> disconnected tuples -- that was the other unresolved question. This
> revision takes a harder line on the state of disconnected heap-only
> tuples. Andres said that he doesn't know for sure that disconnected
> heap-only tuples cannot be DELETE/INSERT_IN_PROGRESS -- "I'm not
> actually sure the Assert is unreachable. I can imagine cases where
> we'd see e.g. DELETE/INSERT_IN_PROGRESS due to a concurrent
> subtransaction abort or such". But I don't see how that's possible. In
> fact, I don't even see how it's possible for these items to be
> RECENTLY_DEAD -- I think that they must always be DEAD (or we're in
> big trouble anyway).
>
> These are not just any heap-only tuples. They're heap-only tuples that
> cannot possibly be accessed from a HOT chain. And so it's just
> physically impossible for them to be returned by index scans -- this
> is a certainty. How could they not be DEAD, in every possible sense?
> How could they not come from an aborted transaction, specifically?

With subtransactions abort is a bit more complicated than with plain
transactions. I'm not at all sure a problematic scenario exists, but I
wouldn't want to rely on it.

Especially if suboverflowed comes into play there can be scenarios where one
backend uses TransactionIdDidAbort() + SubTransGetTopmostTransaction() for
in-progress determination while another just relies on the procarray. Those
aren't updated atomically with respect to each other.

Also, heap_update()'s wait = true path uses a bit different logic again to
wait for other backends than what HeapTupleSatisfiesVacuum() ends up with.

> Naturally, I also went through the exercise of trying to find a
> counterexample, where pruning doesn't see a disconnected tuple as DEAD
> in its HTSV. I could not get the assertion to fail with Alexander's
> test case, nor with make check-world.

I don't think that provides a meaningful coverage. Alexander's test has a
quite limited set operations (which e.g. doesn't include an subxacts), and our
own tests around subtransactions, and particularly concurrent subtransaction
heavy work, is quite, uh, minimal.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2021-11-12 22:46:22 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Previous Message Peter Geoghegan 2021-11-12 21:11:54 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum