Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-12 22:57:25
Message-ID: 20211112225725.2a32slgl5ou3dvre@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2021-11-12 14:46:22 -0800, Peter Geoghegan wrote:
> On Fri, Nov 12, 2021 at 2:29 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > With subtransactions abort is a bit more complicated than with plain
> > transactions. I'm not at all sure a problematic scenario exists, but I
> > wouldn't want to rely on it.
>
> What would it actually mean to rely on it, or to not rely on it?

That we shouldn't throw an error / assert out if we find such a tuple.

> As I've pointed out many times already, a disconnected heap tuple
> cannot be accessed from an index scan -- this is something that you
> *can* rely on, because we've performed exactly the same steps as
> heap_hot_search_buffer() would in making that determination.

Yes, it'd also not be considered visible by SatisfiesMVCC().

> When you talk about what HTSV thinks of the tuple, you're merely talking
> about how to behave in the event of a specific form of HOT chain corruption
> (a theoretical background risk for HOT chains that's nothing new).

My point is that I don't think it necessarily signals corruption. But a very
short term transient state under heavy concurrency.

> We need to be pragmatic here. There is some uncertainty about what
> HTSV might say about a disconnected tuple in the absence of
> corruption, or there is a risk of a new problem like that coming up in
> the future -- let's work within those confines, then. What do you want
> to do about that? There aren't that many choices, since, to repeat,
> the tuple is "morally" DEAD no matter what. Even with corruption, even
> without corruption in the presence of some unanticipated corner case
> with HTSV -- this is fundamental.

I think we can assert/error out if it's visible, that's clearly
corruption. I'd personally not add assert/error checks for other states, given
that it could plausible happen without indicating a problem. Debugging
transient errors that happen rarely, under high load, with nontrivial
workloads isn't fun.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2021-11-12 23:12:41 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Previous Message Peter Geoghegan 2021-11-12 22:46:22 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum