Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-12 22:46:22
Message-ID: CAH2-WzmxQMHs9e61Qg0b7admeQc0y+ne_xxAxTLtvmHiJ=FQiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Nov 12, 2021 at 2:29 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> With subtransactions abort is a bit more complicated than with plain
> transactions. I'm not at all sure a problematic scenario exists, but I
> wouldn't want to rely on it.

What would it actually mean to rely on it, or to not rely on it?

As I've pointed out many times already, a disconnected heap tuple
cannot be accessed from an index scan -- this is something that you
*can* rely on, because we've performed exactly the same steps as
heap_hot_search_buffer() would in making that determination. When you
talk about what HTSV thinks of the tuple, you're merely talking about
how to behave in the event of a specific form of HOT chain corruption
(a theoretical background risk for HOT chains that's nothing new).

This is a question of trade-offs around adding defensive checks and so
on. It is not a question of making the corruption itself any less
likely (unless early detection allows the user to prevent further
corruption). I'm a bit confused here, because it sounds like you might
not agree with that.

> > Naturally, I also went through the exercise of trying to find a
> > counterexample, where pruning doesn't see a disconnected tuple as DEAD
> > in its HTSV. I could not get the assertion to fail with Alexander's
> > test case, nor with make check-world.
>
> I don't think that provides a meaningful coverage. Alexander's test has a
> quite limited set operations (which e.g. doesn't include an subxacts), and our
> own tests around subtransactions, and particularly concurrent subtransaction
> heavy work, is quite, uh, minimal.

It's a start.

We need to be pragmatic here. There is some uncertainty about what
HTSV might say about a disconnected tuple in the absence of
corruption, or there is a risk of a new problem like that coming up in
the future -- let's work within those confines, then. What do you want
to do about that? There aren't that many choices, since, to repeat,
the tuple is "morally" DEAD no matter what. Even with corruption, even
without corruption in the presence of some unanticipated corner case
with HTSV -- this is fundamental.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2021-11-12 22:57:25 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Previous Message Andres Freund 2021-11-12 22:29:19 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum