Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-11 00:47:27
Message-ID: 20211111004727.imn7xrimdiyo7vfv@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2021-11-10 13:04:43 -0800, Peter Geoghegan wrote:
> On Wed, Nov 10, 2021 at 11:20 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > The way this definitely breaks - I have been able to reproduce this in
> > isolation - is when one tuple is processed twice by heap_prune_chain(), and
> > the result of HeapTupleSatisfiesVacuum() changes from
> > HEAPTUPLE_DELETE_IN_PROGRESS to DEAD.
>
> I had no idea that that was now possible. I really think that this
> ought to be documented centrally.

Where would you suggest?

> The relevant code in pruneheap.c was always incredibly fragile -- no
> question. Even still, there is really no good reason to believe that
> that was actually a problem before commit dc7420c2. Even if we assume
> that there's a problem before 14, the surface area is vastly smaller
> than on 14 -- the relevant pruneheap.c code hasn't really ever changed
> since HOT went in. And so I think that the most sensible course of
> action here is this: commit a fix to Postgres 14 + HEAD only -- no
> backpatch to earlier versions.

Yea. The fact that I also saw *one* error in 13 worries me a bit, but perhaps
that was something else. Even if we eventually need to backpatch something
further, having it in 14/master first is good.

The fact that 13 didn't trigger the problem reliably doesn't necessarily much
- it's a pretty limited workload. There e.g. are no aborts.

I think we might be able to do something a bit more limited than what you
propose. But I'm not sure it's worth going for that.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2021-11-11 00:57:03 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Previous Message Peter Geoghegan 2021-11-10 22:18:01 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum