Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2021-11-13 02:32:48
Message-ID: CAH2-Wz=2wAftxnZdUjKPpnjyXESqjq90-=DOjmDZg_2HiiT4NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Nov 12, 2021 at 5:57 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> You said it yourself: who knows exactly what the justification for
> RECENTLY_DEAD->DEAD was? I have to imagine it had something to do with the
> "INSERT_IN_PROGRESS becomes DEAD due to concurrent xact abort" thing,
> but that's unclear. And even if it was clear, and even if we knew that
> it was 100% safe at one point, it still wouldn't be clear that it's
> safe today, in Postgres 14.

Another relevant factor is how we deal with already-corrupt HOT chains
affected by the bug. I would be comfortable with a full "can't happen"
error in the new code path for disconnected and aborted heap-only
tuples, provided the error only gets raised when the tuple is fully
LIVE according to HTSV (and also assert that it's DEAD). Something
like my v4 plus this LIVE-should-be-DEAD defensive error seems very
likely to avoid making the corruption any worse. There is a huge
amount of redundancy in the tuple headers that we can cross check
inexpensively.

--
Peter Geoghegan

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Padmakumar Kadayaprth 2021-11-13 08:28:51 Re: Logical Replication not working for few Tables
Previous Message Tom Lane 2021-11-13 02:12:59 Re: BUG #17283: localhost should also include IPv6