From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Date: | 2021-11-16 02:39:18 |
Message-ID: | 20211116023918.5g6d6izevronx76m@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
On 2021-11-10 19:08:21 -0800, Andres Freund wrote:
> On 2021-11-09 19:07:02 -0800, Peter Geoghegan wrote:
> > @@ -700,11 +720,18 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
> > }
> >
> > /*
> > - * Remember the last DEAD tuple seen. We will advance past
> > - * RECENTLY_DEAD tuples just in case there's a DEAD one after them;
> > - * but we can't advance past anything else. We have to make sure that
> > - * we don't miss any DEAD tuples, since DEAD tuples that still have
> > - * tuple storage after pruning will confuse VACUUM.
> > + * Remember the last DEAD tuple seen from this HOT chain.
> > + *
> > + * We expect to sometimes find a RECENTLY_DEAD tuple after a DEAD
> > + * tuple. When this happens, the RECENTLY_DEAD tuple will be treated
> > + * as if it was DEAD all along. Manage that now.
>
> I now actually wonder why this is correct. There's another comment about it:
>
> > We also prune any RECENTLY_DEAD tuples preceding a DEAD tuple.
> > * This is OK because a RECENTLY_DEAD tuple preceding a DEAD tuple is really
>
> But that doesn't justify very much.
>
> What prevents the scenario that some other backend e.g. has a snapshot with
> xmin=xmax=RECENTLY_DEAD-row. If the RECENTLY_DEAD row has an xid that is later
> than the DEAD row, this afaict would make it perfectly legal to prune the DEAD
> row, but *not* the RECENTLY_DEAD one.
I think the code is actually correct if awfully commented. I.e. the above
scenario cannot happen in a harmful situation. The relevant piece is that the
set of "valid" snapshots is limited (for a specific cluster history).
A row can only be RECENTLY_DEAD if the deleting / updating transaction
committed.
An intermediary RECENTLY_DEAD row followed by a DEAD row would only be visible
to a snapshot that either had
1) row.xmax >= Snapshot.xmax
2) row.xmax in Snapshot.xip.
There cannot be a snapshot that sees the RECENTLY_DEAD row while also seeing
the subsequent row as DEAD, because the existence of one of the two conditions
would have held back the xid horizon:
1) is not possible, because Snapshot.xmin is <= Snapshot.xmax, and
Snapshot.xmin >= MyProc->xmin. The subsequent row could not be DEAD.
2) is not possible, because .xip only contains xids >= Snapshot.xmin.
IOW, there *can* be a snapshot with xmin=xmax=RECENTLY_DEAD-row, but it'd not
see the RECENTLY_DEAD row anyway.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-11-16 03:42:35 | Re: Logical Replication not working for few Tables |
Previous Message | Tom Lane | 2021-11-15 22:07:45 | Re: pg_restore depending on user functions |