From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum |
Date: | 2021-11-10 19:20:10 |
Message-ID: | 20211110192010.ckvfzz352hsba5xf@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
On 2021-11-09 15:31:37 -0800, Peter Geoghegan wrote:
> I'm not sure why this seems to have become more of a problem following
> the snapshot scalability work from Andres -- Alexander mentioned that
> commit dc7420c2 looked like it was the source of the problem here, but
> I can't see any reason why that might be true (even though I accept
> that it might well *appear* to be true). I believe Andres has some
> theory on that, but I don't know the details myself. AFAICT, this is a
> live bug on all supported versions. We simply weren't being careful
> enough about breaking the invariant that an LP_REDIRECT can only point
> to a valid heap-only tuple. The really surprising thing here is that
> it took this long for it to visibly break.
The way this definitely breaks - I have been able to reproduce this in
isolation - is when one tuple is processed twice by heap_prune_chain(), and
the result of HeapTupleSatisfiesVacuum() changes from
HEAPTUPLE_DELETE_IN_PROGRESS to DEAD.
Consider a page like this:
lp 1: redirect to lp2
lp 2: deleted by xid x, not yet committed
and a sequence of events like this:
1) heap_prune_chain(rootlp = 1)
2) commit x
3) heap_prune_chain(rootlp = 2)
1) heap_prune_chain(rootlp = 1) will go to lp2, and see a
HEAPTUPLE_DELETE_IN_PROGRESS and thus not do anything.
3) then could, with the snapshot scalability changes, get DEAD back from
HTSV. Due to the "fuzzy" nature of the post-snapshot-scalability xid horizons,
that is possible, because we can end up rechecking the boundary condition and
seeing that now the horizon allows us to prune x / lp2.
At that point we have a redirect tuple pointing into an unused slot. Which is
"illegal", because something independent can be inserted into that slot.
What made this hard to understand (and likely hard to hit) is that we don't
recompute the xid horizons more than once per hot pruning ([1]). At first I
concluded that a change from RECENTLY_DEAD to DEAD could thus not happen - and
it doesn't: We go from HEAPTUPLE_DELETE_IN_PROGRESS to DEAD, which is possible
because there was no horizon test for HEAPTUPLE_DELETE_IN_PROGRESS.
Note that there are several paths < 14, that cause HTSV()'s answer to change
for the same xid. E.g. when the transaction inserting a tuple version aborts,
we go from HEAPTUPLE_INSERT_IN_PROGRESS to DEAD. But I haven't quite found a
path to trigger problems with that, because there won't be redirects to a
tuple version that is HEAPTUPLE_INSERT_IN_PROGRESS (but there can be redirects
to a HEAPTUPLE_DELETE_IN_PROGRESS or RECENTLY_DEAD).
I hit a crash once in 13 with a slightly evolved version of the test (many
connections creating / dropping the partitions as in the original scenario,
using :client_id to target different tables). It's possible that my
instrumentation was the cause of that. Unfortunately it took quite a few hours
to hit the problem in 13...
Greetings,
Andres Freund
[1] it's a bit more complicated than that, we only recompute the horizon when
a) we've not done it before in the current xact, b) RecentXmin changed during
a snapshot computation. Recomputing the horizon is expensive-ish, so we don't
want to do it constantly.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2021-11-10 20:28:43 | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() |
Previous Message | Tom Lane | 2021-11-10 14:35:41 | Re: BUG #17279: 'return query update ... returning *' reports syntax error in pg/plsql function |