Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date: 2022-03-10 00:25:05
Message-ID: 20220310002505.ir474sfzmbz6enjg@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2022-03-03 16:22:11 -0800, Peter Geoghegan wrote:
> Attached is (roughly speaking) a rebased version of this patch (i.e.
> your v4-0004- patch from the patch series). Like your patch, this
> patch documents our expectations for vistest and OldestXmin. Unlike
> your patch, it makes all vacrel initialization take place before
> lazy_scan_heap is called, from inside heap_vacuum_rel.

I think it'd be nicer if we did the horizon determination after allocating
space for dead tuples... But it's still better than today, so...

> An important detail here is how OldestXmin (and vistest) are related
> to rel_pages -- I also added something about that. It definitely
> wouldn't be okay to (say) move the call to RelationGetNumberOfBlocks
> to any place before the call to vacuum_set_xid_limits. And so it very
> much seems in scope here.

You're right, it needs to be that way round.

> I'm a bit confused about at least one detail here: you've said here
> that "GlobalVisTestFor() doesn't itself do any horizon determination",
> which seems to contradict comments from the v4-0004- patch that was
> attached to the same email.

Which comment in the patch is the quoted statement contradicting? The on-point
comment seems to be

+ * [...] vacrel->vistest is always at least as aggressive as the
+ * limits vacuum_set_xid_limits() computes because ComputeXidHorizons()
+ * (via vacuum_set_xid_limits() ->GetOldestNonRemovableTransactionId())
+ * ensures the approximate horizons are always at least as aggressive as
+ * the precise horizons.

> + * Also, we delay establishing vistest + * as a minor optimization. A
> later cutoff can enable more eager pruning. + */

I don't think the minor optimization does anything (which I had stated wrongly
at some point in this thread).

> /*
> * Call lazy_scan_heap to perform all required heap pruning, index
> - * vacuuming, and heap vacuuming (plus related processing)
> + * vacuuming, and heap vacuuming (plus related processing).
> + *
> + * Note: Resource management for parallel VACUUM is quite subtle, and so
> + * lazy_scan_heap must allocate (and deallocate) vacrel->dead_items space
> + * for itself. Every other vacrel field is initialized by now, though.
> */
> lazy_scan_heap(vacrel, params->nworkers);

What are you referring to with "quite subtle"?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2022-03-10 00:46:37 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Previous Message Masahiko Sawada 2022-03-09 23:36:35 Re: BUG #17385: "RESET transaction_isolation" inside serializable transaction causes Assert at the transaction end