From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Subject: | Re: Combine Prune and Freeze records emitted by vacuum |
Date: | 2024-03-26 21:45:59 |
Message-ID: | 20240326214559.k7hgbyjhk4k4gxft@liskov |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 25, 2024 at 09:33:38PM +0200, Heikki Linnakangas wrote:
> On 24/03/2024 18:32, Melanie Plageman wrote:
> > On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > >
> > > In heap_page_prune_and_freeze(), we now do some extra work on each live
> > > tuple, to set the all_visible_except_removable correctly. And also to
> > > update live_tuples, recently_dead_tuples and hastup. When we're not
> > > freezing, that's a waste of cycles, the caller doesn't care. I hope it's
> > > enough that it doesn't matter, but is it?
> >
> > Last year on an early version of the patch set I did some pgbench
> > tpcb-like benchmarks -- since there is a lot of on-access pruning in
> > that workload -- and I don't remember it being a showstopper. The code
> > has changed a fair bit since then. However, I think it might be safer
> > to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip
> > the rest of the loop after heap_prune_satisifies_vacuum() when
> > on-access pruning invokes it. I had avoided that because it felt ugly
> > and error-prone, however it addresses a few other of your points as
> > well.
>
> Ok. I'm not a fan of the name 'by_vacuum' though. It'd be nice if the
> argument described what it does, rather than who it's for. For example,
> 'need_all_visible'. If set to true, the function determines 'all_visible',
> otherwise it does not.
A very rough v7 is attached. The whole thing is rebased over master and
then 0016 contains an attempt at the refactor we discussed in this
email.
Instead of just using the PruneReason to avoid doing the extra steps
when on-access pruning calls heap_page_prune_and_freeze(), I've made an
"actions" variable and defined different flags for it. One of them is
a replacement for the existing mark_unused_now flag. I defined another
one, PRUNE_DO_TRY_FREEZE, which could be used in place of checking if
pagefrz is NULL.
There is a whole group of activities that only the vacuum caller does
outside of freezing -- setting hastup, counting live and recently dead
tuples, determining whole page visibility and a snapshot conflict
horizon for updating the VM. But I didn't want to introduce separate
flags for each of them, because then I would have to check each of them
before taking the action. That would be lots of extra branching and
on-access pruning does none of those actions while vacuum does all of
them.
> I started to look closer at the loops in heap_prune_chain() and how they
> update all the various flags and counters. There's a lot going on there. We
> have:
>
> - live_tuples counter
> - recently_dead_tuples counter
> - all_visible[_except_removable]
> - all_frozen
> - visibility_cutoff_xid
> - hastup
> - prstate.frozen array
> - nnewlpdead
> - deadoffsets array
>
> And that doesn't even include all the local variables and the final
> dead/redirected arrays.
>
> Some of those are set in the first loop that initializes 'htsv' for each
> tuple on the page. Others are updated in heap_prune_chain(). Some are
> updated in both. It's hard to follow which are set where.
>
> I think recently_dead_tuples is updated incorrectly, for tuples that are
> part of a completely dead HOT chain. For example, imagine a hot chain with
> two tuples: RECENTLY_DEAD -> DEAD. heap_prune_chain() would follow the
> chain, see the DEAD tuple at the end of the chain, and mark both tuples for
> pruning. However, we already updated 'recently_dead_tuples' in the first
> loop, which is wrong if we remove the tuple.
Ah, yes, you are so right about this bug.
> Maybe that's the only bug like this, but I'm a little scared. Is there
> something we could do to make this simpler? Maybe move all the new work that
> we added to the first loop, into heap_prune_chain() ? Maybe introduce a few
> more helper heap_prune_record_*() functions, to update the flags and
> counters also for live and insert/delete-in-progress tuples and for dead
> line pointers? Something like heap_prune_record_live() and
> heap_prune_record_lp_dead().
I like the idea of a heap_prune_record_live_or_recently_dead() function.
That's what I've attempted to implement in the attached 0016. I haven't
updated and cleaned up everything (especially comments) in the refactor,
but there are two major issues:
1) In heap_prune_chain(), a heap-only tuple which is not HOT updated may
end up being a live tuple not part of any chain or it may end up the
redirect target in a HOT chain. At the top of heap_prune_chain(), we
return if (HeapTupleHeaderIsHeapOnly(htup)). We may come back to this
tuple later if it is part of a chain. If we don't, we need to have
called heap_prune_record_live_or_recently_dead(). However, there are
other tuples that get redirected to which do not meet this criteria, so
we must call heap_prune_record_live_or_recently_dead() when setting an
item redirected to. If we call heap_prune_record_live_or_recently_dead()
in both places, we will double-count. To fix this, I introduced an
array, "counted". But that takes up extra space in the PruneState and
extra cycles to memset it.
I can't think of a way to make sure we count the right tuples without
another array. The tuples we need to count are those not pointed to by
prstate->marked + those tuples whose line pointers will be redirected to
(those are marked).
2) A large number of the members of PruneFreezeResult are only
initialized for the vacuum caller now. Even with a comment, this is a
bit confusing. And, it seems like there should be some symmetry between
the actions the caller tells heap_page_prune_and_freeze() to take and
the result parameters that are filled in.
I am concerned about adding all of the actions (setting hastup,
determining whole page visibility, etc as mentioned above) because then
I also have to check all the actions and that will add extra branching.
And out of the two callers of heap_page_prune_and_freeze(), one will do
all of the actions and one will do none of them except "main" pruning.
> > Note that I still don't think we have a resolution on what to
> > correctly update new_relfrozenxid and new_relminmxid to at the end
> > when presult->nfrozen == 0 and presult->all_frozen is true.
> >
> > if (presult->nfrozen > 0)
> > {
> > presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
> > presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
> > }
> > else
> > {
> > presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
> > presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
> > }
>
> One approach is to take them out of the PageFreezeResult struct again, and
> pass them as pointers:
>
> void
> heap_page_prune_and_freeze(Relation relation, Buffer buffer,
> ...
> TransactionId *new_relfrozenxid,
> MultiXactId *new_relminmxid,
> ...
> )
>
> That would be natural for the caller too, as it wouldn't need to set up the
> old values to HeapPageFreeze before each call, nor copy the new values to
> 'vacrel' after the call. I'm thinking that we'd move the responsibility of
> setting up HeapPageFreeze to heap_page_prune_and_freeze(), instead of having
> the caller set it up. So the caller would look something like this:
>
> heap_page_prune_and_freeze(rel, buf, vacrel->vistest,
> &vacrel->cutoffs, &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid,
> &presult,
> PRUNE_VACUUM_SCAN, flags,
> &vacrel->offnum);
>
> In this setup, heap_page_prune_and_freeze() would update *new_relfrozenxid
> and *new_relminmxid when it has a new value for them, and leave them
> unchanged otherwise.
I've passed new_relfrozen_xid and new_relmin_mxid as arguments.
But as for only updating them when there is a new value, that doesn't
sound cheaper than just setting them when they are passed in with the
values from [No]FreezePageRelfrozenXid, [No]FreezePageRelminMxid. Unless
you are imagining a way to simplify the current
[No]FreezePageRelfrozenXid, [No]FreezePageRelminMxid.
- Melanie
From | Date | Subject | |
---|---|---|---|
Next Message | Jacob Champion | 2024-03-26 21:52:23 | Re: WIP Incremental JSON Parser |
Previous Message | Tomas Vondra | 2024-03-26 21:29:19 | Re: speed up a logical replica setup |