Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-10 22:27:57
Message-ID: CAAKRu_aX64c36JOEYrwiML3dBnq8MCsOjYs-7bOrPBOXWEsGPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 10, 2024 at 3:54 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Jan 9, 2024 at 5:42 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > Done in attached v6.
>
> Don't kill me, but:
>
> + /*
> + * For now, pass no_indexes == false
> regardless of whether or not
> + * the relation has indexes. In the future we
> may enable immediate
> + * reaping for on access pruning.
> + */
> + heap_page_prune(relation, buffer, vistest, false,
> + &presult, NULL);
>
> My previous comments about lying seem equally applicable here.

Yes, the options I can think of are:

1) rename the parameter to "immed_reap" or similar and make very clear
in heap_page_prune_opt() that you are not to pass true.
2) make immediate reaping work for on-access pruning. I would need a
low cost way to find out if there are any indexes on the table. Do you
think this is possible? Should we just rename the parameter for now
and think about that later?

>
> - if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
> + if (!ItemIdIsUsed(itemid))
> continue;
>
> There is a bit of overhead here for the !no_indexes case. I assume it
> doesn't matter.

Right. Should be okay. Alternatively, and I'm not saying this is a
good idea, but we could throw this into the loop in heap_page_prune()
which calls heap_prune_chain():

+ if (ItemIdIsDead(itemid) && prstate.no_indexes)
+ {
+ heap_prune_record_unused(&prstate, offnum);
+ continue;
+ }

I think that is correct?
But, from a consistency perspective, we don't call the
heap_prune_record* functions directly from heap_page_prune() in any
other cases. And it seems like it would be nice to have all of those
in one place?

> static void
> heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum)
> {
> + /*
> + * If the relation has no indexes, we can remove dead tuples during
> + * pruning instead of marking their line pointers dead. Set this tuple's
> + * line pointer LP_UNUSED. We hint that tables with indexes are more
> + * likely.
> + */
> + if (unlikely(prstate->no_indexes))
> + {
> + heap_prune_record_unused(prstate, offnum);
> + return;
> + }
>
> I think this should be pushed to the callers. Else you make the
> existing function name into another lie.

Yes, so I preferred it in the body of heap_prune_chain() (the caller).
Andres didn't like the extra level of indentation. I could wrap
heap_record_dead() and heap_record_unused(), but I couldn't really
think of a good wrapper name. heap_record_dead_or_unused() seems long
and literal. But, it might be the best I can do. I can't think of a
general word which encompasses both the transition to death and to
disposal.

> + bool recordfreespace;
>
> Not sure if it's necessary to move this to an outer scope like this?
> The whole handling of this looks kind of confusing. If we're going to
> do it this way, then I think lazy_scan_prune() definitely needs to
> document how it handles this function (i.e. might set true to false,
> won't set false to true, also meaning). But are we sure we want to let
> a local variable with a weird name "leak out" like this?

Which function do you mean when you say "document how
lazy_scan_prune() handles this function". And no we definitely don't
want a variable like this to be hanging out in lazy_scan_heap(), IMHO.
The other patches in the stack move the FSM updates into
lazy_scan_[no]prune() and eliminate this parameter. I moved up the
scope because lazy_scan_noprune() already had recordfreespace as an
output parameter and initialized it unconditionally inside. I
initialize it unconditionally in lazy_scan_prune() as well. I mention
in the commit message that this is temporary because we plan to
eliminate recordfreespace as an output parameter by updating the FSM
in lazy_scan_[no]prune(). I could have stuck recordfreespace into the
LVPagePruneState with the other output parameters. But, leaving it as
a separate output parameter made the diffs lovely for the rest of the
patches in the stack.

> + Assert(vacrel->do_index_vacuuming);
>
> Is this related?

Yes, previously the assert was:
Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
And we eliminated the caller of lazy_vacuum_heap_page() with
vacrel->nindexes == 0. Now it should only be called after doing index
vacuuming (thus index vacuuming should definitely be enabled).

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-01-10 22:53:50 Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Previous Message Majid Garoosi 2024-01-10 21:45:48 GUCifying MAX_WAL_SEND