Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-12 17:33:28
Message-ID: CAAKRu_YwBdS6Od=6ZzWRBY=OWO7ryPjwz7CZjH-uQKf2C3PGrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 12, 2024 at 11:50 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Fri, Jan 12, 2024 at 9:44 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Looking through the git history, I see that this behavior seems to
> > date back to 44fa84881fff4529d68e2437a58ad2c906af5805 which introduced
> > lazy_scan_noprune(). The comments don't explain the reasoning, but my
> > guess is that it was just an accident. It's not entirely evident to me
> > whether there might ever be good reasons to update the freespace map
> > for a page where we haven't freed up any space -- after all, the free
> > space map isn't crash-safe, so you could always postulate that
> > updating it will correct an existing inaccuracy. But I really doubt
> > that there's any good reason for lazy_scan_prune() and
> > lazy_scan_noprune() to have different ideas about whether to update
> > the FSM or not, especially in an obscure corner case like this.
>
> Why do you think that lazy_scan_prune() and lazy_scan_noprune() have
> different ideas about whether to update the FSM or not?
>
> Barring certain failsafe edge cases, we always call
> PageGetHeapFreeSpace() exactly once for each scanned page. While it's
> true that we won't always do that in the first heap pass (in cases
> where VACUUM has indexes), that's only because we expect to do it in
> the second heap pass instead -- since we only want to do it once.
>
> It's true that lazy_scan_noprune unconditionally calls
> PageGetHeapFreeSpace() when vacrel->nindexes == 0. But that's the same
> behavior as lazy_scan_prune when vacrel-> nindexes == 0. In both cases
> we know that there won't be any second heap pass, and so in both cases
> we always call PageGetHeapFreeSpace() in the first heap pass. It's
> just that it's a bit harder to see that in the lazy_scan_prune case.
> No?

So, I think this is the logic in master:

Prune case, first pass

- indexes == 0 && space_freed -> update FSM
- indexes == 0 && (!space_freed || !index_vacuuming) -> update FSM
- indexes > 0 && (!space_freed || !index_vacuuming) -> update FSM

which reduces to:

- indexes == 0 || !space_freed || !index_vacuuming

No Prune (except aggressive vacuum), first pass:

- indexes == 0 -> update FSM
- indexes > 0 && !space_freed -> update FSM

which reduces to:

- indexes == 0 || !space_freed

which is, during the first pass, if we will not do index vacuuming,
either because we have no indexes, because there is nothing to vacuum,
or because do_index_vacuuming is false, make sure we update the
freespace map now.

but what about no prune when do_index_vacuuming is false?

I still don't understand why vacuum is responsible for updating the
FSM per page when no line pointers have been set unused. That is how
PageGetFreeSpace() figures out if there is free space, right?

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Abhijit Menon-Sen 2024-01-12 17:43:46 Re: introduce dynamic shared memory registry
Previous Message Nathan Bossart 2024-01-12 17:23:50 Re: reorganize "Shared Memory and LWLocks" section of docs