Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-17 23:08:51
Message-ID: CAH2-WzmkH=v2+DsY+pgNQhz252gqd-ie=weueCp4v=K3gUaDOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 17, 2024 at 5:47 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Wed, Jan 17, 2024 at 4:31 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> >
> > On Wed, Jan 17, 2024 at 4:25 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > > I tend to suspect that VACUUM_FSM_EVERY_PAGES is fundamentally the
> > > wrong idea. If it's such a good idea then why not apply it all the
> > > time? That is, why not apply it independently of whether nindexes==0
> > > in the current VACUUM operation? (You know, just like with
> > > FAILSAFE_EVERY_PAGES.)
> >
> > Actually, I suppose that we couldn't apply it independently of
> > nindexes==0. Then we'd call FreeSpaceMapVacuumRange() before our
> > second pass over the heap takes place for those LP_DEAD-containing
> > heap pages scanned since the last round of index/heap vacuuming took
> > place (or since VACUUM began). We need to make sure that the FSM has
> > the most recent possible information known to VACUUM, which would
> > break if we applied VACUUM_FSM_EVERY_PAGES rules when nindexes > 0.
> >
> > Even still, the design of VACUUM_FSM_EVERY_PAGES seems questionable to me.
>
> I now see I misunderstood and my earlier email was wrong. I didn't
> notice that we only use VACUUM_FSM_EVERY_PAGES if nindexes ==0.
> So, in master, we call FreeSpaceMapVacuumRange() always after a round
> of index vacuuming and periodically if there are no indexes.

The "nindexes == 0" if() that comes just after our call to
lazy_scan_prune() is "the one-pass equivalent of a call to
lazy_vacuum()". Though this includes the call to
FreeSpaceMapVacuumRange() that immediately follows the two-pass case
calling lazy_vacuum(), too.

> It seems like you are asking whether not we should vacuum the FSM at a
> different cadence for the no indexes case (and potentially count
> blocks actually vacuumed instead of blocks considered).
>
> And it seems like Robert is asking whether or not we should
> FreeSpaceMapVacuumRange() more frequently than after index vacuuming
> in the nindexes > 0 case.

There is no particular reason for the nindexes==0 case to care about
how often we'd call FreeSpaceMapVacuumRange() in the counterfactual
world where the same VACUUM ran on the same table, except that it was
nindexes>1 instead. At least I don't see any.

> Other than the overhead of the actual vacuuming of the FSM, what are
> the potential downsides of knowing about freespace sooner? It could
> change what pages are inserted to. What are the possible undesirable
> side effects?

The whole VACUUM_FSM_EVERY_PAGES thing comes from commit 851a26e266.
The commit message of that work seems to suppose that calling
FreeSpaceMapVacuumRange() more frequently is pretty much strictly
better than calling it less frequently, at least up to the point where
certain more-or-less fixed costs paid once per
FreeSpaceMapVacuumRange() start to become a problem. I think that
that's probably about right.

The commit message also says that we "arbitrarily update upper FSM
pages after each 8GB of heap" (in the nindexes==0 case). So
VACUUM_FSM_EVERY_PAGES is only very approximately analogous to what we
do in the nindexes>1 case. That seems reasonable because these two
cases really aren't so comparable in terms of the FSM vacuuming
requirements -- the nindexes==0 case legitimately doesn't have the
same dependency on heap vacuuming (and index vacuuming) that we have
to consider when nindexes>1.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-01-17 23:15:28 subscription disable_on_error not working after ALTER SUBSCRIPTION set bad conninfo
Previous Message Daniel Gustafsson 2024-01-17 22:47:41 Re: initdb's -c option behaves wrong way?