Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?
Date: 2024-12-17 17:06:28
Message-ID: CAAKRu_aXdsPrhmS=VdQ7vSypFNFEZtSZnuM8GXZq5ZYZxS3Jcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 17, 2024 at 9:11 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
>
>
> On 12/16/24 19:49, Melanie Plageman wrote:
>
> > No, I'm talking about the behavior of causing small pockets of
> > all-frozen pages which end up being smaller than SKIP_PAGES_THRESHOLD
> > and are then scanned (even though they are already frozen). What I
> > describe in that email I cited is that because we freeze
> > opportunistically when we have or will emit an FPI, and bgwriter will
> > write out blocks in clocksweep order, we end up with random pockets of
> > pages getting frozen during/after a checkpoint. Then in the next
> > vacuum, we end up scanning those all-frozen pages again because the
> > ranges of frozen pages are smaller than SKIP_PAGES_THRESHOLD. This is
> > mostly going to happen for an insert-only workload. I'm not saying
> > freezing the pages is bad, I'm saying that causing these pockets of
> > frozen pages leads to scanning all-frozen pages on future vacuums.
> >
>
> Yeah, this interaction between the components is not great :-( But can
> we think of a way to reduce the fragmentation? What would need to change?

Well reducing SKIP_PAGES_THRESHOLD would help. And unfortunately we do
not know if the skippable pages are all-frozen without extra
visibilitymap_get_status() calls -- so we can't decide to avoid
scanning ranges of skippable pages because they are frozen.

> I don't think bgwriter can help much - it's mostly oblivious to the
> contents of the buffer, I don't think it could consider stuff like this
> when deciding what to evict.

Agreed.

> Maybe the freezing code could check how many of the nearby pages are
> frozen, and consider that together with the FPI write?

That's an interesting idea. We wouldn't have any guaranteed info
because we only have a lock on the page we are considering freezing.
But we could keep track of the length of a run of pages we are
freezing and opportunistically freeze pages that don't require
freezing if they follow one or more pages requiring freezing. But I
don't know how much more this buys us than removing
SKIP_PAGES_THRESHOLD. Since it would "fix" the fragmentation, perhaps
it makes larger future vacuum reads possible. But I wonder how much
benefit it would be vs complexity.

> >>> However, we are not close to coming up with a
> >>> replacement heuristic, so removing SKIP_PAGES_THRESHOLD would help.
> >>> This wouldn't have affected your results, but it is worth considering
> >>> more generally.
> >>
> >> One of the reasons why we have SKIP_PAGES_THRESHOLD is that it makes
> >> it more likely that non-aggressive VACUUMs will advance relfrozenxid.
> >> Granted, it's probably not doing a particularly good job at that right
> >> now. But any effort to replace it should account for that.
> >>
>
> I don't follow. How could non-aggressive VACUUM advance relfrozenxid,
> ever? I mean, if it doesn't guarantee freezing all pages, how could it?

It may, coincidentally, not skip any all-visible pages. Peter points
out that this happens all the time for small tables, but wouldn't the
overhead of an aggressive vacuum be barely noticeable for small
tables? It seems like there is little cost to waiting.

> >> This is possible by making VACUUM consider the cost of scanning extra
> >> heap pages up-front. If the number of "extra heap pages to be scanned"
> >> to advance relfrozenxid happens to not be very high (or not so high
> >> *relative to the current age(relfrozenxid)*), then pay that cost now,
> >> in the current VACUUM operation. Even if age(relfrozenxid) is pretty
> >> far from the threshold for aggressive mode, if the added cost of
> >> advancing relfrozenxid is still not too high, why wouldn't we just do
> >> it?
> >
> > That's an interesting idea. And it seems like a much more effective
> > way of getting some relfrozenxid advancement than hoping that the
> > pages you scan due to SKIP_PAGES_THRESHOLD end up being enough to have
> > scanned all unfrozen tuples.
> >
>
> I agree it might be useful to formulate this as a "costing" problem, not
> just in the context of a single vacuum, but for the overall maintenance
> overhead - essentially accepting the vacuum gets slower, in exchange for
> lower cost of maintenance later.

Yes, that costing sounds like a big research and benchmarking project
on its own.

> But I think that (a) is going to be fairly complex, because how do you
> cost the future vacuum?, and (b) is somewhat misses my point that on
> modern NVMe SSD storage (SKIP_PAGES_THRESHOLD > 1) doesn't seem to be a
> win *ever*.
>
> So why shouldn't we reduce the SKIP_PAGES_THRESHOLD value (or perhaps
> make it configurable)? We can still do the other stuff (decide how
> aggressively to free stuff etc.) independently of that.

I think your tests show SKIP_PAGES_THRESHOLD has dubious if any
benefit related to readahead. But the question is if we care about it
for advancing relfrozenxid for small tables.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-12-17 17:22:34 Re: Crash: invalid DSA memory alloc request
Previous Message Tomas Vondra 2024-12-17 17:01:39 Re: Fix for pageinspect bug in PG 17