Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Maybe we should reduce SKIP_PAGES_THRESHOLD a bit?
Date: 2024-12-17 14:11:25
Message-ID: d8994ef4-894c-4002-a862-0101e71f5b44@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/16/24 19:49, Melanie Plageman wrote:
> On Mon, Dec 16, 2024 at 12:32 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>
>> On Mon, Dec 16, 2024 at 10:37 AM Melanie Plageman
>> <melanieplageman(at)gmail(dot)com> wrote:
>>> On a related note, the other day I noticed another negative effect
>>> caused in part by SKIP_PAGES_THRESHOLD. SKIP_PAGES_THRESHOLD interacts
>>> with the opportunistic freeze heuristic [1] causing lots of all-frozen
>>> pages to be scanned when checksums are enabled. You can easily end up
>>> with a table that has very fragmented ranges of frozen, all-visible,
>>> and modified pages. In this case, the opportunistic freeze heuristic
>>> bears most of the blame.
>>
>> Bears most of the blame for what? Significantly reducing the total
>> amount of WAL written?
>
> No, I'm talking about the behavior of causing small pockets of
> all-frozen pages which end up being smaller than SKIP_PAGES_THRESHOLD
> and are then scanned (even though they are already frozen). What I
> describe in that email I cited is that because we freeze
> opportunistically when we have or will emit an FPI, and bgwriter will
> write out blocks in clocksweep order, we end up with random pockets of
> pages getting frozen during/after a checkpoint. Then in the next
> vacuum, we end up scanning those all-frozen pages again because the
> ranges of frozen pages are smaller than SKIP_PAGES_THRESHOLD. This is
> mostly going to happen for an insert-only workload. I'm not saying
> freezing the pages is bad, I'm saying that causing these pockets of
> frozen pages leads to scanning all-frozen pages on future vacuums.
>

Yeah, this interaction between the components is not great :-( But can
we think of a way to reduce the fragmentation? What would need to change?

I don't think bgwriter can help much - it's mostly oblivious to the
contents of the buffer, I don't think it could consider stuff like this
when deciding what to evict.

Maybe the freezing code could check how many of the nearby pages are
frozen, and consider that together with the FPI write?

>>> However, we are not close to coming up with a
>>> replacement heuristic, so removing SKIP_PAGES_THRESHOLD would help.
>>> This wouldn't have affected your results, but it is worth considering
>>> more generally.
>>
>> One of the reasons why we have SKIP_PAGES_THRESHOLD is that it makes
>> it more likely that non-aggressive VACUUMs will advance relfrozenxid.
>> Granted, it's probably not doing a particularly good job at that right
>> now. But any effort to replace it should account for that.
>>

I don't follow. How could non-aggressive VACUUM advance relfrozenxid,
ever? I mean, if it doesn't guarantee freezing all pages, how could it?

>> This is possible by making VACUUM consider the cost of scanning extra
>> heap pages up-front. If the number of "extra heap pages to be scanned"
>> to advance relfrozenxid happens to not be very high (or not so high
>> *relative to the current age(relfrozenxid)*), then pay that cost now,
>> in the current VACUUM operation. Even if age(relfrozenxid) is pretty
>> far from the threshold for aggressive mode, if the added cost of
>> advancing relfrozenxid is still not too high, why wouldn't we just do
>> it?
>
> That's an interesting idea. And it seems like a much more effective
> way of getting some relfrozenxid advancement than hoping that the
> pages you scan due to SKIP_PAGES_THRESHOLD end up being enough to have
> scanned all unfrozen tuples.
>

I agree it might be useful to formulate this as a "costing" problem, not
just in the context of a single vacuum, but for the overall maintenance
overhead - essentially accepting the vacuum gets slower, in exchange for
lower cost of maintenance later.

But I think that (a) is going to be fairly complex, because how do you
cost the future vacuum?, and (b) is somewhat misses my point that on
modern NVMe SSD storage (SKIP_PAGES_THRESHOLD > 1) doesn't seem to be a
win *ever*.

So why shouldn't we reduce the SKIP_PAGES_THRESHOLD value (or perhaps
make it configurable)? We can still do the other stuff (decide how
aggressively to free stuff etc.) independently of that.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message AbdelAziz Sharaf 2024-12-17 15:09:06 Re: [Feature Request] Schema Aliases and Versioned Schemas
Previous Message Ashutosh Bapat 2024-12-17 14:10:11 Re: Changing shared_buffers without restart