Quick Links

Re: Eager page freeze criteria clarification

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Melanie Plageman <melanieplageman(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: Eager page freeze criteria clarification
Date:	2023-09-06 20:21:31
Message-ID:	CA+TgmobVz=gC60bR=U2EAdFq8DUXrTqzOnfzBtNEDcfh+QNA6g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Sep 6, 2023 at 12:20 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> This was a case where index vacuuming was never required. It's just a
> simple and easy to recreate example of what I think of as a more
> general problem.

OK.

> Why wouldn't we expect a table to have some pages that ought to be
> frozen right away, and others where freezing should in theory be put
> off indefinitely? I think that that's very common.

Oh, I see. I agree that that's pretty common. I think what that means
in practice is that we need to avoid relying too much on
relation-level statistics to guide behavior with respect to individual
relation pages. On the other hand, I don't think it means that we
can't look at relation-wide or even system-wide statistics at all.
Sure, those statistics may not be perfect, but some things are not
practical to track on a page granularity, and having some
course-grained information can, I think, be better than having nothing
at all, if you're careful about how much and in what way you rely on
it.

> As you know, I am particularly concerned about the tendency of
> unfrozen all-visible pages to accumulate without bound (at least
> without bound expressed in physical units such as pages). The very
> fact that pages are being set all-visible by VACUUM can be seen as a
> part of a high-level systemic problem -- a problem that plays out over
> time, across multiple VACUUM operations. So even if the cost of
> setting pages all-visible happened to be much lower than the cost of
> freezing (which it isn't), setting pages all-visible without freezing
> has unique downsides.

I generally agree with all of that.

> If VACUUM freezes too aggressively, then (pretty much by definition)
> we can be sure that the next VACUUM will scan the same pages -- there
> may be some scope for VACUUM to "learn from its mistake" when we err
> in the direction of over-freezing. But when VACUUM makes the opposite
> mistake (doesn't freeze when it should have), it won't scan those same
> pages again for a long time, by design. It therefore has no plausible
> way of "learning from its mistakes" before it becomes an extremely
> expensive and painful lesson (which happens whenever the next
> aggressive VACUUM takes place). This is in large part a consequence of
> the way that VACUUM dutifully sets pages all-visible whenever
> possible. That behavior interacts badly with many workloads, over
> time.

I think this is an insightful commentary with which I partially agree.
As I see it, the difference is that when you make the mistake of
marking something all-visible or freezing it too aggressively, you
incur a price that you pay almost immediately. When you make the
mistake of not marking something all-visible when it would have been
best to do so, you incur a price that you pay later, when the next
VACUUM happens. When you make the mistake of not marking something
all-frozen when it would have been best to do so, you incur a price
that you pay even later, not at the next VACUUM but at some VACUUM
further off. So there are different trade-offs. When you pay the price
for a mistake immediately or nearly immediately, it can potentially
harm the performance of the foreground workload, if you're making a
lot of mistakes. That sucks. On the other hand, when you defer paying
the price until some later bulk operation, the costs of all of your
mistakes get added up and then you pay the whole price all at once,
which means you can be suddenly slapped with an enormous bill that you
weren't expecting. That sucks, too, just in a different way.

> VACUUM simply ignores such second-order effects. Perhaps it would be
> practical to address some of the issues in this area by avoiding
> setting pages all-visible without freezing them, in some general
> sense. That at least creates a kind of symmetry between mistakes in
> the direction of under-freezing and mistakes in the direction of
> over-freezing. That might enable VACUUM to course-correct in either
> direction.
>
> Melanie is already planning on combining the WAL records (PRUNE,
> FREEZE_PAGE, and VISIBLE). Perhaps that'll weaken the argument for
> setting unfrozen pages all-visible even further.

Yeah, so I think the question here is whether it's ever a good idea to
mark a page all-visible without also freezing it. If it's not, then we
should either mark fewer pages all-visible, or freeze more of them.
Maybe I'm all wet here, but I think it depends on the situation. If a
page is already dirty and has had an FPI since the last checkpoint,
then it's pretty appealing to freeze whenever we mark all-visible. We
still have to consider whether the incremental CPU cost and WAL volume
are worth it, but assuming those costs are small enough not to be a
big problem, it seems like a pretty good bet. Making a page
un-all-visible has some cost, but making a page un-all-frozen really
doesn't, so cool. On the other hand, if we have a page that isn't
dirty, hasn't had a recent FPI, and doesn't need pruning, but which
can be marked all-visible, freezing it is a potentially more
significant cost, because marking the buffer all-visible doesn't force
a new FPI, and freezing does.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Re: Eager page freeze criteria clarification at 2023-09-06 16:20:23 from Peter Geoghegan

Responses

Re: Eager page freeze criteria clarification at 2023-09-08 05:26:05 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Daniel Gustafsson	2023-09-06 20:23:55	Re: GUC for temporarily disabling event triggers
Previous Message	Bruce Momjian	2023-09-06 19:36:28	Re: Release notes wording about logical replication as table owner