Re: Eager page freeze criteria clarification

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Eager page freeze criteria clarification
Date: 2023-09-27 17:46:33
Message-ID: 20230927174633.hrnoia3vz5s7a5uv@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-09-27 10:25:00 -0700, Peter Geoghegan wrote:
> On Wed, Sep 27, 2023 at 10:01 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2023-09-26 09:07:13 -0700, Peter Geoghegan wrote:
> > I don't think doing this on a system wide basis with a metric like #unfrozen
> > pages is a good idea. It's quite common to have short lived data in some
> > tables while also having long-lived data in other tables. Making opportunistic
> > freezing more aggressive in that situation will just hurt, without a benefit
> > (potentially even slowing down the freezing of older data!). And even within a
> > single table, making freezing more aggressive because there's a decent sized
> > part of the table that is updated regularly and thus not frozen, doesn't make
> > sense.
>
> I never said that #unfrozen pages should be the sole criterion, for
> anything. Just that it would influence the overall strategy, making
> the system veer towards more aggressive freezing. It would complement
> a more sophisticated algorithm that decides whether or not to freeze a
> page based on its individual characteristics.
>
> For example, maybe the page-level algorithm would have a random
> component. That could potentially be where the global (or at least
> table level) view gets to influence things -- the random aspect is
> weighed using the global view of debt. That kind of thing seems like
> an interesting avenue of investigation.

I don't disagree that we should do something in that direction - I just don't
see the raw number of unfrozen pages being useful in that regard. If you have
a database where no pages live long, we don't need to freeze
oppportunistically, yet the fraction of unfrozen pages will be huge.

> > If we want to take global freeze debt into account, which I think is a good
> > idea, we'll need a smarter way to represent the debt than just the number of
> > unfrozen pages. I think we would need to track the age of unfrozen pages in
> > some way. If there are a lot of unfrozen pages with a recent xid, then it's
> > fine, but if they are older and getting older, it's a problem and we need to
> > be more aggressive.
>
> Tables like pgbench_history will have lots of unfrozen pages with a
> recent XID that get scanned during every VACUUM. We should be freezing
> such pages at the earliest opportunity.

I think we ought to be able to freeze tables with as simple a workload as
pgbench_history has aggressively without taking a global freeze debt into
account.

> > The problem I see is how track the age of unfrozen data -
> > it'd be easy enough to track the mean(oldest-64bit-xid-on-page), but then we
> > again have the issue of rare outliers moving the mean too much...
>
> I think that XID age is mostly not very important compared to the
> absolute amount of unfrozen pages, and the cost profile of freezing
> now versus later. (XID age *is* important in emergencies, but that's
> mostly not what we're discussing right now.)

We definitely *also* should take the number of unfrozen pages into account. I
just don't determining freeze debt primarily using the number of unfrozen
pages will be useful. The presence of unfrozen pages that are likely to be
updated again soon is not a problem and makes the simple metric pretty much
useless.

> To be clear, that doesn't mean that XID age shouldn't play an
> important role in helping VACUUM to differentiate between pages that
> should not be frozen and pages that should be frozen.

I think we need to take it into acocunt to determine a useful freeze debt on a
table level (and potentially system wide too).

Assuming we could compute it cheaply enough, if we had an approximate median
oldest-64bit-xid-on-page and the number of unfrozen pages, we could
differentiate between tables that have lots of recent unfrozen pages (the
median will be low) and pages with lots of unfrozen pages that are unlikely to
be updated again (the median will be high and growing). Something like the
median 64bit xid would be interesting because it'd not get "invalidated" if
relfrozenxid is increased.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-09-27 18:23:47 Re: Eager page freeze criteria clarification
Previous Message Andres Freund 2023-09-27 17:29:25 Re: pg_stat_get_activity(): integer overflow due to (int) * (int) for MemoryContextAllocHuge()