Re: Eager page freeze criteria clarification

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Eager page freeze criteria clarification
Date: 2023-09-27 17:01:21
Message-ID: 20230927170121.j3klc3xi4yonle5y@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-09-26 09:07:13 -0700, Peter Geoghegan wrote:
> On Tue, Sep 26, 2023 at 8:19 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > However, I'm not at all convinced doing this on a system wide level is a good
> > idea. Databases do often contain multiple types of workloads at the same
> > time. E.g., we want to freeze aggressively in a database that has the bulk of
> > its size in archival partitions but has lots of unfrozen data in an active
> > partition. And databases have often loads of data that's going to change
> > frequently / isn't long lived, and we don't want to super aggressively freeze
> > that, just because it's a large portion of the data.
>
> I didn't say that we should always have most of the data in the
> database frozen, though. Just that we can reasonably be more lazy
> about freezing the remainder of pages if we observe that most pages
> are already frozen. How they got that way is another discussion.
>
> I also think that the absolute amount of debt (measured in physical
> units such as unfrozen pages) should be kept under control. But that
> isn't something that can ever be expected to work on the basis of a
> simple threshold -- if only because autovacuum scheduling just doesn't
> work that way, and can't really be adapted to work that way.

I don't think doing this on a system wide basis with a metric like #unfrozen
pages is a good idea. It's quite common to have short lived data in some
tables while also having long-lived data in other tables. Making opportunistic
freezing more aggressive in that situation will just hurt, without a benefit
(potentially even slowing down the freezing of older data!). And even within a
single table, making freezing more aggressive because there's a decent sized
part of the table that is updated regularly and thus not frozen, doesn't make
sense.

If we want to take global freeze debt into account, which I think is a good
idea, we'll need a smarter way to represent the debt than just the number of
unfrozen pages. I think we would need to track the age of unfrozen pages in
some way. If there are a lot of unfrozen pages with a recent xid, then it's
fine, but if they are older and getting older, it's a problem and we need to
be more aggressive. The problem I see is how track the age of unfrozen data -
it'd be easy enough to track the mean(oldest-64bit-xid-on-page), but then we
again have the issue of rare outliers moving the mean too much...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-09-27 17:07:21 Re: Set enable_seqscan doesn't take effect?
Previous Message Heikki Linnakangas 2023-09-27 16:42:03 Re: Unlinking Parallel Hash Join inner batch files sooner