Re: Eager page freeze criteria clarification

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Eager page freeze criteria clarification
Date: 2023-09-30 00:49:51
Message-ID: CAH2-WzmwKS-h4TzXmGz2-8FKu8YSyRMPvqA5pdNZoL4MRNoKEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 29, 2023 at 11:27 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think that's true. For me, the issue is what a user is practically
> likely to notice and care about. I submit that on a
> not-particularly-busy system, it would probably be fine to freeze
> aggressively in almost every situation, because you're only incurring
> costs you can afford to pay. On a busy system, it's more important to
> be right, or at least not too badly wrong. But even on a busy system,
> I think that when the time between data being written and being frozen
> is more than a few tens of minutes, it's very doubtful that anyone is
> going to notice the contribution that freezing makes to the overall
> workload. They're much more likely to notice an annoying autovacuum
> than they are to notIce a bit of excess freezing that ends up getting
> reversed. But when you start cranking the time between writing data
> and freezing it down into the single-digit numbers of minutes, and
> even more if you push down to tens of seconds or less, now I think
> people are going to care more about useless freezing work than about
> long-term autovacuum risks. Because now their database is really busy
> so they care a lot about performance, and seemingly most of the data
> involved is ephemeral anyway.

I think that that's all true.

As I believe I pointed out at least once in the past, my thinking
about the practical requirements from users was influenced by this
paper/talk:

https://www.microsoft.com/en-us/research/video/cost-performance-in-modern-data-stores-how-data-cashing-systems-succeed/

For the types of applications that Postgres is a natural fit for, most
of the data is cold -- very cold ("freezing" cold, even). If you have
a 50GB pgbench_accounts table, Postgres will always perform
suboptimally. But so will SQL Server, Oracle, and MySQL.

While pgbench makes a fine stress-test, for the most part its workload
is highly unrealistic. And yet we seem to think that it's just about
the most important benchmark of all. If we're not willing to get over
even small regressions in pgbench, I fear we'll never make significant
progress in this area. It's at least partly a cultural problem IMV.

The optimal freezing strategy for pgbench_accounts is to never freeze,
even once. It doesn't matter if you vary the distributions of the
accounts table updates -- it's still inevitable that each and every
row will get updated before too long. In fact, it's not just freezing
that should always be avoided here -- same applies with pruning by
VACUUM.

As I suggested in my last email, the lesson offered by
pgbench_accounts table seems to be "never VACUUM at all, except
perhaps to advance relfrozenxid" (which shouldn't actually require any
freezing even one page). If you haven't tuned heap fill factor, then
you might want to VACUUM a bit, at first. But, overall, vacuuming is
bad. That is the logical though absurd conclusion. It completely flies
in the face of practical experience.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2023-09-30 01:27:11 Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers
Previous Message Tom Lane 2023-09-30 00:46:20 Re: dikkop seems unhappy because of openssl stuff (FreeBSD 14-BETA1)