From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Davis <pgsql(at)j-davis(dot)com> |
Subject: | Re: Eager page freeze criteria clarification |
Date: | 2023-08-28 21:05:57 |
Message-ID: | CAH2-WzkVXc1MJSmgCobbc3X+Pps2cwGa6ngreCFyYqO9US=bXA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Aug 28, 2023 at 1:17 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm sure this could be implemented, but it's unclear to me why you
> would expect it to perform well. Freezing a page that has no frozen
> tuples yet isn't cheaper than freezing one that does, so for this idea
> to be a win, the presence of frozen tuples on the page would have to
> be a signal that the page is likely to be modified again in the near
> future. In general, I don't see any reason why we should expect that
> to be the case.
What I've described in a scheme for deciding to not freeze where that
would usually happen -- a scheme for *vetoing* page freezing -- rather
than a scheme for deciding to freeze. On its own, what I suggested
cannot help at all. It assumes a world in which we're already deciding
to freeze much more frequently, based on whatever other criteria. It's
intended to complement something like the LSN scheme.
> What really matters here is finding a criterion that is likely to
> perform well in general, on a test case not known to us beforehand.
> This isn't an entirely feasible goal, because just as you can
> construct a test case where any given criterion performs well, so you
> can also construct one where any given criterion performs poorly. But
> I think a rule that has a clear theory of operation must be preferable
> to one that doesn't. The theory that Melanie and Andres are advancing
> is that a page that has been modified recently (in insert-LSN-time) is
> more likely to be modified again soon than one that has not i.e. the
> near future will be like the recent past.
I don't think that it's all that useful on its own. You just cannot
ignore the fact that the choice to not freeze now doesn't necessarily
mean that you get to rereview that choice in the near future.
Particularly with large tables, the opportunities to freeze at all are
few and far between -- if for no other reason than the general design
of autovacuum scheduling. Worse still, any unfrozen all-visible pages
can just accumulate as all-visible pages, until the next aggressive
VACUUM happens whenever. How can that not be extremely important?
That isn't an argument against a scheme that uses LSNs (many kinds of
information might be weighed) -- it's an argument in favor of paying
attention to the high level cadence of VACUUM. That much seems
essential. I think that there might well be room for having several
complementary schemes like the LSN scheme. Or one big scheme that
weighs multiple factors together, if you prefer. That all seems
basically reasonable to me.
Adaptive behavior is important with something as complicated as this.
Adaptive schemes all seem to involve trial and error. The cost of
freezing too much is relatively well understood, and can be managed
sensibly. So we should err in that direction -- a direction that is
relatively easy to understand, to notice, and to pull back from having
gone too far. Putting off freezing for a very long time is a source of
much of the seemingly intractable complexity in this area.
Another way of addressing that is getting rid of aggressive VACUUM as
a concept. But I'm not going to revisit that topic now, or likely
ever.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2023-08-28 21:12:00 | Re: Wrong usage of pqMsg_Close message code? |
Previous Message | Heikki Linnakangas | 2023-08-28 20:52:15 | Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code) |