From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com> |
Subject: | Re: Eager page freeze criteria clarification |
Date: | 2023-09-08 04:07:13 |
Message-ID: | 20230908040713.qlc4sojzjd4amqt4@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-09-06 10:35:17 -0400, Robert Haas wrote:
> On Wed, Sep 6, 2023 at 1:09 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Yea, it'd be useful to have a reasonably approximate wall clock time for the
> > last modification of a page. We just don't have infrastructure for determining
> > that. We'd need an LSN->time mapping (xid->time wouldn't be particularly
> > useful, I think).
> >
> > A very rough approximate modification time can be computed by assuming an even
> > rate of WAL generation, and using the LSN at the time of the last vacuum and
> > the time of the last vacuum, to compute the approximate age.
> >
> > For a while I thought that'd not give us anything that just using LSNs gives
> > us, but I think it might allow coming up with a better cutoff logic: Instead
> > of using a cutoff like "page LSN is older than 10% of the LSNs since the last
> > vacuum of the table", it would allow us to approximate "page has not been
> > modified in the last 15 seconds" or such. I think that might help avoid
> > unnecessary freezing on tables with very frequent vacuuming.
>
> Yes. I'm uncomfortable with the last-vacuum-LSN approach mostly
> because of the impact on very frequently vacuumed tables, and
> secondarily because of the impact on very infrequently vacuumed
> tables.
>
> Downthread, I proposed using the RedoRecPtr of the latest checkpoint
> rather than the LSN of the previou vacuum. I still like that idea.
Assuming that "downthread" references
https://postgr.es/m/CA%2BTgmoYb670VcDFbekjn2YQOKF9a7e-kBFoj2WJF1HtH7YPaWQ%40mail.gmail.com
could you sketch out the logic you're imagining a bit more?
> It's a value that we already have, with no additional bookkeeping. It
> varies over a much narrower range than the interval between vacuums on
> a table. The vacuum interval could be as short as tens of seconds as
> long as years, while the checkpoint interval is almost always going to
> be between a few minutes at the low end and some tens of minutes at
> the high end, hours at the very most. That's quite appealing.
The reason I was thinking of using the "lsn at the end of the last vacuum", is
that it seems to be more adapative to the frequency of vacuuming.
One the one hand, if a table is rarely autovacuumed because it is huge,
(InsertLSN-RedoRecPtr) might or might not be representative of the workload
over a longer time. On the other hand, if a page in a frequently vacuumed
table has an LSN from around the last vacuum (or even before), it should be
frozen, but will appear to be recent in RedoRecPtr based heuristics?
Perhaps we can mix both approaches. We can use the LSN and time of the last
vacuum to establish an LSN->time mapping that's reasonably accurate for a
relation. For infrequently vacuumed tables we can use the time between
checkpoints to establish a *more aggressive* cutoff for freezing then what a
percent-of-time-since-last-vacuum appach would provide. If e.g. a table gets
vacuumed every 100 hours and checkpoint timeout is 1 hour, no realistic
percent-of-time-since-last-vacuum setting will allow freezing, as all dirty
pages will be too new. To allow freezing a decent proportion of those, we
could allow freezing pages that lived longer than ~20%
time-between-recent-checkpoints.
Hm, possibly stupid idea: What about using shared_buffers residency as a
factor? If vacuum had to read in a page to vacuum it, a) we would need read IO
to freeze it later, as we'll soon evict the page via the ringbuffer b)
non-residency indicates the page isn't constantly being modified?
> Also, I think the time between checkpoints actually matters here, because in
> some sense we're looking to get dirty, already-FPI'd pages frozen before
> they get written out, or before a new FPI becomes necessary, and checkpoints
> are one way for the first of those things to happen and the only way for the
> second one to happen.
Intuitively it seems easier to take care of that by checking if an FPI is
needed or not?
I guess we, eventually, might want to also freeze if an FPI would be
generated, if we are reasonably certain that the page isn't going to be
modified again soon (e.g. when table stats indicate effectively aren't any
updates / deletes). Although perhaps it's better to take care of that via
freeze-on-writeout style logic.
I suspect than even for freeze-on-writeout we might end up needing some
heuristics
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2023-09-08 04:21:07 | Re: Document that server will start even if it's unable to open some TCP/IP ports |
Previous Message | Tatsuo Ishii | 2023-09-08 03:54:47 | Re: Row pattern recognition |