From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Jeff Davis <pgsql(at)j-davis(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com> |
Subject: | Re: Eager page freeze criteria clarification |
Date: | 2023-12-13 17:24:25 |
Message-ID: | CA+TgmoYhQzdQryMo6QMGpqifbGu_Pz3GqvKSNJgyXYoKdfxcrA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Great results.
On Sat, Dec 9, 2023 at 5:12 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> Values can be "removed" from the accumulator by simply decrementing its
> cardinality and decreasing the sum and sum squared by a value that will
> not change the mean and standard deviation of the overall distribution.
> To adapt to a table's changing access patterns, we'll need to remove
> values from this accumulator over time, but this patch doesn't yet
> decide when to do this. A simple solution may be to cap the cardinality
> of the accumulator to the greater of 1% of the table size, or some fixed
> number of values (perhaps 200?). Even without such removal of values,
> the distribution recorded in the accumulator will eventually skew toward
> more recent data, albeit at a slower rate.
I think we're going to need something here. Otherwise, after 6 months
of use, changing a table's perceived access pattern will be quite
difficult.
I think one challenge here is to find something that doesn't decay too
often and end up with cases where it basically removes all the data.
As long as you avoid that, I suspect that the algorithm might not be
terribly sensitive to other kinds of changes. If you decay after 200
values or 2000 or 20,000, it will only affect how fast we can change
our notion of the access pattern, and my guess would be that any of
those values would produce broadly acceptable results, with some
differences in the details. If you decay after 200,000,000 values or
not at all, then I think there will be problems.
> This algorithm is able to predict when pages will be modified before
> target_freeze_threshold as long as their modification pattern fits a
> normal distribution -- this accommodates access patterns where, after
> some period of time, pages become less likely to be modified the older
> they are. As an example, however, access patterns where modification
> times are bimodal aren't handled well by this model (imagine that half
> your modifications are to very new data and the other half are to data
> that is much older but still younger than target_freeze_duration). If
> this is a common access pattern, the normal distribution could easily be
> swapped out for another distribution. The current accumulator is capable
> of tracking a distribution's first two moments of central tendency (the
> mean and standard deviation). We could track more if we wanted to use a
> fancier distribution.
I think it might be fine to ignore this problem. What we're doing
right now is way worse.
> We also must consider what to do when we have few unsets, e.g. with an
> insert-only workload. When there are very few unsets (I chose 30 which
> the internet says is the approximate minimum number required for the
> central limit theorem to hold), we can choose to always freeze freezable
> pages; above this limit, the calculated page threshold is used. However,
> we may not want to make predictions based on 31 values either. To avoid
> this "cliff", we could modify the algorithm a bit to skew the mean and
> standard deviation of the distribution using a confidence interval based
> on the number of values we've collected.
I think some kind of smooth transition would might be a good idea, but
I also don't know how much it matters. I think the important thing is
that we freeze aggressively unless there's a clear signal telling us
not to do that. Absence of evidence should cause us to tend toward
aggressive freezing. Otherwise, we get insert-only tables wrong.
> The goal is to keep pages frozen for at least target_freeze_duration.
> target_freeze_duration is in seconds and pages only have a last
> modification LSN, so target_freeze_duration must be converted to LSNs.
> To accomplish this, I've added an LSNTimeline data structure, containing
> XLogRecPtr, TimestampTz pairs stored with decreasing precision as they
> age. When we need to translate the guc value to LSNs, we linearly
> interpolate it on this timeline. For the time being, the global
> LSNTimeline is in PgStat_WalStats and is only updated by vacuum. There
> is no reason it can't be updated with some other cadence and/or by some
> other process (nothing about it is inherently tied to vacuum). The
> cached translated value of target_freeze_duration is stored in each
> table's stats. This is arbitrary as it is not a table-level stat.
> However, it needs to be located somewhere that is accessible on
> update/delete. We may want to recalculate it more often than once per
> table vacuum, especially in case of long-running vacuums.
This part sounds like it isn't quite baked yet. The idea of the data
structure seems fine, but updating it once per vacuum sounds fairly
unprincipled to me? Don't we want the updates to happen on a somewhat
regular wall clock cadence?
> To benchmark this new heuristic (I'm calling it algo 6 since it is the
> 6th I've proposed on this thread), I've used a modified subset of my
> original workloads:
>
> [ everything worked perfectly ]
Hard to beat that.
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-12-13 17:27:26 | Re: Clean up find_typedefs and add support for Mac |
Previous Message | Tristan Partin | 2023-12-13 17:23:50 | Re: Clean up find_typedefs and add support for Mac |