From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule? |
Date: | 2015-12-14 23:08:43 |
Message-ID: | 566F4BFB.7060802@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I was planning to do some review/testing on this patch, but then I
noticed it was rejected with feedback in 2015-07 and never resubmitted
into another CF. So I won't waste time in testing this unless someone
shouts that I should do that anyway. Instead I'll just post some ideas
about how we might improve the patch, because I'd forget about them
otherwise.
On 07/05/2015 09:48 AM, Heikki Linnakangas wrote:
>
> The ideal correction formula f(x), would be such that f(g(X)) = X, where:
>
> X is time, 0 = beginning of checkpoint, 1.0 = targeted end of
> checkpoint (checkpoint_segments), and
>
> g(X) is the amount of WAL generated. 0 = beginning of checkpoint, 1.0
> = targeted end of checkpoint (derived from max_wal_size).
>
> Unfortunately, we don't know the shape of g(X), as that depends on the
> workload. It might be linear, if there is no effect at all from
> full_page_writes. Or it could be a step-function, where every write
> causes a full page write, until all pages have been touched, and after
> that none do (something like an UPDATE without a where-clause might
> cause that). In pgbench-like workloads, it's something like sqrt(x). I
> picked X^1.5 as a reasonable guess. It's close enough to linear that it
> shouldn't hurt too much if g(x) is linear. But it cuts the worst spike
> at the very beginning, if g(x) is more like sqrt(x).
Exactly. I think the main "problem" here is that we do mix two types of
WAL records, with quite different characteristics:
(a) full_page_writes - very high volume right after checkpoint, then
usually drops to much lower volume
(b) regular records - about the same volume over time (well, lower
volume right after the checkpoint, as that's where FPWs happen)
We completely ignore this when computing elapsed_xlogs, because we
compute it (about) like this:
elapsed_xlogs = wal_since_checkpoint / CheckPointSegments;
which of course gets confused when we write a lot of WAL right after a
checkpoint, because of FPW. But what if we actually tracked the amount
of WAL produced by FWP in a checkpoint (which we current don't AFAIK)?
Then we could compute the expected *remaining* amount of WAL to be
produced within the checkpoint interval, and use that to compute a
better progress like this:
wal_bytes - WAL (total)
wal_fpw_bytes - WAL (due to FPW)
prev_wal_bytes - WAL (total) in previous checkpoint
prev_wal_fpw_bytes - WAL (due to FPW) in previous checkpoint
So we know that we should expect about
(prev_wal_bytes - wal_bytes) + (prev_wal_fpw_bytes - wal_fpw_bytes)
( regular WAL ) + ( FPW WAL )
to be produced until the end of the current checkpoint. I don't have a
clear idea how to transform this into the 'progress' yet, but I'm pretty
sure tracking the two types of WAL is a key to a better solution. The
x^1.5 is probably a step in the right direction, but I don't feel
particularly confident about the 1.5 (which is rather arbitrary).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2015-12-15 00:23:53 | Re: Function and view to retrieve WAL receiver status |
Previous Message | Daniel Verite | 2015-12-14 22:15:14 | Re: [patch] Proposal for \rotate in psql |