Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, digoal zhou <digoal(dot)zhou(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?
Date: 2015-07-05 13:19:59
Message-ID: alpine.DEB.2.10.1507051449210.26569@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Heikki,

>> I think that the load is distributed as the derivative of this function,
>> that is (1.5 * x ** 0.5): It starts at 0 but very quicky reaches 0.5, it
>> pass the 1.0 (average load) around 40% progress, and ends up at 1.5, that
>> is the finishing load is 1.5 the average load, just before fsyncing files.
>> This looks like a recipee for a bad time: I would say this is too large an
>> overload. I would suggest a much lower value, say around 1.1...
>
> Hmm. Load is distributed as a derivate of that, but probably not the way you
> think. Note that X means the amount of WAL consumed, not time.

Interesting point. After a look at IsCheckpointOnSchedule, and if I
understand the code correctly, it is actually *both*, so it really depends
whether the checkpoint was xlog or time triggered, and especially which
one (time/xlog) is proeminent at the beginning of the checkpoint.

If it is time triggered and paced my reasonning is probably right and
things will go bad/worse in the end, but if it is xlog-triggered and paced
your line of argument is probably closer to what happens.

This suggest that the corrective function should be applied with more
care, maybe only for the xlog-based on schedule test, but not the
time-based check.

> The goal is that I/O is constant over time, but the consumption of WAL
> over time is non-linear, with a lot more WAL consumed in the beginning
> of a checkpoint cycle. The function compensates for that.

*If* the checkpointer pacing comes from WAL size, which may or may not be
the case.

> [...]
>
> Unfortunately, we don't know the shape of g(X), as that depends on the
> workload. It might be linear, if there is no effect at all from
> full_page_writes. Or it could be a step-function, where every write causes a
> full page write, until all pages have been touched, and after that none do
> (something like an UPDATE without a where-clause might cause that).

If postgresql is running in its cache (i.e. within shared buffers), the
usual assumption would be an unknown exponential probability decreasing
with time while the same pages are hit over and over.

If postgresql is running on memory or disk (effective database size
greater than shared buffers), pages are statiscally not reused by another
update before being sent out, so the full page write would be always used
during the whole checkpoint, there is no WAL storm (or it is always a
storm, depending on the point of view) and the corrective factor would
only create issues...

So basically I would say that what to do heavily depends on the database
size and checkpoint trigger (time vs xlog), which really suggest that a
guc is indispensible, and maybe that the place the correction is applied
is currently not the right one.

> In pgbench-like workloads, it's something like sqrt(x).

Probably for a small database size?

> I picked X^1.5 as a reasonable guess. It's close enough to linear that
> it shouldn't hurt too much if g(x) is linear.

My understanding is still a 50% overload at the end of the checkpoint just
before issuing fsync... I think that could hurt in some case.

> But it cuts the worst spike at the very beginning, if g(x) is more like
> sqrt(x).

Hmmm. It's a balance between saving the 10 first seconds of the checkpoint
at the price of risking a panic at the end of the checkpoint.

Now the right approach might be for pg to know what is happening by
collecting statistics while running, and to apply a correction when it is
needed, for the amount needed.

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-07-05 14:51:48 Re: Exposing PG_VERSION_NUM in pg_config
Previous Message Fabien COELHO 2015-07-05 12:49:12 Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?