| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
| Cc: | Amul Sul <sulamul(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [Patch] ALTER SYSTEM READ ONLY |
| Date: | 2020-12-10 00:34:28 |
| Message-ID: | 20201210003428.sy5tx55v5x242hrf@alap3.anarazel.de |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2020-12-09 16:13:06 -0500, Robert Haas wrote:
> That's not good. On a typical busy system, a system is going to be in
> the middle of a checkpoint most of the time, and the checkpoint will
> take a long time to finish - maybe minutes.
Or hours, even. Due to the cost of FPWs it can make a lot of sense to
reduce the frequency of that cost...
> We want this feature to respond within milliseconds or a few seconds,
> not minutes. So we need something better here.
Indeed.
> I'm inclined to think
> that we should try to CompleteWALProhibitChange() at the same places
> we AbsorbSyncRequests(). We know from experience that bad things
> happen if we fail to absorb sync requests in a timely fashion, so we
> probably have enough calls to AbsorbSyncRequests() to make sure that
> we always do that work in a timely fashion. So, if we do this work in
> the same place, then it will also be done in a timely fashion.
Sounds sane, without having looked in detail.
> I'm not 100% sure whether that introduces any other problems.
> Certainly, we're not going to be able to finish the checkpoint once
> we've gone read-only, so we'll fail when we try to write the WAL
> record for that, or maybe earlier if there's anything else that tries
> to write WAL. Either the checkpoint needs to error out, like any other
> attempt to write WAL, and we can attempt a new checkpoint if and when
> we go read/write, or else we need to finish writing stuff out to disk
> but not actually write the checkpoint completion record (or any other
> WAL) unless and until the system goes back into read/write mode - and
> then at that point the previously-started checkpoint will finish
> normally. The latter seems better if we can make it work, but the
> former is probably also acceptable. What you've got right now is not.
I mostly wonder which of those two has which implications for how many
FPWs we need to redo. Presumably stalling but not cancelling the current
checkpoint is better?
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Geoghegan | 2020-12-10 01:12:40 | Re: Deleting older versions in unique indexes to avoid page splits |
| Previous Message | Andres Freund | 2020-12-09 23:16:10 | Re: [Patch] ALTER SYSTEM READ ONLY |