From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Fabrízio Mello <fabriziomello(at)gmail(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Time-Delayed Standbys |
Date: | 2013-12-03 19:33:16 |
Message-ID: | CA+U5nML64Y7Nf0mpQirk-tOWrV+ZQ3y8DdRgk2mP+e3kGT16Mw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 18 October 2013 19:03, Fabrízio de Royes Mello
<fabriziomello(at)gmail(dot)com> wrote:
> The attached patch is a continuation of Robert's work [1].
Reviewing v2...
> I made some changes:
> - use of Latches instead of pg_usleep, so we don't have to wakeup regularly.
OK
> - call HandleStartupProcInterrupts() before CheckForStandbyTrigger() because
> might change the trigger file's location
OK
> - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and
> XLOG_XACT_COMMIT_COMPACT checks
Why just those? Why not aborts and restore points also?
> - don't care about clockdrift because it's an admin problem.
Few minor points on things
* The code with comment "Clear any previous recovery delay time" is in
wrong place, move down or remove completely. Setting the delay to zero
doesn't prevent calling recoveryDelay(), so that logic looks wrong
anyway.
* The loop exit in recoveryDelay() is inelegant, should break if <= 0
* There's a spelling mistake in sample
* The patch has whitespace in one place
and one important point...
* The delay loop happens AFTER we check for a pause. Which means if
the user notices a problem on a commit, then hits pause button on the
standby, the pause will have no effect and the next commit will be
applied anyway. Maybe just one commit, but its an off by one error
that removes the benefit of the patch. So I think we need to test this
again after we finish delaying
if (xlogctl->recoveryPause)
recoveryPausesHere();
We need to explain in the docs that this is intended only for use in a
live streaming deployment. It will have little or no meaning in a
PITR.
I think recovery_time_delay should be called
<something>_apply_delay
to highlight the point that it is the apply of records that is
delayed, not the receipt. And hence the need to document that sync rep
is NOT slowed down by setting this value.
And to make the name consistent with other parameters, I suggest
min_standby_apply_delay
We also need to document caveats about the patch, in that it only
delays on timestamped WAL records and other records may be applied
sooner than the delay in some circumstances, so it is not a way to
avoid all cancellations.
We also need to document the behaviour of the patch is to apply all
data received as quickly as possible once triggered, so the specified
delay does not slow down promoting the server to a master. That might
also be seen as a negative behaviour, since promoting the master
effectively sets recovery_time_delay to zero.
I will handle the additional documentation, if you can update the
patch with the main review comments. Thanks.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2013-12-03 19:34:10 | Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL |
Previous Message | Tom Lane | 2013-12-03 19:31:17 | Re: Extension Templates S03E11 |