From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Skip checkpoint on promoting from streaming replication |
Date: | 2013-01-24 17:44:26 |
Message-ID: | CA+U5nMLfun0KcusiEn7iB1o-HYcGQ8B0VG0HE7U9byJtf9r6Sg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 24 January 2013 16:52, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> On 24.01.2013 18:24, Simon Riggs wrote:
>>
>> On 6 January 2013 21:58, Simon Riggs<simon(at)2ndquadrant(dot)com> wrote:
>>>
>>> I've been torn between the need to remove the checkpoint for speed and
>>>
>>> being worried about the implications of doing so.
>>>
>>> We promote in multiple use cases. When we end a PITR, or are
>>> performing a switchover, it doesn't really matter how long the
>>> shutdown checkpoint takes, so I'm inclined to leave it there in those
>>> cases. For failover, we need fast promotion.
>>>
>>> So my thinking is to make pg_ctl promote -m fast
>>> be the way to initiate a fast failover that skips the shutdown
>>> checkpoint.
>>>
>>> That way all existing applications work the same as before, while new
>>> users that explicitly choose to do so will gain from the new option.
>>
>>
>> Here's a patch to skip checkpoint when we do
>>
>> pg_ctl promote -m fast
>>
>> We keep the end of recovery checkpoint in all other cases.
>
>
> Hmm, there seems to be no way to do a "fast" promotion with a trigger file.
True. I thought we were moving away from trigger files to use of "promote"
> I'm a bit confused why there needs to be special mode for this. Can't we
> just always do the "fast" promotion? I agree that there's no urgency when
> you're doing PITR, but shouldn't do any harm either. Or perhaps always do
> "fast" promotion when starting up from standby mode, and "slow" otherwise.
>
> Are we comfortable enough with this to skip the checkpoint after crash
> recovery?
I'm not. Maybe if we get no bugs we can make it do this always, in next release.
It;s fast when it needs to be and safe otherwise.
> I may be missing something, but it looks like after a "fast" promotion, you
> don't request a new checkpoint. So it can take quite a while for the next
> checkpoint to be triggered by checkpoint_timeout/segments. That shouldn't be
> a problem, but I feel that it'd be prudent to request a new checkpoint
> immediately (not necessarily an "immediate" checkpoint, though).
I thought of that and there is a long comment to explain why I didn't.
Two problems:
1) an immediate checkpoint can cause a disk/resource usage spike,
which is definitely not what you need just when a spike of connections
and new SQL hits the system.
2) If we did that, we would have an EndOfRecovery record, some other
records and then a Shutdown checkpoint.
As I right this, (2) is wrong, because we shouldn't do a a Shutdown
checkpoint anyway.
But I still think (1) is a valid concern.
>> The only thing left from Kyotaro's patch is a single line of code -
>> the call to ReadCheckpointRecord() that checks to see if the WAL
>> records for the last two restartpoints is on disk, which was an
>> important line of code.
>
>
> Why's that important, just for paranoia? If the last two restartpoints have
> disappeared, something's seriously wrong, and you will be in trouble e.g if
> you crash at that point. Do we need to be extra paranoid when doing a "fast"
> promotion?
The check is cheap, so what do we gain by skipping the check?
>> Patch implements a new record type XLOG_END_OF_RECOVERY that behaves
>> on replay like a shutdown checkpoint record. I put this back in from
>> my patch because I believe its important that we have a clear place
>> where the WAL history changes timelineId. WAL format change bump
>> required.
>
>
> Agreed, such a WAL record is essential.
>
> At replay, an end-of-recovery record should be a signal to the hot standby
> mechanism that there are no transactions running in the master at that
> point, same as a shutdown checkpoint.
I had a reason why I didn't do that, but it seems to have slipped my mind.
If I can't remember, I'll add it.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | David Fetter | 2013-01-24 17:51:46 | LATERAL, UNNEST and spec compliance |
Previous Message | Andres Freund | 2013-01-24 17:39:33 | Re: Proposal for Allow postgresql.conf values to be changed via SQL [review] |