From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Дмитрий Дегтярёв <degtyaryov(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PERFORM] Cpu usage 100% on slave. s_lock problem. |
Date: | 2013-11-21 16:37:34 |
Message-ID: | 528E36CE.3020402@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 21.11.2013 17:08, Merlin Moncure wrote:
> On Thu, Nov 21, 2013 at 9:02 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> On 2013-11-21 16:25:02 +0200, Heikki Linnakangas wrote:
>>> Hmm. All callers of RecoveryInProgress() must be prepared to handle the case
>>> that RecoveryInProgress() returns true, but the system is no longer in
>>> recovery. No matter what locking we do in RecoveryInProgress(), the startup
>>> process might finish recovery just after RecoveryInProgress() has returned.
>>
>> True.
>>
>>> What about the attached? It reads the shared variable without a lock or
>>> barrier. If it returns 'true', but the system in fact just exited recovery,
>>> that's OK. As explained above, all the callers must tolerate that anyway.
>>> But if it returns 'false', then it performs a full memory barrier, which
>>> should ensure that it sees any other shared variables as it is after the
>>> startup process cleared SharedRecoveryInProgress (notably,
>>> XLogCtl->ThisTimeLineID).
>>
>> I'd argue that we should also remove the spinlock in StartupXLOG and
>> replace it with a write barrier. Obviously not for performance reasons,
>> but because somebody might add more code to run under that spinlock.
>>
>> Looks good otherwise, although a read memory barrier ought to suffice.
>
> This code is in a very hot code path. Are we *sure* that the read
> barrier is fast enough that we don't want to provide an alternate
> function that only returns the local flag? I don't know enough about
> them to say either way.
In my patch, I put the barrier inside the if (!LocalRecoveryInProgress)
block. That codepath can only execute once in a backend, so performance
is not an issue there. Does that look sane to you?
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | J Smith | 2013-11-21 16:54:58 | Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1 |
Previous Message | Tatsuo Ishii | 2013-11-21 16:32:30 | Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1 |