From: | Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> |
---|---|
To: | Юрий Соколов <funny(dot)falcon(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA |
Date: | 2017-11-27 19:10:57 |
Message-ID: | 99f4b913-2953-5d48-2d1c-22fd1160ade3@redhat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Yura,
On 11/27/2017 07:41 AM, Юрий Соколов wrote:
>>> I looked at assembly, and remembered, that last commit simplifies
>>> `init_local_spin_delay` to just two-three writes of zeroes (looks
>>> like compiler combines 2*4byte write into 1*8 write). Compared to
>>> code around (especially in LWLockAcquire itself), this overhead
>>> is negligible.
>>>
>>> Though, I found that there is benefit in calling LWLockAttemptLockOnce
>>> before entering loop with calls to LWLockAttemptLockOrQueue in the
>>> LWLockAcquire (in there is not much contention). And this way, `inline`
>>> decorator for LWLockAttemptLockOrQueue could be omitted. Given, clang
>>> doesn't want to inline this function, it could be the best way.
>>
>> In attach version with LWLockAcquireOnce called before entering loop
>> in LWLockAcquire.
>>
>
> Oh... there were stupid error in previos file.
> Attached fixed version.
>
I can reconfirm my performance findings with this patch; system same as
up-thread.
Thanks !
Best regards,
Jesper
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-11-27 19:16:25 | Re: ERROR: too many dynamic shared memory segments |
Previous Message | Robert Haas | 2017-11-27 18:53:53 | Re: [HACKERS] More stats about skipped vacuums |