Quick Links

Re: Fix performance degradation of contended LWLock on NUMA

From:	Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
To:	Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>
Cc:	pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Re: Fix performance degradation of contended LWLock on NUMA
Date:	2017-09-11 15:01:14
Message-ID:	8e1dce77-6d4c-cfdb-00e0-8e9b3321aaba@redhat.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 09/08/2017 03:35 PM, Sokolov Yura wrote:
>> I'm seeing
>>
>> -M prepared: Up to 11% improvement
>> -M prepared -S: No improvement, no regression ("noise")
>> -M prepared -N: Up to 12% improvement
>>
>> for all runs the improvement shows up the closer you get to the number
>> of CPU threads, or above. Although I'm not seeing the same
>> improvements as you on very large client counts there are definitely
>> improvements :)
>
> It is expected:
> - patch "fixes NUMA": for example, it doesn't give improvement on 1 socket
> at all (I've tested it using numactl to bind to 1 socket)
> - and certainly it gives less improvement on 2 sockets than on 4 sockets
> (and 28 cores vs 72 cores also gives difference),
> - one of hot points were CLogControlLock, and it were fixed with
> "Group mode for CLOG updates" [1]
>

I'm planning to re-test that patch.

>>
>> +static inline bool
>> +LWLockAttemptLockOrQueueSelf(LWLock *lock, LWLockMode mode,
>> LWLockMode waitmode)
>>
>> I'll leave it to the Committer to decide if this method is too big to
>> be "inline".
>
> GCC 4.9 doesn't want to inline it without directive, and function call
> is then remarkable in profile.
>
> Attach contains version with all suggestions applied except remove of
> "inline".
>

Yes, ideally the method will be kept at "inline".

>> Open questions:
>> ---------------
>> * spins_per_delay as extern
>> * Calculation of skip_wait_list
>
> Currently calculation of skip_wait_list is mostly empirical (ie best
> i measured).
>

Ok, good to know.

> I strongly think that instead of spins_per_delay something dependent
> on concrete lock should be used. I tried to store it in a LWLock
> itself, but it were worse.

Yes, LWLock should be kept as small as possible, and cache line aligned
due to the cache storms, as shown by perf c2c.

> Recently I understand it should be stored in array indexed by tranche,
> but I didn't implement it yet, and therefore didn't measure.
>

Different constants for the LWLock could have an impact, but the
constants would also be dependent on machine setup, and work load.

Thanks for working on this !

Best regards,
Jesper

In response to

Re: Fix performance degradation of contended LWLock on NUMA at 2017-09-08 19:35:39 from Sokolov Yura

Responses

Re: Fix performance degradation of contended LWLock on NUMA at 2017-09-14 12:34:08 from Jesper Pedersen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2017-09-11 15:17:09	Re: The case for removing replacement selection sort
Previous Message	Tom Lane	2017-09-11 14:55:12	Re: Still another race condition in recovery TAP tests