From: | Jan Wieck <jan(at)wi3ck(dot)info> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: s_lock() seems too aggressive for machines with many sockets |
Date: | 2015-06-10 13:54:00 |
Message-ID: | 55784178.4020005@wi3ck.info |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 06/10/2015 09:28 AM, Andres Freund wrote:
> On 2015-06-10 09:18:56 -0400, Jan Wieck wrote:
>> On a machine with 8 sockets, 64 cores, Hyperthreaded 128 threads total, a
>> pgbench -S peaks with 50-60 clients around 85,000 TPS. The throughput then
>> takes a very sharp dive and reaches around 20,000 TPS at 120 clients. It
>> never recovers from there.
>
> 85k? Phew, that's pretty bad. What exact type of CPU is this? Which
> pgbench scale? Did you use -M prepared?
model name : Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz
numactl --hardware shows the distance to the attached memory as 10, the
distance to every other node as 21. I interpret that as the machine
having one NUMA bus with all cpu packages attached to that, rather than
individual connections from cpu to cpu or something different.
pgbench scale=300, -Msimple.
>
> Could you share a call graph perf profile?
I do not have them handy at the moment and the machine is in use for
something else until tomorrow. I will forward perf and systemtap based
graphs ASAP.
What led me into that spinlock area was the fact that a wall clock based
systemtap FlameGraph showed a large portion of the time spent in
BufferPin() and BufferUnpin().
>
>> The attached patch demonstrates that less aggressive spinning and
>> (much) more often delaying improves the performance "on this type of
>> machine". The 8 socket machine in question scales to over 350,000 TPS.
>
> Even that seems quite low. I've gotten over 500k TPS on a four socket
> x86 machine, and about 700k on a 8 socket x86 machine.
There is more wrong with the machine in question than just that. But for
the moment I am satisfied with having a machine where I can reproduce
this phenomenon in what appears to be a worst case.
>
> Maybe we need to adjust the amount of spinning, but to me such drastic
> differences are a hint that we should tackle the actual contention
> point. Often a spinlock for something regularly heavily contended can be
> worse than a queued lock.
I have the impression that the code assumes that there is little penalty
for accessing the shared byte in a tight loop from any number of cores
in parallel. That apparently is true for some architectures and core
counts, but no longer holds for these machines with many sockets.
Regards, Jan
--
Jan Wieck
Senior Software Engineer
http://slony.info
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2015-06-10 14:05:21 | Re: s_lock() seems too aggressive for machines with many sockets |
Previous Message | Noah Misch | 2015-06-10 13:41:59 | Re: [COMMITTERS] pgsql: Add pg_audit, an auditing extension |