Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

From: "Graeme B(dot) Bell" <graeme(dot)bell(at)nibio(dot)no>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Craig James <cjames(at)emolecules(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Graeme B(dot) Bell" <graeme(dot)bell(at)nibio(dot)no>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Date: 2015-07-09 08:59:26
Message-ID: E1016428-2C5E-4348-A9D8-ED50DC07FA9E@skogoglandskap.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 08 Jul 2015, at 22:27, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote:
>> On Wed, Jul 8, 2015 at 12:48 PM, Craig James <cjames(at)emolecules(dot)com> wrote:
>>>
>>> Well, right, which is why I mentioned "even with dozens of clients."
>>> Shouldn't that scale to at least all of the CPUs in use if the function is
>>> CPU intensive (which it is)?
>>
>> only in the absence of inter-process locking and cache line bouncing.
>
> And addititionally memory bandwidth (shared between everything, even in
> the numa case), cross socket/bus bandwidth (absolutely performance
> critical in multi-socket configurations), cache capacity (shared between
> cores, and sometimes even sockets!).

1. Note for future readers - it's also worth noting that depending on the operation, and on your hardware, you may have less "CPU cores" than you think to parallelise upon.

1a. For example AMD CPUs list the number of integer cores (e.g. 16), but there is actually only half as many cores available for floating point work (8). So if your functions need to use floating point, your scaling will suffer badly on FP functions.

https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)
"In terms of hardware complexity and functionality, this "module" is equal to a dual-core processor in its integer power, and to a single-core processor in its floating-point power: for each two integer cores, there is one floating-point core."

1b. Or, if you have hyper-threading enabled on an Intel CPU, you may think you have e.g. 8 cores, but if all the threads are running the same type of operation repeatedly, it won't be possible for the hyper-threading to work well and you'll only get 4 in practice. Maybe less due to overheads. Or, if your work is continuallly going to main memory for data (e.g. limited by the memory bus), it will run at 4-core speed, because the cores have to share the same memory bus.

Hyper-threading depends on the 2 logical cores being asked to perform two different types of tasks at once (each having relatively lower demands on memory).

"When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task."
https://en.wikipedia.org/wiki/Hyper-threading
https://en.wikipedia.org/wiki/Superscalar

2. Keep in mind also when benchmarking that it's normal to see an small drop-off when you hit the maximum number of cores for your system.
After all, the O/S and the benchmark program and anything else you have running will need a core or two.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andres Freund 2015-07-09 09:00:08 Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Previous Message Tom Lane 2015-07-09 03:38:38 Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?