Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: andres(at)anarazel(dot)de (Andres Freund)
Cc: Craig James <cjames(at)emolecules(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Graeme B(dot) Bell" <graeme(dot)bell(at)nibio(dot)no>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Date: 2015-07-09 03:38:38
Message-ID: 15702.1436413118@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

andres(at)anarazel(dot)de (Andres Freund) writes:
> On 2015-07-08 15:38:24 -0700, Craig James wrote:
>> From my admittedly naive point of view, it's hard to see why any of this
>> matters. I have functions that do purely CPU-intensive mathematical
>> calculations ... you could imagine something like is_prime(N) that
>> determines if N is a prime number. I have eight clients that connect to
>> eight backends. Each client issues an SQL command like, "select
>> is_prime(N)" where N is a simple number.

> I mostly replied to Merlin's general point (additionally in the context of
> plpgsql).

> But I have a hard time seing that postgres would be the bottleneck for a
> is_prime() function (or something with similar characteristics) that's
> written in C where the average runtime is more than, say, a couple
> thousand cyles. I'd like to see a profile of that.

But that was not the case that Graeme was complaining about. He's talking
about simple-arithmetic-and-looping written in plpgsql, in a volatile
function that is going to take a new snapshot for every statement, even if
that's only "n := n+1". So it's going to spend a substantial fraction of
its runtime banging on the ProcArray, and that doesn't scale. If you
write your is_prime function purely in plpgsql, and don't bother to mark
it nonvolatile, *it will not scale*. It'll be slow even in single-thread
terms, but it'll be particularly bad if you're saturating a multicore
machine with it.

One of my Salesforce colleagues has been looking into ways that we could
decide to skip the per-statement snapshot acquisition even in volatile
functions, if we could be sure that a particular statement isn't going to
do anything that would need a snapshot. Now, IMO that doesn't really do
much for properly written plpgsql; but there's an awful lot of bad plpgsql
code out there, and it can make a huge difference for that.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Graeme B. Bell 2015-07-09 08:59:26 Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Previous Message Andres Freund 2015-07-08 22:45:18 Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?