Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

From: Craig James <cjames(at)emolecules(dot)com>
To: "Graeme B(dot) Bell" <graeme(dot)bell(at)nibio(dot)no>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Date: 2015-07-08 03:05:57
Message-ID: CAFwQ8rcEOiGJ79WtoO=uTX33ZhKVUWG9v8iO14TR+AoT_wd1og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Jul 2, 2015 at 9:15 AM, Graeme B. Bell <graeme(dot)bell(at)nibio(dot)no> wrote:

> Hi everyone,
>
> I've written a new open source tool for easily parallelising SQL scripts
> in postgres. [obligatory plug: https://github.com/gbb/par_psql ]
>
> Using it, I'm seeing a problem I've seen in other postgres projects
> involving parallelisation in the last 12 months.
>
> Basically:
>
> - I have machines here with up to 16 CPUs and 128GB memory, very fast SSDs
> and controller etc, carefully configured kernel/postgresql.conf for high
> performance.
>
> - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff
> ...), e.g. almost up to 16x performance improvement.
>
> - Calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT
> myfunction(some_stuff)) do not parallelise well, even when they are
> independent or accessing tables in a read-only way. They hit a limit at
> 2.5x performance improvement relative to single-CPU performance (pg9.4) and
> 2x performance (pg9.3). This is about 6 times slower than I'm expecting.
>
> - Can't see what would be locking. It seems like it's the pl/pgsql
> environment itself that is somehow locking or incurring some huge
> frictional costs. Whether I use independently defined functions,
> independent source tables, independent output tables, makes no difference
> whatsoever, so it doesn't feel 'locky'. It also doesn't seem to be
> WAL/synchronisation related, as the machines I'm using can hit absurdly
> high pgbench rates, and I'm using unlogged tables.
>
> Curious? Take a quick peek here:
> https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md
>
> Wondering what I'm missing here. Any ideas?
>

No ideas, but I ran into the same thing. I have a set of C/C++ functions
that put some chemistry calculations into Postgres as extensions (things
like, "calculate the molecular weight of this molecule"). As SQL functions,
the whole thing bogged down, and we never got the scalability we needed. On
our 8-CPU setup, we couldn't get more than 2 CPUs busy at the same time,
even with dozens of clients.

When I moved these same functions into an Apache fast-CGI HTTP service
(exact same code, same network overhead), I could easily scale up and use
the full 100% of all eight CPUs.

I have no idea why, and never investigated further. The convenience of
having the functions in SQL wasn't that important.

Craig

>
> Graeme.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
---------------------------------

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2015-07-08 05:31:43 Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Previous Message Merlin Moncure 2015-07-07 20:52:28 Re: Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?