From: | "Graeme B(dot) Bell" <graeme(dot)bell(at)nibio(dot)no> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: [BUGS] BUG #13493: pl/pgsql doesn't scale with cpus (PG9.3, 9.4) |
Date: | 2015-07-09 15:59:55 |
Message-ID: | A4C3F8F3-6636-4AFF-A15B-D08A941FD487@skogoglandskap.no |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
This is a reply to to Andreas's post on the #13495 documentation thread in -bugs.
I am responding to it here because it relates to #13493 only.
Andres wrote, re: #13493
>> This issue is absolutely critical for performance and scalability of code,
> Pft. In most cases it doesn't actually matter that much because the
> contained query are the expensive stuff. It's just when you do lots of
> very short and cheap things that it has such a big effect. Usually the
> effect on the planner is bigger.
Hi Andres,
'Pft' is kinda rude - I wouldn't comment on it normally, but seeing as you just lectured me on -performance on something you perceived as impolite (just like you lectured me on not spreading things onto multiple threads), can you please try to set a good example? You don't encourage new contributors into open source communities this way.
Getting to the point. I think the gap between our viewpoints comes from the fact I (and others here at my institute) have a bunch of pl/pgsql code here with for loops and calculations, which we see as 'code'. Thinking of all the users I know myself, I know there are plenty of GIS people out there using for loops and pgsql to simulate models on data in the DB, and I expect the same is true among e.g. older scientists with DB datasets.
Whereas it sounds like you and Tom see pl/pgsql as 'glue' and don't see any problem. As I have never seen statistics on pl/pgsql use-cases among users at large, I don't know what happens everywhere else outside of GIS-world and pgdev-world. Have you any references/data you can share on that? I would be interested to know because I don't want to overclaim on the importance of these bugs or any other bugs in future. In this case, #13493 wrecked the code for estimates on a 20 million euro national roadbuilding project here and it cost me a few weeks of my life, but for all I know you're totally right about the general importance to the world at large.
Though keep in mind: This isn't just only about scaling up one program. It's a db-level problem. If you have a large GIS DB server with many users, long-running queries etc. on large amounts of data, then you only need e.g. 2-3 people to be running some code with for-loops or a long series of calculation in pl/pgsql, and everything will fall apart in pgsql-land.
Last point. When I wrote 'absolutely critical' I was under the impression this bug could have some serious impact on postgis/pgrouting. Since I wanted to double check what you said about 'expensive stuff' vs 'short/cheap stuff', I ran some benchmarks to check on a few functions.
You are right that only short, looped things are affected. e.g. for loops with calculations and so on. Didn't see any trouble with the calls I made to postgis inside or outside of pgsql. This confirms/replicates your findings. Updated numbers/tests posted to github shortly.
Regards
Graeme Bell
From | Date | Subject | |
---|---|---|---|
Next Message | Graeme B. Bell | 2015-07-09 16:31:29 | Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this? |
Previous Message | Merlin Moncure | 2015-07-09 15:42:11 | Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this? |