From: | "Graeme B(dot) Bell" <graeme(dot)bell(at)nibio(dot)no> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this? |
Date: | 2015-07-09 09:44:24 |
Message-ID: | 14796976-FF86-4A4D-97EE-259FCC9D01CF@skogoglandskap.no |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 09 Jul 2015, at 05:38, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> If you
> write your is_prime function purely in plpgsql, and don't bother to mark
> it nonvolatile, *it will not scale*.
> much for properly written plpgsql; but there's an awful lot of bad plpgsql
> code out there, and it can make a huge difference for that.
Hi Tom,
I object to phrases like 'don't bother to mark it' and 'bad plpgsql' here. That is putting the blame on programmers. Clearly, if there is no end of code out there that isn't right in this regard, there's something very wrong in the project documentation.
1. I have been writing pl/pgsql on and off for a couple of years now and I've read quite a bit of the postgres doumentation, but I don't recall seeing a clear statement telling me I should mark pl/pgsql functions nonvolatile wherever possible or throw all performance and scalability out the window. I'm sure there may be a line hidden somewhere in the docs, but judging from the impact it has in practice, this seems like a very fundamental concept that should be repeatedly and clearly marked in the docs.
2. Furthermore, I have never come across anything in the documentation that made it clear to me that any pl/pgsql function I write will, by default, be taking out locks for every single statement in the code. I've written code in I dunno, maybe 15-20 different languages in my life, and I can't think of another language offhand that does that by default. From the reactions on this thread to this benchmark and the par_psql benchmarks, it doesn't seem that it was even immediately obvious to many postgres enthusiasts and developers.
3. I don't disagree that the benchmark code is objectively 'bad' in the sense that it is missing an important optimisation.
But I really don't think it helps to frame this as laziness or "bad" in any other sense of the word e.g. 'clumsy'.
Let's look at the postgresql documentation for some examples of 'bad' and lazy code:
http://www.postgresql.org/docs/9.3/static/plpgsql-structure.html
http://www.postgresql.org/docs/9.3/static/plpgsql-declarations.html
There are about 13 functions on that page.
How many functions on that page make use non-volatile or immutable wherever it would be appropriate?
zero.
or this one: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html
zero
or this one: http://www.postgresql.org/docs/9.3/static/plpgsql-cursors.html#PLPGSQL-CURSOR-USING
zero
The reason 90% of people out there are 'not bothering' and 'writing bad code' is because **99% of the postgresql documentation teaches them to do it that way**.
So when you talk about other people 'not bothering' to do things - who is really at fault here what for what you see as endemic 'bad' or 'lazy' code? Is it the new postgres programmers, or the people that taught them with "bad" examples consistently throughout the *entire project documentation*, starting from the very first example?
I think I'm going to raise this as a documentation bug.
Graeme.
From | Date | Subject | |
---|---|---|---|
Next Message | Graeme B. Bell | 2015-07-09 10:30:35 | Re: Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this? |
Previous Message | Andres Freund | 2015-07-09 09:00:08 | Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this? |