From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: short-cutting if sum()>constant |
Date: | 2009-12-23 13:35:27 |
Message-ID: | 162867790912230535m6c84a1cfodea1c269d3d84301@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
2009/12/23 Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>:
> On Wed, 23 Dec 2009 12:52:38 +0100
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>> The problem is massive cycle. Plpgsql really isn't language for
>> this. This interpret missing own arithmetic unit. So every
>> expression is translated to SELECT statement
>>
>> IF a > c ~ SELECT a > c
>> a = a + 1 ~ SELECT a + 1
>
>> these queries are executed in some special mode, but still it is
>> more expensive than C a = a + 1
>
> I didn't get it.
> I'd expect that since plpgsql should shere SQL data types it could
> map easily something like
>
> if (a>b) then
> or
> a := a +b
>
> and something like a for in query loop was going to be highly
> optimised as well.
> plpgsql should be the most tightly bounded language to the internals
> of pg SQL.
no there are not any internal optimisation. PostgreSQL can optimize
only execution plan - not individual expressions.
>
>> > select sum(a) from data; takes 1999.492 ms.
>> > select count(*) from data; takes 1612.039 ms
>
>> it is slower, because there is one NULL test more.
>
> That didn't came as a surprise. It was there for comparison.
>
>> PLpgSQL quietly uses cursors for FOR SELECT, your plperl uses
>
> I didn't notice. That was my first plperl function in my life.
> Anyway that means that that test didn't say anything about
> interpreter speed and duck typing.
you don't compare equal things.
>
>> > Anyway I'm not able to justify the difference in speed between
>> > plpgsql and your solution in such case unless plpgsql is
>> > switching back and forward between binary data and their text
>> > representation. It would be nice if someone that know plpgsql
>> > internals explain where the cost comes from.
>
>> I thing so most expensive part in your plpgsql code is two
>> expression.
>
one expression: if >> loop_var > parameter <<
second expression: loop_var + 1
> I didn't understand.
>
> What's the reason the interpreter can't translate the if and the b :=
> row._a + b; into something that very resemble compiled C?
a) simplicity. PLpgSQL interpret is very simple.
b) mostly types are little bit different behave than natural C types -
domains are different, C types doesn't know NULL value, ...
so if you like maximum speed, then you can use C language. It is
relative simple, much simpler than C procedures in T-SQL or Oracle.
> plpgsql is not doing anything different than:
> select count(*) from (select case when a>3 then 1 else 2 end, a+a
> from data limit 9000000) as f;
no - it doesn't any intelligence - it doesn't join expression
together. It does exactly it, what you write.
see http://okbob.blogspot.com/2009/12/how-dont-write-plpgsql-functions.html
> One of the advantages of using plpgsql (and one annoyance too) is
> that variables are declared and plpgsql should know how to operate
> on them with native C code.
It little bit nonsense. On 99% plpgsql use SPI api and work with
variables via query interface. PLpgSQL can do loop statement, if
statement, but for all others use internal query based API.
>
> BTW the best performer considering all constraints and data
> distribution seems to be the simplest one:
>
> select sum(s.a) from (select a from data where a>0 limit 9000000) s;
> Time: 2620.677 ms
>
this query is little but different, than you original request, but it
could work for you.
Regard
Pavel Stehule
> We've no negative int... and they are int, so they have to be no
> less than 1 if they are not 0. If I know there are no 0, the
> simplest version become even faster.
>
> When I randomly pick up values in [1,3] plpgsql and generate_series
> start to perform similarly but still the simplest version is leading.
>
> When the interval is in [1,6] the plpgsql becomes faster than the
> generate_series but the simplest version is still leading.
>
> Just when the interval is in [1,10] the plpgsql version and the
> simplest one becomes comparable.
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>
From | Date | Subject | |
---|---|---|---|
Next Message | Ivan Sergio Borgonovo | 2009-12-23 14:27:28 | Re: short-cutting if sum()>constant |
Previous Message | Ivan Sergio Borgonovo | 2009-12-23 13:01:16 | Re: short-cutting if sum()>constant |