Re: short-cutting if sum()>constant

From: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
To:
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: short-cutting if sum()>constant
Date: 2009-12-23 13:01:16
Message-ID: 20091223140116.1c79d821@dawn.webthatworks.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Wed, 23 Dec 2009 12:52:38 +0100
Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> The problem is massive cycle. Plpgsql really isn't language for
> this. This interpret missing own arithmetic unit. So every
> expression is translated to SELECT statement
>
> IF a > c ~ SELECT a > c
> a = a + 1 ~ SELECT a + 1

> these queries are executed in some special mode, but still it is
> more expensive than C a = a + 1

I didn't get it.
I'd expect that since plpgsql should shere SQL data types it could
map easily something like

if (a>b) then
or
a := a +b

and something like a for in query loop was going to be highly
optimised as well.
plpgsql should be the most tightly bounded language to the internals
of pg SQL.

> > select sum(a) from data; takes 1999.492 ms.
> > select count(*) from data; takes 1612.039 ms

> it is slower, because there is one NULL test more.

That didn't came as a surprise. It was there for comparison.

> PLpgSQL quietly uses cursors for FOR SELECT, your plperl uses

I didn't notice. That was my first plperl function in my life.
Anyway that means that that test didn't say anything about
interpreter speed and duck typing.

> > Anyway I'm not able to justify the difference in speed between
> > plpgsql and your solution in such case unless plpgsql is
> > switching back and forward between binary data and their text
> > representation. It would be nice if someone that know plpgsql
> > internals explain where the cost comes from.

> I thing so most expensive part in your plpgsql code is two
> expression.

I didn't understand.

What's the reason the interpreter can't translate the if and the b :=
row._a + b; into something that very resemble compiled C?
plpgsql is not doing anything different than:
select count(*) from (select case when a>3 then 1 else 2 end, a+a
from data limit 9000000) as f;
One of the advantages of using plpgsql (and one annoyance too) is
that variables are declared and plpgsql should know how to operate
on them with native C code.

BTW the best performer considering all constraints and data
distribution seems to be the simplest one:

select sum(s.a) from (select a from data where a>0 limit 9000000) s;
Time: 2620.677 ms

We've no negative int... and they are int, so they have to be no
less than 1 if they are not 0. If I know there are no 0, the
simplest version become even faster.

When I randomly pick up values in [1,3] plpgsql and generate_series
start to perform similarly but still the simplest version is leading.

When the interval is in [1,6] the plpgsql becomes faster than the
generate_series but the simplest version is still leading.

Just when the interval is in [1,10] the plpgsql version and the
simplest one becomes comparable.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Pavel Stehule 2009-12-23 13:35:27 Re: short-cutting if sum()>constant
Previous Message Pavel Stehule 2009-12-23 11:52:38 Re: short-cutting if sum()>constant