Quick Links

Re: short-cutting if sum()>constant

From:	Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
To:
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: short-cutting if sum()>constant
Date:	2009-12-23 13:01:16
Message-ID:	20091223140116.1c79d821@dawn.webthatworks.it
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

On Wed, 23 Dec 2009 12:52:38 +0100
Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> The problem is massive cycle. Plpgsql really isn't language for
> this. This interpret missing own arithmetic unit. So every
> expression is translated to SELECT statement
>
> IF a > c ~ SELECT a > c
> a = a + 1 ~ SELECT a + 1

> these queries are executed in some special mode, but still it is
> more expensive than C a = a + 1

I didn't get it.
I'd expect that since plpgsql should shere SQL data types it could
map easily something like

if (a>b) then
or
a := a +b

and something like a for in query loop was going to be highly
optimised as well.
plpgsql should be the most tightly bounded language to the internals
of pg SQL.

> > select sum(a) from data; takes 1999.492 ms.
> > select count(*) from data; takes 1612.039 ms

> it is slower, because there is one NULL test more.

That didn't came as a surprise. It was there for comparison.

> PLpgSQL quietly uses cursors for FOR SELECT, your plperl uses

I didn't notice. That was my first plperl function in my life.
Anyway that means that that test didn't say anything about
interpreter speed and duck typing.

> > Anyway I'm not able to justify the difference in speed between
> > plpgsql and your solution in such case unless plpgsql is
> > switching back and forward between binary data and their text
> > representation. It would be nice if someone that know plpgsql
> > internals explain where the cost comes from.

> I thing so most expensive part in your plpgsql code is two
> expression.

I didn't understand.

What's the reason the interpreter can't translate the if and the b :=
row._a + b; into something that very resemble compiled C?
plpgsql is not doing anything different than:
select count(*) from (select case when a>3 then 1 else 2 end, a+a
from data limit 9000000) as f;
One of the advantages of using plpgsql (and one annoyance too) is
that variables are declared and plpgsql should know how to operate
on them with native C code.

BTW the best performer considering all constraints and data
distribution seems to be the simplest one:

select sum(s.a) from (select a from data where a>0 limit 9000000) s;
Time: 2620.677 ms

We've no negative int... and they are int, so they have to be no
less than 1 if they are not 0. If I know there are no 0, the
simplest version become even faster.

When I randomly pick up values in [1,3] plpgsql and generate_series
start to perform similarly but still the simplest version is leading.

When the interval is in [1,6] the plpgsql becomes faster than the
generate_series but the simplest version is still leading.

Just when the interval is in [1,10] the plpgsql version and the
simplest one becomes comparable.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

In response to

Re: short-cutting if sum()>constant at 2009-12-23 11:52:38 from Pavel Stehule

Responses

Re: short-cutting if sum()>constant at 2009-12-23 13:35:27 from Pavel Stehule

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Pavel Stehule	2009-12-23 13:35:27	Re: short-cutting if sum()>constant
Previous Message	Pavel Stehule	2009-12-23 11:52:38	Re: short-cutting if sum()>constant