Re: Weird function behavior from Sept 11 snapshot

From: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mike Mascari <mascarm(at)mascari(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Weird function behavior from Sept 11 snapshot
Date: 2000-09-12 14:58:12
Message-ID: 39BE4484.C453B792@alumni.caltech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> ... Does anyone want to check the time for sum() or avg() on an
> int4 column over a large table, under both 7.0.* and current?

For 262144 rows on the current tree, I get the following:

sum(int4): 12.0 seconds
sum(float8): 5.2 seconds
sum(cast(int4 as float8): 5.7 seconds

This includes startup costs, etc, and are the minimum times from several
runs (there is pretty wide variability, presumably due to disk caching,
swapping, etc on my laptop). It is a safe bet that the original int4
implementation was as fast or faster than the float8 result above (int4
does not require palloc() calls).

> Actually, using a float8 accumulator would work pretty well; assuming
> IEEE float8, you'd only start to get roundoff error when the running
> sum exceeds 2^52 or so. However the SQL92 spec is insistent that sum()
> deliver an exact-numeric result when applied to exact-numeric data,
> and with a float accumulator we'd be at the mercy of the quality of the
> local implementation of floating point.

A problem with float8 is that it is possible to reach a point in the
accumulation where subsequent input values are ignored in the sum. This
is different than just roundoff error, since it degrades ungracefully
from that point on.

> I could see offering variant aggregates, say "sumf" and "avgf", that
> use float8 accumulation. Right now the user can get the same result
> by writing "sum(foo::float8)" but it might be wise to formalize the
> idea ...

How about using int8 for the accumulator (on machines which support it
of course)? Falling back to float8 or numeric on other machines? Or
perhaps we could have an option (runtime??) to switch accumulator modes.

I like the idea of something like "sumf" to get alternative algorithms,
but it would be nice if basic sum() could be a bit more optimized than
currently.

- Thomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-09-12 15:14:21 Re: Weird function behavior from Sept 11 snapshot
Previous Message Tom Lane 2000-09-12 14:24:32 Re: Weird function behavior from Sept 11 snapshot