From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Daniel Farina <drfarina(at)acm(dot)org> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Refactoring the Type System |
Date: | 2010-11-14 21:25:24 |
Message-ID: | AANLkTimWo+UM-c2dE7ma-b+g3AU__d2PnGg89P=axn-5@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Nov 14, 2010 at 2:27 PM, Daniel Farina <drfarina(at)acm(dot)org> wrote:
> On Sun, Nov 14, 2010 at 7:47 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Daniel Farina <drfarina(at)acm(dot)org> writes:
>>> Here are some weaknesses in the SUM aggregate that run up against the
>>> type system. Maybe they'll help crystallize some discussion:
>>
>>> SUM(int2) => int4
>>> SUM(int4) => int8
>>> SUM(int8) => numeric
>>
>>> Some weaknesses:
>>
>>> SUM, of any precision, assumes that the precision being accumulated
>>> into (which is also the return-precision) is enough to avoid overflow.
>>
>> This is not a flaw of the type system, it's just an implementation
>> choice in the SUM() aggregates. We could easily have chosen wider
>> accumulation and/or result types.
>
> That's true, but there are downsides to escalating the precision so
> aggressively.
>
> The case I was thinking about in particular involves composition of
> SUM. If one can assume that a relation has int4s and that will never
> overflow an int8 (as is done now), I don't see a great way to optimize
> the following case without special exceptions in the optimizer for
> particular aggregates known a-priori. Here's what would happen now:
>
> SELECT SUM(x::int8)::numeric
> FROM (SELECT SUM(x::int4)::int8 AS x
> FROM rel
> GROUP BY y) some_name;
>
> Could be rendered, by this assumption, as:
>
> SELECT SUM(x::int8)::int8
> ....(same FROM clause)
>
> (Why would anyone write a query like this? Views. Possibly inlined SQL
> UDFs, too.)
>
> This can be measurably faster. It also more properly constrains the
> result type, as numeric can also handle non-integer quantities.
>
> I should have underscored that a positive aspect of having a
> type-class like facility that allows declaration things like this
> hypothetical Integer when backed by concrete types that might support
> a superset of functionality.
Like Tom, I'm not sure this is really a type-system problem. This
sounds like a complaint that operations on "numeric" are much slower
than operations on "int4" and "int8", even for values that could be
represented by either type. I think that's a valid complaint, but I
don't see how changing the type system would help. I think what you'd
need to is optimize the existing numeric type, or provide a new
numeric-ish type with optimizations for dealing with
small-to-medium-sized integers.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2010-11-14 21:31:02 | Re: wCTE behaviour |
Previous Message | Greg Smith | 2010-11-14 21:23:48 | Re: pg_stat_bgwriter broken? |