Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Florian Pflug <fgp(at)phlo(dot)org>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-01-17 21:17:02
Message-ID: BDA6B20A-E94A-4416-B38C-3F3794EDF739@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jan17, 2014, at 20:34 , David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>
>> I've now shuffled things around so that we can use inverse transition functions
>> even if only some aggregates provide them, and to allow inverse transition
>> functions to force a restart by returning NULL. The rules regarding NULL handling
>> are now the following
>
> Maybe this is me thinking out loud, but I'm just thinking about the numeric case again.
> Since the patch can now handle inverse transition functions returning NULL when they
> fail to perform inverse transitions, I'm wondering if we could add an "expectedscale"
> to NumericAggState, set it to -1 initially, when we get the first value set the
> expectedscale to the dscale of that numeric, then if we get anything but that value
> we'll set the expectedscale back to -1 again, if we are asked to perform an inverse
> transition with a expectedscale as -1 we'll return null, otherwise we can perform
> the inverse transition...

You could do better than that - the numeric problem amounts to tracking the maximum
scale AFAICS, so it'd be sufficient to restart if we remove a value whose scale equals
the current maximum. But if we do SUM(numeric) at all, I think we should do so
without requiring restarts - I still think the overhead of tracking the maximum scale
within the aggregated values isn't that bad. If we zero the dscale counters lazily,
the numbers of entries we have to zero is bound by the maximum dscale we encounter.
Since actually summing N digits is probably more expensive than zeroing them, and since
we pay the zeroing overhead only once per aggregation but save potentially many
summations, I'm pretty sure we come out ahead by quite some margin.

It'd be interesting to do float() similar to the way you describe, though. We might
not be able to guarantee that we yield exactly the same result as without inverse
aggregation, but we might be able to bound the error. That might make it acceptable
to do this - as Kevin pointed out, float is always an approximation anyway. I haven't
really thought that through, though...

Anyway, with time running out fast if we want to get this into 9.4, I think we should
focus on getting this into a committable state right now.

I've started to look over what you've pushed to github, and it looks mostly fine.
I have a few comments - mostly cosmetic stuff - that I'll post once I finished reading
through it. I also plan to do some basic performance testing to verify that my
reshuffling of eval_windowaggregates() doesn't hurt aggregates without an inverse
transition function.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-01-17 21:18:06 Re: currawong is not a happy animal
Previous Message Tom Lane 2014-01-17 20:42:01 Re: [PATCH] pgcrypto: implement gen_random_uuid