From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Negative Transition Aggregate Functions (WIP) |
Date: | 2014-01-10 09:43:51 |
Message-ID: | CAEZATCW5F6z5pTvJL9CTm64wYHPAspeo3_Hb=vrES4q7-sDrMA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 10 January 2014 08:12, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Fri, Jan 10, 2014 at 4:09 AM, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
> wrote:
>>
>> Hi,
>>
>> Reading over this, I realised that there is a problem with NaN
>> handling --- once the state becomes NaN, it can never recover. So the
>> results using the inverse transition function don't match HEAD in
>> cases like this:
>>
>> create table t(a int, b numeric);
>> insert into t values(1,1),(2,2),(3,'NaN'),(4,3),(5,4);
>> select a, b,
>> sum(b) over(order by a rows between 1 preceding and current row)
>> from t;
>>
>> which in HEAD produces:
>>
>> a | b | sum
>> ---+-----+-----
>> 1 | 1 | 1
>> 2 | 2 | 3
>> 3 | NaN | NaN
>> 4 | 3 | NaN
>> 5 | 4 | 7
>> (5 rows)
>>
>> but with this patch produces:
>>
>> a | b | sum
>> ---+-----+-----
>> 1 | 1 | 1
>> 2 | 2 | 3
>> 3 | NaN | NaN
>> 4 | 3 | NaN
>> 5 | 4 | NaN
>> (5 rows)
>>
>
> Nice catch! Thanks for having a look at the patch.
>
> Ok, so I thought about this and I don't think it's too big a problem at to
> fix it all. I think it can be handled very similar to how I'm taking care of
> NULL values in window frame. For these, I simply keep a count of them in an
> int64 and when the last one leaves the aggregate context things can continue
> as normal.
>
> Lucky for us that all numeric aggregation (and now inverse aggregation) goes
> through 2 functions. do_numeric_accum() and the new inverse version of it
> do_numeric_discard(), both these functions operate on a NumericAggState
> which in the attached I've changed the isNaN bool field to a NaNCount int64
> field. I'm just doing NaNCount++ when we get a NaN value in do_numeric_accum
> and NaNCount-- in do_numeric_discard(), in the final functions I'm just
> checking if NaNCount > 0.
>
Cool, that sounds like a neat fix.
> Though this implementation does fix the reported problem unfortunately it
> may have an undesired performance impact for numeric aggregate functions
> when not uses in the context of a window.. Let me explain what I mean:
>
> Previously there was some code in do_numeric_accum() which did:
>
> if (state->isNaN || NUMERIC_IS_NAN(newval))
> {
> state->isNaN = true;
> return;
> }
>
> Which meant that it didn't bother adding new perfectly valid numerics to the
> aggregate totals when there was an NaN encountered previously. I had to
> change this to continue on regardless as we still need to keep the totals
> just in case all the NaN values are removed and the totals are required once
> again. This means that the non-window version of SUM(numeric) and
> AVG(numeric) and and the stddev aggregates for numeric pay a price and have
> to keep on totaling after encountering NaN values. :(
>
I suspect that NaNs almost never occur in practice, so the fact that
it might now take longer to tell you that the sum is NaN doesn't worry
me. More important is that it always gives the right answer.
Note, if anyone ever did this for floats, +/- Infinity would also need
to be handled, so you'd have to maintain 3 counts and deal with logic
like Infinity - Infinity = NaN.
Regards,
Dean
From | Date | Subject | |
---|---|---|---|
Next Message | Marko Tiikkaja | 2014-01-10 09:52:21 | Re: array_length(anyarray) |
Previous Message | Merlin Moncure | 2014-01-10 09:41:34 | Re: array_length(anyarray) |