Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Florian Pflug <fgp(at)phlo(dot)org>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-01-14 01:00:15
Message-ID: 8B0338B7-95DD-4851-930A-B36A88342273@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jan10, 2014, at 22:27 , David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> As the patch stands at the moment, I currently have a regression test
> which currently fails due to these extra zeros after the decimal point:
>
> -- This test currently fails due extra trailing 0 digits.
> SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
> FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);
>
> Patched produces:
> 6.01
> 5.00
> 3.00
> Unpatched produces:
> 6.01
> 5
> 3

Hm, that's annoying. I checked the standard to see if it offers any guidance
here, but it doesn't look like it does - the standard just says that SUM() ought
to return a type with the same scale as the input type has. That's fine if
every NUMERIC type has a fixed scale, but that's not the case in postgres -
we allow values of unconstrained NUMERIC types (i.e., numeric types without an
explicitly specified precision or scale) to each define their own scale.

To fix this, we'd have to track the maximum scale within the current frame.
That's easier than the general problem of providing an inverse transition
function for MAX, because AFAIK we limit the scale to at most 1000. So it'd
be sufficient to track the number of times we saw each scale, and also the
current maximum. Once we reduce the current maximum's count back to zero
in the inverse transition function, we'd scan from that value to the left to
find the next non-zero entry.

We could also choose to ignore this, although I'm not sure I really like that.
It seems entirely OK at first sight - after all, the values all stay the same,
we just emit a different number of trailing zeros. But it still causes results
to be affected by values, even if only in the number of trailing zeros, which
lie outside the value's range. That seems like more non-determinism than as
database should show.

> With inverse transitions this query still produces correct results, it just does
> not produces the numeric in the same format as it does without performing inverse
> transitions. Personally I'd rather focus on trying to get SUM(numeric) in there
> for 9.4

I think it'd be worthwile to get this into 9.4, if that's still an option,
even if we only support COUNT.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2014-01-14 01:05:37 Re: plpgsql.consistent_into
Previous Message Tom Lane 2014-01-14 00:57:04 Re: plpgsql.consistent_into