From: | Florian Pflug <fgp(at)phlo(dot)org> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Negative Transition Aggregate Functions (WIP) |
Date: | 2014-01-29 12:45:39 |
Message-ID: | 8FF9A080-A3E3-4D72-B689-15EC8BAE7A73@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Jan29, 2014, at 09:59 , Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> On 28 January 2014 20:16, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> On Jan27, 2014, at 23:28 , Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>>> This case is explicitly forbidden, both in CREATE AGGREGATE and in the
>>> executor. To me, that seems overly restrictive --- if transfn is
>>> strict, then you know for sure that no NULL values are added to the
>>> aggregate state, so surely it doesn't matter whether or not
>>> inv_transfn is strict. It will never be asked to remove NULL values.
>>>
>>> I think there are definite use-cases where a user might want to use a
>>> pre-existing function as the inverse transition function, so it seems
>>> harsh to force them to wrap it in a strict function in this case.
>>
>> I'm not sure that the likelihood of someone wanting to combine a strict
>> forward with a non-strict inverse function is very hight, but neither
>>
>
> Me neither, but the checks to forbid it aren't adding anything, and I
> think it's best to make it as flexible as possible.
Ok.
>> Another idea would be to have an explicit nulls=ignore|process option
>> for CREATE AGGREGATE. If nulls=process and either of the transition
>> functions are strict, we could either error out, or simply do what
>> normal functions calls do and pretend they return NULL for NULL inputs.
>> Not sure how the rule that forward transition functions may not return
>> NULL if there's an inverse transition function would fit in if we do
>> the latter, though.
>>
>> The question is - is it worth it the effort to add that flag?
>
> One use-case I had in mind upthread was suppose you wanted to write a
> custom version of array_agg that only collected non-NULL values. With
> such a flag, that would be trivial, but with the current patch you'd
> have to (count-intuitively) wrap the inverse transition function in a
> strict function.
I'd be more convinced by that if doing so was actually possible for
non-superusers. But it isn't, because the aggregate uses "internal" as
it's state type and it's thus entirely up to the user to not crash the
database by mixing transfer and final functions with incompatible
state data. Plus, instead of creating a custom aggregate, one can just
use a FILTER clause to get rid of the NULL values.
> Another use-case I can imagine is suppose you wanted a custom version
> of sum() that returned NULL if any of the input values were NULL. If
> you knew that most of your data was non-NULL, you might still wish to
> benefit from an inverse transition function. Right now the patch won't
> allow that because of the error in advance_windowaggregate(), but
> possibly that could be relaxed by forcing a restart in that case.
That's not really true - that patch only forbids that if you insist on
representing the state "i have seen a NULL input" with a NULL state value.
But if you instead just count the number of NULLS in your transition
functions, all you need to do is to have your final function return NULL
if that count is not zero.
> If I've understood correctly that should give similar to current
> performance if NULLs are present, and enhanced performance as the
> window moved over non-NULL data.
Exactly - and this makes defining a NULL-sensitive SUM() this way
rather silly - a simple counter has very nearly zero overhead, and avoids
all rescans.
> In that second case, it would also be nice if you could simply re-use
> the existing sum forward and inverse transition functions, with a
> different null-handling flag.
Even if we had a nulls=process|ignore flag, SUM's transition functions
would still need to take that use-case into account explicitly to make
this work - at the very least, the forward transition function would
need to return NULL if the input is NULL instead of just skipping it as
it does now. But that would leads to completely unnecessary rescans, so
what we actually ought to do then is to make it track whether there have
been NULL inputs and make the finalfunc return NULL in this case. Which
would border on ironic, since the whole raison d'etre for this is to
*avoid* spreading NULL-counting logic around...
Plus, to make this actually useable, we'd have to document this, and tell
people how to define such a SUM aggregate. But I don't want to go there -
we really mustn't encourage people to mix-and-match built-in aggregates
with state type "internal", since whether they work or crash depends
on factors outside the control of the user.
And finally, you can get that behaviour quite easily by doing
CASE WHEN bool_and(input IS NOT NULL) whatever_agg(input) ELSE NULL END
> So I think an ignore-nulls flag would have real benefits, as well as
> being a cleaner design than relying on a strict inverse transition
> function.
The more I think about this, the less convinced I am. In fact, I'm
currently leaning towards just forbidding non-strict forward transition
function with strict inverses, and adding non-NULL counters to the
aggregates that then require them. It's really only the SUM() aggregates
that are affected by this, I think.
For an argument in favour of that, look at how we handle *non*-NULL
initial states. For these, even aggregates with nulls=ignore would
require some tracking, because otherwise they'd simply return the
initial state if the input contained no non-NULL value. Which quite
probably isn't what you'd want - you'd usually want to return NULL
in this case (as all the built-in aggregates do).
Currently, all built-in aggregates with non-NULL initial states compute
an expected value of some kind (average, std. deviation, regressions),
and thus need to count the number of inputs anyway. But that's just
coincidence - if it wasn't for state type "internal", having SUM start
from 0 instead of NULL would be entirely reasonable.
So if we have a nulls=ignore flag, we ought to have an nullif=only_nulls
flag too to cover all the ground. But at that point, just making this
the transition function's responsibility seems more sensible.
best regards,
Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2014-01-29 12:48:14 | Re: WIP patch (v2) for updatable security barrier views |
Previous Message | Ashutosh Bapat | 2014-01-29 12:38:08 | Re: LIKE INCLUDING CONSTRAINTS is broken |