From: | Florian Pflug <fgp(at)phlo(dot)org> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Negative Transition Aggregate Functions (WIP) |
Date: | 2014-01-20 14:20:51 |
Message-ID: | 1D0E5AB5-ED42-4A33-8691-DD3C7DA1578A@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Jan20, 2014, at 08:42 , David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>> On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> * An assert that the frame end doesn't move backwards - I realized that
>> it is after all easy to do that, if it's done after the loop which adds
>> the new values, not before.
>
> I've applied this too, but I'm wondering why an elog for if the head moves
> back, but an assert if the tail moves back?
When I put the frame head check in, I was concerned that the code might crash
or loop endlessly if aggregatedbase was ever larger than frameheadpos, so
I made it elog(), just for safety.
The frame end check, OTOH, is done at the very end, so it doesn't protect
against much, it just documents that it's not supposed to happen.
But yeah, it's bit weird. Feel free to turn the frame end check into an
elog() too if you want to.
>> * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and
>> it's the last commit, so if you object to that, then you can merge up to
>> eafa72330f23f7c970019156fcc26b18dd55be27 instead of
>> de3d9148be9732c4870b76af96c309eaf1d613d7.
>
>
> Seems like sfunc really should be tfunc then we could have invtfunc. I'd probably
> understand this better if I knew what the 's' was for in sfunc. I've not applied
> this just yet. Do you have a reason why you think it's better?
My issue with just "invfunc" is mainly that it's too generic - it doesn't tell
you what it's supposed to be the inverse of.
I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and that
the naming is inspired by control theory, where the function which acts on the
state space is often called S.
> Thanks, yeah those really do need a review. I've lost a bit of direction with
> them and I'm not quite sure just how much detail to go in to with it. I'd like
> to explain a bit that users who need to use their numeric columns in window
> aggregates might want to think about having a defined scale to the numeric rather
> than an undefined scale and explain that this is because the inverse transition
> function for numeric bails out if it loses the maximum seen dscale. Though it
> does seem generally a good idea to have a defined scale, but then I guess you've
> got to have a bit of knowledge about the numbers you're storing in that case.
> I'm not quite sure how to put that into words friendly enough for the documents
> just yet and or exactly where to put the words. So any ideas or patches you have
> around that would be great.
Here's what I image the docs should look like, very roughly
* In CREATE AGGREGATE, we should state the precise axioms we assume about forward
and inverse transition functions. The last time I read the text there, it was
a bit ambiguous about whether inverse transition functions assume commutativity,
i.e. whether we assume that we can remove inputs other than the first one from
the aggregation. Currently, all the inverse transition functions we have are,
in fact, commutative, because all the forward transition function are also. But
we could e.g. add an inverse to array_agg and string_agg, and those would
obviously need this ordering guarantee. I'd also like us to explain that the
"inverse" in "inverse transition function" shouldn't be taken too literally. For
forward transition function F, inverse transition function G, and inputs a,b,...
we *don't require that G(F(s,a),a) == s. We we do require is that if I is the
initial state, then
G(F(...F(F(I,a),b)...,z),a) == F(...F(I,b)...,z).
(Well, actually we don't strictly require even that, the two states don't
need to be identical, they just need to produce identical outputs when passed
to the final function)
* In CREATE AGGREGATE, we should also explain the NULL semantics you get
with various combinations of strict and non-strict forward and inverse
transition functions. I think a table listing all the combinations is in order
there, with a column explaining the semantics you get. We should also clearly
state that once you provide an inverse transition function, NULL isn't a valid
state value anymore, except as an initial state, i.e. that the forward transition
function must never return NULL in this case.
* The window function page should explain the performance hazards when
a moving frame head is involved. Ideally, we'd list which aggregates never
have to restart, and which sometimes might, and what you can do to avoid that.
We can also tell people to use EXPLAIN VERBOSE ANALYZE to check whether
restarts are occurring. This is were we'd tell people to cast their numeric
operands to a type with defined scale to avoid restarts, and were we'd state
the SUM(float) *does* restart always due to precision loss issues.
BTW, something that we haven't addressed at all is how inverse transition
functions interact with what is called "ordered-set aggregates". I haven't
wrapped my head around these fully, I think, so I'm still not sure if there's
anything to do there or not. Can those even be used as window functions?
Should we forbid ordered-set aggregates from specifying an inverse transition
function?
best regards,
Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Florian Pflug | 2014-01-20 14:25:22 | Re: plpgsql.warn_shadow |
Previous Message | Simon Riggs | 2014-01-20 14:14:02 | Re: ALTER TABLESPACE ... MOVE ALL TO ... |