Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-03-05 17:27:51
Message-ID: CAEZATCVrQY8DjBaKFvi_7JywEVta_1Sg+8OTPryN6crCe3vVfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5 March 2014 14:35, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Mar4, 2014, at 21:09 , Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>> On 3 March 2014 23:00, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>>>> * In show_windowagg_info(), this calculation looks suspicious to me:
>>>>
>>>> double tperrow = winaggstate->aggfwdtrans /
>>>> (inst->nloops * inst->ntuples);
>>>>
>>>> If the node is executed multiple times, aggfwdtrans will be reset in
>>>> each loop, so the transitions per row figure will be under-estimated.
>>>> ISTM that if you want to report on this, you'd need aggfwdtrans to be
>>>> reset once per query, but I'm not sure exactly how to do that.
>>>>
>>>> ...
>>>>
>>>> Actually, I think it's misleading to only count forward transition
>>>> function calls, because a call to the inverse transition function
>>>> still represents a state transition, and is likely to be around the
>>>> same cost. For a window of size 2, there would not be much advantage
>>>> to using inverse transition functions, because it would be around 2
>>>> transitions per row either way.
>>>
>>> True. In fact, I pondered whether to avoid using the inverse transition
>>> function for windows of 2 rows. In the end, I didn't because I felt that
>>> it makes custom aggregates harder to test.
>>>
>>> On the question of whether to count inverse transition function calls -
>>> the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the
>>> number of state transitions, but rather to show whether the aggregation
>>> has O(n) or O(n^2) behaviour. The idea being that a value close to "1"
>>> means "inverse transition function works as expected", and larger values
>>> mean "not working so well".
>>>
>>> Regarding multiple evaluations - I think I based the behaviour on how
>>> ntuples works, which also only reports the value of the last evaluation
>>> I think. But maybe I'm confused about this.
>>>
>>
>> No, it doesn't look like that's correct for multiple loops. Consider
>> this example:
>>
>> ...
>>
>> It turns out that show_windowagg_info() is only called once at the
>> end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing
>> tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get
>> 1, you'd have to use this formula:
>>
>> double tperrow = winaggstate->aggfwdtrans / inst->ntuples;
>
> Hm, so I *was* confused - seems I mixed up ntuples (which counts the
> total number of tuples over all loops) with what we report as "rows"
> (i.e. the average number of rows per loop). Thanks for clearing that up!
>
>> I'm still not convinced that's the most useful thing to report though.
>> Personally, I'd prefer to just see the separate counts, e.g.:
>>
>> -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual
>> time=0.019..0.094 rows=25 loops=4)
>> Output: sum(i.i) OVER (?)
>> Forward transitions: 25 Inverse transitions: 25
>>
>> IMO that gives a clearer picture of what's going on.
>
> My problem with this is that if there are multiple aggregates, the
> numbers don't really count the number of forward and reverse transfer
> function invocations. What nodeWindowAgg.c really counts is the number
> of *rows* it has to fetch from the tuple store to perform forward
> aggregations - the relevant code snippet is
>
> if (numaggs_restart > 0)
> winstate->aggfwdtrans += (winstate->aggregatedupto
> - winstate->frameheadpos);
> else
> winstate->aggfwdtrans += (winstate->aggregatedupto
> - aggregatedupto_nonrestarted);
>
> Now we could of course report these figures per aggregate instead of
> only once per aggregation node. But as I said earlier, my guess is that
> people usually won't be interested in the precise counts - most likely,
> what you want to know is "how much do the restarts cost me"
>

The problem I have with the single "Transitions Per Row" figure, and
the idea that a value close to 1.0 is supposed to be good, is that
it's not really true. For example, with a window size of 2 and a
"perfect" invertible aggregate, you'd get a value of 1.0, but with a
non-invertible aggregate you'd get a value of 2.0, when actually it
wouldn't do any better if it had an inverse. I think trying to
represent this as a single number is too simplistic.

> When I added the EXPLAIN stuff, I initially simply reported the number
> of times nodeWindowAgg has to restart the aggregation. The problem with
> that approach is that not all restarts carry the same cost. If the frames
> either don't overlap at all or are identical, restarts don't cause any
> additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND
> m FOLLOWING), the performance effects of restarts depends on m-n.
>
> Which is why I made it count the number of aggregated input rows instead.
>
> Having said that, I' not really 100% happy with the name
> "Transitions Per Row" for this - it was simply the best I could come up with
> that was reasonably short. And I'm certainly open to reporting the absolute
> count instead of a factor relative to ntuples.
>
> If we made it an absolute count, would calling this "Aggregated Rows" work
> for you?
>

I'm not sure about naming, but I think my preference would be to
output the correct absolute counts for both the forward and inverse
transitions (i.e. multiply by the right combinations of numaggs and
numaggs_restart). The EXPLAIN output already tells you how many
aggregates there are, and how many rows there are, so you'd be able to
work out from that how much extra work it's doing.

I think we really need a larger consensus on this though, so I'd be
interested to hear what others think.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-03-05 17:29:49 Re: jsonb and nested hstore
Previous Message Robert Haas 2014-03-05 17:26:13 Re: jsonb and nested hstore