Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-03-05 14:35:53
Message-ID: 3152D8B7-1E8C-4422-AAA6-37A20EB11FC7@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mar4, 2014, at 21:09 , Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> On 3 March 2014 23:00, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>>> * In show_windowagg_info(), this calculation looks suspicious to me:
>>>
>>> double tperrow = winaggstate->aggfwdtrans /
>>> (inst->nloops * inst->ntuples);
>>>
>>> If the node is executed multiple times, aggfwdtrans will be reset in
>>> each loop, so the transitions per row figure will be under-estimated.
>>> ISTM that if you want to report on this, you'd need aggfwdtrans to be
>>> reset once per query, but I'm not sure exactly how to do that.
>>>
>>> ...
>>>
>>> Actually, I think it's misleading to only count forward transition
>>> function calls, because a call to the inverse transition function
>>> still represents a state transition, and is likely to be around the
>>> same cost. For a window of size 2, there would not be much advantage
>>> to using inverse transition functions, because it would be around 2
>>> transitions per row either way.
>>
>> True. In fact, I pondered whether to avoid using the inverse transition
>> function for windows of 2 rows. In the end, I didn't because I felt that
>> it makes custom aggregates harder to test.
>>
>> On the question of whether to count inverse transition function calls -
>> the idea of the EXPLAIN VERBOSE ANALYZE output isn't really to show the
>> number of state transitions, but rather to show whether the aggregation
>> has O(n) or O(n^2) behaviour. The idea being that a value close to "1"
>> means "inverse transition function works as expected", and larger values
>> mean "not working so well".
>>
>> Regarding multiple evaluations - I think I based the behaviour on how
>> ntuples works, which also only reports the value of the last evaluation
>> I think. But maybe I'm confused about this.
>>
>
> No, it doesn't look like that's correct for multiple loops. Consider
> this example:
>
> ...
>
> It turns out that show_windowagg_info() is only called once at the
> end, with ntuples=100, nloops=4 and aggfwdtrans=100, so it's computing
> tperrow=100/(4*100)=0.25, which then gets truncated to 0.2. So to get
> 1, you'd have to use this formula:
>
> double tperrow = winaggstate->aggfwdtrans / inst->ntuples;

Hm, so I *was* confused - seems I mixed up ntuples (which counts the
total number of tuples over all loops) with what we report as "rows"
(i.e. the average number of rows per loop). Thanks for clearing that up!

> I'm still not convinced that's the most useful thing to report though.
> Personally, I'd prefer to just see the separate counts, e.g.:
>
> -> WindowAgg (cost=0.00..22.50 rows=1000 width=4) (actual
> time=0.019..0.094 rows=25 loops=4)
> Output: sum(i.i) OVER (?)
> Forward transitions: 25 Inverse transitions: 25
>
> IMO that gives a clearer picture of what's going on.

My problem with this is that if there are multiple aggregates, the
numbers don't really count the number of forward and reverse transfer
function invocations. What nodeWindowAgg.c really counts is the number
of *rows* it has to fetch from the tuple store to perform forward
aggregations - the relevant code snippet is

if (numaggs_restart > 0)
winstate->aggfwdtrans += (winstate->aggregatedupto
- winstate->frameheadpos);
else
winstate->aggfwdtrans += (winstate->aggregatedupto
- aggregatedupto_nonrestarted);

Now we could of course report these figures per aggregate instead of
only once per aggregation node. But as I said earlier, my guess is that
people usually won't be interested in the precise counts - most likely,
what you want to know is "how much do the restarts cost me"

When I added the EXPLAIN stuff, I initially simply reported the number
of times nodeWindowAgg has to restart the aggregation. The problem with
that approach is that not all restarts carry the same cost. If the frames
either don't overlap at all or are identical, restarts don't cause any
additional work. And for fixed-length frames (BETWEEN n PRECEEDING AND
m FOLLOWING), the performance effects of restarts depends on m-n.

Which is why I made it count the number of aggregated input rows instead.

Having said that, I' not really 100% happy with the name
"Transitions Per Row" for this - it was simply the best I could come up with
that was reasonably short. And I'm certainly open to reporting the absolute
count instead of a factor relative to ntuples.

If we made it an absolute count, would calling this "Aggregated Rows" work
for you?

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Lawrence Barwick 2014-03-05 14:37:18 Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode
Previous Message Andrew Dunstan 2014-03-05 14:27:56 Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode