Re: [PATCH] Negative Transition Aggregate Functions (WIP)

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Date: 2014-04-11 00:27:37
Message-ID: F472ABEE-7DCA-4604-9EDC-DCA673EB9FF0@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Apr11, 2014, at 01:30 , Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Florian Pflug <fgp(at)phlo(dot)org> writes:
>> As for evidence - have you looked at the patch I posted? I'd be very
>> interested to know if it removes the performance differences you saw.
>
> (1) You can't really prove the absence of a performance issue by showing
> that one specific aggregate doesn't have an issue.

I'm claiming that SUM(int4) is about as simple as it gets, so if the
effect can be mitigated there, it can be mitigated everywhere. The more
complex a forward-only transition function is, the less will and added if
or two hurt.

> (2) These results
> (mine as well as yours) seem mighty platform-dependent, so the fact you
> don't see an issue doesn't mean someone else won't.

Yeah, though *any* change - mine, the one your propose, and any other -
has the potential to hurt some platform due to weird interactions (say,
cache trashing).

> (3) A new
> FunctionCallInfoData field just for this? Surely not. There's got to be
> a distributed cost to that (although I notice you didn't bother
> initializing the field most places, which is cheating).

I think the field doesn't actually increase the size of the structure at
all - at least not if the bool before it has size 1 and the short following
it is 2-byte aligned. Or at least that was why I made it a char, and added
it at the place I did. But I might be missing something...

Also, InitFunctionCallInfoData *does* initialize the flags to zero. Though
maybe not everybody uses that - I didn't check, this was just a quick hack.

> Now point 3 could be addressed by doing the signaling in some other way
> with the existing context field. But it's still the case that you're
> trying to band-aid a bad design. There's no good reason to make the sfunc
> do extra work to be invertible in contexts where we know, with certainty,
> that that work is useless.

This is the principal point where we disagree, I think. From my POV, the
problem isn't invertibility here at all. Heck, SUM(int4) wouldn't need
*any* extra state at all to be invertible, if it weren't for those pesky
issues surrounding NULL handling. In fact, an earlier version of the
invtrans patch *did* use int4_sum as SUM(int4)'s transfer functions! The
only reason this doesn't work nowadays is that Dean didn't like the
forward-nonstrict-but-inverse-strict special case that made this work.

The way I see it, the main problem is the drop in performance that comes
from using a pass-by-ref state type. Which IMHO happens because this
*already* is a heavily band-aided design - all the state validation and
"if (AggCheckCallContext(,NULL))" stuff really works around the fact that
transition functions *aren't* supposed to be user-called, yet they are,
and must deal.

Which is why I feel that having two separate sets of transition functions
and state types solves the wrong problem. If we find a way to prevent
transition functions from being called directly, we'll shave a few cycles
of quite a few existing aggregates, invertible or not. If we find a way
around the initfunc-for-internal-statetype problem you discovered, the
implementation would get much simpler, since we could then make nearly
all of them strict. And this would again shave off a few cycles - for lots
of NULL inputs, the effect could even be large.

Compared to that, the benefit of having a completely separate set of
transition functions and state types for the invertible case is much
less tangible, IMHO.

> Especially not when we know that even a few instructions of extra work
> can be performance-significant.

But where do we draw that line? nodeWindowAgg.c quite certainly wastes
about as many cycles as int4_avg_accum does on various checks that are
unnecessary unless in the non-sliding window case. Do we duplicate those
functions too?

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sachin D. Kotwal 2014-04-11 01:38:27 Re: WAL replay bugs
Previous Message Bruce Momjian 2014-04-11 00:05:11 Re: psql \d+ and oid display