Re: Partial aggregates pushdown

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Partial aggregates pushdown
Date: 2024-08-21 15:41:02
Message-ID: 982a903e-cb59-4b60-911a-bdbb88da594e@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/8/24 13:48, Jelte Fennema-Nio wrote:
> SUMMARY OF THREAD
>
> The design of patch 0001 is agreed upon by everyone on the thread (so
> far). This adds the PARTIAL_AGGREGATE label for aggregates, which will
> cause the finalfunc not to run. It also starts using PARTIAL_AGGREGATE
> for pushdown of aggregates in postgres_fdw. In 0001 PARTIAL_AGGREGATE
> is only supported for aggregates with a non-internal/pseudo type as
> the stype.
>

I don't have a strong opinion on this, but I wonder if someone might
object this essentially extends the syntax with something that is not
(and never will be) in the SQL standard. I wonder if there's some
precedent for encoding such explicit execution instructions into the
query itself?

That reminds me - the PARTIAL_AGGREGATE label is per aggregate, but I
don't think it makes much sense to do this only for some aggregates,
right? Do we have a way to make sure the query is "consistent"? I'm not
sure if doing this on the source (before pushdown) is enough. Could
there be a difference in what the remote instance supports?

The only alternative that I can think of (and that I believe was already
mentioned in this thread) is to set some GUC that forces the top-most
query level to do this (all aggregates at that level). That's have the
benefit of always affecting all aggregates.

> The design for patch 0002 is still under debate. This would expand on
> the functionality added by adding support for PARTIAL_AGGREGATE for
> aggregates with an internal stype. This is done by returning a byte
> array containing the bytes that the serialfunc of the aggregate
> returns.
>
> A competing proposal for 0002 is to instead change aggregates to not
> use an internal stype anymore, and create dedicated types. The main
> downside here is that infunc and outfunc would need to be added for
> text serialization, in addition to the binary serialization. An open
> question is: Can we change the requirements for CREATE TYPE, so that
> types can be created without infunc and outfunc.
>

I think it's +0.5 for the new dedicated data types from me.

I admit I'm too lazy to read the whole thread from scratch, but I
believe we did discuss the possibility to reuse the serial/deserial
functions we already have, but the reason against that was the missing
cross-version stability. Parallel queries always run within a single
instance, hence there are no concerns about other versions. But this is
meant to support the remote node having a wildly different version.

I guess we might introduce another pair of serial/deserial functions,
with this guarantee. I know we (me and Jelte) discussed that in person
at some point, and there were arguments for doing the data types. But I
managed to forget the details :-(

> WHAT IS NEEDED?
>
> The things needed for this patch are that docs need to be added, and
> detailed codereview needs to be done.

Yeah, I think the docs are must-have for a proper review.

> Feedback from more people on the two competing proposals for 0002
> would be very helpful in making a decision.
>

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-08-21 16:31:49 Re: optimize hashjoin
Previous Message Nathan Bossart 2024-08-21 15:37:25 Re: Remove dependence on integer wrapping