RE: Partial aggregates pushdown

From: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
To: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, vignesh C <vignesh21(at)gmail(dot)com>, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
Subject: RE: Partial aggregates pushdown
Date: 2024-06-23 08:23:32
Message-ID: TY2PR01MB383585CACD74F2106A0563C195CB2@TY2PR01MB3835.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi. Jelte, hackers.

Sorry for the late response.
Thank you for detailed and useful comments.

On Wed, Jun 12, 2024 at 5:02 PM Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> wrote:
>
> On Wed, 12 Jun 2024 at 07:27, Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp
> <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp> wrote:
> > Could you please clarify what you mean?
> > Are you referring to:
> > Option 1: Modifying existing aggregate functions to minimize the use of internal state values.
> > Option 2: Not supporting the push down of partial aggregates for functions with internal state values.
>
> Basically I mean both Option 1 and Option 2 together. i.e. once we do
> option 1, supporting partial aggregate pushdown for all important
> aggregates with internal state values, then supporting pushdown of
> internal state values becomes unnecessary.
Understood.
However, there are points that I agree with and others that I don't agree with.

I will show my opinion using one aggregate function avg(int8), whose transtype is internal.

I agree that, in general, any remote server should transmit the state value to the local server using a format whose data type is a native data type and is not serialized, whenever possible. I call this format standard format along with [1].
For avg(int8), I think that it is rational that any remote server transmit the state value to the local server using the format whose data type is _numeric of count and sum.
Before your advice, I plan to use the standard format whose data type is text, like "count=5 sum=13.3". But now, I think that it is rational to use _numeric.
I attached the POC patch(the second one) which supports avg(int8) whose standard format is _numeric type.

However, I do not agree that I modify the internal transtype to the native data type. The reasons are the following three.
1. Generality
I believe we should develop a method that can theoretically apply to any aggregate function, even if we cannot implement it immediately. However, I find it exceptionally challenging to demonstrate that any internal transtype can be universally converted to a native data type for aggregate functions that are parallel-safe under the current parallel query feature. Specifically, proving this for user-defined aggregate functions presents a significant difficulty in my view.
On the other hand, I think that the usage of export and import functions can theoretically apply to any aggregate functions.

2. Amount of codes.
It could need more codes.

3. Concern about performance
I'm concerned that altering the current internal data types could impact performance.

> So I agree it's probably more code than your current approach. At the
> very least because you would need to implement in/out text
> serialization functions for these internal types that currently don't
> have them. But I do think it would be quite a feasible amount. And to
> clarify, I see a few benefits of using the approach that I'm
> proposing:
>
> 1. So far aggserialfn and aggdeserialfn haven't been required to be
> network safe at all. In theory extensions might reference shared
> memory pointers with in them, that are valid for serialization within
> the same postgres process tree but not outside of it. Or they might
> serialize to bytes in a way that does not work across different
> bigendian/littleendian systems, thus causing wrong aggregation
> results. Never sending results of serialfn over the network solves
> that issue.
I know. In my proposal, the standard format is not seriarized data by serialfn, instead, is text or other native data type.
Just to clarify, I'm writing this to avoid any potential misunderstanding.

> 2. Partial aggregate pushdown across different postgres version could
> be made to work by using the in/out functions instead of receive/send
> functions, to use the text based serialization format (which should be
> stable across versions)
Thank you for your advice. I agree that, as mentioned earlier, standard formats should generally use native data types whenever possible.

> 3. It seems nice to be able to get the text representation of all
> PARTIAL_AGGREGATE output for debugging purposes. With your approach I
> think what currently happens is that it will show a bytea for when
> using PARTIAL_AGGREGATE for avg(bigint) directly from psql.
I may have caused some misunderstanding.
To clarify and prevent any potential misunderstanding: In my proposal, the standard format does not involve serialized data by serialfn; rather, it utilizes text or other native data types.

> 4. In my experience it's easier to get patches merged if they don't
> change a lot at once and are useful by themselves. This way you could
> split your current patch up into multiple smaller patches, each of
> which could be merged separately (appart from b relying on a).
> a. Introduce PARTIAL_AGGREGATE syntax for non-internal & non-pseudo types
> b. Start using PARTIAL_AGGREGATE for FDW pushdown
> c. Convert NumericAggState to non-internal
> d. Convert PolyNumAggState to non-internal
> e. Use non-internal for string_agg_serialize aggregates
> f. Use non-internal for array_agg_serialize
> g. Use non-internal for array_agg_array_serialize
I understand. Thank you for advice.
Basically responding to your advice,
for now, I prepare two POC patches.
The first supports case a, currently covering only avg(int4) and other aggregate functions that do not require import or export functions, such as min, max, and count.
The second supports case b and commonly used functions like sum and avg. Currently, it only includes avg(int8).

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation

[1] https://www.postgresql.org/message-id/attachment/160659/PGConfDev2024_Presentation_Aggregation_Scaleout_FDW_Sharding_20240531.pdf

Attachment Content-Type Size
0001-POC-Partial-aggregate-pushdown-notinternal-transtype.patch application/octet-stream 120.5 KB
0002-POC-Partial-aggregate-pushdown-internal-transtype.patch application/octet-stream 40.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii.Yuki@df.MitsubishiElectric.co.jp 2024-06-23 08:25:42 RE: Partial aggregates pushdown
Previous Message Joel Jacobson 2024-06-23 07:00:29 Re: Optimize numeric.c mul_var() using the Karatsuba algorithm