RE: Partial aggregates pushdown

From: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
To: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, vignesh C <vignesh21(at)gmail(dot)com>, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
Subject: RE: Partial aggregates pushdown
Date: 2024-07-07 21:46:31
Message-ID: TY2PR01MB3835C0DC967E6958C4C8040995D92@TY2PR01MB3835.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jelte and hackers,

I've reconsidered which of the following two approaches is the best.
Approach1: Adding export/import functions to transmit state values.
Approach 2: Adding native types which are equal to state values.

In my mind, Approach1 is superior. Therefore, if there are no objections this week, I plan to resume implementing Approach1 next week. I would appreciate it if anyone could discuss the topic with me or ask questions.

I believe that while Approach1 has the extendability to support situations where local and remote major versions differ, Approach2 lacks this extendability. Additionally, it seems that Approach1 requires fewer additional lines of code compared to Approach2. I'm also concerned that Approach2 may cause the catalog pg_type to bloat.

Although Approach2 offers the benefit of avoiding the addition of columns to pg_aggregate, I think this benefit is smaller than the advantages of Approach1 mentioned above.

Next, I will present my complete comparison. The comparison points are as follows:
1. Extendability
2. Amount of codes
3. Catalog size
4. Developer burden
5. Additional columns to catalogs

1. Extendability
I believe it is crucial to support scenarios where the local and remote major versions may differ in the future (see the below).

https://www.postgresql.org/message-id/4012625.1701120204%40sss.pgh.pa.us

Regarding this aspect, I consider Approach1 superior to Approach2. The reason is that:
・The data type of an aggregate function's state value may change with each major version increment.
・In Approach1, by extending the export/import functionalities to include the major version in which the state value was created (refer to p.16 and p.17 of [1]), I can handle such situations.
・On the other hand, it appears that Approach2 fundamentally lacks the capability to support these scenarios.

2. Amount of codes
Regarding this aspect, I find Approach1 to be better than Approach2.
In Approach1, developers only need to export/import functions and can use a standardized format for transmitting state values.
In Approach2, developers have two options:
Option1: Adding typinput/typoutput and typsend/typreceive.
Option2: Adding typinput/typoutput only.
Option1 requires more lines of code, which may be seen as cumbersome by some developers.
Option2 restricts developers to using only text representation for transmitting state values, which I consider limiting.

3. Catalog size
Regarding this point, I believe Approach1 is better than Approach2.
In Approach1, theoretically, it is necessary to add export/import functions to pg_proc for each aggregate.
In Approach2, theoretically, it is necessary to add typoutput/typinput functions (and typsend/typreceive if necessary) to pg_proc and add a native type to pg_type for each aggregate.
I would like to emphasize that we should consider user-defined functions in addition to built-in aggregate functions.
I think most developers prefer to avoid bloating catalogs, even if they may not be able to specify exact reasons.
In fact, in Robert's previous review, he expressed a similar concern (see below).

https://www.postgresql.org/message-id/CA%2BTgmobvja%2Bjytj5zcEcYgqzOaeJiqrrJxgqDf1q%3D3k8FepuWQ%40mail.gmail.com

4. Developer burden.
Regarding this aspect, I believe Approach1 is better than Approach2.
In Approach1, developers have the following additional tasks:
Task1-1: Create and define export/import functions.

In Approach2, developers have the following additional tasks:
Task2-1: Create and define typoutput/input functions (and typesend/typreceive functions if necessary).
Task2-2: Define a native type.

Approach1 requires fewer additional tasks, although the difference may be not substantial.

5. Additional columns to catalogs.
Regarding this aspect, Approach2 is better than Approach1.
Approach1 requires additional three columns in pg_aggregate, specifically the aggpartialpushdownsafe flag, export function reference, and import function reference.
Approach2 does not require any additional columns in catalogs.
However, over the past four years of discussions, no one has expressed concerns about additional columns in catalogs.

[1] https://www.postgresql.org/message-id/attachment/160659/PGConfDev2024_Presentation_Aggregation_Scaleout_FDW_Sharding_20240531.pdf

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii.Yuki@df.MitsubishiElectric.co.jp 2024-07-07 21:52:27 RE: Partial aggregates pushdown
Previous Message Tom Lane 2024-07-07 20:43:56 Re: XML test error on Arch Linux