Re: Partial aggregates pushdown

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>
Cc: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Ilya Gladyshev <i(dot)gladyshev(at)postgrespro(dot)ru>
Subject: Re: Partial aggregates pushdown
Date: 2023-06-20 00:42:45
Message-ID: ZJD2BWVqRlJgz3n6@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 13, 2023 at 02:18:15AM +0000, Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp wrote:
> Hi Mr.Momjian.
>
> Thank you for advises.
>
> > From: Bruce Momjian <bruce(at)momjian(dot)us>
> > Sent: Monday, June 12, 2023 10:38 PM
> > > I understand what the problem is. I will put a mechanism maintaining compatibility into the patch.
> > > I believe there are three approaches.
> > > Approach 1-1 is preferable because it does not require additional options for postgres_fdw.
> > > I will revise the patch according to Approach 1-1, unless otherwise commented.
> > >
> > > Approach1:
> > > I ensure that postgres_fdw retrieves the version of each remote server
> > > and does not partial aggregate pushd down if the server version is less than 17.
> > > There are two approaches to obtaining remote server versions.
> > > Approach1-1: postgres_fdw connects a remote server and use PQserverVersion().
> > > Approach1-2: Adding a postgres_fdw option about a remote server version (like "server_version").
> > >
> > > Approach2:
> > > Adding a postgres_fdw option for partial aggregate pushdown is enable
> > > or not (like enable_partial_aggregate_pushdown).
> >
> > These are good questions. Adding a postgres_fdw option called enable_partial_aggregate_pushdown helps make the
> > purpose of the option clear, but remote_version can be used for future breakage as well.
> >
> > I think remote_version is the best idea, and in the documentation for the option, let's explicitly say it is useful to disable
> > partial aggregates pushdown on pre-PG 17 servers. If we need to use the option for other cases, we can just update the
> > documentation. When the option is blank, the default, everything is pushed down.
> >
> > I see remote_version a logical addition to match our "extensions" option that controls what extension functions can be
> > pushed down.
>
> Thank you for your perspective.
> So, of the approaches I have presented, you think that approach 1-2 is
> preferable and that the option name remote_server is preferable?
> Indeed, the option of a remote version may have other uses.
> However, this information can be obtained by connecting to a remote server,
> I'm concerned that some people may find this option redundant.
>
> Is the problem with approach 1-1 because the user cannot decide whether to include the compatibility check in the decision to do partial aggregate pushdown or not?
> # If Approach 1-1 is taken, the problem is that this feature cannot be used for all bait-in aggregate functions
> # when the remote server is older than PG17.
> If so, Approache1-3 below seem more desirable.
> Would it be possible for us to hear your thoughts?
>
> Approache1-3:We add a postgres_fdw option about a compatibility check for partial aggregate pushdown
> (like "enable_aggpartialfunc_compatibility_check"). This option is false, the default.
> When this option is true, postgres_fdw obtains the remote server version by connecting the remote server and using PQserverVersion().
> And if the remote server version is older than PG17, then the partial aggregate pushdown feature is enable for all buit-in aggregate functions.
> Otherwise the partial aggregate pushdown feature is disable for all buit-in aggregate functions.

Apologies for the delay in my reply to this email. I looked into the
existing code and I found three things:

1) PQserverVersion() just pulls the conn->sversion value from the
existing connection because pqSaveParameterStatus() pulls the
server_version sent by the backend; no need to issue SELECT version().

2) postgres_fdw already has nine calls to GetConnection(), and only
opens a connection if it already doesn't have one. Here is an example:

/* Get the remote estimate */
conn = GetConnection(fpinfo->user, false, NULL);
get_remote_estimate(sql.data, conn, &rows, &width,
&startup_cost, &total_cost);
ReleaseConnection(conn);

Therefore, it seems like it would be near-zero cost to just call conn =
GetConnection() and then PQserverVersion(conn), and ReleaseConnection().
You can then use the return value of PQserverVersion() to determine if
you can push down partial aggregates.

3) Looking at postgresAcquireSampleRowsFunc(), I see this exact method
used:

conn = GetConnection(user, false, NULL);

/* We'll need server version, so fetch it now. */
server_version_num = PQserverVersion(conn);

...

if ((server_version_num < 95000) &&
(method == ANALYZE_SAMPLE_SYSTEM ||
method == ANALYZE_SAMPLE_BERNOULLI))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("remote server does not support TABLESAMPLE feature")));

I am sorry if you already knew all this, but I didn't.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2023-06-20 00:48:00 Re: Document that server will start even if it's unable to open some TCP/IP ports
Previous Message Tomas Vondra 2023-06-20 00:04:27 Re: Do we want a hashset type?