RE: Partial aggregates pushdown

From: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, vignesh C <vignesh21(at)gmail(dot)com>, "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp>
Subject: RE: Partial aggregates pushdown
Date: 2024-05-27 21:30:59
Message-ID: TYAPR01MB30884F8E4D94038CD661FFEF95F02@TYAPR01MB3088.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Mr. Pyhalov.

Sorry for the late reply.
Thank you for your modification and detailed review.
I attach a fixed patch, have been not yet rebased.

Monday, 25 March 2024 16:01 Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>:.
> Comment in nodeAgg.c seems to be strange:
>
> 1079 /*
> 1080 * If the agg's finalfn is provided and PARTIAL_AGGREGATE
> keyword is
> 1081 * not specified, apply the agg's finalfn.
> 1082 * If PARTIAL_AGGREGATE keyword is specified and the
> transValue type
> 1083 * is internal, apply the agg's serialfn. In this case, if
> the agg's
> 1084 * serialfn must not be invalid. Otherwise return
> transValue.
> 1085 */
>
> Likely, you mean:
>
> ... In this case the agg'ss serialfn must not be invalid...
Fixed.

> Lower, in the same file, please, correct error message:
>
> 1136 if(!OidIsValid(peragg->serialfn_oid))
> 1137 elog(ERROR, "serialfunc is note provided
> for partial aggregate");
>
> it should be "serialfunc is not provided for partial aggregate"
Fixed.

> Also something is wrong with the following test :
>
> SELECT /* aggregate <> partial aggregate */
> array_agg(c_int4array), array_agg(b),
> avg(b::int2), avg(b::int4), avg(b::int8), avg(c_interval),
> avg(b::float4), avg(b::float8),
> corr(b::float8, (b * b)::float8),
> covar_pop(b::float8, (b * b)::float8),
> covar_samp(b::float8, (b * b)::float8),
> regr_avgx((2 * b)::float8, b::float8),
> .....
>
> Its results have changed since last patch. Do they depend on daylight
> saving time?
You are right. In my environment, TimeZone is set to 'PST8PDT'
with which timetz values depends on daylight saving time.
Changed TimeZone to 'UTC' in this test.

> You can see that filter is applied before append. The result is correct
> only by chance, as sum in every partition is actually < 700. If you
> lower this bound, let's say, to 200, you'll start getting wrong results
> as data is filtered prior to aggregation.
>
> It seems, however, that in partial case you should just avoid pulling
> conditions from having qual at all, all filters will be applied on upper
> level. Something like
Thank you for your modification.

> Found one more problem. You can fire partial aggregate over partitioned
> table, but convert_combining_aggrefs() will make non-partial copy, which
> leads to
> 'variable not found in subplan target list' error.
Thanks for the correction as well.
As you pointed out,
the original patch certainly had the potential to cause problems.
However, I could not actually reproduce the problem in cases such as the following.

Settings:
t(c1, c2) is a patitioned table whose partition key is c1.
t1, t2 are patitions of t and are partitioned table.
t11, t12: partitions of t1 and foreign table of postgres_fdw.
t21, t22: partitions of t2 and foreign table of postgres_fdw.
Query:
select c2 / 2, sum(c1) from t group by c2 / 2 order by 1

If you have a reproducible example, I would like to add it to
the regression test.
Do you have a reproducible example?

> Also denied partial agregates pushdown on server version mismatch.
> Should check_partial_aggregate_support be 'true' by default?
Could we discuss this point after we determine how to transfer state values?
If we determine this point, we can easly determine whether check_partial_aggregate_support shold be 'true' by default.

> I'm not sure what to do with current grammar - it precludes partial
> distinct aggregates. I understand that it's currently impossible to have
> partial aggregation for distinct agregates -but does it worth to have
> such restriction at grammar level?
If partial aggregation for distinct agregates becomes possible in the future,
I see no problem with the policy of accepting new SQL keywords,
such as "PARTIL_AGGREGATE DISTINCT".

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

Attachment Content-Type Size
0001-Partial-aggregates-push-down-SQL-keyword-v5.patch application/octet-stream 147.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tristan Partin 2024-05-27 21:32:46 Re: Comments on Custom RMGRs
Previous Message Ashutosh Bapat 2024-05-27 21:17:25 Re: apply_scanjoin_target_to_paths and partitionwise join