Re: Partial aggregates pushdown

From: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
To: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Partial aggregates pushdown
Date: 2024-05-28 05:57:39
Message-ID: fa5f3f5548330e512ae151a5ac79091f@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp писал(а) 2024-05-28 00:30:
> Hi Mr. Pyhalov.
>
> Sorry for the late reply.
> Thank you for your modification and detailed review.
> I attach a fixed patch, have been not yet rebased.
>
> Monday, 25 March 2024 16:01 Alexander Pyhalov
> <a(dot)pyhalov(at)postgrespro(dot)ru>:.
>> Comment in nodeAgg.c seems to be strange:
>>
>> 1079 /*
>> 1080 * If the agg's finalfn is provided and PARTIAL_AGGREGATE
>> keyword is
>> 1081 * not specified, apply the agg's finalfn.
>> 1082 * If PARTIAL_AGGREGATE keyword is specified and the
>> transValue type
>> 1083 * is internal, apply the agg's serialfn. In this case, if
>> the agg's
>> 1084 * serialfn must not be invalid. Otherwise return
>> transValue.
>> 1085 */
>>
>> Likely, you mean:
>>
>> ... In this case the agg'ss serialfn must not be invalid...
> Fixed.
>
>> Lower, in the same file, please, correct error message:
>>
>> 1136 if(!OidIsValid(peragg->serialfn_oid))
>> 1137 elog(ERROR, "serialfunc is note provided
>> for partial aggregate");
>>
>> it should be "serialfunc is not provided for partial aggregate"
> Fixed.
>
>> Also something is wrong with the following test :
>>
>> SELECT /* aggregate <> partial aggregate */
>> array_agg(c_int4array), array_agg(b),
>> avg(b::int2), avg(b::int4), avg(b::int8), avg(c_interval),
>> avg(b::float4), avg(b::float8),
>> corr(b::float8, (b * b)::float8),
>> covar_pop(b::float8, (b * b)::float8),
>> covar_samp(b::float8, (b * b)::float8),
>> regr_avgx((2 * b)::float8, b::float8),
>> .....
>>
>> Its results have changed since last patch. Do they depend on daylight
>> saving time?
> You are right. In my environment, TimeZone is set to 'PST8PDT'
> with which timetz values depends on daylight saving time.
> Changed TimeZone to 'UTC' in this test.
>
>> You can see that filter is applied before append. The result is
>> correct
>> only by chance, as sum in every partition is actually < 700. If you
>> lower this bound, let's say, to 200, you'll start getting wrong
>> results
>> as data is filtered prior to aggregation.
>>
>> It seems, however, that in partial case you should just avoid pulling
>> conditions from having qual at all, all filters will be applied on
>> upper
>> level. Something like
> Thank you for your modification.
>
>> Found one more problem. You can fire partial aggregate over
>> partitioned
>> table, but convert_combining_aggrefs() will make non-partial copy,
>> which
>> leads to
>> 'variable not found in subplan target list' error.
> Thanks for the correction as well.
> As you pointed out,
> the original patch certainly had the potential to cause problems.
> However, I could not actually reproduce the problem in cases such as
> the following.
>
> Settings:
> t(c1, c2) is a patitioned table whose partition key is c1.
> t1, t2 are patitions of t and are partitioned table.
> t11, t12: partitions of t1 and foreign table of postgres_fdw.
> t21, t22: partitions of t2 and foreign table of postgres_fdw.
> Query:
> select c2 / 2, sum(c1) from t group by c2 / 2 order by 1
>
> If you have a reproducible example, I would like to add it to
> the regression test.
> Do you have a reproducible example?
>
>> Also denied partial agregates pushdown on server version mismatch.
>> Should check_partial_aggregate_support be 'true' by default?
> Could we discuss this point after we determine how to transfer state
> values?
> If we determine this point, we can easly determine whether
> check_partial_aggregate_support shold be 'true' by default.
>
>> I'm not sure what to do with current grammar - it precludes partial
>> distinct aggregates. I understand that it's currently impossible to
>> have
>> partial aggregation for distinct agregates -but does it worth to have
>> such restriction at grammar level?
> If partial aggregation for distinct agregates becomes possible in the
> future,
> I see no problem with the policy of accepting new SQL keywords,
> such as "PARTIL_AGGREGATE DISTINCT".

BTW, there's I have an issue with test results in the last version of
the patch. Attaching regression diffs.
I have partial sum over c_interval instead of sum(c_interval).

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Attachment Content-Type Size
regression.diffs text/x-diff 43.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Pyhalov 2024-05-28 06:05:02 Re: CREATE INDEX CONCURRENTLY on partitioned index
Previous Message Alexander Pyhalov 2024-05-28 05:45:04 Re: Partial aggregates pushdown