Quick Links

Re: Partial aggregates pushdown

From:	Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
To:	"Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, vignesh C <vignesh21(at)gmail(dot)com>, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Subject:	Re: Partial aggregates pushdown
Date:	2024-06-12 08:02:17
Message-ID:	CAGECzQSXXKVCQ13jHXtmk=PMsJHhFGb78PoT3JiT4eAcoLXxoA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 12 Jun 2024 at 07:27, Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp
<Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp> wrote:
> Could you please clarify what you mean?
> Are you referring to:
> Option 1: Modifying existing aggregate functions to minimize the use of internal state values.
> Option 2: Not supporting the push down of partial aggregates for functions with internal state values.

Basically I mean both Option 1 and Option 2 together. i.e. once we do
option 1, supporting partial aggregate pushdown for all important
aggregates with internal state values, then supporting pushdown of
internal state values becomes unnecessary.

> There are many aggregate functions with internal state values, so if we go with Option 1,
> we might need to change a lot of existing code, like transition functions and finalize functions.
> Also, I'm not sure how many existing aggregate functions can be modified this way.

There are indeed 57 aggregate functions with internal state values:

> SELECT count(*) from pg_aggregate where aggtranstype = 'internal'::regtype;
count
───────
57
(1 row)

But there are only 26 different aggtransfns. And the most used 8 of
those cover 39 of those 57 aggregates.

> SELECT
sum (count) OVER(ORDER BY count desc, aggtransfn ROWS BETWEEN
unbounded preceding and current row) AS cumulative_count
, *
FROM (
SELECT
count(*),
aggtransfn
from pg_aggregate
where aggtranstype = 'internal'::regtype
group by aggtransfn
order by count(*) desc, aggtransfn
);
cumulative_count │ count │ aggtransfn
──────────────────┼───────┼────────────────────────────────────────
7 │ 7 │ ordered_set_transition
13 │ 6 │ numeric_accum
19 │ 6 │ int2_accum
25 │ 6 │ int4_accum
31 │ 6 │ int8_accum
35 │ 4 │ ordered_set_transition_multi
37 │ 2 │ int8_avg_accum
39 │ 2 │ numeric_avg_accum
40 │ 1 │ array_agg_transfn
41 │ 1 │ json_agg_transfn
42 │ 1 │ json_object_agg_transfn
43 │ 1 │ jsonb_agg_transfn
44 │ 1 │ jsonb_object_agg_transfn
45 │ 1 │ string_agg_transfn
46 │ 1 │ bytea_string_agg_transfn
47 │ 1 │ array_agg_array_transfn
48 │ 1 │ range_agg_transfn
49 │ 1 │ multirange_agg_transfn
50 │ 1 │ json_agg_strict_transfn
51 │ 1 │ json_object_agg_strict_transfn
52 │ 1 │ json_object_agg_unique_transfn
53 │ 1 │ json_object_agg_unique_strict_transfn
54 │ 1 │ jsonb_agg_strict_transfn
55 │ 1 │ jsonb_object_agg_strict_transfn
56 │ 1 │ jsonb_object_agg_unique_transfn
57 │ 1 │ jsonb_object_agg_unique_strict_transfn
(26 rows)

And actually most of those don't have a serialfn, so they wouldn't be
supported by your suggested approach either. Looking at the
distribution of aggserialfns instead we see the following:

> SELECT
sum (count) OVER(ORDER BY count desc, aggserialfn ROWS BETWEEN
unbounded preceding and current row) AS cumulative_count
, *
FROM (
SELECT
count(*),
aggserialfn
from pg_aggregate
where
aggtranstype = 'internal'::regtype
AND aggserialfn != 0
group by aggserialfn
order by count(*) desc, aggserialfn
);
cumulative_count │ count │ aggserialfn
──────────────────┼───────┼───────────────────────────
12 │ 12 │ numeric_serialize
24 │ 12 │ numeric_poly_serialize
26 │ 2 │ numeric_avg_serialize
28 │ 2 │ int8_avg_serialize
30 │ 2 │ string_agg_serialize
31 │ 1 │ array_agg_serialize
32 │ 1 │ array_agg_array_serialize
(7 rows)

So there are only 7 aggserialfns, and thus at most 7 new postgres
types that you would need to create to support the same aggregates as
in your current proposal. But looking at the implementations of these
serialize functions even that is an over-estimation: numeric_serialize
and numeric_avg_serialize both serialize a NumericAggState, and
numeric_poly_serialize and int8_avg_serialize both serialize a
PolyNumAggState. So probably a we could even do with only 5 types. And
to be clear: only converting PolyNumAggState and NumericAggState to
actual postgres types would already cover 28 out of the 32 aggregates.
That seems quite feasible to do.

So I agree it's probably more code than your current approach. At the
very least because you would need to implement in/out text
serialization functions for these internal types that currently don't
have them. But I do think it would be quite a feasible amount. And to
clarify, I see a few benefits of using the approach that I'm
proposing:

1. So far aggserialfn and aggdeserialfn haven't been required to be
network safe at all. In theory extensions might reference shared
memory pointers with in them, that are valid for serialization within
the same postgres process tree but not outside of it. Or they might
serialize to bytes in a way that does not work across different
bigendian/littleendian systems, thus causing wrong aggregation
results. Never sending results of serialfn over the network solves
that issue.
2. Partial aggregate pushdown across different postgres version could
be made to work by using the in/out functions instead of receive/send
functions, to use the text based serialization format (which should be
stable across versions)
3. It seems nice to be able to get the text representation of all
PARTIAL_AGGREGATE output for debugging purposes. With your approach I
think what currently happens is that it will show a bytea for when
using PARTIAL_AGGREGATE for avg(bigint) directly from psql.
4. In my experience it's easier to get patches merged if they don't
change a lot at once and are useful by themselves. This way you could
split your current patch up into multiple smaller patches, each of
which could be merged separately (appart from b relying on a).
a. Introduce PARTIAL_AGGREGATE syntax for non-internal & non-pseudo types
b. Start using PARTIAL_AGGREGATE for FDW pushdown
c. Convert NumericAggState to non-internal
d. Convert PolyNumAggState to non-internal
e. Use non-internal for string_agg_serialize aggregates
f. Use non-internal for array_agg_serialize
g. Use non-internal for array_agg_array_serialize

P.S. The problem described in benefit 1 could also be solved in your
approach by adding a boolean opt in flag to CREATE AGGREGATE. e.g.
CREATE AGGREGATE ( ..., NETWORKSAFESERIALFUNCS = true)

In response to

RE: Partial aggregates pushdown at 2024-06-12 05:27:02 from Fujii.Yuki@df.MitsubishiElectric.co.jp

Responses

RE: Partial aggregates pushdown at 2024-06-23 08:23:32 from Fujii.Yuki@df.MitsubishiElectric.co.jp

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kartyshov Ivan	2024-06-12 08:36:05	Re: [HACKERS] make async slave to wait for lsn to be replayed
Previous Message	Alexander Kukushkin	2024-06-12 07:11:03	Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions