Re: Partial aggregates pushdown

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: "Fujii(dot)Yuki(at)df(dot)MitsubishiElectric(dot)co(dot)jp" <Fujii(dot)Yuki(at)df(dot)mitsubishielectric(dot)co(dot)jp>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Ilya Gladyshev <i(dot)gladyshev(at)postgrespro(dot)ru>
Subject: Re: Partial aggregates pushdown
Date: 2023-04-13 16:01:59
Message-ID: CA+Tgmoam-sXZqhqESi1CyNG0RpSQdGLEspNudoXsOJYvUL4Gqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 30, 2022 at 3:12 AM Alexander Pyhalov
<a(dot)pyhalov(at)postgrespro(dot)ru> wrote:
> 1) In previous version of the patch aggregates, which had partialaggfn,
> were ok to push down. And it was a definite sign that aggregate can be
> pushed down. Now we allow pushing down an aggregate, which prorettype is
> not internal and aggfinalfn is not defined. Is it safe for all
> user-defined (or builtin) aggregates, even if they are generally
> shippable?

I think that this is exactly the correct test. Here's how to think
about it: to perform an aggregate, you merge all the values into the
transition state, and then you apply the final function once at the
end. So the process looks like this:

TRANSITION_STATE_0 + VALUE_1 = TRANSITION_STATE_1
TRANSITION_STATE_1 + VALUE_2 = TRANSITION_STATE_2
...
TRANSITION_STATE_N => RESULT

Here, + represents applying the transition function and => represents
applying the final function.

In the case of parallel query, we want every worker to be able to
incorporate values into its own transition states and then merge all
the transition states at the end. That's a problem, because the
transition function expects a transition state and a value, not two
transition states. So we invented the idea of a "combine" function to
solve this problem. A combine function takes two transition states
and produces a new transition state. That allows each worker to create
an initially empty transition state, merge a bunch of values into it,
and then pass the result back to the leader, which can combine all the
transition states using the combine function, and then apply the final
function at the end.

The same kind of idea works here. If we want to push down an entire
aggregate, there's no problem, provided the remote side supports it:
just push down the whole operation and get the result. But if we want
to push down part of the aggregate, then what we want to get back is a
transition value that we can then combine with other values (using the
transition function) or other transition states (using the combine
function) locally. That's tricky, because there's no SQL syntax to ask
the remote side to give us the transition value rather than the final
value. I think we would need to add that to solve this problem in its
full generality. However, in the special case where there's no final
function, the problem goes away, because then a transition value and a
result are identical. If we ask for a result, we can treat it as a
transition value, and there's no problem.

Internal values are a problem. Generally, you don't see internal as
the return type for an aggregate, because then the aggregate couldn't
be called by the user. An internal value can't be returned. However,
it's pretty common to see an aggregate that has an internal value as a
transition type, and something else as the result type. In such cases,
even if we had some syntax telling the remote side to send the
transition value rather than the final value, it would not be
sufficient, because the internal value still couldn't be transmitted.
This problem also arises for parallel query, where we want to move
transition values between processes within a single database cluster.
We solved that problem using aggserialfn and aggdeserialfn.
aggserialfn converts an internal transition value (which can't be
moved between processes) into a bytea, and aggdeserialfn does the
reverse. Maybe we would adopt the same solution here: our syntax that
tells the remote side to give us the transition value rather than the
final value could also tell the remote side to serialize it to bytea
if it's an internal type. However, if we did this, we'd have to be
sure that our deserialization functions were pretty well hardened
against unexpected or even malicious input, because who knows whether
that remote server is really going to send us a bytea in the format
that we're expecting to get?

Anyway, for the present patch, I think that testing whether there's a
final function is the right thing, and testing whether the return type
is internal doesn't hurt. If we want to extend this to other cases in
the future, then I think we need syntax to ask the remote side for the
unfinalized aggregate, like SELECT UNFINALIZED MAX(a) FROM t1, or
whatever. I'm not sure what the best concrete SQL syntax is - probably
not that.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-04-13 16:15:16 Re: pg_init_privs corruption.
Previous Message Karl O. Pinc 2023-04-13 15:53:31 Re: doc: add missing "id" attributes to extension packaging page