Re: Partial aggregates pushdown

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Partial aggregates pushdown
Date: 2024-08-23 00:07:18
Message-ID: 4b389c97-991b-4b1a-8fc0-d0eabe3e7f7b@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 8/22/24 22:07, Bruce Momjian wrote:
> On Thu, Aug 22, 2024 at 09:54:02PM +0200, Tomas Vondra wrote:
>> On 8/22/24 20:56, Bruce Momjian wrote:
>>> You make a very good point above. Would there ever be cases where a
>>> targetlist would have multiple aggregates, and some can be pushed down,
>>> and some have to return all matching rows so the sender can compute the
>>> aggregate? If so, how would we handle that? How does parallelism
>>> handle that now?
>>
>> I think the only sane way to handle this is to disable partial pushdown
>> for that query, fetch all rows and do the aggregate locally.
>
> Okay.
>
>>> It is more than different versions. Different CPU architectures can
>>> store data types differently in binary. It would be kind of interesting
>>> to have a serial/deserial that was in text format. I guess that is
>>> where my RECORD idea came from that I just emailed to the list.
>>>
>>
>> I believe everything in this thread assumes "binary" in the same sense
>> as the protocol, with the same rules for sending data in text/binary
>> formats, etc. With binary for integers meaning "network order" and that
>> sort of stuff. Which is mostly independent of PG version.
>
> If the binary is tranferable between servers of the same PG version but
> perhaps different CPU architectures, you are saying we only are worrying
> about not using the exact same aggregate passing method we use for
> parallelism, except for PG version differences. Seems we should just
> assume the same version for this optimization and call it done.

IMHO restricting this to the exact same version (it'd really have to be
the same minor version, not just major) would be pretty annoying and I'm
afraid it would seriously restrict how often we'd be able to enable this
optimization.

Consider a big sharded cluster using postgres_fdw to run queries. AFAIK
it's not uncommon to do minor version upgrades node by node, possibly
even with failover to make it as smooth as possible. For the duration of
this upgrade the pushdown would be impossible, because at least one node
has a different minor version. That doesn't seem great.

> Except
> we don't always check for the same PG version and that could lead to
> crashes or security problems?
>

Yes, we'd need to check this comprehensively. I'm not sure if this might
cause crashes - presumably the input functions should be careful about
this (because otherwise it'd be a problem even without this feature).
Also not sure about security issues.

>>> What I would really like is something similar to pg_proc.proargtypes
>>> (which is data type oidvector) for pg_aggregate where we can supply the
>>> oids of the pg_aggregate.aggtranstype and construct a record on the fly
>>> to return to the FDW caller, rather than having to create specialized
>>> functions for every aggregate that needs to return several different
>>> data types, e.g., AVG(interval), #4 in my previous email. I realize
>>> this would require the creation of data types int128 and int128 array.
>>>
>>
>> Isn't that really just a definition of a composite type?
>
> Probably. I am trying to leverage what we already have.
>
>> I still don't understand which part of the code constructs the record.
>> If the aggregate has aggstate internal, I don't see how could it be
>> anything else than the aggregate itself.
>
> Yes, but I was unclear if we needed a special function for every
> construction.
>

OK

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-08-23 00:18:09 Re: Injection Points remaining stats
Previous Message Michael Paquier 2024-08-22 23:56:01 Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?