From: | Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Aggregates for string_agg and array_agg |
Date: | 2018-03-27 07:22:39 |
Message-ID: | CAA8=A7-OrZDOcRwZjqNv8nNEQzE8JSbjTsAjv2CK2zyq4jPe+A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 27, 2018 at 5:36 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Mar 27, 2018 at 12:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> > On 27 March 2018 at 09:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> >> I do not think it is accidental that these aggregates are exactly the
>> >> ones
>> >> that do not have parallelism support today. Rather, that's because you
>> >> just about always have an interest in the order in which the inputs get
>> >> aggregated, which is something that parallel aggregation cannot
>> >> support.
>>
>> > This very much reminds me of something that exists in the 8.4 release
>> > notes:
>> >> SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer always produce
>> >> sorted output (Tom)
>>
>> That's a completely false analogy: we got a significant performance
>> benefit for a significant fraction of users by supporting hashed
>> aggregation. My argument here is that only a very tiny fraction of
>> string_agg/array_agg users will not care about aggregation order, and thus
>> I don't believe that this patch can help very many people. Against that,
>> it's likely to hurt other people, by breaking their queries and forcing
>> them to insert expensive explicit sorts to fix it. Even discounting the
>> backwards-compatibility question, we don't normally adopt performance
>> features for which it's unclear that the net gain over all users is
>> positive.
>
>
> I think you are quite wrong in claiming that only a tiny fraction of the
> users are going to care.
>
> This may, and quite probably does, hold true for string_agg(), but not for
> array_agg(). I see a lot of cases where people use that to load it into an
> unordered array/hashmap/set/whatever on the client side, which looses
> ordering *anyway*,and they would definitely benefit from it.
Agreed, I have seen lots of uses of array_agg where the order didn't matter.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2018-03-27 07:24:07 | Re: PQHost() undefined behavior if connecting string contains both host and hostaddr types |
Previous Message | Magnus Hagander | 2018-03-27 07:06:59 | Re: Parallel Aggregates for string_agg and array_agg |