From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Subject: | Re: hashagg slowdown due to spill changes |
Date: | 2020-06-15 13:34:03 |
Message-ID: | 20200615133403.gt6x3musvjzvw24y@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Jun 14, 2020 at 11:09:55PM -0700, Jeff Davis wrote:
>On Sun, 2020-06-14 at 11:14 -0700, Andres Freund wrote:
>> I'm somewhat inclined to think that we should revert 4cad2534da6 and
>> then look at how precisely to tackle this in 14.
>
>I'm fine with that.
>
I don't see how we could just revert 4cad2534d and leave this for v14.
The hashagg spilling is IMHO almost guaranteed to be a pain point for
some users, as it will force some queries to serialize large amounts of
data. Yes, some of this is a cost for hashagg enforcing work_mem at
runtime, I'm fine with that. We'd get reports about that too, but we can
justify that cost ...
But just reverting 4cad2534d will make this much worse, I think, as
illustrated by the benchmarks I did in [1]. And no, this is not really
fixable by tweaking the cost parameters - even with the current code
(i.e. 4cad2534d in place) I had to increase random_page_cost to 60 on
the temp tablespace (on SATA RAID) to get good plans with parallelism
enabled. I haven't tried, but I presume without 4cad2534d I'd have to
push r_p_c even further ...
[1] https://www.postgresql.org/message-id/20200519151202.u2p2gpiawoaznsv2%40development
>> It'd probably make sense to request small tlists when the number of
>> estimated groups is large, and not otherwise.
>
>That seems like a nice compromise that would be non-invasive, at least
>for create_agg_plan().
>
Maybe. It'd certainly better than nothing. It's not clear to me what
would a good threshold be, though. And it's not going to handle cases of
under-estimates.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2020-06-15 13:35:49 | Re: Transactions involving multiple postgres foreign servers, take 2 |
Previous Message | 李杰 (慎追) | 2020-06-15 13:33:05 | 回复:回复:how to create index concurrently on partitioned table |