From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Adam Lee <ali(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Memory-Bounded Hash Aggregation |
Date: | 2020-02-22 19:59:59 |
Message-ID: | ee3784f2d54a37a8c93719704f57a8aa50a98d77.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 2020-02-19 at 20:16 +0100, Tomas Vondra wrote:
> 5) Assert(nbuckets > 0);
...
> 6) Another thing that occurred to me was what happens to grouping
> sets,
> which we can't spill to disk. So I did this:
...
> which fails with segfault at execution time:
The biggest problem was that my grouping sets test was not testing
multiple hash tables spilling, so a couple bugs crept in. I fixed them,
thank you.
To fix the tests, I also had to fix the GUCs and the way the planner
uses them with my patch. In master, grouping sets are planned by
generating a path that tries to do as many grouping sets with hashing
as possible (limited by work_mem). But with my patch, the notion of
fitting hash tables in work_mem is not necessarily important. If we
ignore work_mem during path generation entirely (and only consider it
during costing and execution), it will change quite a few plans and
undermine the concept of mixed aggregates entirely. That may be a good
thing to do eventually as a simplification, but for now it seems like
too much, so I am preserving the notion of trying to fit hash tables in
work_mem to create mixed aggregates.
But that created the testing problem: I need a reliable way to get
grouping sets with several hash tables in memory that are all spilling,
but the planner is trying to avoid exactly that. So, I am introducing a
new GUC called enable_groupingsets_hash_disk (better name suggestions
welcome), defaulting it to "off" (and turned on during the test).
Additionally, I removed the other GUCs I introduced in earlier versions
of this patch. They were basically designed around the idea to revert
back to the previous hash aggregation behavior if desired (by setting
enable_hashagg_spill=false and hashagg_mem_overflow=true). That makes
some sense, but that was already covered pretty well by existing GUCs.
If you want to use HashAgg without spilling, just set work_mem higher;
and if you want to avoid the planner from choosing HashAgg at all, you
set enable_hashagg=false. So I just got rid of enable_hashagg_spill and
hashagg_mem_overflow.
I didn't forget about your explain-related suggestions. I'll address
them in the next patch.
Regards,
Jeff Davis
Attachment | Content-Type | Size |
---|---|---|
hashagg-20200222.patch | text/x-patch | 105.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michail Nikolaev | 2020-02-22 20:08:45 | Re: BUG #16108: Colorization to the output of command-line has unproperly behaviors at Windows platform |
Previous Message | Jeff Davis | 2020-02-22 19:02:16 | Re: Memory-Bounded Hash Aggregation |