From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Adam Lee <ali(at)pivotal(dot)io>, Melanie Plageman <mplageman(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Memory-Bounded Hash Aggregation |
Date: | 2020-03-12 21:01:46 |
Message-ID: | 20200312210146.GG29065@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 11, 2020 at 11:55:35PM -0700, Jeff Davis wrote:
> * tweaked EXPLAIN output some more
> Unless I (or someone else) finds something significant, this is close
> to commit.
Thanks for working on this ; I finally made a pass over the patch.
+++ b/doc/src/sgml/config.sgml
+ <term><varname>enable_groupingsets_hash_disk</varname> (<type>boolean</type>)
+ Enables or disables the query planner's use of hashed aggregation for
+ grouping sets when the size of the hash tables is expected to exceed
+ <varname>work_mem</varname>. See <xref
+ linkend="queries-grouping-sets"/>. Note that this setting only
+ affects the chosen plan; execution time may still require using
+ disk-based hash aggregation. ...
...
+ <term><varname>enable_hashagg_disk</varname> (<type>boolean</type>)
+ ... This only affects the planner choice;
+ execution time may still require using disk-based hash
+ aggregation. The default is <literal>on</literal>.
I don't understand what's meant by "the chosen plan".
Should it say, "at execution ..." instead of "execution time" ?
+ Enables or disables the query planner's use of hashed aggregation plan
+ types when the memory usage is expected to exceed
Either remove "plan types" for consistency with enable_groupingsets_hash_disk,
Or add it there. Maybe it should say "when the memory usage would OTHERWISE BE
expected to exceed.."
+show_hashagg_info(AggState *aggstate, ExplainState *es)
+{
+ Agg *agg = (Agg *)aggstate->ss.ps.plan;
+ long memPeakKb = (aggstate->hash_mem_peak + 1023) / 1024;
I see this partially duplicates my patch [0] to show memory stats for (at
Andres' suggestion) all of execGrouping.c. Perhaps you'd consider naming the
function something more generic in case my patch progresses ? I'm using:
|show_tuplehash_info(HashTableInstrumentation *inst, ExplainState *es);
Mine also shows:
|ExplainPropertyInteger("Original Hash Buckets", NULL,
|ExplainPropertyInteger("Peak Memory Usage (hashtable)", "kB",
|ExplainPropertyInteger("Peak Memory Usage (tuples)", "kB",
[0] https://www.postgresql.org/message-id/20200306213310.GM684%40telsasoft.com
You added hash_mem_peak and hash_batches_used to struct AggState.
In my 0001 patch, I added instrumentation to struct TupleHashTable, and in my
0005 patch I move it into AggStatePerHashData and other State nodes.
+ if (from_tape)
+ partition_mem += HASHAGG_READ_BUFFER_SIZE;
+ partition_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;
=> That looks wrong ; should say += ?
+ gettext_noop("Enables the planner's use of hashed aggregation plans that are expected to exceed work_mem."),
should say:
"when the memory usage is otherwise be expected to exceed.."
--
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2020-03-12 21:53:11 | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Previous Message | Daniel Gustafsson | 2020-03-12 21:00:33 | Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library |