From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Subject: | Re: explain HashAggregate to report bucket and memory stats |
Date: | 2020-03-06 21:33:10 |
Message-ID: | 20200306213310.GM684@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 06, 2020 at 09:58:59AM -0800, Andres Freund wrote:
> > + ExplainIndentText(es);
> > + appendStringInfo(es->str,
> > + "Buckets: %ld (originally %ld)",
> > + inst->nbuckets,
> > + inst->nbuckets_original);
>
> I'm not sure I like the alternative output formats here. All the other
> fields are separated with a comma, but the original size is in
> parens. I'd probably just format it as "Buckets: %lld " and then add
> ", Original Buckets: %lld" when differing.
It's done that way for consistency with hashJoin in show_hash_info().
> > +/* Update instrumentation stats */
> > +void
> > +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial)
> > +{
> > + hashtable->instrument.nbuckets = hashtable->hashtab->size;
> > + if (initial)
> > + {
> > + hashtable->instrument.nbuckets_original = hashtable->hashtab->size;
> > + hashtable->instrument.space_peak_hash = hashtable->hashtab->size *
> > + sizeof(TupleHashEntryData);
> > + hashtable->instrument.space_peak_tuples = 0;
> > + }
> > + else
> > + {
> > +#define maxself(a,b) a=Max(a,b)
> > + /* hashtable->entrysize includes additionalsize */
> > + maxself(hashtable->instrument.space_peak_hash,
> > + hashtable->hashtab->size * sizeof(TupleHashEntryData));
> > + maxself(hashtable->instrument.space_peak_tuples,
> > + hashtable->hashtab->members * hashtable->entrysize);
> > +#undef maxself
> > + }
> > +}
>
> Not a fan of this macro.
>
> I'm also not sure I understand what you're trying to do here?
I have to call UpdateTupleHashTableStats() from the callers at deliberate
locations. If the caller fills the hashtable all at once, I can populate the
stats immediately after that, but if it's populated incrementally, then need to
update stats right before it's destroyed or reset, otherwise we can show tuple
size of the hashtable since its most recent reset, rather than a larger,
previous incarnation.
> > diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
> > index f457b5b150..b173b32cab 100644
> > --- a/src/test/regress/expected/aggregates.out
> > +++ b/src/test/regress/expected/aggregates.out
> > @@ -517,10 +517,11 @@ order by 1, 2;
> > -> HashAggregate
> > Output: s2.s2, sum((s1.s1 + s2.s2))
> > Group Key: s2.s2
> > + Buckets: 4
> > -> Function Scan on pg_catalog.generate_series s2
> > Output: s2.s2
> > Function Call: generate_series(1, 3)
> > -(14 rows)
> > +(15 rows)
>
> These tests probably won't be portable. The number of hash buckets
> calculated will e.g. depend onthe size of the contained elements. And
> that'll e.g. will depend on whether pointers are 4 or 8 bytes.
I was aware and afraid of that. Previously, I added this output only to
"explain analyze", and (as an quick, interim implementation) changed various
tests to use analyze, and memory only shown in "verbose" mode. But as Tomas
pointed out, that's consistent with what's done elsewhere.
So is the solution to show stats only during explain ANALYZE ?
Or ... I have a patch to create a new explain(MACHINE) option to allow more
stable output, by avoiding Memory/Disk. That doesn't attempt to make all
"explain analyze" output stable - there's other issues, I think mostly related
to parallel workers (see 4ea03f3f, 13e8b2ee). But does allow retiring
explain_sq_limit and explain_parallel_sort_stats. I'm including my patch to
show what I mean, but I didn't enable it for hashtable "Buckets:". I guess in
either case, the tests shouldn't be included.
--
Justin
Attachment | Content-Type | Size |
---|---|---|
v7-0001-explain-to-show-tuplehash-bucket-and-memory-stats.patch | text/x-diff | 88.7 KB |
v7-0002-refactor-show_grouping_set_keys.patch | text/x-diff | 3.0 KB |
v7-0003-Gross-hack-to-put-hash-stats-of-subplans-in-the-r.patch | text/x-diff | 5.4 KB |
v7-0004-implement-hash-stats-for-bitmapHeapScan.patch | text/x-diff | 6.3 KB |
v7-0005-Refactor-for-consistency-symmetry.patch | text/x-diff | 14.8 KB |
v7-0006-TupleHashTable.entrysize-was-unused-except-for-in.patch | text/x-diff | 1.6 KB |
v7-0007-Update-comment-obsolete-since-69c3936a.patch | text/x-diff | 891 bytes |
v7-0008-Add-explain-MACHINE.patch | text/x-diff | 12.6 KB |
v7-0009-Add-explain-REGRESS.patch | text/x-diff | 5.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dent John | 2020-03-06 21:36:15 | Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR |
Previous Message | Thomas Munro | 2020-03-06 21:33:03 | Re: effective_io_concurrency's steampunk spindle maths |