Quick Links

Re: explain HashAggregate to report bucket and memory stats

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject:	Re: explain HashAggregate to report bucket and memory stats
Date:	2020-03-06 21:33:10
Message-ID:	20200306213310.GM684@telsasoft.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 06, 2020 at 09:58:59AM -0800, Andres Freund wrote:
> > + ExplainIndentText(es);
> > + appendStringInfo(es->str,
> > + "Buckets: %ld (originally %ld)",
> > + inst->nbuckets,
> > + inst->nbuckets_original);
>
> I'm not sure I like the alternative output formats here. All the other
> fields are separated with a comma, but the original size is in
> parens. I'd probably just format it as "Buckets: %lld " and then add
> ", Original Buckets: %lld" when differing.

It's done that way for consistency with hashJoin in show_hash_info().

> > +/* Update instrumentation stats */
> > +void
> > +UpdateTupleHashTableStats(TupleHashTable hashtable, bool initial)
> > +{
> > + hashtable->instrument.nbuckets = hashtable->hashtab->size;
> > + if (initial)
> > + {
> > + hashtable->instrument.nbuckets_original = hashtable->hashtab->size;
> > + hashtable->instrument.space_peak_hash = hashtable->hashtab->size *
> > + sizeof(TupleHashEntryData);
> > + hashtable->instrument.space_peak_tuples = 0;
> > + }
> > + else
> > + {
> > +#define maxself(a,b) a=Max(a,b)
> > + /* hashtable->entrysize includes additionalsize */
> > + maxself(hashtable->instrument.space_peak_hash,
> > + hashtable->hashtab->size * sizeof(TupleHashEntryData));
> > + maxself(hashtable->instrument.space_peak_tuples,
> > + hashtable->hashtab->members * hashtable->entrysize);
> > +#undef maxself
> > + }
> > +}
>
> Not a fan of this macro.
>
> I'm also not sure I understand what you're trying to do here?

I have to call UpdateTupleHashTableStats() from the callers at deliberate
locations. If the caller fills the hashtable all at once, I can populate the
stats immediately after that, but if it's populated incrementally, then need to
update stats right before it's destroyed or reset, otherwise we can show tuple
size of the hashtable since its most recent reset, rather than a larger,
previous incarnation.

> > diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
> > index f457b5b150..b173b32cab 100644
> > --- a/src/test/regress/expected/aggregates.out
> > +++ b/src/test/regress/expected/aggregates.out
> > @@ -517,10 +517,11 @@ order by 1, 2;
> > -> HashAggregate
> > Output: s2.s2, sum((s1.s1 + s2.s2))
> > Group Key: s2.s2
> > + Buckets: 4
> > -> Function Scan on pg_catalog.generate_series s2
> > Output: s2.s2
> > Function Call: generate_series(1, 3)
> > -(14 rows)
> > +(15 rows)
>
> These tests probably won't be portable. The number of hash buckets
> calculated will e.g. depend onthe size of the contained elements. And
> that'll e.g. will depend on whether pointers are 4 or 8 bytes.

I was aware and afraid of that. Previously, I added this output only to
"explain analyze", and (as an quick, interim implementation) changed various
tests to use analyze, and memory only shown in "verbose" mode. But as Tomas
pointed out, that's consistent with what's done elsewhere.

So is the solution to show stats only during explain ANALYZE ?

Or ... I have a patch to create a new explain(MACHINE) option to allow more
stable output, by avoiding Memory/Disk. That doesn't attempt to make all
"explain analyze" output stable - there's other issues, I think mostly related
to parallel workers (see 4ea03f3f, 13e8b2ee). But does allow retiring
explain_sq_limit and explain_parallel_sort_stats. I'm including my patch to
show what I mean, but I didn't enable it for hashtable "Buckets:". I guess in
either case, the tests shouldn't be included.

--
Justin

Attachment	Content-Type	Size
v7-0001-explain-to-show-tuplehash-bucket-and-memory-stats.patch	text/x-diff	88.7 KB
v7-0002-refactor-show_grouping_set_keys.patch	text/x-diff	3.0 KB
v7-0003-Gross-hack-to-put-hash-stats-of-subplans-in-the-r.patch	text/x-diff	5.4 KB
v7-0004-implement-hash-stats-for-bitmapHeapScan.patch	text/x-diff	6.3 KB
v7-0005-Refactor-for-consistency-symmetry.patch	text/x-diff	14.8 KB
v7-0006-TupleHashTable.entrysize-was-unused-except-for-in.patch	text/x-diff	1.6 KB
v7-0007-Update-comment-obsolete-since-69c3936a.patch	text/x-diff	891 bytes
v7-0008-Add-explain-MACHINE.patch	text/x-diff	12.6 KB
v7-0009-Add-explain-REGRESS.patch	text/x-diff	5.9 KB

In response to

Re: explain HashAggregate to report bucket and memory stats at 2020-03-06 17:58:59 from Andres Freund

Responses

Re: explain HashAggregate to report bucket and memory stats at 2020-03-09 17:33:55 from Andres Freund
Re: explain HashAggregate to report bucket and memory stats at 2020-03-13 17:15:46 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dent John	2020-03-06 21:36:15	Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR
Previous Message	Thomas Munro	2020-03-06 21:33:03	Re: effective_io_concurrency's steampunk spindle maths