| From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> | 
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> | 
| Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> | 
| Subject: | Re: explain HashAggregate to report bucket and memory stats | 
| Date: | 2020-03-20 08:44:42 | 
| Message-ID: | 20200320084442.GX26184@telsasoft.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Fri, Mar 13, 2020 at 10:57:43AM -0700, Andres Freund wrote:
> On 2020-03-13 10:53:17 -0700, Jeff Davis wrote:
> > On Fri, 2020-03-13 at 10:27 -0700, Andres Freund wrote:
> > > On 2020-03-13 10:15:46 -0700, Jeff Davis wrote:
> > > > Also, is there a reason you report two different memory values
> > > > (hashtable and tuples)? I don't object, but it seems like a little too
> > > > much detail.
> > > 
> > > Seems useful to me - the hashtable is pre-allocated based on estimates,
> > > whereas the tuples are allocated "on demand". So seeing the difference
> > > will allow to investigate the more crucial issue...
> > Then do we also want to report separately on the by-ref transition
> > values? That could be useful if you are using ARRAY_AGG and the states
> > grow larger than you might expect.
> 
> I can see that being valuable - I've had to debug cases with too much
> memory being used due to aggregate transitions before. Right now it'd be
> mixed in with tuples, I believe - and we'd need a separate context for
> tracking the transition values? Due to that I'm inclined to not report
> separately for now.
I think that's already in a separate context indexed by grouping set:
src/include/nodes/execnodes.h:  ExprContext **aggcontexts;      /* econtexts for long-lived data (per GS) */
But the hashtable and tuples are combined.  I put them in separate contexts and
rebased on top of 1f39bce021540fde00990af55b4432c55ef4b3c7.
But didn't do anything yet with the aggcontexts.
Now I can get output like:
|template1=# explain analyze SELECT i,COUNT(1) FROM t GROUP BY 1;
| HashAggregate  (cost=4769.99..6769.98 rows=199999 width=12) (actual time=266.465..27020.333 rows=199999 loops=1)
|   Group Key: i
|   Buckets: 524288 (originally 262144)
|   Peak Memory Usage: hashtable: 12297kB, tuples: 24576kB
|   Disk Usage: 192 kB
|   HashAgg Batches: 3874
|   ->  Seq Scan on t  (cost=0.00..3769.99 rows=199999 width=4) (actual time=13.043..64.017 rows=199999 loops=1)
It looks somewhat funny next to hash join, which puts everything on one line:
|template1=# explain  analyze SELECT i,COUNT(1) FROM t a JOIN t b USING(i) GROUP BY 1;
| HashAggregate  (cost=13789.95..15789.94 rows=199999 width=12) (actual time=657.733..27129.873 rows=199999 loops=1)
|   Group Key: a.i
|   Buckets: 524288 (originally 262144)
|   Peak Memory Usage: hashtable: 12297kB, tuples: 24576kB
|   Disk Usage: 192 kB
|   HashAgg Batches: 3874
|   ->  Hash Join  (cost=6269.98..12789.95 rows=199999 width=4) (actual time=135.932..426.071 rows=199999 loops=1)
|         Hash Cond: (a.i = b.i)
|         ->  Seq Scan on t a  (cost=0.00..3769.99 rows=199999 width=4) (actual time=3.265..47.598 rows=199999 loops=1)
|         ->  Hash  (cost=3769.99..3769.99 rows=199999 width=4) (actual time=131.881..131.882 rows=199999 loops=1)
|               Buckets: 262144  Batches: 1  Memory Usage: 9080kB
|               ->  Seq Scan on t b  (cost=0.00..3769.99 rows=199999 width=4) (actual time=3.273..40.163 rows=199999 loops=1)
-- 
Justin
| Attachment | Content-Type | Size | 
|---|---|---|
| v8-0001-nodeAgg-separate-context-for-each-hashtable.patch | text/x-diff | 9.8 KB | 
| v8-0002-explain-to-show-tuplehash-bucket-and-memory-stats.patch | text/x-diff | 30.0 KB | 
| v8-0003-refactor-show_grouping_set_keys.patch | text/x-diff | 3.0 KB | 
| v8-0004-Gross-hack-to-put-hash-stats-of-subplans-in-the-r.patch | text/x-diff | 5.5 KB | 
| v8-0005-implement-hash-stats-for-bitmapHeapScan.patch | text/x-diff | 6.3 KB | 
| v8-0006-Refactor-for-consistency-symmetry.patch | text/x-diff | 14.8 KB | 
| v8-0007-TupleHashTable.entrysize-was-unused-except-for-in.patch | text/x-diff | 1.6 KB | 
| v8-0008-Update-comment-obsolete-since-69c3936a.patch | text/x-diff | 890 bytes | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Pavel Stehule | 2020-03-20 09:46:50 | Re: plan cache overhead on plpgsql expression | 
| Previous Message | Pavel Stehule | 2020-03-20 08:28:22 | Re: proposal: schema variables |