From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Subject: | Re: explain HashAggregate to report bucket and memory stats |
Date: | 2020-03-20 08:44:42 |
Message-ID: | 20200320084442.GX26184@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 13, 2020 at 10:57:43AM -0700, Andres Freund wrote:
> On 2020-03-13 10:53:17 -0700, Jeff Davis wrote:
> > On Fri, 2020-03-13 at 10:27 -0700, Andres Freund wrote:
> > > On 2020-03-13 10:15:46 -0700, Jeff Davis wrote:
> > > > Also, is there a reason you report two different memory values
> > > > (hashtable and tuples)? I don't object, but it seems like a little too
> > > > much detail.
> > >
> > > Seems useful to me - the hashtable is pre-allocated based on estimates,
> > > whereas the tuples are allocated "on demand". So seeing the difference
> > > will allow to investigate the more crucial issue...
> > Then do we also want to report separately on the by-ref transition
> > values? That could be useful if you are using ARRAY_AGG and the states
> > grow larger than you might expect.
>
> I can see that being valuable - I've had to debug cases with too much
> memory being used due to aggregate transitions before. Right now it'd be
> mixed in with tuples, I believe - and we'd need a separate context for
> tracking the transition values? Due to that I'm inclined to not report
> separately for now.
I think that's already in a separate context indexed by grouping set:
src/include/nodes/execnodes.h: ExprContext **aggcontexts; /* econtexts for long-lived data (per GS) */
But the hashtable and tuples are combined. I put them in separate contexts and
rebased on top of 1f39bce021540fde00990af55b4432c55ef4b3c7.
But didn't do anything yet with the aggcontexts.
Now I can get output like:
|template1=# explain analyze SELECT i,COUNT(1) FROM t GROUP BY 1;
| HashAggregate (cost=4769.99..6769.98 rows=199999 width=12) (actual time=266.465..27020.333 rows=199999 loops=1)
| Group Key: i
| Buckets: 524288 (originally 262144)
| Peak Memory Usage: hashtable: 12297kB, tuples: 24576kB
| Disk Usage: 192 kB
| HashAgg Batches: 3874
| -> Seq Scan on t (cost=0.00..3769.99 rows=199999 width=4) (actual time=13.043..64.017 rows=199999 loops=1)
It looks somewhat funny next to hash join, which puts everything on one line:
|template1=# explain analyze SELECT i,COUNT(1) FROM t a JOIN t b USING(i) GROUP BY 1;
| HashAggregate (cost=13789.95..15789.94 rows=199999 width=12) (actual time=657.733..27129.873 rows=199999 loops=1)
| Group Key: a.i
| Buckets: 524288 (originally 262144)
| Peak Memory Usage: hashtable: 12297kB, tuples: 24576kB
| Disk Usage: 192 kB
| HashAgg Batches: 3874
| -> Hash Join (cost=6269.98..12789.95 rows=199999 width=4) (actual time=135.932..426.071 rows=199999 loops=1)
| Hash Cond: (a.i = b.i)
| -> Seq Scan on t a (cost=0.00..3769.99 rows=199999 width=4) (actual time=3.265..47.598 rows=199999 loops=1)
| -> Hash (cost=3769.99..3769.99 rows=199999 width=4) (actual time=131.881..131.882 rows=199999 loops=1)
| Buckets: 262144 Batches: 1 Memory Usage: 9080kB
| -> Seq Scan on t b (cost=0.00..3769.99 rows=199999 width=4) (actual time=3.273..40.163 rows=199999 loops=1)
--
Justin
Attachment | Content-Type | Size |
---|---|---|
v8-0001-nodeAgg-separate-context-for-each-hashtable.patch | text/x-diff | 9.8 KB |
v8-0002-explain-to-show-tuplehash-bucket-and-memory-stats.patch | text/x-diff | 30.0 KB |
v8-0003-refactor-show_grouping_set_keys.patch | text/x-diff | 3.0 KB |
v8-0004-Gross-hack-to-put-hash-stats-of-subplans-in-the-r.patch | text/x-diff | 5.5 KB |
v8-0005-implement-hash-stats-for-bitmapHeapScan.patch | text/x-diff | 6.3 KB |
v8-0006-Refactor-for-consistency-symmetry.patch | text/x-diff | 14.8 KB |
v8-0007-TupleHashTable.entrysize-was-unused-except-for-in.patch | text/x-diff | 1.6 KB |
v8-0008-Update-comment-obsolete-since-69c3936a.patch | text/x-diff | 890 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2020-03-20 09:46:50 | Re: plan cache overhead on plpgsql expression |
Previous Message | Pavel Stehule | 2020-03-20 08:28:22 | Re: proposal: schema variables |