From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Jeff Davis <jdavis(at)postgresql(dot)org> |
Cc: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: significant slowdown of HashAggregate between 9.6 and 10 |
Date: | 2020-06-04 18:41:16 |
Message-ID: | 20200604184116.s5bluoak67hgbpl2@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-06-03 13:26:43 -0700, Andres Freund wrote:
> On 2020-06-03 21:31:01 +0200, Tomas Vondra wrote:
> > So there seems to be +40% between 9.6 and 10, and further +25% between
> > 10 and master. However, plain hashagg, measured e.g. like this:
As far as I can tell the 10->master difference comes largely from the
difference of the number of buckets in the hashtable.
In 10 it is:
Breakpoint 1, tuplehash_create (ctx=0x5628251775c8, nelements=75, private_data=0x5628251952f0)
and in master it is:
Breakpoint 1, tuplehash_create (ctx=0x5628293a0a70, nelements=256, private_data=0x5628293a0b90)
As far as I can tell the timing difference simply is the cost of
iterating 500k times over a hashtable with fairly few entries. Which is,
unsurprisingly, more expensive if the hashtable is larger.
The reason the hashtable got bigger in 12 is
commit 1f39bce021540fde00990af55b4432c55ef4b3c7
Author: Jeff Davis <jdavis(at)postgresql(dot)org>
Date: 2020-03-18 15:42:02 -0700
Disk-based Hash Aggregation.
which introduced
+/* minimum number of initial hash table buckets */
+#define HASHAGG_MIN_BUCKETS 256
I don't really see much explanation for that part in the commit, perhaps
Jeff can chime in?
I think optimizing for the gazillion hash table scans isn't particularly
important. Rarely is a query going to have 500k scans of unchanging
aggregated data. So I'm not too concerned about the 13 regression - but
I also see very little reason to just always use 256 buckets? It's
pretty darn common to end up with 1-2 groups, what's the point of this?
I'll look into 9.6->10 after buying groceries... But I'd wish there were
a relevant benchmark, I don't think it's worth optimizing for this case.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Jonathan S. Katz | 2020-06-04 18:47:54 | REL_13_STABLE Branch |
Previous Message | Rajin Raj | 2020-06-04 18:27:53 | Re: Regarding TZ conversion |