From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: 9.5: Better memory accounting, towards memory-bounded HashAgg |
Date: | 2014-12-28 20:37:05 |
Message-ID: | 1419799025.24895.59.camel@jeff-desktop |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 2014-12-23 at 01:16 -0800, Jeff Davis wrote:
> New patch attached (rebased, as well).
>
> I also see your other message about adding regression testing. I'm
> hesitant to slow down the tests for everyone to run through this code
> path though. Should I add regression tests, and then remove them later
> after we're more comfortable that it works?
Attached are some tests I ran. First, generate the data sets with
hashagg_test_data.sql. Then, do (I used work_mem at default of 4MB):
set enable_hashagg=false;
\o /tmp/sort.out
\i /tmp/hashagg_test.sql
\o
set enable_hashagg=true;
\o /tmp/hash.out
\i /tmp/hashagg_test.sql
and then diff'd the output to make sure the results are the same (except
the plans, of course). The script loads the results into a temp table,
then sorts it before outputting, to make the test order-independent. I
didn't just add an ORDER BY, because that would change the plan and it
would never use hashagg.
I think that has fairly good coverage of the hashagg code. I used 3
different input data sets, byval and byref types (for group key and
args), and a group aggregate query as well as DISTINCT. Let me know if I
missed something.
I also did some performance comparisons between disk-based sort+group
and disk-based hashagg. The results are quite favorable for hashagg
given the data sets I provided. Simply create the data using
hashagg_test_data.sql (if not already done), set the work_mem to the
value you want to test, and run hashagg_test_perf.sql. It uses EXPLAIN
ANALYZE for the timings.
singleton: 10M groups of 1
even: 1M groups of 10
skew: wildly different group sizes; see data script
q1: group aggregate query
q2: distinct query
The total memory requirements for the test to run without going to disk
ranges from about 100MB (for "even") to about 1GB (for "singleton").
Regardless of work_mem, these all fit in memory on my machine, so they
aren't *really* going to disk. Also note that, because of how the memory
blocks are allocated, and that hashagg waits until memory is exceeded,
then hashagg might use about double work_mem when work_mem is small (the
effect is not important at higher values).
work_mem='1MB':
sort+group (s) hashagg (s)
singleton q1 12 10
singleton q2 8 7
even q1 14 7
even q2 10 5
skew q1 22 6
skew q2 16 4
work_mem='4MB':
sort+group (s) hashagg (s)
singleton q1 12 11
singleton q2 8 6
even q1 12 7
even q2 9 5
skew q1 19 6
skew q2 13 3
work_mem='16MB':
sort+group (s) hashagg (s)
singleton q1 12 11
singleton q2 8 7
even q1 14 7
even q2 10 5
skew q1 15 6
skew q2 12 4
work_mem='64MB':
sort+group (s) hashagg (s)
singleton q1 13 12
singleton q2 9 8
even q1 14 8
even q2 10 5
skew q1 17 6
skew q2 13 4
work_mem='256MB':
sort+group (s) hashagg (s)
singleton q1 12 12
singleton q2 9 8
even q1 14 7
even q2 11 4
skew q1 16 6
skew q2 13 4
work_mem='512MB':
sort+group (s) hashagg (s)
singleton q1 12 12
singleton q2 9 7
even q1 14 7
even q2 10 4
skew q1 16 6
skew q2 12 4
work_mem='2GB':
sort+group (s) hashagg (s)
singleton q1 9 12
singleton q2 6 6
even q1 8 7
even q2 6 4
skew q1 7 6
skew q2 5 4
These numbers are great news for disk-based hashagg. It seems to be the
same or better than sort+group in nearly all cases (again, this example
doesn't actually go to disk, so those numbers may come out differently).
Also, the numbers are remarkably stable for varying work_mem for both
plans. That means that it doesn't cost much to keep a lower work_mem as
long as your system has plenty of memory.
Do others have similar numbers? I'm quite surprised at how little
work_mem seems to matter for these plans (HashJoin might be a different
story though). I feel like I made a mistake -- can someone please do a
sanity check on my numbers?
Regards,
Jeff Davis
Attachment | Content-Type | Size |
---|---|---|
hashagg_test_data.sql | application/sql | 1.2 KB |
hashagg_test.sql | application/sql | 1.8 KB |
hashagg_test_perf.sql | application/sql | 1.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2014-12-28 20:45:42 | Re: 9.5: Better memory accounting, towards memory-bounded HashAgg |
Previous Message | Oskari Saarenmaa | 2014-12-28 15:38:45 | Re: Proposal "VACUUM SCHEMA" |