From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Use generation context to speed up tuplesorts |
Date: | 2021-07-30 06:42:18 |
Message-ID: | CAApHDvoH4ASzsAOyHcxkuY01Qf++8JJ0paw+03dk+W25tQEcNQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hackers,
While reviewing and benchmarking 91e9e89dc (Make nodeSort.c use Datum
sorts for single column sorts), I noticed that we use a separate
memory context to store tuple data and we just reset when we're done
if the sort fits in memory, or we dump the tuples to disk in the same
order they're added and reset the context when it does not. There is
a little pfree() work going on via writetup_heap() which I think
possibly could just be removed to get some additional gains.
Anyway, this context that stores tuples uses the standard aset.c
allocator which has the usual power of 2 wastage and additional
overheads of freelists etc. I wondered how much faster it would go if
I set it to use a generation context instead of an aset.c one.
After running make installcheck to make the tenk1 table, running the
attached tuplesortbench script, I get this:
Master:
work_mem = '4MB';
Sort Method: external merge Disk: 2496kB
work_mem = '4GB';
Sort Method: quicksort Memory: 5541kB
Patched:
work_mem = '4MB';
Sort Method: quicksort Memory: 3197kB
So it seems to save quite a bit of memory getting away from rounding
up allocations to the next power of 2.
Performance-wise, there's some pretty good gains. (Results in TPS)
work_mem = '4GB';
Test master gen sort compare
Test1 317.2 665.6 210%
Test2 228.6 388.9 170%
Test3 207.4 330.7 159%
Test4 185.5 279.4 151%
Test5 292.2 563.9 193%
If I drop the work_mem down to standard the unpatched version does to
disk, but the patched version does not. The gains get a little
bigger.
work_mem = '4MB';
Test master gen sort compare
Test1 177.5 658.2 371%
Test2 149.7 385.2 257%
Test3 137.5 330.0 240%
Test4 129.0 275.1 213%
Test5 161.7 546.4 338%
The patch is just a simple 1-liner at the moment. I likely do need to
adjust what I'm passing as the blockSize to GenerationContextCreate().
Maybe a better number would be something that's calculated from
work_mem, e.g Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64))
so that we just allocate at most a 16th of work_mem per chunk, but not
bigger than 8MB. I don't think changing this will affect the
performance of the above very much.
David
Attachment | Content-Type | Size |
---|---|---|
tuplesortbench.sh.txt | text/plain | 1.3 KB |
generation_tuplesort.patch | application/octet-stream | 710 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | houzj.fnst@fujitsu.com | 2021-07-30 06:42:43 | RE: Added schema level support for publication. |
Previous Message | Greg Nancarrow | 2021-07-30 06:32:51 | Re: [HACKERS] logical decoding of two-phase transactions |