From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru> |
Subject: | Re: pglz performance |
Date: | 2019-05-15 10:06:22 |
Message-ID: | 446036FB-EA13-4F18-B593-70EDE0A35366@yandex-team.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> 13 мая 2019 г., в 12:14, Michael Paquier <michael(at)paquier(dot)xyz> написал(а):
>
>> Currently we test mostly decompression improvements against two WALs
>> and one data file taken from pgbench-generated database. Any
>> suggestion on more relevant data payloads are very welcome.
>
> Text strings made of random data and variable length?
Like text corpus?
> For any test of
> this kind I think that it is good to focus on the performance of the
> low-level calls, even going as far as a simple C wrapper on top of the
> pglz APIs to test only the performance and not have extra PG-related
> overhead like palloc() which can be a barrier.
Our test_pglz extension is measuring only time of real compression, doing warmup run, all allocations are done before measurement.
> Focusing on strings of
> lengths of 1kB up to 16kB may be an idea of size, and it is important
> to keep the same uncompressed strings for performance comparison.
We intentionally avoid using generated data, thus keep test files committed into git repo.
Also we check that decompressed data matches source of compression. All tests are done 5 times.
We use PG extension only for simplicity of deployment of benchmarks to our PG clusters.
Here are some test results.
Currently we test on 4 payloads:
1. WAL from cluster initialization
2. 2 WALs from pgbench pgbench -i -s 10
3. data file taken from pgbench -i -s 10
We use these decompressors:
1. pglz_decompress_vanilla - taken from PG source code
2. pglz_decompress_hacked - use sliced memcpy to imitate byte-by-byte pglz decompression
3. pglz_decompress_hacked4, pglz_decompress_hacked8, pglz_decompress_hackedX - use memcpy if match is no less than X bytes. We need to determine best X, if this approach is used.
I used three platforms:
1. Server XEONE5-2660 SM/SYS1027RN3RF/10S2.5/1U/2P (2*INTEL XEON E5-2660/16*DDR3ECCREG/10*SAS-2.5) Under Ubuntu 14, PG 9.6.
2. Desktop Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz Ubuntu 18, PG 12devel
3. Laptop MB Pro 15 2015 2.2 GHz Core i7 (I7-4770HQ) MacOS, PG 12devel
Owners of AMD and ARM devices are welcome.
Server results (less is better):
NOTICE: 00000: Time to decompress one byte in ns:
NOTICE: 00000: Payload 000000010000000000000001
NOTICE: 00000: Decompressor pglz_decompress_hacked result 0.647235
NOTICE: 00000: Decompressor pglz_decompress_hacked4 result 0.671029
NOTICE: 00000: Decompressor pglz_decompress_hacked8 result 0.699949
NOTICE: 00000: Decompressor pglz_decompress_hacked16 result 0.739586
NOTICE: 00000: Decompressor pglz_decompress_hacked32 result 0.787926
NOTICE: 00000: Decompressor pglz_decompress_vanilla result 1.147282
NOTICE: 00000: Payload 000000010000000000000006
NOTICE: 00000: Decompressor pglz_decompress_hacked result 0.201774
NOTICE: 00000: Decompressor pglz_decompress_hacked4 result 0.211859
NOTICE: 00000: Decompressor pglz_decompress_hacked8 result 0.212610
NOTICE: 00000: Decompressor pglz_decompress_hacked16 result 0.214601
NOTICE: 00000: Decompressor pglz_decompress_hacked32 result 0.221813
NOTICE: 00000: Decompressor pglz_decompress_vanilla result 0.706005
NOTICE: 00000: Payload 000000010000000000000008
NOTICE: 00000: Decompressor pglz_decompress_hacked result 1.370132
NOTICE: 00000: Decompressor pglz_decompress_hacked4 result 1.388991
NOTICE: 00000: Decompressor pglz_decompress_hacked8 result 1.388502
NOTICE: 00000: Decompressor pglz_decompress_hacked16 result 1.529455
NOTICE: 00000: Decompressor pglz_decompress_hacked32 result 1.520813
NOTICE: 00000: Decompressor pglz_decompress_vanilla result 1.433527
NOTICE: 00000: Payload 16398
NOTICE: 00000: Decompressor pglz_decompress_hacked result 0.606943
NOTICE: 00000: Decompressor pglz_decompress_hacked4 result 0.623044
NOTICE: 00000: Decompressor pglz_decompress_hacked8 result 0.624118
NOTICE: 00000: Decompressor pglz_decompress_hacked16 result 0.620987
NOTICE: 00000: Decompressor pglz_decompress_hacked32 result 0.621183
NOTICE: 00000: Decompressor pglz_decompress_vanilla result 1.365318
Comment: pglz_decompress_hacked is unconditionally optimal. On most of cases it is 2x better than current implementation.
On 000000010000000000000008 it is only marginally better. pglz_decompress_hacked8 is few percents worse than pglz_decompress_hacked.
Desktop results:
NOTICE: Time to decompress one byte in ns:
NOTICE: Payload 000000010000000000000001
NOTICE: Decompressor pglz_decompress_hacked result 0.396454
NOTICE: Decompressor pglz_decompress_hacked4 result 0.429249
NOTICE: Decompressor pglz_decompress_hacked8 result 0.436413
NOTICE: Decompressor pglz_decompress_hacked16 result 0.478077
NOTICE: Decompressor pglz_decompress_hacked32 result 0.491488
NOTICE: Decompressor pglz_decompress_vanilla result 0.695527
NOTICE: Payload 000000010000000000000006
NOTICE: Decompressor pglz_decompress_hacked result 0.110710
NOTICE: Decompressor pglz_decompress_hacked4 result 0.115669
NOTICE: Decompressor pglz_decompress_hacked8 result 0.127637
NOTICE: Decompressor pglz_decompress_hacked16 result 0.120544
NOTICE: Decompressor pglz_decompress_hacked32 result 0.117981
NOTICE: Decompressor pglz_decompress_vanilla result 0.399446
NOTICE: Payload 000000010000000000000008
NOTICE: Decompressor pglz_decompress_hacked result 0.647402
NOTICE: Decompressor pglz_decompress_hacked4 result 0.691891
NOTICE: Decompressor pglz_decompress_hacked8 result 0.693834
NOTICE: Decompressor pglz_decompress_hacked16 result 0.776815
NOTICE: Decompressor pglz_decompress_hacked32 result 0.777960
NOTICE: Decompressor pglz_decompress_vanilla result 0.721192
NOTICE: Payload 16398
NOTICE: Decompressor pglz_decompress_hacked result 0.337654
NOTICE: Decompressor pglz_decompress_hacked4 result 0.355452
NOTICE: Decompressor pglz_decompress_hacked8 result 0.351224
NOTICE: Decompressor pglz_decompress_hacked16 result 0.362548
NOTICE: Decompressor pglz_decompress_hacked32 result 0.356456
NOTICE: Decompressor pglz_decompress_vanilla result 0.837042
Comment: identical to Server results.
Laptop results:
NOTICE: Time to decompress one byte in ns:
NOTICE: Payload 000000010000000000000001
NOTICE: Decompressor pglz_decompress_hacked result 0.661469
NOTICE: Decompressor pglz_decompress_hacked4 result 0.638366
NOTICE: Decompressor pglz_decompress_hacked8 result 0.664377
NOTICE: Decompressor pglz_decompress_hacked16 result 0.696135
NOTICE: Decompressor pglz_decompress_hacked32 result 0.634825
NOTICE: Decompressor pglz_decompress_vanilla result 0.676560
NOTICE: Payload 000000010000000000000006
NOTICE: Decompressor pglz_decompress_hacked result 0.213921
NOTICE: Decompressor pglz_decompress_hacked4 result 0.224864
NOTICE: Decompressor pglz_decompress_hacked8 result 0.229394
NOTICE: Decompressor pglz_decompress_hacked16 result 0.218141
NOTICE: Decompressor pglz_decompress_hacked32 result 0.220954
NOTICE: Decompressor pglz_decompress_vanilla result 0.242412
NOTICE: Payload 000000010000000000000008
NOTICE: Decompressor pglz_decompress_hacked result 1.053417
NOTICE: Decompressor pglz_decompress_hacked4 result 1.063704
NOTICE: Decompressor pglz_decompress_hacked8 result 1.007211
NOTICE: Decompressor pglz_decompress_hacked16 result 1.145089
NOTICE: Decompressor pglz_decompress_hacked32 result 1.079702
NOTICE: Decompressor pglz_decompress_vanilla result 1.051557
NOTICE: Payload 16398
NOTICE: Decompressor pglz_decompress_hacked result 0.251690
NOTICE: Decompressor pglz_decompress_hacked4 result 0.268125
NOTICE: Decompressor pglz_decompress_hacked8 result 0.269248
NOTICE: Decompressor pglz_decompress_hacked16 result 0.277880
NOTICE: Decompressor pglz_decompress_hacked32 result 0.270290
NOTICE: Decompressor pglz_decompress_vanilla result 0.705652
Comment: decompress time on WAL segments is statistically indistinguishable between hacked and original versions. Hacked decompression of data file is 2x faster.
We are going to try these tests on cascade lake processors too.
Best regards, Andrey Borodin.
From | Date | Subject | |
---|---|---|---|
Next Message | Hubert Zhang | 2019-05-15 10:19:38 | Replace hashtable growEnable flag |
Previous Message | Amit Langote | 2019-05-15 08:36:32 | Re: PostgreSQL 12: Feature Highlights |