From: | Petr Jelinek <petr(at)2ndquadrant(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru> |
Subject: | Re: pglz performance |
Date: | 2019-08-04 00:41:24 |
Message-ID: | d8576096-76ba-487d-515b-44fdedba8bb5@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 02/08/2019 21:48, Tomas Vondra wrote:
> On Fri, Aug 02, 2019 at 11:20:03AM -0700, Andres Freund wrote:
>
>>
>>> Another question is whether we'd actually want to include the code in
>>> core directly, or use system libraries (and if some packagers might
>>> decide to disable that, for whatever reason).
>>
>> I'd personally say we should have an included version, and a
>> --with-system-... flag that uses the system one.
>>
>
> OK. I'd say to require a system library, but that's a minor detail.
>
Same here.
Just so that we don't idly talk, what do you think about the attached?
It:
- adds new GUC compression_algorithm with possible values of pglz
(default) and lz4 (if lz4 is compiled in), requires SIGHUP
- adds --with-lz4 configure option (default yes, so the configure option
is actually --without-lz4) that enables the lz4, it's using system library
- uses the compression_algorithm for both TOAST and WAL compression (if on)
- supports slicing for lz4 as well (pglz was already supported)
- supports reading old TOAST values
- adds 1 byte header to the compressed data where we currently store the
algorithm kind, that leaves us with 254 more to add :) (that's an extra
overhead compared to the current state)
- changes the rawsize in TOAST header to 31 bits via bit packing
- uses the extra bit to differentiate between old and new format
- supports reading from table which has different rows stored with
different algorithm (so that the GUC itself can be freely changed)
Simple docs and a TAP test included.
I did some basic performance testing (it's not really my thing though,
so I would appreciate if somebody did more).
I get about 7x perf improvement on data load with lz4 compared to pglz
on my dataset but strangely only tiny decompression improvement. Perhaps
more importantly I also did before patch and after patch tests with pglz
and the performance difference with my data set was <1%.
Note that this will just link against lz4, it does not add lz4 into
PostgreSQL code-base.
The issues I know of:
- the pg_decompress function really ought to throw error in the default
branch but that file is also used in front-end so not sure how to do that
- the TAP test probably does not work with all possible configurations
(but that's why it needs to be set in PG_TEST_EXTRA like for example ssl)
- we don't really have any automated test for reading old TOAST format,
no idea how to do that
- I expect my changes to configure.in are not the greatest as I don't
have pretty much zero experience with autoconf
--
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/
Attachment | Content-Type | Size |
---|---|---|
0001-Add-new-GUC-compression_algorithm.patch | text/x-patch | 34.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-08-04 02:20:04 | Re: More refactoring for BuildIndexInfo |
Previous Message | Tom Lane | 2019-08-03 23:14:13 | Re: Redacting information from logs |