Quick Links

Re: pglz performance

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru>
Subject:	Re: pglz performance
Date:	2019-08-02 17:00:39
Message-ID:	20190802170039.o4pabnzm4xy3z7uj@development
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Aug 02, 2019 at 09:39:48AM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:
>> We have some kind of "roadmap" of "extensible pglz". We plan to
>> provide implementation on Novembers CF.
>
>I don't understand why it's a good idea to improve the compression side
>of pglz. There's plenty other people that spent a lot of time
>developing better compression algorithms.
>

Isn't it beneficial for existing systems, that will be stuck with pglz
even if we end up adding other algorithms?

>
>> Currently, pglz starts with empty cache map: there is no prior 4k
>> bytes before start. We can add imaginary prefix to any data with
>> common substrings: this will enhance compression ratio. It is hard
>> to decide on training data set for this "common prefix". So we want
>> to produce extension with aggregate function which produces some
>> "adapted common prefix" from users's data. Then we can "reserve" few
>> negative bytes for "decompression commands". This command can
>> instruct database on which common prefix to use. But also system
>> command can say "invoke decompression from extension".
>>
>> Thus, user will be able to train database compression on his data and
>> substitute pglz compression with custom compression method
>> seamlessly.
>>
>> This will make hard-choosen compression unneeded, but seems overly
>> hacky. But there will be no need to have lz4, zstd, brotli, lzma and
>> others in core. Why not provide e.g. "time series compression"? Or
>> "DNA compression"? Whatever gun user wants for his foot.
>
>I think this is way too complicated, and will provide not particularly
>much benefit for the majority users.
>

I agree with this. I do see value in the feature, but probably not as a
drop-in replacement for the default compression algorithm. I'd compare
it to the "custom compression methods" patch that was submitted some
time ago.

>In fact, I'll argue that we should flat out reject any such patch until
>we have at least one decent default compression algorithm in core.
>You're trying to work around a poor compression algorithm with
>complicated dictionary improvement, that require user interaction, and
>only will work in a relatively small subset of the cases, and will very
>often increase compression times.
>

I wouldn't be so strict I guess. But I do agree an algorithm that
requires additional steps (training, ...) is unlikely to be a good
candidate for default instance compression alorithm.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: pglz performance at 2019-08-02 16:39:48 from Andres Freund

Responses

Re: pglz performance at 2019-08-02 17:12:58 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2019-08-02 17:12:58	Re: pglz performance
Previous Message	Konstantin Knizhnik	2019-08-02 16:39:59	Re: Add client connection check during the execution of the query