Re: pglz performance

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vladimir Leskov <vladimirlesk(at)yandex-team(dot)ru>
Subject: Re: pglz performance
Date: 2019-08-02 17:00:39
Message-ID: 20190802170039.o4pabnzm4xy3z7uj@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 02, 2019 at 09:39:48AM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-08-02 20:40:51 +0500, Andrey Borodin wrote:
>> We have some kind of "roadmap" of "extensible pglz". We plan to
>> provide implementation on Novembers CF.
>
>I don't understand why it's a good idea to improve the compression side
>of pglz. There's plenty other people that spent a lot of time
>developing better compression algorithms.
>

Isn't it beneficial for existing systems, that will be stuck with pglz
even if we end up adding other algorithms?

>
>> Currently, pglz starts with empty cache map: there is no prior 4k
>> bytes before start. We can add imaginary prefix to any data with
>> common substrings: this will enhance compression ratio. It is hard
>> to decide on training data set for this "common prefix". So we want
>> to produce extension with aggregate function which produces some
>> "adapted common prefix" from users's data. Then we can "reserve" few
>> negative bytes for "decompression commands". This command can
>> instruct database on which common prefix to use. But also system
>> command can say "invoke decompression from extension".
>>
>> Thus, user will be able to train database compression on his data and
>> substitute pglz compression with custom compression method
>> seamlessly.
>>
>> This will make hard-choosen compression unneeded, but seems overly
>> hacky. But there will be no need to have lz4, zstd, brotli, lzma and
>> others in core. Why not provide e.g. "time series compression"? Or
>> "DNA compression"? Whatever gun user wants for his foot.
>
>I think this is way too complicated, and will provide not particularly
>much benefit for the majority users.
>

I agree with this. I do see value in the feature, but probably not as a
drop-in replacement for the default compression algorithm. I'd compare
it to the "custom compression methods" patch that was submitted some
time ago.

>In fact, I'll argue that we should flat out reject any such patch until
>we have at least one decent default compression algorithm in core.
>You're trying to work around a poor compression algorithm with
>complicated dictionary improvement, that require user interaction, and
>only will work in a relatively small subset of the cases, and will very
>often increase compression times.
>

I wouldn't be so strict I guess. But I do agree an algorithm that
requires additional steps (training, ...) is unlikely to be a good
candidate for default instance compression alorithm.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-08-02 17:12:58 Re: pglz performance
Previous Message Konstantin Knizhnik 2019-08-02 16:39:59 Re: Add client connection check during the execution of the query