Re: [HACKERS] Custom compression methods

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2020-06-28 12:22:38
Message-ID: CAFiTN-v3soZKaYtR2ig43t4haJJx3FZXMd2hDaj3E1mtSjwJPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 24, 2020 at 5:30 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Jun 23, 2020 at 4:00 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > https://postgr.es/m/20130621000900.GA12425%40alap2.anarazel.de is a
> > thread with more information / patches further along.
> >
> > I confused this patch with the approach in
> > https://www.postgresql.org/message-id/d8576096-76ba-487d-515b-44fdedba8bb5%402ndquadrant.com
> > sorry for that. It obviously still differs by not having lower space
> > overhead (by virtue of not having a 4 byte 'va_cmid', but no additional
> > space for two methods, and then 1 byte overhead for 256 more), but
> > that's not that fundamental a difference.
>
> Wait a minute. Are we saying there are three (3) dueling patches for
> adding an alternate TOAST algorithm? It seems like there is:
>
> This "custom compression methods" thread - vintage 2017 - Original
> code by Nikita Glukhov, later work by Ildus Kurbangaliev
> The "pluggable compression support" thread - vintage 2013 - Andres Freund
> The "plgz performance" thread - vintage 2019 - Petr Jelinek
>
> Anyone want to point to a FOURTH implementation of this feature?
>
> I guess the next thing to do is figure out which one is the best basis
> for further work.

I have gone through these 3 threads and here is a summary of what I
understand from them. Feel free to correct me if I have missed
something.

#1. Custom compression methods: Provide a mechanism to create/drop
compression methods by using external libraries, and it also provides
a way to set the compression method for the columns/types. There are
a few complexities with this approach those are listed below:

a. We need to maintain the dependencies between the column and the
compression method. And the bigger issue is, even if the compression
method is changed, we need to maintain the dependencies with the older
compression methods as we might have some older tuples that were
compressed with older methods.
b. Inside the compressed attribute, we need to maintain the
compression method so that we know how to decompress it. For this, we
use 2 bits from the raw_size of the compressed varlena header.

#2. pglz performance: Along with pglz this patch provides an
additional compression method using lz4. The new compression method
can be enabled/disabled during configure time or using SIGHUP. We use
1 bit from the raw_size of the compressed varlena header to identify
the compression method (pglz or lz4).

#3. pluggable compression: This proposal is to replace the existing
pglz algorithm, with the snappy or lz4 whichever is better. As per
the performance data[1], it appeared that the lz4 is the winner in
most of the cases.
- This also provides an additional patch to plugin any compression method.
- This will also use 2 bits from the raw_size of the compressed
attribute for identifying the compression method.
- Provide an option to select the compression method using GUC, but
the comments in the patch suggest to remove the GUC. So it seems that
GUC was used only for the POC.
- Honestly, I did not clearly understand from this patch set that
whether it proposes to replace the existing compression method with
the best method (and the plugin is just provided for performance
testing) or it actually proposes an option to have pluggable
compression methods.

IMHO, We can provide a solution based on #1 and #2, i.e. we can
provide a few best compression methods in the core, and on top of
that, we can also provide a mechanism to create/drop the external
compression methods.

[1] https://www.postgresql.org/message-id/20130621000900.GA12425%40alap2.anarazel.de

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2020-06-28 13:40:00 Re: PostgreSQL: WolfSSL support
Previous Message Daniel Gustafsson 2020-06-28 11:39:38 TLS checking in pgstat