From: | Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: ZStandard (with dictionaries) compression support for TOAST compression |
Date: | 2025-04-22 01:53:33 |
Message-ID: | CAFAfj_HWZ0Fe+2iKqY7qwra9CiXBjBks8QUGXXEnv=WQWQwE6A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Michael,
Thanks for the feedback and the suggested patch sequence. I completely
agree—we must minimize storage overhead when dictionaries aren’t used,
while ensuring varattrib_4b remains extensible enough to handle future
compression metadata beyond dictionary ID (for other algorithms). I’ll
explore design options that satisfy both goals and share my proposal.
Best regards,
Nikhil Veldanda
On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> > I think we could add plain-old zstd compression without really
> > tackling this issue, but if we are going to add dictionaries then I
> > think we might need to revisit the idea of preventing things from
> > leaking out of tables. What I can't quite remember at the moment is
> > how much of the problem was that it was going to be slow to force the
> > recompression, and how much of it was that we weren't sure we could
> > even find all the places in the code that might need such handling.
>
> FWIW, this point resonates here. There is one thing that we have to
> do anyway: we just have one bit left in the varlena headers as lz4 is
> using the one before last. So we have to make it extensible, even if
> it means that any compression method other than LZ4 and pglz would
> consume one more byte in its header by default. And I think that this
> has to happen at some point if we want flexibility in this area.
>
> + struct
> + {
> + uint32 va_header;
> + uint32 va_tcinfo;
> + uint32 va_cmp_alg;
> + uint32 va_cmp_dictid;
> + char va_data[FLEXIBLE_ARRAY_MEMBER];
> + } va_compressed_ext;
>
> Speaking of which, I am confused by this abstraction choice in
> varatt.h in the first patch. Are we sure that we are always going to
> have a dictionary attached to a compressed data set or even a
> va_cmp_alg? It seems to me that this could lead to a waste of data in
> some cases because these fields may not be required depending on the
> compression method used, as some fields may not care about these
> details. This kind of data should be made optional, on a per-field
> basis.
>
> One thing that I've been wondering is how it would be possible to make
> the area around varattrib_4b more readable while dealing with more
> extensibility. It would be a good occasion to improve that, even if
> I'm hand-waving here currently and that the majority of this code is
> old enough to vote, with few modifications across the years.
>
> The second thing that I'd love to see on top of the addition of the
> extensibility is adding plain compression support for zstd, with
> nothing fancy, just the compression and decompression bits. I've done
> quite a few benchmarks with the two, and results kind of point in the
> direction that zstd is more efficient than lz4 overall. Don't take me
> wrong: lz4 can be better in some workloads as it can consume less CPU
> than zstd while compressing less. However, a comparison of ratios
> like (compression rate / cpu used) has always led me to see zstd as
> superior in a large number of cases. lz4 is still very good if you
> are CPU-bound and don't care about the extra space required. Both are
> three classes better than pglz.
>
> Once we have these three points incrementally built-in together (the
> last bit extensibility, the potential varatt.h refactoring and the
> zstd support), there may be a point in having support for more
> advanced options with the compression methods in the shape of dicts or
> more requirements linked to other compression methods, but I think the
> topic is complex enough that we should make sure that these basics are
> implemented in a way sane enough so as we'd be able to extend them
> with all the use cases in mind.
> --
> Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2025-04-22 02:11:39 | Recent CFbot failure |
Previous Message | Hayato Kuroda (Fujitsu) | 2025-04-22 01:25:02 | RE: Regression test fails when 1) old PG is installed and 2) meson/ninja build is used |