Re: ZStandard (with dictionaries) compression support for TOAST compression

From: Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ZStandard (with dictionaries) compression support for TOAST compression
Date: 2025-04-25 15:15:26
Message-ID: CAFAfj_H1cQQh1nDYa6+TuJozk-8Q013_6n8XboX9-xRqbmAZfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

Thanks for the suggestions. I agree that we should first solve the
“last–free-bit” problem in varattrib_4b compression bits before
layering on any features. Below is the approach I’ve prototyped to
keep the header compact yet fully extensible, followed by a sketch of
the plain-ZSTD(no dict) patch that sits cleanly on top of it.

1. Minimal but extensible header

/* varatt_cmp_extended follows va_tcinfo when the upper two bits of
* va_tcinfo are 11. Compressed data starts immediately after
* ext_data. ext_hdr encodes both the compression algorithm and the
* byte-length of the algorithm-specific metadata.
*/
typedef struct varatt_cmp_extended
{
uint32 ext_hdr; /* [ meta_size:24 | cmpr_id:8 ] */
char ext_data[FLEXIBLE_ARRAY_MEMBER]; /* optional metadata */
} varatt_cmp_extended;

a. 24 bits for length → per-datum compression algorithm metadata is
capped at 16 MB, which is far more than any realistic compression
header.
b. 8 bits for algorithm id → up to 256 algorithms.
c. Zero-overhead when unused if an algorithm needs no per-datum
metadata (e.g., ZSTD-nodict),

2. Algorithm registry
/*
* TOAST compression methods enumeration.
*
* Each entry defines:
* - NAME : identifier for the compression algorithm
* - VALUE : numeric enum value
* - METADATA type: struct type holding extra info (void when none)
*
* The INVALID entry is a sentinel and must remain last.
*/
#define TOAST_COMPRESSION_LIST \
X(PGLZ, 0, void) /* existing */ \
X(LZ4, 1, void) /* existing */ \
X(ZSTD_NODICT, 2, void) /* new, no metadata */ \
X(ZSTD_DICT, 3, zstd_dict_meta) /* new, needs dict_id */ \
X(INVALID, 4, void) /* sentinel */

typedef enum ToastCompressionId
{
#define X(name,val,meta) TOAST_##name##_COMPRESSION_ID = val,
TOAST_COMPRESSION_LIST
#undef X
} ToastCompressionId;

/* Example of an algorithm-specific metadata block */
typedef struct
{
uint32 dict_id; /* dictionary Oid */
} zstd_dict_meta;

3. Resulting on-disk layouts for zstd

ZSTD no dict: datum ondisk layout:
+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) | (11 in top two bits specify extended)
+----------------------------------+
| ext_hdr (uint32) | <-- [ meta size:24 bits |
compression id:8 bits ]
+----------------------------------+
| Compressed bytes … | <-- zstd (no dictionary)
+----------------------------------+

ZSTD dict: datum ondisk layout
+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) |
+----------------------------------+
| ext_hdr (uint32) | <-- [ meta size:24 bits |
compression id:8 bits ]
+----------------------------------+
| dict_id (uint32) | <-- zstd_dict_meta
+----------------------------------+
| Compressed bytes … | <-- zstd (dictionary)
+----------------------------------+

4. How does this fit?

Flexibility: Each new algorithm that needs extra metadata simply
defines its own struct and allocates varatt_cmp_extended in
setup_compression_info.
Storage: Everything in varatt_cmp_extended is copied to the datum,
immediately followed by the compressed payload.
Optional, pay-as-you-go metadata – only algorithms that need it pay for it.
Future-proof – new compression algorithms, requires any kind of
metadata like dictid or any other slot into the same ext_data
mechanism.

I’ve split the work into two patches for review:
v19-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, enum plumbing,
and macros; behaviour unchanged.
v19-0002-zstd-nodict-support.patch: Plain ZSTD (non dict) support.

Please share your thoughts—and I’d love to hear feedback on the design. Thanks!

On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> > I think we could add plain-old zstd compression without really
> > tackling this issue, but if we are going to add dictionaries then I
> > think we might need to revisit the idea of preventing things from
> > leaking out of tables. What I can't quite remember at the moment is
> > how much of the problem was that it was going to be slow to force the
> > recompression, and how much of it was that we weren't sure we could
> > even find all the places in the code that might need such handling.
>
> FWIW, this point resonates here. There is one thing that we have to
> do anyway: we just have one bit left in the varlena headers as lz4 is
> using the one before last. So we have to make it extensible, even if
> it means that any compression method other than LZ4 and pglz would
> consume one more byte in its header by default. And I think that this
> has to happen at some point if we want flexibility in this area.
>
> + struct
> + {
> + uint32 va_header;
> + uint32 va_tcinfo;
> + uint32 va_cmp_alg;
> + uint32 va_cmp_dictid;
> + char va_data[FLEXIBLE_ARRAY_MEMBER];
> + } va_compressed_ext;
>
> Speaking of which, I am confused by this abstraction choice in
> varatt.h in the first patch. Are we sure that we are always going to
> have a dictionary attached to a compressed data set or even a
> va_cmp_alg? It seems to me that this could lead to a waste of data in
> some cases because these fields may not be required depending on the
> compression method used, as some fields may not care about these
> details. This kind of data should be made optional, on a per-field
> basis.
>
> One thing that I've been wondering is how it would be possible to make
> the area around varattrib_4b more readable while dealing with more
> extensibility. It would be a good occasion to improve that, even if
> I'm hand-waving here currently and that the majority of this code is
> old enough to vote, with few modifications across the years.
>
> The second thing that I'd love to see on top of the addition of the
> extensibility is adding plain compression support for zstd, with
> nothing fancy, just the compression and decompression bits. I've done
> quite a few benchmarks with the two, and results kind of point in the
> direction that zstd is more efficient than lz4 overall. Don't take me
> wrong: lz4 can be better in some workloads as it can consume less CPU
> than zstd while compressing less. However, a comparison of ratios
> like (compression rate / cpu used) has always led me to see zstd as
> superior in a large number of cases. lz4 is still very good if you
> are CPU-bound and don't care about the extra space required. Both are
> three classes better than pglz.
>
> Once we have these three points incrementally built-in together (the
> last bit extensibility, the potential varatt.h refactoring and the
> zstd support), there may be a point in having support for more
> advanced options with the compression methods in the shape of dicts or
> more requirements linked to other compression methods, but I think the
> topic is complex enough that we should make sure that these basics are
> implemented in a way sane enough so as we'd be able to extend them
> with all the use cases in mind.
> --
> Michael

--
Nikhil Veldanda

--
Nikhil Veldanda

Attachment Content-Type Size
v19-0002-zstd-nodict-support.patch application/octet-stream 23.8 KB
v19-0001-varattrib_4b-design-proposal-to-make-it-extended.patch application/octet-stream 18.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Malakhov 2025-04-25 15:16:11 Re: Introduce some randomness to autovacuum
Previous Message Andrei Lepikhov 2025-04-25 15:13:26 Re: MergeAppend could consider sorting cheapest child path