From: | Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: ZStandard (with dictionaries) compression support for TOAST compression |
Date: | 2025-04-25 15:15:26 |
Message-ID: | CAFAfj_H1cQQh1nDYa6+TuJozk-8Q013_6n8XboX9-xRqbmAZfw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Michael,
Thanks for the suggestions. I agree that we should first solve the
“last–free-bit” problem in varattrib_4b compression bits before
layering on any features. Below is the approach I’ve prototyped to
keep the header compact yet fully extensible, followed by a sketch of
the plain-ZSTD(no dict) patch that sits cleanly on top of it.
1. Minimal but extensible header
/* varatt_cmp_extended follows va_tcinfo when the upper two bits of
* va_tcinfo are 11. Compressed data starts immediately after
* ext_data. ext_hdr encodes both the compression algorithm and the
* byte-length of the algorithm-specific metadata.
*/
typedef struct varatt_cmp_extended
{
uint32 ext_hdr; /* [ meta_size:24 | cmpr_id:8 ] */
char ext_data[FLEXIBLE_ARRAY_MEMBER]; /* optional metadata */
} varatt_cmp_extended;
a. 24 bits for length → per-datum compression algorithm metadata is
capped at 16 MB, which is far more than any realistic compression
header.
b. 8 bits for algorithm id → up to 256 algorithms.
c. Zero-overhead when unused if an algorithm needs no per-datum
metadata (e.g., ZSTD-nodict),
2. Algorithm registry
/*
* TOAST compression methods enumeration.
*
* Each entry defines:
* - NAME : identifier for the compression algorithm
* - VALUE : numeric enum value
* - METADATA type: struct type holding extra info (void when none)
*
* The INVALID entry is a sentinel and must remain last.
*/
#define TOAST_COMPRESSION_LIST \
X(PGLZ, 0, void) /* existing */ \
X(LZ4, 1, void) /* existing */ \
X(ZSTD_NODICT, 2, void) /* new, no metadata */ \
X(ZSTD_DICT, 3, zstd_dict_meta) /* new, needs dict_id */ \
X(INVALID, 4, void) /* sentinel */
typedef enum ToastCompressionId
{
#define X(name,val,meta) TOAST_##name##_COMPRESSION_ID = val,
TOAST_COMPRESSION_LIST
#undef X
} ToastCompressionId;
/* Example of an algorithm-specific metadata block */
typedef struct
{
uint32 dict_id; /* dictionary Oid */
} zstd_dict_meta;
3. Resulting on-disk layouts for zstd
ZSTD no dict: datum ondisk layout:
+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) | (11 in top two bits specify extended)
+----------------------------------+
| ext_hdr (uint32) | <-- [ meta size:24 bits |
compression id:8 bits ]
+----------------------------------+
| Compressed bytes … | <-- zstd (no dictionary)
+----------------------------------+
ZSTD dict: datum ondisk layout
+----------------------------------+
| va_header (uint32) |
+----------------------------------+
| va_tcinfo (uint32) |
+----------------------------------+
| ext_hdr (uint32) | <-- [ meta size:24 bits |
compression id:8 bits ]
+----------------------------------+
| dict_id (uint32) | <-- zstd_dict_meta
+----------------------------------+
| Compressed bytes … | <-- zstd (dictionary)
+----------------------------------+
4. How does this fit?
Flexibility: Each new algorithm that needs extra metadata simply
defines its own struct and allocates varatt_cmp_extended in
setup_compression_info.
Storage: Everything in varatt_cmp_extended is copied to the datum,
immediately followed by the compressed payload.
Optional, pay-as-you-go metadata – only algorithms that need it pay for it.
Future-proof – new compression algorithms, requires any kind of
metadata like dictid or any other slot into the same ext_data
mechanism.
I’ve split the work into two patches for review:
v19-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, enum plumbing,
and macros; behaviour unchanged.
v19-0002-zstd-nodict-support.patch: Plain ZSTD (non dict) support.
Please share your thoughts—and I’d love to hear feedback on the design. Thanks!
On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> > I think we could add plain-old zstd compression without really
> > tackling this issue, but if we are going to add dictionaries then I
> > think we might need to revisit the idea of preventing things from
> > leaking out of tables. What I can't quite remember at the moment is
> > how much of the problem was that it was going to be slow to force the
> > recompression, and how much of it was that we weren't sure we could
> > even find all the places in the code that might need such handling.
>
> FWIW, this point resonates here. There is one thing that we have to
> do anyway: we just have one bit left in the varlena headers as lz4 is
> using the one before last. So we have to make it extensible, even if
> it means that any compression method other than LZ4 and pglz would
> consume one more byte in its header by default. And I think that this
> has to happen at some point if we want flexibility in this area.
>
> + struct
> + {
> + uint32 va_header;
> + uint32 va_tcinfo;
> + uint32 va_cmp_alg;
> + uint32 va_cmp_dictid;
> + char va_data[FLEXIBLE_ARRAY_MEMBER];
> + } va_compressed_ext;
>
> Speaking of which, I am confused by this abstraction choice in
> varatt.h in the first patch. Are we sure that we are always going to
> have a dictionary attached to a compressed data set or even a
> va_cmp_alg? It seems to me that this could lead to a waste of data in
> some cases because these fields may not be required depending on the
> compression method used, as some fields may not care about these
> details. This kind of data should be made optional, on a per-field
> basis.
>
> One thing that I've been wondering is how it would be possible to make
> the area around varattrib_4b more readable while dealing with more
> extensibility. It would be a good occasion to improve that, even if
> I'm hand-waving here currently and that the majority of this code is
> old enough to vote, with few modifications across the years.
>
> The second thing that I'd love to see on top of the addition of the
> extensibility is adding plain compression support for zstd, with
> nothing fancy, just the compression and decompression bits. I've done
> quite a few benchmarks with the two, and results kind of point in the
> direction that zstd is more efficient than lz4 overall. Don't take me
> wrong: lz4 can be better in some workloads as it can consume less CPU
> than zstd while compressing less. However, a comparison of ratios
> like (compression rate / cpu used) has always led me to see zstd as
> superior in a large number of cases. lz4 is still very good if you
> are CPU-bound and don't care about the extra space required. Both are
> three classes better than pglz.
>
> Once we have these three points incrementally built-in together (the
> last bit extensibility, the potential varatt.h refactoring and the
> zstd support), there may be a point in having support for more
> advanced options with the compression methods in the shape of dicts or
> more requirements linked to other compression methods, but I think the
> topic is complex enough that we should make sure that these basics are
> implemented in a way sane enough so as we'd be able to extend them
> with all the use cases in mind.
> --
> Michael
--
Nikhil Veldanda
--
Nikhil Veldanda
Attachment | Content-Type | Size |
---|---|---|
v19-0002-zstd-nodict-support.patch | application/octet-stream | 23.8 KB |
v19-0001-varattrib_4b-design-proposal-to-make-it-extended.patch | application/octet-stream | 18.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nikita Malakhov | 2025-04-25 15:16:11 | Re: Introduce some randomness to autovacuum |
Previous Message | Andrei Lepikhov | 2025-04-25 15:13:26 | Re: MergeAppend could consider sorting cheapest child path |