Re: Compress ReorderBuffer spill files using LZ4

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Julien Tachoires <julmon(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Compress ReorderBuffer spill files using LZ4
Date: 2024-09-17 18:48:10
Message-ID: 8e550ead-1033-4105-a727-22d41fef0e59@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Julien,

Thanks for the last patch version and sorry for the delay. Here's a
quick review, I plan to do a bit of additional testing, hopefully
sometime this week.

Attached is v4, which is your v3 rebased to current master, with a
couple "review" commits, adding comments to relevant places, which I
find easier to follow. There's also a "pgindent" commit at the end,
showing some formatting issues, adding structs to typedefs, etc. This
needs to be applied to the earlier patches, I didn't want to have too
many pgindent commits.

v4-0002-review.patch
--------------------

1) I don't think we need to rename ReorderBufferDiskChange to
ReorderBufferDiskHeader. It seems we could easily just add a field to
ReorderBufferDiskChange, and keep using that, no? It's still just a
wrapper for ReorderBufferChange, even if it's compressed.

2) Likewise, I don't see the point in moving ReorderBufferDiskChange to
a different file. It seems like something that should remain private to
reorderbuffer.c. IMHO the correct thing would be to move the various
ReorderBuffer* funct from reorderbuffer_compress.c to reorderbuffer.c.
The compression is something the ReorderBuffer is responsible for, so
reorderbuffer.c is the right place for that.

3) That means the reorderbuffer_compress.c would have only the actual
compression code. But wouldn't it be better to have one file for each
compression algorithm, similar to src/fe_utils/astreamer_{pglz,lz4,...}?
Just a suggestion, though. Not sure.

4) logical_decoding_spill_compression moved a bit, next to the variables
for other logical_decoding GUCs. It's also missing the definition in a
header file, so I get a compiler warning, so add it to reorderbuffer.h.

5) Won't the code in ReorderBufferCompress() do palloc/pfree for each
piece of data we compress? That seems it might be pretty expensive, at
least for records that happen to be large (>8kB), because oversized
chunks are not cached in the context. IMHO this should use a single
buffer, similarly to what astreamer_lz4_compressor_content does.

6) I'm really confused by having "streaming" and "regular" compression.
I know lz4 supports different modes, but I'd expect only one of them
being really suitable for this, so why support both? But even more
importantly, ReorderBufferCompress() sets the strategy using

#define lz4_CanDoStreamingCompression(s) \
(s < (LZ4_RING_BUFFER_SIZE / 2))

But it does that with "s" being the "change size", and that can vary
wildly. Doesn't that means the strategy will change withing a single
file? Does that even make sense?

v4-0004-review.patch
--------------------

7) I'm not sure the "Fix spill_bytes counter" is actually a fix. Even
now (i.e. without compression) it tracks the size of transactions, not
the bytes written to disk exactly, because it doesn't include the bytes
for the size field, so maybe we should not change that ...

v4-0006-review.patch
--------------------

8) It seems a bit strange that we add lz4 first, and only later pglz.
IMHO it should be the other way around, as pglz is the default algorithm
we can rely to have, while lz4 etc. are optional.

One question I have is whether it might be better to compress stuff at a
lower layer - not in reorderbuffer.c, but for the whole file. But maybe
there are reasons why that would be difficult, I haven't tried that.

regards

--
Tomas Vondra

Attachment Content-Type Size
v4-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patch text/x-patch 32.7 KB
v4-0002-review.patch text/x-patch 5.6 KB
v4-0003-Fix-spill_bytes-counter.patch text/x-patch 2.8 KB
v4-0004-review.patch text/x-patch 1.2 KB
v4-0005-Compress-ReorderBuffer-spill-files-using-PGLZ.patch text/x-patch 4.1 KB
v4-0006-review.patch text/x-patch 1.2 KB
v4-0007-Compress-ReorderBuffer-spill-files-using-ZSTD.patch text/x-patch 14.6 KB
v4-0008-review.patch text/x-patch 1.3 KB
v4-0009-Add-the-subscription-option-spill_compression.patch text/x-patch 66.0 KB
v4-0010-Add-ReorderBuffer-ondisk-compression-tests.patch text/x-patch 5.0 KB
v4-0011-pgindent.patch text/x-patch 21.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florents Tselai 2024-09-17 19:03:17 Re: [PATCH] WIP: replace method for jsonpath
Previous Message David E. Wheeler 2024-09-17 18:40:57 Re: [PATCH] WIP: replace method for jsonpath