From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Julien Tachoires <julmon(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Compress ReorderBuffer spill files using LZ4 |
Date: | 2024-09-23 16:13:23 |
Message-ID: | 89871cc8-f46c-4364-b6d9-5d6e93448339@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I've spent a bit more time on this, mostly running tests to get a better
idea of the practical benefits.
Firstly, I think there's a bug in ReorderBufferCompress() - it's legal
for pglz_compress() to return -1. This can happen if the data is not
compressible, and would not fit into the output buffer. The code can't
just do elog(ERROR) in this case, it needs to handle that by storing the
raw data. The attached fixup patch makes this work for me - I'm not
claiming this is the best way to handle this, but it works.
FWIW I find it strange the tests included in the patch did not trigger
this. That probably means the tests are not quite sufficient.
Now, to the testing. Attached are two scripts, testing different cases:
test-columns.sh - Table with a variable number of 'float8' columns.
test-toast.sh - Table with a single text column.
The script always sets up a publication/subscription on two instances,
generates certain amount of data (~1GB for columns, ~3.2GB for TOAST),
waits for it to be replicated to the replica, and measures how much data
was spilled to disk with the different compression methods (off, pglz
and lz4). There's a couple more metrics, but that's irrelevant here.
For the "column" test, it looks like this (this is in MB):
rows columns distribution off pglz lz4
========================================================
100000 1000 compressible 778 20 9
random 778 778 16
--------------------------------------------------------
1000000 100 compressible 916 116 62
random 916 916 67
It's very clear that for the "compressible" data (which just copies the
same value into all columns), both pglz and lz4 can significantly reduce
the amount of data. For 1000 columns it's 780MB -> 20MB/9MB, for 100
columns it's a bit less efficient, but still good.
For the "random" data (where every column gets a random value, but rows
are copied), it's a very different story - pglz does not help at all,
while lz4 still massively reduces the amount of spilled data.
I think the explanation is very simple - for pglz, we compress each row
on it's own, there's no concept of streaming/context. If a row is
compressible, it works fine, but when the row gets random, pglz can't
compress it at all. For lz4, this does not matter, because with the
streaming mode it still sees that rows are just repeated, and so can
compress them efficiently.
For TOAST test, the results look like this:
distribution repeats toast off pglz lz4
===============================================================
compressible 10000 lz4 14 2 1
pglz 40 4 3
1000 lz4 32 16 9
pglz 54 17 10
---------------------------------------------------------
random 10000 lz4 3305 3305 3157
pglz 3305 3305 3157
1000 lz4 3166 3162 1580
pglz 3334 3326 1745
----------------------------------------------------------
random2 10000 lz4 3305 3305 3157
pglz 3305 3305 3158
1000 lz4 3160 3156 3010
pglz 3334 3326 3172
The "repeats" value means how long the string is - it's the number of
"md5" hashes added to the string. The number of rows is calculated to
keep the total amount of data the same. The "toast" column tracks what
compression was used for TOAST, I was wondering if it matters.
This time there are three data distributions - compressible means that
each TOAST value is nicely compressible, "random" means each value is
random (not compressible), but the rows are just copy of the same value
(so on the whole there's a lot of redundancy). And "random2" means each
row is random and unique (so not compressible at all).
The table shows that with compressible TOAST values, compressing the
spill file is rather useless. The reason is that ReorderBufferCompress
is handling raw TOAST data, which is already compressed. Yes, it may
further reduce the amount of data, but it's negligible when compared to
the original amount of data.
For the random cases, the spill compression is rather pointless. Yes,
lz4 can reduce it to 1/2 for the shorter strings, but other than that
it's not very useful.
For a while I was thinking this approach is flawed, because it only sees
and compressed changes one by one, and that seeing a batch of changes
would improve this (e.g. we'd see the copied rows). But I realized lz4
already does that (in the streaming mode at least), and yet it does not
help very much. Presumably that depends on how large the context is. If
the random string is long enough, it won't help.
So maybe this approach is fine, and doing the compression at a lower
layer (for the whole file), would not really improve this. Even then
we'd only see a limited amount of data.
Maybe the right answer to this is that compression does not help cases
where most of the replicated data is TOAST, and that it can help cases
with wide (and redundant) rows, or repeated rows. And that lz4 is a
clearly superior choice. (This also raises the question if we want to
support REORDER_BUFFER_STRAT_LZ4_REGULAR. I haven't looked into this,
but doesn't that behave more like pglz, i.e. no context?)
FWIW when doing these tests, it made me realize how useful would it be
to track both the "raw" and "spilled" amounts. That is before/after
compression. It'd make calculating compression ratio much easier.
regards
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
test-toast.sh | application/x-shellscript | 3.8 KB |
test-columns.sh | application/x-shellscript | 3.5 KB |
results-columns-1727083454.csv | text/csv | 3.0 KB |
results-toast-1727088557.csv | text/csv | 3.2 KB |
0001-compression-fixup.patch | text/x-patch | 2.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-09-23 17:17:02 | Re: Cleaning up ERRCODE usage in our XML code |
Previous Message | Alvaro Herrera | 2024-09-23 15:50:49 | Re: restrict_nonsystem_relation_kind led to regression (kinda) |