From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: ZStandard (with dictionaries) compression support for TOAST compression |
Date: | 2025-04-23 15:59:26 |
Message-ID: | CA+TgmoY8uCKFvVYDQsLbfhY3JYrduXE7WHi75hmfTWxjWioaaw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Apr 21, 2025 at 8:52 PM Nikhil Kumar Veldanda
<veldanda(dot)nikhilkumar17(at)gmail(dot)com> wrote:
> After reviewing the email thread you attached on previous response, I
> identified a natural choke point for both inserts and updates: the
> call to "heap_toast_insert_or_update" inside
> heap_prepare_insert/heap_update. In the current master branch, that
> function only runs when HeapTupleHasExternal is true; my patch extends
> it to HeapTupleHasVarWidth tuples as well.
Isn't that basically all tuples, though? I think that's where this gets painful.
> On the performance side, my basic benchmarks show almost no regression
> for simple INSERT … VALUES workloads. CTAS, however, does regress
> noticeably: a CTAS completes in about 4 seconds before this patch, but
> with this patch it takes roughly 24 seconds. (For reference, a normal
> insert into the source table took about 58 seconds when using zstd
> dictionary compression), I suspect the extra cost comes from the added
> zstd decompression and PGLZ compression on the destination table.
That's nice to know, but I think the key question is not so much what
the feature costs when it is used but what it costs when it isn't
used. If we implement a system where we don't let
dictionary-compressed zstd datums leak out of tables, that's bound to
slow down a CTAS from a table where this feature is used, but that's
kind of OK: the feature has pros and cons, and if you don't like those
tradeoffs, you don't have to use it. However, it sounds like this
could also slow down inserts and updates in some cases even for users
who are not making use of the feature, and that's going to be a major
problem unless it can be shown that there is no case where the impact
is at all significant. Users hate paying for features that they aren't
using.
I wonder if there's a possible design where we only allow
dictionary-compressed datums to exist as top-level attributes in
designated tables to which those dictionaries are attached; and any
time you try to bury that Datum inside a container object (row, range,
array, whatever) detoasting is forced. If there's a clean and
inexpensive way to implement that, then you could avoid having
heap_toast_insert_or_update care about HeapTupleHasExternal(), which
seems like it might be a key point.
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-04-23 16:04:20 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Previous Message | George MacKerron | 2025-04-23 15:46:59 | Re: Making sslrootcert=system work on Windows psql |