Re: ZStandard (with dictionaries) compression support for TOAST compression

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ZStandard (with dictionaries) compression support for TOAST compression
Date: 2025-04-22 16:24:08
Message-ID: wjrd7ubqm4sq6t4ddv2ae7xaf26vcf4z6i7nftag3e6zexqmyc@yjilk5fsqcfh
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-04-18 12:22:18 -0400, Robert Haas wrote:
> On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
> <veldanda(dot)nikhilkumar17(at)gmail(dot)com> wrote:
> > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)
> >
> > As compressed datums can be copied to other unrelated tables via CTAS,
> > INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
> > method inheritZstdDictionaryDependencies. This method is invoked at
> > the end of such statements and ensures that any dictionary
> > dependencies from source tables are copied to the destination table.
> > We determine the set of source tables using the relationOids field in
> > PlannedStmt.
>
> With the disclaimer that I haven't opened the patch or thought
> terribly deeply about this issue, at least not yet, my fairly strong
> suspicion is that this design is not going to work out, for multiple
> reasons. In no particular order:
>
> 1. I don't think users will like it if dependencies on a zstd
> dictionary spread like kudzu across all of their tables. I don't think
> they'd like it even if it were 100% accurate, but presumably this is
> going to add dependencies any time there MIGHT be a real dependency
> rather than only when there actually is one.
>
> 2. Inserting into a table or updating it only takes RowExclusiveLock,
> which is not even self-exclusive. I doubt that it's possible to change
> system catalogs in a concurrency-safe way with such a weak lock. For
> instance, if two sessions tried to do the same thing in concurrent
> transactions, they could both try to add the same dependency at the
> same time.
>
> 3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE
> TABLE...EXECUTE are the only ways that datums can creep from one table
> into another. For example, what if I create a plpgsql function that
> gets a value from one table and stores it in a variable, and then use
> that variable to drive an INSERT into another table? I seem to recall
> there are complex cases involving records and range types and arrays,
> too, where the compressed object gets wrapped inside of another
> object; though maybe that wouldn't matter to your implementation if
> INSERT INTO ... SELECT uses a sufficiently aggressive strategy for
> adding dependencies.

+1 to all of these.

> I think we could add plain-old zstd compression without really
> tackling this issue

+1

> I'm now also curious to know whether Andres would agree that it's bad
> if zstd dictionaries are un-droppable. After all, I thought it would
> be bad if there was no way to eliminate a dependency on a compression
> method, and he disagreed.

I still am not too worried about that aspect. However:

> So maybe he would also think undroppable dictionaries are fine.

I'm much less sanguine about this. Imagine a schema based multi-tenancy setup,
where tenants come and go, and where a few of the tables use custom
dictionaries. Whereas not being able to get rid of lz4 at all has basically no
cost whatsoever, collecting more and more unusable dictionaries can imply a
fair amount of space usage after a while. I don't see any argument why that
would be ok, really.

> But maybe not. It seems even worse to me than undroppable compression
> methods, because you'll probably not have that many compression methods
> ever, but you could have a large number of dictionaries eventually.

Agreed on the latter.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2025-04-22 16:26:01 Re: index prefetching
Previous Message Jacob Champion 2025-04-22 16:23:25 Re: What's our minimum supported Python version?