Re: [PATCH] Compression dictionaries for JSONB

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [PATCH] Compression dictionaries for JSONB
Date: 2022-07-04 12:45:22
Message-ID: CAEze2Wg+HM180NudppyAHH3t6-ttg6FT06M6s4BaJsZnAkg6zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alexander,

On Fri, 17 Jun 2022 at 17:04, Aleksander Alekseev
<aleksander(at)timescale(dot)com> wrote:
>> These are just my initial thoughts I would like to share though. I may
>> change my mind after diving deeper into a "pluggable TOASTer" patch.
>
> I familiarized myself with the "pluggable TOASTer" thread and joined
> the discussion [1].
>
> I'm afraid so far I failed to understand your suggestion to base
> "compression dictionaries" patch on "pluggable TOASTer", considering
> the fair amount of push-back it got from the community, not to mention
> a somewhat raw state of the patchset. It's true that Teodor and I are
> trying to address similar problems. This however doesn't mean that
> there should be a dependency between these patches.

The reason I think this is better implemented as a pluggable toaster
is because casts are necessarily opaque and require O(sizeofdata)
copies or processing. The toaster infrastructure that is proposed in
[0] seems to improve on the O(sizeofdata) requirement for toast, but
that will not work with casts.

> Also, I completely agree with Tomas [2]:
>
>> My main point is that we should not be making too many radical
>> changes at once - it makes it much harder to actually get anything done.
>
> IMO the patches don't depend on each other but rather complement each
> other. The user can switch between different TOAST methods, and the
> compression dictionaries can work on top of different TOAST methods.

I don't think that is possible (or at least, not as performant). To
treat type X' as type X and use it as a stored medium instead, you
must have either the whole binary representation of X, or have access
to the internals of type X. I find it difficult to believe that casts
can be done without a full detoast (or otherwise without deep
knowledge about internal structure of the data type such as 'type A is
binary compatible with type X'), and as such I think this feature
'compression dictionaries' is competing with the 'pluggable toaster'
feature, if the one is used on top of the other. That is, the
dictionary is still created like in the proposed patches (though
preferably without the 64-byte NAMELEN limit), but the usage will be
through "TOASTER my_dict_enabled_toaster".

Additionally, I don't think we've ever accepted two different
implementations of the same concept, at least not without first having
good arguments why both competing implementations have obvious
benefits over the other, and both implementations being incompatible.

> Although there is also a high-level idea (according to the
> presentations) to share common data between different TOASTed values,
> similarly to what compression dictionaries do, by looking at the
> current feedback and considering the overall complexity and the amount
> of open questions (e.g. interaction with different TableAMs, etc), I
> seriously doubt that this particular part of "pluggable TOASTer" will
> end-up in the core.

Yes, and that's why I think that this where this dictionary
infrastructure could provide value, as an alternative or extension to
the proposed jsonb toaster in the 'pluggable toaster' thread.

Kind regards,

Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2022-07-04 12:53:24 Re: Patch proposal: New hooks in the connection path
Previous Message Amit Kapila 2022-07-04 12:42:41 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns