Re: RFC: Pluggable TOAST

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Nikita Malakhov <hukutoc(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: RFC: Pluggable TOAST
Date: 2023-11-07 11:51:22
Message-ID: CAEze2Wj3Jcd4rH0g-cyDRr=K-9t6eDHW5ujzf4d_kYOxj0UKiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 7 Nov 2023 at 11:06, Nikita Malakhov <hukutoc(at)gmail(dot)com> wrote:
>
> Hi,
>
> I've been thinking about Matthias' proposals for some time and have some
> questions:
>
> >So, in short, I don't think there is a need for a specific "Pluggable
> >toast API" like the one in the patchset at [0] that can be loaded
> >on-demand, but I think that updating our current TOAST system to a
> >system for which types can provide support functions would likely be
> >quite beneficial, for efficient extraction of data from composite
> >values.
>
> As I understand one of the reasons against Pluggable TOAST is that differences
> in plugged-in Toasters could result in incompatibility even in different versions
> of the same DB.

That could be part of it, but it definitely wasn't my primary concern.
The primary concern remains that the pluggable toaster patch made the
jsonb type expose an API for a pluggable toaster that for all intents
and purposes only has one implementation due to its API being
specifically tailored for the jsonb internals use case, with similar
type-specific API bindings getting built for other types, each having
strict expectations about the details of the implementation. I agree
that it makes sense to specialize TOASTing for jsonb, but what I don't
understand about it is why that would need to be achieved outside the
core jsonb code.

I understand that the 'pluggable toaster' APIs originate from one of
PostgresPRO's forks of PostgreSQL, and I think it shows. That's not to
say it's bad, but it seems to be built on different expectations:
When maintaining a fork, you have different tradeoffs when compared to
maintaining the main product. A fork's changes need to be covered
across many versions with unknown changes, thus you would want the
smalles possible changes to enable the feature - pluggable toast makes
sense here, as the changes are limited to a few jsonb internals, but
most complex code is in an extension.
However, for core PostgreSQL, I think this separation makes very
little sense: the complexity of maintaining a toast api for each type
(when there can be expected to be only one implementation) is much
more work than just building a good set of helper functions that do
that same job. It allows for more flexibility, as there is no
noticable black box api implementation to keep track of.

> The importance of the correct TOAST update is out of question, feel like I have
> to prepare a patch for it. There are some questions though, I'd address them
> later with a patch.
>
> >Example support functions:
>
> >/* TODO: bikeshedding on names, signatures, further support functions. */
> >Datum typsup_roastsliceofbread(Datum ptr, int sizetarget, char cmethod)
> >Datum typsup_unroastsliceofbread(Datum ptr)
> >void typsup_releaseroastedsliceofbread(Datump ptr) /* in case of
> >non-unitary in-memory datums */
>
> I correctly understand that you mean extending PG_TYPE and type cache,
> by adding a new function set for toasting/detoasting a value in addition to
> in/out, etc?

Yes.

> I see several issues here:
> 1) We could benefit from knowledge of internals of data being toasted (i.e.
> in case of JSON value with key-value structure) only when EXTERNAL
> storage mode is set, otherwise value will be compressed before toasted.
> So we have to keep both TOAST mechanics regarding the storage mode
> being used. It's the same issue as in Pluggable TOAST. Is it OK?

I think it is OK that the storage-related changes of this only start
once the toast mechanism is

> 2) TOAST pointer is very limited in means of data it keeps, we'd have to
> extend it anyway and keep both for backwards compatibility;

Yes. We already have to retain the current (de)toast infrastructure to
make sure current data files can still be read, given that we want to
retain backward compatibility for currently toasted data.

> 3) There is no API and such an approach would require implementing
> toast and detoast in every data type we want to be custom toasted, resulting
> in multiple files modification. Maybe we have to consider introducing such
> an API?

No. As I mentioned, we can retain the current toast mechanism for
current types that do not yet want to use these new toast APIs. If we
use one different varatt_1b_e tag for type-owned toast pointers, the
system will be opt-in for types, and for types that don't (yet) have
their own toast slicing design will keep using the old all-or-nothing
single-allocation data with the good old compress-then-slice
out-of-line toast storage.

> 4) 1 toast relation per regular relation. With an update mechanics this will
> be less limiting, but still a limiting factor because 1 entry in base table
> could have a lot of entries in the toast table. Are we doing something with
> this?

I don't think that is relevant to the topic of type-aware toasting
optimization. The toast storage relation growing too large is not
unique to jsonb- or bytea-typed columns, so I believe this is better
solved in a different thread. Ideas like 'toast relation per column'
also doesn't really solve the issue when the main table only has one
bigint and one jsonb column, so I think this needs a different
approach, too. I think solutions could probably best be discussed in a
separate thread.

Kind regards,

Matthias van de Meent.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2023-11-07 11:57:12 Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Previous Message Alvaro Herrera 2023-11-07 11:43:18 Re: Call pqPipelineFlush from PQsendFlushRequest