Re: Pluggable toaster

From: Nikita Malakhov <hukutoc(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jacob Champion <jchampion(at)timescale(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Pluggable toaster
Date: 2022-10-23 20:38:13
Message-ID: CAN-LCVNaQy04RvbgVtwygmvfPDFSGxLhzUK=DgUjAEiZ9n9Mfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Aleksander,
>Don't you think that this is an arguable design decision? Basically
>all we know about the underlying TableAM is that it stores tuples
>_somehow_ and that tuples have TIDs [1]. That's it. We don't know if
>it even has any sort of pages, whether they are fixed in size or not,
>whether it uses shared buffers, etc. It may not even require TOAST.
>(Not to mention the fact that when you have N TOAST implementations
>and M TableAM implementations now you have to run N x M compatibility
>tests. And this doesn't account for different versions of Ns and Ms,
>different platforms and different versions of PostgreSQL.)

>I believe the proposed approach is architecturally broken from the
beginning.

Existing TOAST mechanics just works, but for certain types of data it does
so
very poorly, and, let's face it, this mechanics has very strict limitations
that limit
overall capabilities of DBMS, because TOAST was designed when today's
usual amounts of data were not the case - I mean tables with hundreds of
billions of rows, with sizes measured by hundreds of Gb and even by
Terabytes.

But TOAST itself is good solution to problem of storing oversized
attributes, and
though it has some limitations - it is unwise to just throw it away, better
way is to
make it up-to-date by revising it, get rid of the most painful limitations
and allow
to use different (custom) TOAST strategies for special cases.

The main idea of Pluggable TOAST is to extend TOAST capabilities by
providing
common API allowing to uniformly use different strategies to TOAST
different data.
With the acronym "TOAST" I mean that data would be stored externally to
source
table, somewhere only its Toaster know where and how - it may be regular
Heap
tables, Heap tables with different table structure, some other AM tables,
files outside
of the database, even files on different storage systems. Pluggable TOAST
allows
using advanced compression methods and complex operations on externally
stored
data, like search without fully de-TOASTing data, etc.

Also, existing TOAST is a part of Heap AM and is restricted to use Heap
only.
To make it extensible - we have to separate TOAST from Heap AM. Default
TOAST
in Pluggable TOAST still uses Heap, but Heap knows nothing about TOAST. It
fits
perfectly in OOP paradigms

>It looks like the idea should be actually turned inside out. I.e. what
>would be nice to have is some sort of _framework_ that helps TableAM
>authors to implement TOAST (alternatively, the rest of the TableAM
>except for TOAST) if the TableAM is similar to the default one. In
>other words the idea is not to implement alternative TOASTers that
>will work with all possible TableAMs but rather to simplify the task
>of implementing an alternative TableAM which is similar to the default
>one except for TOAST. These TableAMs should reuse as much common code
>as possible except for the parts where they differ.

To implement different TOAST strategies you must have an API to plug them
in,
otherwise for each strategy you'd have to change the core. TOAST API allows
to plug
in custom TOAST strategies just by adding contrib modules, once the API is
merged
into the core. I have to make a point that different TOAST strategies do
not have
to store data with other TAMs, they just could store these data in Heap but
using
knowledge of internal data structure of workflow to store them in a more
optimal
way - like fast and partially compressed and decompressed JSON, lots of
large
chunks of binary data stored in the database (as you know, largeobjects are
not
of much help with this) and so on.

Implementing another Table AM just to implement another TOAST strategy
seems too
much, the TAM API is very heavy and complex, and you would have to add it
as a contrib.
Lots of different TAMs would cause much more problems than lots of Toasters
because
such a solution results in data incompatibility between installations with
different TAMs
and some minor changes in custom TAM contrib could lead to losing all data
stored with
this TAM, but with custom TOAST you (in the worst case) could lose just
TOASTed data
and nothing else.

We have lots of requests from clients and tickets related to TOAST
limitations and
extending Postgres this way - this growing need made us develop Pluggable
TOAST.

On Sun, Oct 23, 2022 at 12:38 PM Aleksander Alekseev <
aleksander(at)timescale(dot)com> wrote:

> Hi Nikita,
>
> > Pluggable TOAST API was designed with storage flexibility in mind, and
> Custom TOAST mechanics is
> > free to use any storage methods
>
> Don't you think that this is an arguable design decision? Basically
> all we know about the underlying TableAM is that it stores tuples
> _somehow_ and that tuples have TIDs [1]. That's it. We don't know if
> it even has any sort of pages, whether they are fixed in size or not,
> whether it uses shared buffers, etc. It may not even require TOAST.
> (Not to mention the fact that when you have N TOAST implementations
> and M TableAM implementations now you have to run N x M compatibility
> tests. And this doesn't account for different versions of Ns and Ms,
> different platforms and different versions of PostgreSQL.)
>
> I believe the proposed approach is architecturally broken from the
> beginning.
>
> It looks like the idea should be actually turned inside out. I.e. what
> would be nice to have is some sort of _framework_ that helps TableAM
> authors to implement TOAST (alternatively, the rest of the TableAM
> except for TOAST) if the TableAM is similar to the default one. In
> other words the idea is not to implement alternative TOASTers that
> will work with all possible TableAMs but rather to simplify the task
> of implementing an alternative TableAM which is similar to the default
> one except for TOAST. These TableAMs should reuse as much common code
> as possible except for the parts where they differ.
>
> Does it make sense?
>
> Sorry, I realize this will probably imply a complete rewrite of the
> patch. This is the reason why one should start proposing changes from
> gathering the requirements, writing an RFC and run it through several
> rounds of discussion.
>
> [1]: https://www.postgresql.org/docs/current/tableam.html
>
> --
> Best regards,
> Aleksander Alekseev
>

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maciek Sakrejda 2022-10-23 22:35:38 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Previous Message Robert Treat 2022-10-23 20:28:13 Re: Interesting areas for beginners