From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Nikita Malakhov <hukutoc(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: RFC: Pluggable TOAST |
Date: | 2023-10-26 13:40:02 |
Message-ID: | CAEze2Wj0VYkKn6E--63M=KAvMVjpP-WaEOqV921V5PgF6L+uJQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 24 Oct 2023 at 22:38, Nikita Malakhov <hukutoc(at)gmail(dot)com> wrote:
>
> Hi hackers!
>
> We need community feedback on previously discussed topic [1].
> There are some long-live issues in Postgres related to the TOAST mechanics, like [2].
> Some time ago we already proposed a set of patches with an API allowing to plug in
> different TOAST implementations into a live database. The patch set introduced a lot
> of code and was quite crude in some places, so after several implementations we decided
> to try to implement it in the production environment for further check-up.
>
> The main idea behind pluggable TOAST is make it possible to easily plug in and use different
> implementations of large values storage, preserving existing mechanics to keep backward
> compatibilitну provide easy Postgres-way give users alternative mechanics for storing large
> column values in a more effective way - we already have custom and very effective (up to tens
> and even hundreds of times faster) TOAST implementations for bytea and JSONb data types.
>
> As we see it - Pluggable TOAST proposes
> 1) changes in TOAST pointer itself, extending it to store custom data - current limitations
> of TOAST pointer were discussed in [1] and [4];
> 2) API which allows calls of custom TOAST implementations for certain table columns and
> (a topic for discussion) certain datatypes.
>
> Custom TOAST could be also used in a not so trivial way - for example, limited columnar storage could be easily implemented and plugged in without heavy core modifications
> of implementation of Pluggable Storage (Table Access Methods), preserving existing data
> and database structure, be upgraded, replicated and so on.
>
> Any thoughts and proposals are welcome.
TLDR of my thoughts below:
1. I don't see much value in the "Pluggable TOAST" as proposed in [0],
where toasters are both decoupled from the type but also strongly
bound to the type with tagged vtables.
2. I do think we should allow *types* to provide their own toast
slicing implementation (not just "one blob, compressed then sliced"),
so that structured types don't have to read MBs of data to access only
a few of the structure's bytes. As this would be a different way of
storing the data, that would likely use a different tag for the
varatt_1b_e struct to differentiate the two stored formats.
3. I do think that attributes shouldn't be required to be stored
either on disk or in a single palloc-ed area of memory. It is very
expensive to copy such large chunks of memory; jsonb is one such
example. If the type is composite, allow it to be allocated in
multiple regions. This would require a new varatt_1b_e tag to discern
that the Datum isn't necessarily located in a single memory context,
but with good memory context management that should be fine.
4. I do think that TOAST needs improvements to allow differential
updates, not just full rewrites of the value. I believe this would
likely be enabled through solutions for (2) and (3), even if it might
already be possible without implementing new vartag options.
My thoughts:
In my view, the main job of TOAST is:
- To make sure a row with large attributes can still fit on a page by
reducing the size of the representation of attributes in the row
- To allow us to efficiently handle variable-length attribute values
- To reduce the overhead of moving large values through query execution
This is currently implemented through tagged values that contain
exactly one canonical representation of the type (be it inline, inline
compressed, or out of line with or without compression).
Our current implementation assumes that users of the attribute will
always use either the decompressed canonical representation, or don't
care about the representation at all (except decompression of only
prefixes, which is a special case), but this is clearly not the case:
Composite values like ROW types clearly benefit from careful
partitioning and subdivision of values into self-contained compressed
chunks: We don't TOAST a table's rows, but do TOAST per attribute.
JSONB could also benefit if it could create its own on-disk format of
a value: benchmarks of the "Pluggable Toaster" patch have shown that
JSONB operation performance improved significantly with custom toaster
infrastructure.
So, if composite types (like JSONB, ROW and ARRAY) would be able to
manually slice their values and create their own representation of
that toasted value, then that would probably benefit the system by
allowing some data to be stored in a more accessible manner than
"everything inline, compressed, or out-of-line, detoast (a prefix of)
all data, or none of it, no partial detoasting".
Now, returning to the table-level TOAST task of making sure the
tuple's data fits on the page, compressing & out-of-line-ing the data
until it fits:
Things that it currently does: varlena values are compressed and
out-of-lined with generic compression algorithms and a naive
slice-and-dice algorithm, and reconstructed (fully, or just a prefix)
when needed.
Things that it could potentially do in the future: Interface with
types to allow the type to slice&dice the tuple; use type-aware
compression (or encoding) algorithms to allow partial detoasting and
partial updates of a single value.
This would presumably be implemented using a set of new varattrib_1b_e
pointer subtypes whose contents are mostly managed by the type;
allowing for partial detoasting of the original datum, and allowing
for more efficient access to not just the prefix, but intermediate
spans as well: If compression spans .
So, the question would be: how do we expose such an API?
I suspect that each type will have only one meaningful specialized
method to toast its values. I don't see much value for registering
custom TOASTers when they only work with only the types that have code
to support explicitly that toaster. This was visible in the 'Pluggable
Toaster' patch that was provided earlier as well - both example
implementations of this pluggable toaster were specialized to the
needs of one type each, and the type had direct calls into those
"pluggable" toaster's internals, showing no good reason to extend this
support to elsewhere outside the type.
Because there would be only one meaningful type-aware method of
TOASTing a value, we could implement this as an optional type support
function that would allow the type to specify how it wants to TOAST
its values, with the default TOAST as backup in case of still
too-large tuples or if the type does not implement these support
functions. With this I'm thinking mostly towards "new inout functions
for on-disk representations; which return/consume TOASTed slices to
de/construct the original datum", and less "replacement of all of
toast's internals".
So, in short, I don't think there is a need for a specific "Pluggable
toast API" like the one in the patchset at [0] that can be loaded
on-demand, but I think that updating our current TOAST system to a
system for which types can provide support functions would likely be
quite beneficial, for efficient extraction of data from composite
values.
Example support functions:
/* TODO: bikeshedding on names, signatures, further support functions. */
Datum typsup_roastsliceofbread(Datum ptr, int sizetarget, char cmethod)
Datum typsup_unroastsliceofbread(Datum ptr)
void typsup_releaseroastedsliceofbread(Datump ptr) /* in case of
non-unitary in-memory datums */
We would probably want at least 2 more subtypes of varattrib_1b_e -
one for on-disk pointers, and one for in-memory pointers - where the
payload of those pointers is managed by the type's toast mechanism and
considered opaque to the rest of PostgreSQL (and thus not compatible
with the binary transfer protocol). Types are currently already
expected to be able to handle their own binary representation, so
allowing types to manage parts of the toast representation should IMHO
not be too dangerous, though we should make sure that BINARY COERCIBLE
types share this toast support routine, or be returned to their
canonical binary version before they are cast to the coerced type, as
using different detoasting mechanisms could result in corrupted data
and thus crashes.
Lastly, there is the compression part of TOAST. I think it should be
relatively straightforward to expose the compression-related
components of TOAST through functions that can then be used by
type-specific toast support functions.
Note that this would be opt-in for a type, thus all functions that use
that type's internals should be aware of the different on-disk format
for toasted values and should thus be able to handle it gracefully.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
[0] https://www.postgresql.org/message-id/flat/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2023-10-26 13:40:09 | Re: Is this a problem in GenericXLogFinish()? |
Previous Message | Robert Haas | 2023-10-26 13:24:49 | Re: trying again to get incremental backup |