From: | Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> |
---|---|
To: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Pluggable toaster |
Date: | 2022-01-05 14:45:56 |
Message-ID: | CANbhV-F4Vffu7hWDEK+yBjquq4EQGH3JAJ+K4tjHEopFU0a-kQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 30 Dec 2021 at 16:40, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
> We are working on custom toaster for JSONB [1], because current TOAST is
> universal for any data type and because of that it has some disadvantages:
> - "one toast fits all" may be not the best solution for particular
> type or/and use cases
> - it doesn't know the internal structure of data type, so it cannot
> choose an optimal toast strategy
> - it can't share common parts between different rows and even
> versions of rows
Agreed, Oleg has made some very clear analysis of the value of having
a higher degree of control over toasting from within the datatype.
In my understanding, we want to be able to
1. Access data from a toasted object one slice at a time, by using
knowledge of the structure
2. If toasted data is updated, then update a minimum number of
slices(s), without rewriting the existing slices
3. If toasted data is expanded, then allownew slices to be appended to
the object without rewriting the existing slices
> Modification of current toaster for all tasks and cases looks too
> complex, moreover, it will not works for custom data types. Postgres
> is an extensible database, why not to extent its extensibility even
> further, to have pluggable TOAST! We propose an idea to separate
> toaster from heap using toaster API similar to table AM API etc.
> Following patches are applicable over patch in [1]
ISTM that we would want the toast algorithm to be associated with the
datatype, not the column?
Can you explain your thinking?
We already have Expanded toast format, in-memory, which was designed
specifically to allow us to access sub-structure of the datatype
in-memory. So I was expecting to see an Expanded, on-disk, toast
format that roughly matched that concept, since Tom has already shown
us the way. (varatt_expanded). This would be usable by both JSON and
PostGIS.
Some other thoughts:
I imagine the data type might want to keep some kind of dictionary
inside the main toast pointer, so we could make allowance for some
optional datatype-specific private area in the toast pointer itself,
allowing a mix of inline and out-of-line data, and/or a table of
contents to the slices.
I'm thinking could also tackle these things at the same time:
* We want to expand TOAST to 64-bit pointers, so we can have more
pointers in a table
* We want to avoid putting the data length into the toast pointer, so
we can allow the toasted data to be expanded without rewriting
everything (to avoid O(N^2) cost)
--
Simon Riggs http://www.EnterpriseDB.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2022-01-05 15:05:21 | Re: refactoring basebackup.c |
Previous Message | Alvaro Herrera | 2022-01-05 14:10:17 | Re: row filtering for logical replication |