From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [RFC] indirect toast tuple support |
Date: | 2013-02-19 14:00:55 |
Message-ID: | 20130219140055.GA4582@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2013-02-19 08:48:05 -0500, Robert Haas wrote:
> On Sat, Feb 16, 2013 at 11:42 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Given that there have been wishes to support something like b) for quite
> > some time, independent from logical decoding, it seems like a good idea
> > to add support for it. Its e.g. useful for avoiding repeated detoasting
> > or decompression of tuples.
> >
> > The problem with b) is that there is no space in varlena's flag bits to
> > directly denote that a varlena points into memory instead of either
> > directly containing the data or a varattrib_1b_e containing a
> > varatt_external pointing to an on-disk toasted tuple.
>
> So the other way that we could do this is to use something that's the
> same size as a TOAST pointer but has different content - the
> seemingly-obvious choice being va_toastrelid == 0.
Unfortunately that would mean you need to copy the varatt_external (or
whatever it would be called) to aligned storage to check what it
is. Thats why I went the other way.
Its a bit sad that varatt_1b_e only contains a length and not a type
byte. I would like to change the storage of existing toast types but
thats not going to work for pg_upgrade reasons...
> I'd be a little
> reluctant to do it the way you propose because we might, at some
> point, want to try to reduce the size of toast pointers. If you have
> a tuple with many attributes, the size of the TOAST pointers
> themselves starts to add up. It would be nice to be able to have 8
> byte or even 4 byte toast pointers to handle those situations. If we
> steal one or both of those lengths to mean "the data is cached in
> memory somewhere" then we can't use those lengths in a smaller on-disk
> representation, which would seem a shame.
I agree. As I said above, having the type overlayed into the lenght was
and is a bad idea, I just haven't found a better one thats compatible
yet.
Except inventing typlen=-3 aka "toast2" or something. But even that
wouldn't help getting rid of existing pg_upgraded tables. Besides being
a maintenance nightmare.
The only reasonable thing I can see us doing is renaming
varattrib_1b_e.va_len_1be into va_type and redefine VARSIZE_1B_E into a
switch that maps types into lengths. But I think I would put this off,
except placing a comment somewhere, until its gets necessary.
> But having said that, +1 on the general idea of getting something like
> this done. We really need a better infrastructure to avoid copying
> large values around repeatedly in memory - a gigabyte is a lot of data
> to be slinging around.
>
> Of course, you will not be surprised to hear that I think this is 9.4 material.
Yes, obviously. But I need time to actually propose a working patch (I
already found 2 bugs in what I had submitted), thats why I brought it up
now. No point in wasting time if there's an oviously better idea around.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2013-02-19 14:04:34 | Re: JSON Function Bike Shedding |
Previous Message | Robert Haas | 2013-02-19 13:57:52 | Re: sql_drop Event Trigger |