Re: [RFC] indirect toast tuple support

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] indirect toast tuple support
Date: 2013-02-19 14:00:55
Message-ID: 20130219140055.GA4582@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-02-19 08:48:05 -0500, Robert Haas wrote:
> On Sat, Feb 16, 2013 at 11:42 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Given that there have been wishes to support something like b) for quite
> > some time, independent from logical decoding, it seems like a good idea
> > to add support for it. Its e.g. useful for avoiding repeated detoasting
> > or decompression of tuples.
> >
> > The problem with b) is that there is no space in varlena's flag bits to
> > directly denote that a varlena points into memory instead of either
> > directly containing the data or a varattrib_1b_e containing a
> > varatt_external pointing to an on-disk toasted tuple.
>
> So the other way that we could do this is to use something that's the
> same size as a TOAST pointer but has different content - the
> seemingly-obvious choice being va_toastrelid == 0.

Unfortunately that would mean you need to copy the varatt_external (or
whatever it would be called) to aligned storage to check what it
is. Thats why I went the other way.

Its a bit sad that varatt_1b_e only contains a length and not a type
byte. I would like to change the storage of existing toast types but
thats not going to work for pg_upgrade reasons...

> I'd be a little
> reluctant to do it the way you propose because we might, at some
> point, want to try to reduce the size of toast pointers. If you have
> a tuple with many attributes, the size of the TOAST pointers
> themselves starts to add up. It would be nice to be able to have 8
> byte or even 4 byte toast pointers to handle those situations. If we
> steal one or both of those lengths to mean "the data is cached in
> memory somewhere" then we can't use those lengths in a smaller on-disk
> representation, which would seem a shame.

I agree. As I said above, having the type overlayed into the lenght was
and is a bad idea, I just haven't found a better one thats compatible
yet.
Except inventing typlen=-3 aka "toast2" or something. But even that
wouldn't help getting rid of existing pg_upgraded tables. Besides being
a maintenance nightmare.

The only reasonable thing I can see us doing is renaming
varattrib_1b_e.va_len_1be into va_type and redefine VARSIZE_1B_E into a
switch that maps types into lengths. But I think I would put this off,
except placing a comment somewhere, until its gets necessary.

> But having said that, +1 on the general idea of getting something like
> this done. We really need a better infrastructure to avoid copying
> large values around repeatedly in memory - a gigabyte is a lot of data
> to be slinging around.
>
> Of course, you will not be surprised to hear that I think this is 9.4 material.

Yes, obviously. But I need time to actually propose a working patch (I
already found 2 bugs in what I had submitted), thats why I brought it up
now. No point in wasting time if there's an oviously better idea around.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-02-19 14:04:34 Re: JSON Function Bike Shedding
Previous Message Robert Haas 2013-02-19 13:57:52 Re: sql_drop Event Trigger