From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Variable length varlena headers redux |
Date: | 2007-02-09 22:48:16 |
Message-ID: | 200702092248.l19MmG317708@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greg Stark wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Greg Stark <gsstark(at)mit(dot)edu> writes:
> > > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > >> I know it is kind of odd to have a data type that is only used on disk,
> > >> and not in memory, but I see this as a baby varlena type, used only to
> > >> store and get varlena values using less disk space.
> >
> > > I was leaning toward generating the short varlena headers primarily in
> > > heap_form*tuple and just having the datatype specific code generate 4-byte
> > > headers much as you describe.
> >
> > I thought we had a solution for all this, namely to make the short-form
> > headers be essentially a TOAST-compressed representation. The format
> > with 4-byte headers is still legal but just not compressed. Anyone who
> > fails to detoast an input argument is already broken, so there's no code
> > compatibility hit taken.
>
> Uh. So I don't see how to make this work on a little-endian machine. If the
> leading its are 0 we don't know if they're toast flags or bits on the least
> significant byte of a longer length.
>
> If we store all lengths in network byte order that problem goes away but then
> user code that does "VARATT_SIZEP(datum) = len" is incorrect.
>
> If we declare in-memory format to be host byte order and on-disk format to be
> network byte order then every single varlena datum needs to be copied when
> heap_deform*tuple runs.
>
> If we only do this for a new kind of varlena then only text/varchar/
> char/numeric datums would need to be copied but that's still a lot.
I wonder if we need to reorder the TOAST structure to have the bits we
need at the start of the structure so we can be sure they are first.
For example, what if we split varattrib.va_header, which is int32 now,
into for 'char' fields, and just reassemble it in the toast code. That
would be pretty localized.
I had forgotten about hooking into the TOAST system, but since we are
going to be "expanding" the headers of these types when they get into
memory, it does make sense.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Wieck | 2007-02-09 22:49:07 | Re: Proposal: Commit timestamp |
Previous Message | Neil Conway | 2007-02-09 22:46:06 | Re: patch adding new regexp functions |