Re: Variable length varlena headers redux

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <gsstark(at)mit(dot)edu>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Variable length varlena headers redux
Date: 2007-02-13 04:19:14
Message-ID: 23221.1171340354@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gregory Stark <gsstark(at)mit(dot)edu> writes:
> Do you see any way to avoid having every user function everywhere
> use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?

Bruce and I had a long phone conversation about that today. We
concluded that there doesn't seem to be any strong reason to change the
macros for *reading* a varlena value's length. The problem is with the
code that *writes* a length word. The mother example is textin():

result = (text *) palloc(len + VARHDRSZ);
VARATT_SIZEP(result) = len + VARHDRSZ;
memcpy(VARDATA(result), inputText, len);

There is not a lot we can do with that second line: it's going to assign
the value of "len + VARHDRSZ" to something that has to be a native C
variable, struct field, or at best bit field.

If we replaced that line with something like

SET_VARLENA_LEN(result, len + VARHDRSZ);

then we'd have a *whole* lot more control. For example it'd be easy to
implement the previously-discussed design involving storing uncompressed
length words in network byte order: SET_VARLENA_LEN does htonl() and
VARSIZE does ntohl() and nothing else in the per-datatype functions
needs to change. Another idea that we were kicking around is to make
an explicit distinction between little-endian and big-endian hardware:
on big-endian hardware, store the two TOAST flag bits in the MSBs as
now, but on little-endian, store them in the LSBs, shifting the length
value up two bits. This would probably be marginally faster than
htonl/ntohl depending on hardware and compiler intelligence, but either
way you get to guarantee that the flag bits are in the physically first
byte, which is the critical thing needed to be able to tell the
difference between compressed and uncompressed length values.

By my count there are only 170 uses of VARATT_SIZEP in the entire
backend (including contrib) so this is not an especially daunting
change. It would break existing user-written functions that return
varlena values, but the fix wouldn't be painful for them either.
We could guarantee that every problem spot is found by removing the
current definition of the macro (and renaming the underlying struct
fields to get the attention of the truly unreconstructed...). The
important point here is that VARSIZE() still works, so only code that
creates a new varlena value need be affected, not code that examines
one.

Too bad we didn't get this right in the original TOAST rewrite; but
hindsight is always 20/20. Right now I think we gotta absorb some
pain if we want to improve the header-overhead situation, and this
seems about the minimum possible amount of pain.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-02-13 04:55:07 Re: GiST Comparing IndexTuples/Datums
Previous Message Tom Lane 2007-02-13 03:52:14 Re: DROP DATABASE and prepared xacts