Re: Reducing data type space usage

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing data type space usage
Date: 2006-09-18 11:45:43
Message-ID: 1158579943.3147.5.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, R, 2006-09-15 kell 19:34, kirjutas Tom Lane:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Oh, OK, I had high byte meaning no header, but clear is better, so
> > 00000001 is 0x01, and 00000000 is "". But I see now that bytea does
> > store nulls, so yea, we would be better using 10000001, and it is the
> > same size as 00000000.
>
> I'm liking this idea more the more I think about it, because it'd
> actually be far less painful to put into the system structure than the
> other idea of fooling with varlena headers. To review: Bruce is
> proposing a var-length type structure with the properties
>
> first byte 0xxxxxxx ---- field length 1 byte, exactly that value
> first byte 1xxxxxxx ---- xxxxxxx data bytes follow

would adding this -

first byte 0xxxxxxx ---- field length 1 byte, exactly that value
first byte 10xxxxxx ---- 0xxxxxx data bytes follow
first byte 110xxxxx -- xxxxx xxxxxxxx data bytes to follow
first byte 111xxxxx -- xxxxx xxxxxxxx xxxxxxxx xxxxxxxx bytes t.flw

be too expensive ?

it seems that for strings up to 63 bytes it would be as expensive as it is
with your proposal, but that it would scale up to 536870912 (2^29) bytes nicely.

this would be extra good for datasets that are mostly below 63 (or 127) with only
a small percentage above

> This can support *any* stored value from zero to 127 bytes long.
> We can imagine creating new datatypes "short varchar" and "short char",
> and then having the parser silently substitute these types for varchar(N)
> or char(N) whenever N <= 127 / max_encoding_length. Add some
> appropriate implicit casts to convert these to the normal varlena types
> for computation, and away you go. No breakage of any existing
> datatype-specific code, just a few additions in places like
> heap_form_tuple.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2006-09-18 11:47:36 Re: Opinion about macro for the uuid datatype.
Previous Message Joachim Wieland 2006-09-18 11:35:24 Re: guc comment changes (was Re: Getting a move on for 8.2 beta)