From: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Gregory Stark <stark(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Reducing data type space usage |
Date: | 2006-09-16 22:18:52 |
Message-ID: | 450C784C.8040001@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> The user would have to decide that he'll never need a value over 127
>> bytes
>> long ever in order to get the benefit.
>
> Weren't you the one that's been going on at great length about how
> wastefully we store CHAR(1) ? Sure, this has a somewhat restricted
> use case, but it's about as efficient as we could possibly get within
> that use case.
I like the idea of having variable length headers much more than a new
short character type. It solves a more general problem, and it
compresses VARCHAR(>255) TEXT fields nicely when the actual data in the
field is small.
I'd like to propose one more encoding scheme, based on on Tom's earlier
proposals. The use cases I care about are:
* support uncompressed data up to 1G, like we do now
* 1 byte length word for short data.
* store typical CHAR(1) values in just 1 byte.
Tom wrote:
> * 0xxxxxxx uncompressed 4-byte length word as stated above
> * 10xxxxxx 1-byte length word, up to 62 bytes of data
> * 110xxxxx 2-byte length word, uncompressed inline data
> * 1110xxxx 2-byte length word, compressed inline data
> * 1111xxxx 1-byte length word, out-of-line TOAST pointer
My proposal is:
00xxxxxx uncompressed, aligned 4-byte length word
010xxxxx 1-byte length word, uncompressed inline data (up to 32 bytes)
011xxxxx 2-byte length word, uncompressed inline data (up to 8k)
1xxxxxxx 1 byte data in range 0x20-0x7E
1000xxxx 2-byte length word, compressed inline data (up to 4k)
11111111 TOAST pointer
The decoding algorithm is similar to Tom's proposal, and relies on using
0x00 for padding.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Gregory Stark | 2006-09-16 22:58:56 | Re: Reducing data type space usage |
Previous Message | Bruce Momjian | 2006-09-16 21:53:00 | Re: [HACKERS] plpgsql, return can contains any |