From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com> |
Subject: | Re: A varint implementation for PG? |
Date: | 2021-08-05 13:18:11 |
Message-ID: | 48857269-e536-0083-e3b4-3c4a9323be80@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 8/4/21 7:21 PM, Tomas Vondra wrote:
> On 8/5/21 1:05 AM, Andres Freund wrote:
>
>>
>>> The first one seems quite efficient in how it encodes the length
>>> into very
>>> few bits (which matters especially for small values). It's designed for
>>> integers with 1B, 2B, 4B or 8B, but it can be extended to arbitrary
>>> lengths
>>> fairly easily, I think:
>>
>>> Look at the first byte, and
>>>
>>> 0 - 243 - encoded as is
>>> 244 - 1 byte
>>> 245 - 2 bytes
>>> 246 - 3 bytes
>>> 247 - 4 bytes
>>> 248 - 5 bytes
>>> 249 - 6 bytes
>>> 250 - 7 bytes
>>> 251 - 8 bytes
>>> 252 - next 1 byte is length
>>> 253 - next 2 bytes are length
>>> 254 - next 3 bytes are length
>>> 255 - next 4 bytes are length
>>
>>> If we want to support longer lengths, we'd have to reserve an extra
>>> value
>>> (which reduces the number of values that require a single byte).
>>
>> I think that's not a bad scheme. I think it may end up being a bit more
>> expensive to decode because you need more branches instead of using
>> find-first-set type instructions.
>>
>
> I don't think it requires many branches, because you can essentially do
>
> if (byte[0] <= 243)
> length = 0
> else if (byte[0] <= 251)
> length = byte[0] - 243
> else
> {
> length_bytes = byte[0] - 251;
> ... read length_bytes, decode length
> }
>
> but I haven't tried implementing it and maybe my intuition is wrong.
>
> Or maybe it'd be a good scheme for on-disk format, but poor for memory.
>
>
>
This seems like quite an elegant scheme. Certainly worth trying out. I
find it hard to believe that more than 4 length bytes would be needed
(although that reminds me of the famous and possibly apocryphal quote
"640K ought to be enough for anybody.")
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2021-08-05 13:28:37 | Re: Lowering the ever-growing heap->pd_lower |
Previous Message | Tomas Vondra | 2021-08-05 13:06:54 | Re: Commitfest overflow |