From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com> |
Subject: | Re: A varint implementation for PG? |
Date: | 2021-08-04 23:21:18 |
Message-ID: | cdfe5ab1-f3f0-7d1d-51f1-6493072df9ed@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 8/5/21 1:05 AM, Andres Freund wrote:
> Hi,
>
> On 2021-08-04 23:44:10 +0200, Tomas Vondra wrote:
>> How is that better than the two varint flavors that are already out there,
>> i.e. the bitcoin [1] and protocol buffers [2]?
>
> The protobuf one is *terrible* for CPU efficiency. You need to go through each
> byte, do masking and shifting for each byte and then have a conditional
> branch. That's bad from the the amount of instructions you need to execute,
> and *really* bad for the branch predictor.
>
Yeah, probably true - particularly for longer values. No argument here.
>
>> The first one seems quite efficient in how it encodes the length into very
>> few bits (which matters especially for small values). It's designed for
>> integers with 1B, 2B, 4B or 8B, but it can be extended to arbitrary lengths
>> fairly easily, I think:
>
>> Look at the first byte, and
>>
>> 0 - 243 - encoded as is
>> 244 - 1 byte
>> 245 - 2 bytes
>> 246 - 3 bytes
>> 247 - 4 bytes
>> 248 - 5 bytes
>> 249 - 6 bytes
>> 250 - 7 bytes
>> 251 - 8 bytes
>> 252 - next 1 byte is length
>> 253 - next 2 bytes are length
>> 254 - next 3 bytes are length
>> 255 - next 4 bytes are length
>
>> If we want to support longer lengths, we'd have to reserve an extra value
>> (which reduces the number of values that require a single byte).
>
> I think that's not a bad scheme. I think it may end up being a bit more
> expensive to decode because you need more branches instead of using
> find-first-set type instructions.
>
I don't think it requires many branches, because you can essentially do
if (byte[0] <= 243)
length = 0
else if (byte[0] <= 251)
length = byte[0] - 243
else
{
length_bytes = byte[0] - 251;
... read length_bytes, decode length
}
but I haven't tried implementing it and maybe my intuition is wrong.
Or maybe it'd be a good scheme for on-disk format, but poor for memory.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2021-08-05 01:57:57 | Re: archive status ".ready" files may be created too early |
Previous Message | Andres Freund | 2021-08-04 23:05:25 | Re: A varint implementation for PG? |