From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com> |
Subject: | Re: A varint implementation for PG? |
Date: | 2021-08-03 18:26:16 |
Message-ID: | CA+TgmoYyz3MXtNuxrOfx=HFOAaE=+WxbQPuxtkTPdLb=oeBTkA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
[ resurrecting this 2-year-old thread ]
On Fri, Dec 13, 2019 at 12:45 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > If baking a new variant integer format now, I think limiting it to 64 bits
> > is probably a mistake given how long-lived PostgreSQL is, and how hard it
> > can be to change things in the protocol, on disk, etc.
>
> I don't think it's ever going to be sensible to transport 64bit quanta
> of data. Also, uh, it'd be larger than the data a postgres instance
> could really contain, given LSNs are 64 bit.
We already use 128-bit integers in some places in the source code, and
it seems more likely than not that use of 128-bit integer data types
will continue to expand over time. If we want to represent 64 bit
integers in a variable number of bits today, it does seem pretty
likely that some day we will want to do the same thing with 128 bit
integers, and maybe eventually larger. And I think that all of that is
true even though exhausting a 64-bit LSN space looks difficult today
and will likely remain difficult for the foreseeable future.
So to me, the right thing to do here probably depends a bit on the use
case. If we are talking about using this for strictly internal
purposes - e.g. cramming stuff into an internal data structure into
which users have no direct visibility - then I think it's probably
best to pick a representation that corresponds exactly to the C data
type that we're trying to store. If we're starting out with a int64
and we want to byte-squeeze it, the representation you've chosen here
seems like just the thing. And it's easy to imagine why we might also
want to have similar transformations for uint64, int32, and uint32,
and maybe eventually int128 and uint128. And if the details of the
format are a little different in each case that's fine; for these
sorts of use cases we're always going to unpack into the same data
type that we packed originally, so it's not important to have exact
compatibility across differences in signedness or bit count.
However, I suspect that the whole approach should be completely
revised for a user-visible data type. On the one hand, there's no
telling how large a value some user will want to represent, so
limiting ourselves to 64 bits does seem shortsighted. And on the othe
hand, if we've got a varlena, we already know the length, so it seems
like we shouldn't also encode the length in the value. Maybe there's a
more efficient way, but the first thing that occurs to me is to just
discard high order bytes that are all zeroes or all ones until the
high order bit of the next byte doesn't match and plonk the remaining
bytes into the varlena. To decompress, just sign-extend out to the
target length. Really, this kind of representation can be extended to
represent arbitrarily large integers, even bigger than what we can
currently do with numeric, which is already crazy huge, and it seems
to have some advantage in that every payload byte contains exactly 8
data bits, so we don't need to shift or mask while encoding and
decoding.
Now, we could think of introducing a new format for variable-length
datums, essentially making this a new typlen rather than a new kind of
varlena. That might be worth it, because if you are storing a lot of
values that are small enough that this format could represent them in
3 bytes or less, which I think would be everything up to +/- 2^20, you
could save a pretty significant amount of space even if your table was
laid out to avoid alignment padding. However, there would be some
distributed overhead to this, because all the places that have special
handling for typlen = -1 and typlen = -2 would need to grow new cases
for this. I'm not sure how much of a problem that is, really, but I
don't think we can go nuts with adding new typlen values.
> It'll be a bit problmeatic to deal with all the casting necessary, and
> with the likely resulting overload resolution issues. I'm wondering
> whether it'd be worthwhile to have a ALTER TABLE ... STORAGE ... option
> that encodes int2/4/8 as varints when inside a tuple, but otherwise just
> let it be a normal integer.
I don't see how a STORAGE option could work. We tend to treat those as
hints, rather than critical data, which wouldn't work here. I think
there are a number of problems that crop up, but one of them is the
same thing we hit with the custom TOAST compression stuff. If you need
to make a value of some row type out of a HeapTuple, you're not going
to know which settings were used to create that heap tuple, and you're
certainly not going to know that when you go to deform that tuple. The
only thing you're going to have is the tuple descriptor, which AFAICS
means that the representation needs to be a property of the type, not
where the value is stored. Maybe you have a clever idea I'm not
seeing.
As for the casting and overloading issues, that's tripped up quite a
few people now when trying to add new numeric data types - unsigned
integers being a case that has come up a few times now, I think. I
don't think it has to be a blocker. I think the solution is probably
to accept that using unsigned or variable-width data types will
require inserting more casts than would be required for integer
datatypes and numeric. That may not thrill everybody, but it may still
be better than deciding we're never ever ever adding any more data
types for storing numbers than we have today.
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2021-08-03 18:27:40 | Re: Remove unused 'len' from pg_stat_recv_* functions |
Previous Message | Andres Freund | 2021-08-03 18:12:58 | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |