From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: MaxOffsetNumber for Table AMs |
Date: | 2021-05-04 05:01:42 |
Message-ID: | 20210504050142.bhpoff7rsdpacnrq@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2021-04-30 11:51:07 -0700, Peter Geoghegan wrote:
> I think that it's reasonable to impose some cost on index AMs here,
> but that needs to be bounded sensibly and unambiguously. For example,
> it would probably be okay if you had either 6 byte or 8 byte TIDs, but
> no other variations. You could require index AMs (the subset of index
> AMs that are ever able to store 8 byte TIDs) to directly encode which
> width they're dealing with at the level of each IndexTuple. That would
> create some problems for nbtree deduplication, especially in boundary
> cases, but ISTM that you can manage the complexity by sensibly
> restricting how the TIDs work across the board.
> For example, the TIDs should always work like unsigned integers -- the
> table AM must be willing to work with that restriction.
Isn't that more a question of the encoding than the concrete representation?
> You'd then have posting lists tuples in nbtree whose TIDs were all
> either 6 bytes or 8 bytes wide, with a mix of each possible (though
> not particularly likely) on the same leaf page. Say when you have a
> table that exceeds the current MaxBlockNumber restrictions. It would
> be relatively straightforward for nbtree deduplication to simply
> refuse to mix 6 byte and 8 byte datums together to avoid complexity in
> boundary cases. The deduplication pass logic has the flexibility that
> this requires already.
Which nbtree cases do you think would have an easier time supporting
switching between 6 or 8 byte tids than supporting fully variable width
tids? Given that IndexTupleData already is variable-width, it's not
clear to me why supporting two distinct sizes would be harder than a
fully variable size? I assume it's things like BTDedupState->htids?
> > What's wrong with varlena headers? It would end up being a 1-byte
> > header in practically every case, and no variable-width representation
> > can do without a length word of some sort. I'm not saying varlena is
> > as efficient as some new design could hypothetically be, but it
> > doesn't seem like it'd be a big enough problem to stress about. If you
> > used a variable-width representation for integers, you might actually
> > save bytes in a lot of cases. An awful lot of the TIDs people store in
> > practice probably contain several zero bytes, and if we make them
> > wider, that's going to be even more true.
>
> Maybe all of this is true, and maybe it works out to be the best path
> forward in the long term, all things considered. But whether or not
> that's true is crucially dependent on what real practical table AMs
> (of which there will only ever be a tiny number) actually need to do.
> Why should we assume that the table AM cannot accept some
> restrictions? What good does it do to legalistically define the
> problem as a problem for index AMs to solve?
I don't think anybody is arguing that AMs cannot accept any restrictions? I do
think it's pretty clear that it's not entirely obvious what the concrete set
of proper restrictions would be, where we won't end up needing to re-evaluate
limits in a few years are.
If you add to that the fact that variable-width tids will often end up
considerably smaller than our current tids, it's not obvious why we should use
bitspace somewhere to indicate an 8 byte tid instead of a a variable-width
tid?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2021-05-04 05:08:11 | Re: AlterSubscription_refresh "wrconn" wrong variable? |
Previous Message | Andres Freund | 2021-05-04 04:31:49 | Re: AlterSubscription_refresh "wrconn" wrong variable? |