From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Thoughts on nbtree with logical/varwidth table identifiers, v12 on-disk representation |
Date: | 2019-10-30 19:02:55 |
Message-ID: | 20191030190255.hxtqtxcjset3l3pz@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2019-10-30 11:33:21 -0700, Peter Geoghegan wrote:
> On Mon, Apr 22, 2019 at 9:35 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2019-04-21 17:46:09 -0700, Peter Geoghegan wrote:
> > > Andres has suggested that I work on teaching nbtree to accommodate
> > > variable-width, logical table identifiers, such as those required for
> > > indirect indexes, or clustered indexes, where secondary indexes must
> > > use a logical primary key value instead of a heap TID.
>
> I'm revisiting this thread now because it may have relevance to the
> nbtree deduplication patch. If nothing else, the patch further commits
> us to the current heap TID format by making assumptions about the
> width of posting lists with 6 byte TIDs.
I'd much rather not entrench this further, even leaving global indexes
aside. The 4 byte block number is a significant limitation for heap
tables too, and we should lift that at some point not too far away.
Then there's also other AMs that could really use a wider tid space.
> Though I suppose a posting list almost has to have fixed width TIDs to
> perform acceptably.
Hm. It's not clear to me why that is?
> > I think it's two more cases:
> >
> > - table AMs that want to support tables that are bigger than 32TB. That
> > used to be unrealistic, but it's not anymore. Especially when the need
> > to VACUUM etc is largely removed / reduced.
>
> Can we steal some bits that are currently used for offset number
> instead? 16 bits is far more than we ever need to use for heap offset
> numbers in practice.
I think that's a terrible idea. For one, some AMs will have significant
higher limits, especially taking compression and larger block sizes into
account. Also not all AMs need identifiers tied so closely to a disk
position, e.g. zedstore does not. We shouldn't hack evermore
information into the offset, given that background.
> (I wonder if this would also have benefits for the representation of
> in-memory bitmaps?)
Hm. Not sure how?
> > - global indexes (for cross-partition unique constraints and such),
> > which need a partition identifier as part of the tid (or as part of
> > the index key, but I think that actually makes interaction with
> > indexam from other layers more complicated - the inside of the index
> > maybe may want to represent it as a column, but to the outside that
> > ought not to be visible)
>
> Can we just use an implementation level attribute for this? Would it
> be so bad if we weren't able to jump straight to the partition number
> without walking through the tuple when the tuple has varwidth
> attributes? (If that isn't acceptable, then we can probably make it
> work for global indexes without having to generalize everything.)
Having to walk through the index tuple might be acceptable - in all
likelihood we'll have to do so anyway. It does however not *really*
resolve the issue that we still need to pass something tid back from the
indexam, so we can fetch the associated tuple from the heap, or add the
tid to a bitmap. But that could be done separately from the index
internal data structures.
> Generalizing the nbtree AM to be able to work with an arbitrary type
> of table row identifier that isn't at all like a TID raises tricky
> definitional questions. It would have to work in a way that made the
> new variety of table row identifier stable, which is a significant new
> requirement (and one that zheap is clearly not interested in).
Hm. I don't see why a different types of TID would imply them being
stable?
> I am not suggesting that these issues are totally insurmountable. What
> I am saying is this: If we already had "stable logical" TIDs instead
> of "mostly physical TIDs", then generalizing nbtree index tuples to
> store arbitrary table row identifiers would more or less be all about
> the data structure managed by nbtree. But that isn't the case, and
> that strongly discourages me from working on this -- we shouldn't talk
> about the problem as if it is mostly just a matter of settling of the
> best index tuple format.
> Frankly I am not very enthusiastic about working on a project that has
> unclear scope and unclear benefits for users.
Why would properly supporting AMs like zedstore, global indexes,
"indirect" indexes etc benefit users?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2019-10-30 19:19:03 | Re: Parallel leader process info in EXPLAIN |
Previous Message | Peter Geoghegan | 2019-10-30 18:33:21 | Re: Thoughts on nbtree with logical/varwidth table identifiers, v12 on-disk representation |