Re: Thoughts on nbtree with logical/varwidth table identifiers, v12 on-disk representation

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Thoughts on nbtree with logical/varwidth table identifiers, v12 on-disk representation
Date: 2019-10-30 19:02:55
Message-ID: 20191030190255.hxtqtxcjset3l3pz@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-10-30 11:33:21 -0700, Peter Geoghegan wrote:
> On Mon, Apr 22, 2019 at 9:35 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2019-04-21 17:46:09 -0700, Peter Geoghegan wrote:
> > > Andres has suggested that I work on teaching nbtree to accommodate
> > > variable-width, logical table identifiers, such as those required for
> > > indirect indexes, or clustered indexes, where secondary indexes must
> > > use a logical primary key value instead of a heap TID.
>
> I'm revisiting this thread now because it may have relevance to the
> nbtree deduplication patch. If nothing else, the patch further commits
> us to the current heap TID format by making assumptions about the
> width of posting lists with 6 byte TIDs.

I'd much rather not entrench this further, even leaving global indexes
aside. The 4 byte block number is a significant limitation for heap
tables too, and we should lift that at some point not too far away.
Then there's also other AMs that could really use a wider tid space.

> Though I suppose a posting list almost has to have fixed width TIDs to
> perform acceptably.

Hm. It's not clear to me why that is?

> > I think it's two more cases:
> >
> > - table AMs that want to support tables that are bigger than 32TB. That
> > used to be unrealistic, but it's not anymore. Especially when the need
> > to VACUUM etc is largely removed / reduced.
>
> Can we steal some bits that are currently used for offset number
> instead? 16 bits is far more than we ever need to use for heap offset
> numbers in practice.

I think that's a terrible idea. For one, some AMs will have significant
higher limits, especially taking compression and larger block sizes into
account. Also not all AMs need identifiers tied so closely to a disk
position, e.g. zedstore does not. We shouldn't hack evermore
information into the offset, given that background.

> (I wonder if this would also have benefits for the representation of
> in-memory bitmaps?)

Hm. Not sure how?

> > - global indexes (for cross-partition unique constraints and such),
> > which need a partition identifier as part of the tid (or as part of
> > the index key, but I think that actually makes interaction with
> > indexam from other layers more complicated - the inside of the index
> > maybe may want to represent it as a column, but to the outside that
> > ought not to be visible)
>
> Can we just use an implementation level attribute for this? Would it
> be so bad if we weren't able to jump straight to the partition number
> without walking through the tuple when the tuple has varwidth
> attributes? (If that isn't acceptable, then we can probably make it
> work for global indexes without having to generalize everything.)

Having to walk through the index tuple might be acceptable - in all
likelihood we'll have to do so anyway. It does however not *really*
resolve the issue that we still need to pass something tid back from the
indexam, so we can fetch the associated tuple from the heap, or add the
tid to a bitmap. But that could be done separately from the index
internal data structures.

> Generalizing the nbtree AM to be able to work with an arbitrary type
> of table row identifier that isn't at all like a TID raises tricky
> definitional questions. It would have to work in a way that made the
> new variety of table row identifier stable, which is a significant new
> requirement (and one that zheap is clearly not interested in).

Hm. I don't see why a different types of TID would imply them being
stable?

> I am not suggesting that these issues are totally insurmountable. What
> I am saying is this: If we already had "stable logical" TIDs instead
> of "mostly physical TIDs", then generalizing nbtree index tuples to
> store arbitrary table row identifiers would more or less be all about
> the data structure managed by nbtree. But that isn't the case, and
> that strongly discourages me from working on this -- we shouldn't talk
> about the problem as if it is mostly just a matter of settling of the
> best index tuple format.

> Frankly I am not very enthusiastic about working on a project that has
> unclear scope and unclear benefits for users.

Why would properly supporting AMs like zedstore, global indexes,
"indirect" indexes etc benefit users?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-10-30 19:19:03 Re: Parallel leader process info in EXPLAIN
Previous Message Peter Geoghegan 2019-10-30 18:33:21 Re: Thoughts on nbtree with logical/varwidth table identifiers, v12 on-disk representation