Re: MaxOffsetNumber for Table AMs

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MaxOffsetNumber for Table AMs
Date: 2021-04-30 17:28:37
Message-ID: CAH2-WzneVxuCZBMW+MWPATS916xGjQVmguOb2ibJg=hJ0yTZhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 30, 2021 at 10:10 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > There are two major reasons why I want variable-width tuple IDs. One
> > is global indexes, where you need as many bits as the AMs implementing
> > the partitions need, plus some extra bits to identify which partition
> > is relevant for a particular tuple. No fixed number of bits that you
> > make available can ever be sufficient here,
>
> I agree that global indexes need more bits, but it doesn't necessarily
> follow that we must have variable-width TIDs. We could for example
> say that "real" TIDs are only 48 bits and index AMs that want to be
> usable as global indexes must be capable of handling 64-bit TIDs,
> leaving 16 bits for partition ID. A more forward-looking definition
> would require global index AMs to store 96 bits (partition OID plus
> 64-bit TID). Either way would be far simpler for every moving part
> involved than going over to full varlena TIDs.

The question of how the on-disk format on indexes needs to be changed
to accomodate global indexes seems like an entirely separate question
to how we go about expanding or redefining TIDs.

Global indexes should work by adding an extra column that is somewhat
like a TID, that may even have its own pg_attribute entry. It's much
more natural to make the partition number a separate column IMV --
nbtree suffix truncation and deduplication can work in about the same
way as before. Plus you'll need to do predicate pushdown using the
partition identifier in some scenarios anyway. You can make the
partition identifier variable-width without imposing the cost and
complexity of variable-width TIDs on index AMs.

I believe that the main reason why there have been so few problems
with any of the nbtree work in the past few releases is that it
avoided certain kinds of special cases. Any special cases in the
on-disk format and in the space accounting used when choosing a split
point ought to be avoided at all costs. We can probably afford to add
a lot of complexity to make global indexes work, but it ought to be
contained to cases that actually use global indexes in an obvious way.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-04-30 17:35:24 Re: MaxOffsetNumber for Table AMs
Previous Message Tom Lane 2021-04-30 17:10:13 Re: MaxOffsetNumber for Table AMs