Re: MaxOffsetNumber for Table AMs

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MaxOffsetNumber for Table AMs
Date: 2021-04-30 10:04:44
Message-ID: CAEze2Wit1EtHHBHJ+CYvBPthrWUzu2Vqc-BmzU3ApK3iotHriw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 30 Apr 2021, 09:46 Jeff Davis, <pgsql(at)j-davis(dot)com> wrote:

>
> The notion of TID is based on pages and line pointers, which makes
> sense for heapam, but that's not likely to make sense for a custom
> table AM.
>
> The obvious answer is to make a simple mapping between a TID and
> whatever makes sense to the AM (for the sake of discussion, let's say a
> plain row number).
>
> The most natural thing would be to say that we have 48 bits, so it can
> just be a 48-bit number. Of course, there are some restrictions on
> valid values that complicate this:
>
> * InvalidBlockNumber of 0xFFFFFFFF. Not a problem.
> * InvalidOffsetNumber of 0. Not a problem.
> * MaxOffsetNumber of 2048. Does this limit really apply to table AMs?
>

MaxOffsetNumber is not per se 2048. It is BLCKSZ / sizeof(ItemIdData),
which is only 2048 for a 8kiB BLCKSZ. As we support BLCKSZ up to 32kiB,
MaxOffsetNumber can be as large as 8196.

Other than that, I believe you've also missed the special offset numbers
specified in itemptr.h (SpecTokenOffsetNumber and MovedPartitionsOffsetNumber).
I am not well enough aware of the usage of these OffsetNumber values, but
those might also be limiting the values any tableAM can use for their TIDs.

It just seems like it's used when scanning heap or index pages for
> stack-allocated arrays. For a table AM it appears to waste 5 bits.
>

MaxOffsetNumber is used for postgres' Page layout, of which the
MaxOffsetNumber is defined as how many item pointers could exist on a page,
and AFAIK should be used for postgres' Page layout only. No thing can or
should change that. If any code asserts limitations to the ip_posid of
table tuples that could also not be tableam tuples, then I believe that is
probably a mistake in postgres, and that should be fixed.

* ginpostinglist.c:itemptr_to_uint64() seems to think 2047 is the max
> offset number. Is this a bug?

No. The documentation for that function explicitly mentions that these item
pointers are optimized for storage when using the heap tableam, and that
that code will be changed once there exist tableAMs with different TID /
ip_posid constraints (see the comment on lines 32 and 33 of that file).

Note that the limiting number that itemptr_to_uint64 should mention for bit
calculations is actually MaxHeaptuplesPerPage, which is about one seventh
of MaxOffsetNumber. The resulting number of bits reserved is not a
miscalculation though, because MaxHeaptuplesPerPage (for 32kiB BLCKSZ)
requires the mentioned 11 bits, and adapting bit swizzling for multiple
page sizes was apparently not considered worth the effort.

As a table AM author, I'd like to know what the real limits are so that
> I can use whatever bits are available to map from TID to row number and
> back, without worrying that something will break in the future. A few
> possibilities:
>
> 1. Keep MaxOffsetNumber as 2048 and fix itemptr_to_uint64().
>

I believe that this is the right way when there exist tableAMs that use
those upper 5 bits.

> 2. Change MaxOffsetNumber to 2047. This seems likely to break
> extensions that rely on it.
>

If you're going to change MaxOffsetNumber, I believe that it's better to
change it to ((BLCKSZ - sizeof(PageHeaderData)) / sizeof(ItemIdData)), as
that is the maximum amount of ItemIds you could put on a Page that has no
page opaque.

3. Define MaxOffsetNumber as 65536 and introduce a new
> MaxItemsPerPage as 2048 for the stack-allocated arrays. We'd still need
> to fix itemptr_to_uint64().

I believe that that breaks more things than otherwise required. ip_posid is
already limited to uint16, so I see no reason to add a constant that would
assert that the value of any uint16 is less its max value plus one.

With regards,

Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Patrik Novotny 2021-04-30 13:13:43 Help needed with a reproducer for CVE-2020-25695 not based on REFRESH MATERIALIZED VIEW
Previous Message Amit Kapila 2021-04-30 09:31:04 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions