Re: Index AM API cleanup

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)enterprisedb(dot)com>, Alex Wang <alex(dot)wang(at)enterprisedb(dot)com>
Subject: Re: Index AM API cleanup
Date: 2024-08-26 12:21:26
Message-ID: dca84a11-dfbf-4f28-abc8-44c841256cd1@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.08.24 21:25, Mark Dilger wrote:
> The next twenty patches are a mix of fixes of various layering
> violations, such as not allowing non-core index AMs from use in replica
> identity full, or for speculative insertion, or for foreign key
> constraints, or as part of merge join; with updates to the "treeb" code
> as needed. The changes to "treeb" are broken out so that they can also
> easily be excluded from whatever gets committed.

I made a first pass through this patch set. I think the issues it aims
to address are mostly legitimate. In a few cases, we might need some
more discussion and perhaps will end up slicing the APIs a bit
differently. The various patches that generalize the strategy numbers
appear to overlap with things being discussed at [0], so we should see
that the solution covers all the use cases.

[0]:
https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg(at)mail(dot)gmail(dot)com

To make a dent, I picked out something that should be mostly harmless:
Stop calling directly into _bt_getrootheight() (patch 0004). I think
this patch is ok, but I might call the API function amgettreeheight
instead of amgetrootheight. The former seems more general.

Also, a note for us all in this thread, changes to the index AM API need
updates to the corresponding documentation in doc/src/sgml/indexam.sgml.

I notice that _bt_getrootheight() is called only to fill in the
IndexOptInfo tree_height field, which is only used by btcostestimate(),
so in some sense this is btree-internal data. But making it so that
btcostestimate() calls _bt_getrootheight() directly to avoid all that
intermediate business seems too complicated, and there was probably a
reason that the cost estimation functions don't open the index.

Interestingly, the cost estimation functions for gist and spgist also
look at the tree_height field but nothing ever fills it on. So with
your API restructuring, someone could provide the missing API functions
for those index types. Might be interesting.

That said, there might be value in generalizing this a bit. If you look
at the cost estimation functions in pgvector (hnswcostestimate() and
ivfflatcostestimate()), they both have this pattern that
btcostestimate() tries to avoid: They open the index, look up some
number, close the index, then make a cost estimate computation with the
number looked up. So another idea would be to generalize the
tree_height field to some "index size data" or even "internal data for
cost estimation". This wouldn't need to change the API much, since
these are all just integer values, but we'd label the functions and
fields a bit differently.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-08-26 12:40:17 Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Previous Message Pavel Stehule 2024-08-26 12:15:56 Re: [PATCH] Add CANONICAL option to xmlserialize