Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)
Date: 2023-07-09 16:16:26
Message-ID: 677c95ad-513f-6702-3c59-0afa2790a149@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/8/23 23:57, Heikki Linnakangas wrote:
> On 02/07/2023 19:09, Tomas Vondra wrote:
>> Here's an updated version of the patch series.
>>
>> I've polished and pushed the first three patches with cleanup, tests to
>> improve test coverage and so on. I chose not to backpatch those - I
>> planned to do that to make future backpatches simpler, but the changes
>> ended up less disruptive than expected.
>>
>> The remaining patches are just about adding SK_SEARCHARRAY to BRIN.
>>
>> 0001 - adds the optional preprocess procedure, calls it from brinrescan
>>
>> 0002 to 0005 - adds the support to the existing BRIN opclasses
>
> Could you implement this completely in the consistent-function, by
> caching the sorted array in fn_extra, without adding the new preprocess
> procedure? On first call, when fn_extra == NULL, sort the array and
> stash it in fn_extra.
>
> I don't think that works, because fn_extra isn't reset when the scan
> keys change on rescan. We could reset it, and document that you can use
> fn_extra for per-scankey caching. There's some precedence for not
> resetting it though, see commit d22a09dc70f. But we could provide an
> opaque per-scankey scratch space like that somewhere else. In BrinDesc,
> perhaps.
>

Hmm, yeah. BrinDesc seems like a good place for such scratch space ...

And it's seem to alleviate most of the compatibility issues, because
it'd make the preprocessing a responsibility of the consistent function,
instead of doing it in a separate optional procedure (and having to deal
with cases when it does not exist). If would not even need a separate
procnum.

> The new preprocess support function feels a bit too inflexible to me.
> True, you can store whatever you want in the ScanKey that it returns,
> but since that's the case, why not just make it void * ?It seems that
> the constraint here was that you need to pass a ScanKey to the
> consistent function, because the consistent function's signature is what
> it is. But we can change the signature, if we introduce a new support
> amproc number for it.
>

Now sure I follow - what should be made (void *)? Oh, you mean not
passing the preprocessed array as a scan key at all, and instead passing
it as a new (void*) parameter to the (new) consistent function?

Yeah, I was trying to stick to the existing signature of the consistent
function, to minimize the necessary changes.

>> The main open question I have is what exactly does it mean that the
>> procedure is optional. In particular, should it be supported to have a
>> BRIN opclass without the "preprocess" procedure but using the other
>> built-in support procedures?
>>
>> For example, imagine you have a custom BRIN opclass in an extension (for
>> a custom data type or something). This does not need to implement any
>> procedures, it can just call the existing built-in ones. Of course, this
>> won't get the "preprocess" procedure automatically.
>>
>> Should we support such opclasses or should we force the extension to be
>> updated by adding a preprocess procedure? I'd say "optional" means we
>> should support (otherwise it'd not really optional).
>>
>> The reason why this matters is that "amsearcharray" is AM-level flag,
>> but the support procedure is defined by the opclass. So the consistent
>> function needs to handle SK_SEARCHARRAY keys both with and without
>> preprocessing.
>>
>> That's mostly what I did for all existing BRIN opclasses (it's a bit
>> confusing that opclass may refer to both the "generic" minmax or the
>> opclass defined for a particular data type). All the opclasses now
>> handle three cases:
>>
>> 1) scalar keys (just like before, with amsearcharray=fase)
>>
>> 2) array keys with preprocessing (sorted array, array of hashes, ...)
>>
>> 3) array keys without preprocessing (for compatibility with old
>>     opclasses missing the optional preprocess procedure)
>>
>> The current code is a bit ugly, because it duplicates a bunch of code,
>> because the option (3) mostly does (1) in a loop. I'm confident this can
>> be reduced by refactoring and reusing some of the "shared" code.
>>
>> The question is if my interpretation of what "optional" procedure means
>> is reasonable. Thoughts?
>>
>> The other thing is how to test this "compatibility" code. I assume we
>> want to have the procedure for all built-in opclasses, so that won't
>> exercise it. I did test it by temporarily removing the procedure from a
>> couple pg_amproc.dat entries. I guess creating a custom opclass in the
>> regression test is the right solution.
>
> It would be unpleasant to force all BRIN opclasses to immediately
> implement the searcharray-logic. If we don't want to do that, we need to
> implement generic SK_SEARCHARRAY handling in BRIN AM itself. That would
> be pretty easy, right? Just call the regular consistent function for
> every element in the array.
>

True, although the question is how many out-of-core opclasses are there.
My impression is the number is pretty close to 0, in which case we're
making ourselves to jump through all kinds of hoops, making the code
more complex, with almost no benefit in the end.

> If an opclass wants to provide a faster/better implementation, it can
> provide a new consistent support procedure that supports that. Let's
> assign a new amproc number for new-style consistent function, which must
> support SK_SEARCHARRAY, and pass it some scratch space where it can
> cache whatever per-scankey data. Because it gets a new amproc number, we
> can define the arguments as we wish. We can pass a pointer to the
> per-scankey scratch space as a new argument, for example.
>
> We did this backwards-compatibility dance with the 3/4-argument variants
> of the current consistent functions. But I think assigning a whole new
> procedure number is better than looking at the number of arguments.
>

I actually somewhat hate the 3/4-argument dance, and I'm opposed to
doing that sort of thing again. First, I'm not quite convinced it's
worth the effort to jump through this hoop (I recall this being one of
the headaches when fixing the BRIN NULL handling). Second, it can only
be done once - imagine we now need to add a new optional parameter.
Presumably, we'd need to keep the existing 3/4 variants, and add new 4/5
variants. At which point 4 is ambiguous.

Yes, my previous message was mostly about backwards compatibility, and
this may seem a bit like an argument against it. But that message was
more a question "If we do this, is it actually backwards compatible the
way we want/need?")

Anyway, I think the BrinDesc scratch space is a neat idea, I'll try
doing it that way and report back in a couple days.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joseph Koshakow 2023-07-09 17:03:14 Re: Preventing non-superusers from altering session authorization
Previous Message Tom Lane 2023-07-09 14:18:30 Re: BUG #18016: REINDEX TABLE failure