Re: POC, WIP: OR-clause support for indexes

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-06-24 19:02:47
Message-ID: CAH2-WzmKYo2pnQcKAhjZFofW8Ow23B4CZHCUSpCZ+z5YR4eXvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 24, 2024 at 2:28 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jun 24, 2024 at 1:47 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > I agree, with the proviso that "avoid gratuitous failures" should
> > include cases where a query that got the optimization suddenly fails
> > to get the optimization, due only to some very innocuous looking
> > change. Such as a change from using a constant 1_000_000_000 to a
> > constant 5_000_000_000 in the query text. That is a POLA violation.
>
> Nope, I don't agree with that at all. If you imagine that we can
> either have the optimization apply to one of those cases on the other,
> or on the other hand we can have some cases that outright fail, I
> think it's entirely clear that the former is better.

I'm just saying that not having the optimization apply to a query very
similar to one where it does apply is a POLA violation. That's another
kind of failure, for all practical purposes. Weird performance cliffs
like that are bad. It's very easy to imagine code that generates a
query text, that at some point randomly and mysteriously gets a
sequential scan. Or a much less efficient index scan.

> I was assuming this patch shouldn't be changing the way indexes work
> at all, just making use of the facilities that we have today. More
> could be done, but that might make it harder to get anything
> committed.

I was just pointing out that there is currently no good way to make
nbtree efficiently execute a qual "WHERE a = 5 OR a IS NULL", which is
almost entirely (though not quite entirely) due to a lack of any way
of expressing that idea through SQL, in a way that'll get pushed down
to the index scan node. You can write "WHERE a = any('{5,NULL')", of
course, but that doesn't treat NULL as just another array element to
match against via an IS NULL qual (due to NULL semantics).

Yeah, this is nonessential. But it's quite a nice optimization, and
seems entirely doable within the framework of the patch. It would be a
natural follow-up.

All that I'd need on the nbtree side is to get an input scan key that
directly embodies "WHERE mycol = 5 OR mycol IS NULL". That would
probably just be a scan key with sk_flags "SK_SEARCHARRAY |
SK_SEARCHNULL", that was otherwise identical to the current
SK_SEARCHARRAY scan keys.

Adopting the nbtree array index scan code to work with this would be
straightforward. SK_SEARCHNULL scan keys basically already work like
regular equality scan keys at execution time, so all that this
optimization requires on the nbtree side is teaching
_bt_advance_array_keys to treat NULL as a distinct array condition
(evaluated as IS NULL, not as = NULL).

> It's even possible, in my mind at least, that the patch is already
> doing exactly the right things here. Even if it isn't, the problem
> doesn't seem to be fundamental, because if this example can work (and
> it does) then what the patch is trying to do should be workable, too.
> We just have to make sure we're plugging all the pieces properly
> together, and that we have comments adequately explain what is
> happening and test cases that verify it. My feeling is that the patch
> doesn't meet that standard today, but I think that just means it needs
> some more work. I'm not arguing we have to throw the whole thing out,
> or invent a lot of new infrastructure, or anything like that.

I feel like my point about the potential for POLA violations is pretty
much just common sense. I'm not particular about the exact mechanism
by which we avoid it; only that we avoid it.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-06-24 19:04:32 Re: strange context message in spi.c?
Previous Message Robert Haas 2024-06-24 18:51:25 Re: Proposal: Document ABI Compatibility