Re: BUG #18399: Query plan optimization results in runtime error when hoisting cast from inside subquery

From: Sawyer Knoblich <scknoblich(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18399: Query plan optimization results in runtime error when hoisting cast from inside subquery
Date: 2024-03-20 02:20:14
Message-ID: CAOfdcWDY8Bz9OJVmNk1ESBhuCaqZQiFSWAJ36t7qwG=Q=BWRuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks Tom, I can definitely appreciate that there are plenty of times for
compromises like this in engineering so if the answer is "yes it's
technically a bug, but it gives such a large benefit relative to the amount
of issues it causes that it's better to have it than not" then I can live
with that. I do wonder if it's worth calling this out in the documentation
though if it's not there already as it was definitely surprising to me when
I ran into it and couldn't find anything about it online, and it may be
useful to have somewhere to link others to in the future.

Also yes, my original reply may have been somewhat vaguely worded but I was
thinking about this a combined marking + optimizer bug in the sense that
"the marking doesn't say that it isn't valid to do this and therefore the
optimizer assumes it's allowed to do it", and it seemed to me that the root
cause is the inability to actually represent this type of side effect with
the current set of markers. I did have a decently large section drafted in
my previous reply at one point about ideas for additional function marking
that I thought might be interesting but had cut it out for the sake of
brevity, but I'll write a quick summary here if you're interested.

In short, it felt to me like fallibility was orthogonal to volatility in
that its side effect is only triggered by calling it the first time instead
of by calling it multiple times. My thought was of a potential second type
of function marker to describe fallibility that the optimizer could
possibly leverage to hopefully still enable most types of optimizations
while preventing scenarios such as this one. I was imagining that an
operation like division could be marked something like "immutable
fallible", which would indicate that the function is still allowed to be
optimized in any way that an immutable function is except for any
optimizations that may cause it to operate on additional input that it
would otherwise not normally operate on in an unoptimized query, which
could then be used to disallow this bug's optimization where the values
given to the cast are first filtered by the where clause.

I'm sure there are all sorts of issues with this and I don't have nearly
enough database internals experience to know if this is actually a good
idea or not (maybe this would still limit nearly all optimizations, maybe
nearly every function would have to be marked as fallible which would
defeat the whole point, maybe just fallibility is too narrow and there are
other causes of this type of side effect that would also benefit from this
optimization constraint, it might not even be feasible to implement this,
etc) but it was fun to daydream about and figured I'd share on the off
chance it's useful in any way.

Best,
Sawyer Knoblich

On Tue, Mar 19, 2024 at 5:37 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Sawyer Knoblich <scknoblich(at)gmail(dot)com> writes:
> > To make sure I'm understanding correctly, are you suggesting that this is
> > not a bug? The mention of "the cast to integer is presumed to be
> > side-effect-free, which isn't true if you're working with data where it
> > could throw errors" to me heavily implies that this is a faulty
> assumption
> > on behalf of the optimizer about what behaviors certain parts of the
> query
> > may have, but I didn't get the impression that this is being considered
> an
> > optimizer bug and is instead something that just needs to be worked
> around
> > or avoided (although if I'm not interpreting it right then definitely
> > please correct me).
>
> It is not an optimizer bug: the optimizer is doing what the function
> marking entitles it to. You could argue that it's a marking bug and
> no function/operator that can throw an error should be marked stable
> or immutable, because throwing an error is a kind of side-effect.
> But that would be a pretty useless answer, as it would nearly destroy
> our ability to optimize queries at all, since there's not all that
> many functions that can be guaranteed not to throw errors.
>
> In short, this state of affairs is an engineering compromise.
> It's not perfect, but it's not easy to see how to do better either.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2024-03-20 04:08:49 Re: BUG #18399: Query plan optimization results in runtime error when hoisting cast from inside subquery
Previous Message Thomas Munro 2024-03-20 02:12:49 Re: Regression tests fail with musl libc because libpq.so can't be loaded