Re: WIP: expression evaluation improvements

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: expression evaluation improvements
Date: 2021-11-04 16:30:00
Message-ID: CA+TgmoZX=LNY8zovUOp=WiW-KwntFacgTyRRVhJ_c00oi-8+JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres asked me off-list for comments on 0026, so here goes.

As a general comment, I think the patches could really benefit from
more meaningful commit messages and more comments on individual
functions. It would definitely help me review, and it might help other
people review, or modify the code later. For example, I'm looking at
ExprEvalStep. If the intent here is that we don't want the union
members to point to data that might differ from one execution of the
plan to the next, it's surely important to mention that and explain to
people who are trying to add steps later what they should do instead.
But I'm also not entirely sure that's the intended rule. It kind of
surprises me that the only things that we'd be pointing to here that
would fall into that category would be a bool, a NullableDatum, a
NullableDatum array, and a FunctionCallInfo ... but I've been
surprised by a lot of things that turned out to be true.

I am not a huge fan of the various Rel[Whatever] typedefs. I am not
sure that's really adding any clarity. On the other hand I would be a
big fan of renaming the structure members in some systematic way. This
kind of thing doesn't sit well with me:

- NullableDatum *value; /* value to return */
+ RelNullableDatum value; /* value to return */

Well, if NullableDatum was the value to return, then RelNullableDatum
isn't. It's some kind of thing that lets you find the value to return.
Actually that's not really right either, because before 'value' was a
pointer to the value to return and the corresponding isnull flag, and
now it is a way of finding that stuff. I don't know exactly what to do
here to keep the comment comprehensible and not unreasonably long, but
I don't think not changing at it all is the thing. Nor do I think just
having it be called 'value' when it's clearly not the value, nor even
a pointer to the value, is as clear as I would like to be.

I wonder if ExprBuilderAllocBool ought to be using sizeof(bool) rather
than sizeof(NullableDatum).

Is it true that allocno is only used for, err, some kind of LLVM
thing, and not in the regular interpreted path? As far as I can see,
outside of the LLVM code, we only ever test whether it's 0, and don't
actually care about the specific value.

I hope that the fact that this patch reverses the order of the first
two arguments to ExecInitExprRec is only something you did to make it
so that the compiler would find places you needed to update. Because
otherwise it makes no sense to introduce a new thing called an
ExprStateBuilder in 0017, make it an argument to that function, and
then turn around and change the signature again in 0026. Anyway, a
final patch shouldn't include this kind of churn.

+ offsetof(ExprState, steps) * esb->steps_len * sizeof(ExprEvalStep) +
+ state->mutable_off = offsetof(ExprState, steps) * esb->steps_len *
sizeof(ExprEvalStep);

Well, either I'm confused here, or the first * should be a + in each
case. I wonder how this works at all.

+ /* copy in step data */
+ {
+ ListCell *lc;
+ int off = 0;
+
+ foreach(lc, esb->steps)
+ {
+ memcpy(&state->steps[off], lfirst(lc), sizeof(ExprEvalStep));
+ off++;
+ }
+ }

This seems incredibly pointless to me. Why use a List in the first
place if we're going to have to flatten it using this kind of code?

I think stuff like RelFCIOff() and RelFCIIdx() and RelArrayIdx() is
just pretty much incomprehensible. Now, the executor is full of
badly-named stuff already -- ExecInitExprRec being a fine example of a
name nobody is going to understand on first reading, or maybe ever --
but we ought to try not to make things worse. I also do understand
that anything with relative pointers is bound to involve a bunch of
crappy notation that we're just going to have to accept as the price
of doing business. But it would help to pick names that are not so
heavily abbreviated. Like, if RelFCIIdx() were called
find_function_argument_in_relative_fcinfo() or even
get_fnarg_from_relfcinfo() the casual reader might have a chance of
guessing what it does. Sure, the code might be longer, but if you can
tell what it does without cross-referencing, it's still better.

I would welcome changes that make it clearer which things happen just
once and which things happen at execution time; that said, it seems
like RELPTR_RESOLVE() happens at execution time, and it surprises me a
bit that this is OK from a performance perspective. The pointers can
change from execution to execution, but not within an individual
execution, or so I think. So it doesn't need to be resolved every
time, if somehow that can be avoided. But maybe CPUs are sufficiently
well-optimized for computing a pointer address as a+b*c that it does
not matter.

I'm not sure how helpful any of these comments are, but those are my
initial thoughts.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-11-04 16:37:13 Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.
Previous Message Mark Dilger 2021-11-04 16:08:31 Re: pg14 psql broke \d datname.nspname.relname