Re: Reduce "Var IS [NOT] NULL" quals during constant folding

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reduce "Var IS [NOT] NULL" quals during constant folding
Date: 2025-04-11 06:51:30
Message-ID: CAMbWs4_soFRubFd64SX-GC+MA6RV=pZrwfn3oq89LNKoDuLjAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 11, 2025 at 4:45 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> OK. Maybe I shouldn't be worrying about the table_open() /
> table_close() here, because I see that you are right that
> has_subclass() is nearby, which admittedly does not involve opening
> the relation, but it does involve fetching from the syscache a tuple
> that we wouldn't need to fetch if we had a Relation available at that
> point. And I see now that expand_virtual_generated_columns() is also
> in that area and works similar to your proposed function in that it
> just opens and closes all the relations. Perhaps that's just the way
> we do things at this very early stage of the planner? But I feel like
> it might make sense to do some reorganization of this code, though, so
> that it more resembles the phase 1 and phase 2 as you describe them.
> Both expand_virtual_generated_columns() and collect_relation_attrs()
> care about exactly those relations with rte->rtekind == RTE_RELATION,
> and even if we have to open and close all of those relations once to
> do this processing, perhaps we can avoid doing it twice, and maybe
> avoid the has_subclass() call along the way?

Yeah, I agree on this. This is the optimization I mentioned upthread
in the last paragraph of [1] - retrieving the small portion of catalog
information needed at an early stage in one place instead of three.
Hopefully, this way we only need one table_open/table_close at the
early stage of the planner.

> Maybe we can hoist a loop
> over parse->rtable up into subquery_planner and then have it call a
> virtual-column expansion function and a non-null collection function
> once per RTE_RELATION entry?

Hmm, I'm afraid there might be some difficulty with this approach.
The virtual-column expansion needs to be done after sublink pull-up to
ensure that the virtual-column references within the SubLinks that
should be transformed into joins can get expanded, while non-null
collection needs to be done before sublink pull-up since we might want
to leverage the non-null information to convert NOT IN sublinks to
anti joins.

What I had in mind is that we hoist a loop over parse->rtable before
pull_up_sublinks to gather information about which columns of each
relation are defined as NOT NULL and which are virtually generated.
These information will be used in sublink pull-up and virtual-column
expansion. We also call has_subclass for each relation that is maked
'inh' within that loop and clear the inh flag if needed.

This seems to require a fair amount of code changes, so I'd like to
get some feedback on this approach before proceeding with the
implementation.

> A related point that I'm noticing is that you record the not-NULL
> information in the RangeTblEntry. I wonder whether that's going to be
> a problem, because I think of the RangeTblEntry as a parse-time
> structure and the RelOptInfo as a plan-time structure, meaning that we
> shouldn't scribble on the former and that we should record any
> plan-time details we need in the latter. I understand that the problem
> is precisely that the RelOptInfo isn't yet available, but I'm not sure
> that makes it OK to use the RangeTblEntry instead.

Fair point. We should do our best to refrain from scribbling on the
parsetree in the planner. I initially went with the hashtable
approach as Tom suggested, but later found it quite handy to store the
not-null information in the RangeTblEntry, especially since we do
something similar with rte->inh. However, I've come to realize that
inh may not be a good example to follow after all, so I'll go back to
the hashtable method.

Thank you for pointing that out before I went too far down the wrong
path.

[1] https://postgr.es/m/CAMbWs4-DryEm_U-juPn3HwUiwZRXW3jhfX18b_AFgrgihgq4kg@mail.gmail.com

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-04-11 07:07:04 Re: Improve a few appendStringInfo calls new to v18
Previous Message Peter Smith 2025-04-11 06:05:45 Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding