Re: Reduce "Var IS [NOT] NULL" quals during constant folding

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reduce "Var IS [NOT] NULL" quals during constant folding
Date: 2025-04-07 02:59:12
Message-ID: CAMbWs49=P=M7rCvMgfm31FQ-pvTNCv5wasd3gr1amNzQUSkDDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 5, 2025 at 4:14 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Apr 1, 2025 at 10:14 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> > The attnotnull catalog information being discussed here is intended
> > for use during constant folding (and possibly sublink pull-up), which
> > occurs long before partition pruning. Am I missing something?

> Hmm, OK, so you think that we need to gather this information early,
> so that we can do constant folding correctly, but you don't want to
> gather everything that get_relation_info() does at this stage, because
> then we're doing extra work on partitions that might later be pruned.
> Is that correct?

That's correct. As I mentioned earlier, I believe attnotnull isn't
the only piece of information we need to gather early on. My general
idea is to separate the collection of catalog information into two
phases:

* Phase 1 occurs at an early stage and collects the small portion of
catalog information that is needed for constant folding, setting the
inh flag for a relation, or expanding virtual generated columns. All
these happen very early in the planner, before partition pruning.

* Phase 2 collects the majority of the catalog information and occurs
when building the RelOptInfos, like what get_relation_info does.

FWIW, aside from partition pruning, I suspect there may be other cases
where a relation doesn't end up having a RelOptInfo created for it.
And the comment for add_base_rels_to_query further strengthens my
suspicion.

* Note: the reason we find the baserels by searching the jointree, rather
* than scanning the rangetable, is that the rangetable may contain RTEs
* for rels not actively part of the query, for example views. We don't
* want to make RelOptInfos for them.

If my suspicion is correct, then partition pruning isn't the only
reason we might not want to move get_relation_info to an earlier
stage.

> > Additionally, I'm doubtful that the collection of relhassubclass can
> > be moved after partition pruning. How can we determine whether a
> > relation is inheritable without retrieving its relhassubclass
> > information?
>
> We can't -- but notice that we open the relation before fetching
> relhassubclass, and then pass down the Relation object to where
> get_relation_info() is ultimately called, so that we do not repeatedly
> open and close the Relation. I don't know if I can say exactly what's
> going to go wrong if we add an extra table_open()/table_close() as you
> do in the patch, but I've seen enough performance and correctness
> problems with such code over the years to make me skeptical.

I'm confused here. AFAICS, we don't open the relation before fetching
relhassubclass, according to the code that sets the inh flag in
subquery_planner. Additionally, I do not see we pass down the
Relation object to get_relation_info. In get_relation_info, we call
table_open to obtain the Relation object, use it to retrieve the
catalog information, and then call table_close to close the Relation.

Am I missing something, or do you mean that the relcache entry is
actually built earlier, and that table_open/table_close call in
get_relation_info merely increments/decrements the reference count?

IIUC, you're concerned about calling table_open/table_close to
retrieve catalog information. Do you know of a better way to retrieve
catalog information without calling table_open/table_close? I find
the table_open/table_close pattern is quite common in the current
code. In addition to get_relation_info(), I've also seen it in
get_relation_constraints(), get_relation_data_width(), and others.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-04-07 03:02:50 Re: Modern SHA2- based password hashes for pgcrypto
Previous Message Tom Lane 2025-04-07 02:37:41 Re: FmgrInfo allocation patterns (and PL handling as staged programming)