From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Reduce "Var IS [NOT] NULL" quals during constant folding |
Date: | 2025-04-01 06:34:22 |
Message-ID: | CAMbWs4_0B0MjZuQY-az6r=K03ONfdBCKbtoQCHj0p4UwGdOb4A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 1, 2025 at 1:55 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> As a general principle, I have found that it's usually a sign that
> something has been designed poorly when you find yourself wanting to
> open a relation, get exactly one piece of information, and close the
> relation again. That is why, today, all the information that the
> planner needs about a particular relation is retrieved by
> get_relation_info(). We do not just wander around doing random catalog
> lookups wherever we need some critical detail. This patch increases
> the number of places where we fetch relation data from 1 to 2, but
> it's still the case that almost everything happens in
> get_relation_info(), and there's now just exactly this 1 thing that is
> done in a different place. That doesn't seem especially nice. I
> thought the idea was going to be to move get_relation_info() to an
> earlier stage, not split one thing out of it while leaving everything
> else the same.
I initially considered moving get_relation_info() to an earlier stage,
where we would collect all the per-relation data by relation OID.
Later, when building the RelOptInfos, we could link them to the
per-relation-OID data.
However, I gave up this idea because I realized it would require
retrieving a whole bundle of catalog information that isn't needed
until after the RelOptInfos are built, such as max_attr, pages,
tuples, reltablespace, parallel_workers, extended statistics, etc.
And we may also need to create the IndexOptInfos for the relation's
indexes. It seems to me that it's not a trivial task to move
get_relation_info() before building the RelOptInfos, and more
importantly, it's unnecessary most of the time.
In other words, of the many pieces of catalog information we need to
retrieve for a relation, only a small portion is needed at an early
stage. As far as I can see, this small portion includes:
* relhassubclass, which we retrieve immediately after we have finished
adding rangetable entries.
* attgenerated, which we retrieve in expand_virtual_generated_columns.
* attnotnull, as discussed here, which should ideally be retrieved
before pull_up_sublinks.
My idea is to retrieve only this small portion at an early stage, and
still defer collecting the majority of the catalog information until
building the RelOptInfos. It might be possible to optimize by
retrieving this small portion in one place instead of three, but
moving the entire get_relation_info() to an earlier stage doesn't seem
like a good idea to me.
It could be argued that the separation of catalog information
collection isn't very great, but it seems this isn't something new in
this patch. So I respectfully disagree with your statement that "all
the information that the planner needs about a particular relation is
retrieved by get_relation_info()", and that "there's now just exactly
this 1 thing that is done in a different place". For instance,
relhassubclass and attgenerated are retrieved before expression
preprocessing, a relation's constraint expressions are retrieved when
setting the relation's size estimates, and more.
Thanks
Richard
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Guo | 2025-04-01 07:18:58 | Re: Memoize ANTI and SEMI JOIN inner |
Previous Message | Rafael Thofehrn Castro | 2025-04-01 06:23:43 | Re: Proposal: Progressive explain |