Re: Removing unneeded self joins

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2024-07-20 11:38:48
Message-ID: CAPpHfdsj1WdqoCg+g4oq1XQ_=KOj0U4pXqhauBc7e+0msHu66A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 19, 2024 at 11:30 AM Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> On Wed, 17 Jul 2024 at 01:45, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >
> > Jian He gave a try to ChangeVarNodes() [1]. That gives some
> > improvement, but the vast majority of complexity is still here. I
> > think the reason for complexity of SJE is that it's the first time we
> > remove relation, which is actually *used* and therefore might has
> > references in awful a lot of places. In previous cases we removed
> > relations, which were actually unused.
>
> I had a quick look at this, and I have a couple of comments on the
> rewriter changes.
>
> The new function replace_relid() looks to be the same as adjust_relid_set().

They are similar, not the same. replace_relid() has handling for
negative newId, while adjust_relid_set() hasn't. One thing I'd like
to borrow from adjust_relid_set() to replace_relid() is the usage of
IS_SPECIAL_VARNO() macro.

It would be probably nice to move this logic into bms_replace_member()
residing at bitmapset.c. What do you think?

> The changes to ChangeVarNodes() look a little messy. There's a lot of
> code duplicated between ChangeVarNodesExtended() and ChangeVarNodes(),
> which could be avoided by having one call the other. Also, it would be
> better for ChangeVarNodesExtended() to have a "flags" parameter
> instead of an extra boolean parameter, to make it more extensible in
> the future. However,...

I certainly didn't mean to have duplicate functions
ChangeVarNodesExtended() and ChangeVarNodes(). I mean
ChangeVarNodes() should just call ChangeVarNodesExtended().

> I question whether ChangeVarNodesExtended() and the changes to
> ChangeVarNodes() are really the right way to go about this.
> ChangeVarNodes() in particular gains a lot more logic to handle
> RestrictInfo nodes that doesn't really feel like it belongs there --
> e.g., building NullTest nodes is really specific to SJE, and doesn't
> seem like it's something ChangeVarNodes() should be doing.
>
> A better solution might be to add a new walker function to
> analyzejoins.c that does just what SJE needs, which is different from
> ChangeVarNodes() in a number of ways. For Var nodes, it might
> ultimately be necessary to do more than just change the varno, to
> solve the RETURNING/EPQ problems. For RestrictInfo nodes, there's a
> lot of SJE-specific logic. The SJE code wants to ignore RangeTblRef
> nodes, but it could delegate to ChangeVarNodes() for various other
> node types to avoid code duplication. At the top level, the stuff that
> ChangeVarNodes() does to fields of the Query struct would be different
> for SJE, I think.

We initially didn't use ChangeVarNodes() in SJE at all. See the last
patch version without it [1]. We're trying to address Tom Lane's
proposal to re-use more of existing tree-manipulation infrastructure
[2]. I agree with you that the case with ChangeVarNodes() looks
questionable. Do you have other ideas how we can re-use some more of
existing tree-manipulation infrastructure in SJE?

Links
1. https://www.postgresql.org/message-id/55f680bc-756d-4dd3-ab27-3c6e663b0e4c%40postgrespro.ru
2. https://www.postgresql.org/message-id/3622801.1715010885%40sss.pgh.pa.us

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v7-0001-Remove-useless-self-joins.patch application/octet-stream 122.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-07-20 11:46:23 Re: UUID v7
Previous Message Noah Misch 2024-07-20 11:28:35 Re: race condition in pg_class