Re: Removing unneeded self joins

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Richard Guo <guofenglinux(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2024-12-09 06:03:10
Message-ID: CAPpHfds16a7bjQHv-jCLMNzwdukJ_i3da2e0OJFB9L5rTCWEEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 20, 2024 at 2:38 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> We initially didn't use ChangeVarNodes() in SJE at all. See the last
> patch version without it [1]. We're trying to address Tom Lane's
> proposal to re-use more of existing tree-manipulation infrastructure
> [2]. I agree with you that the case with ChangeVarNodes() looks
> questionable. Do you have other ideas how we can re-use some more of
> existing tree-manipulation infrastructure in SJE?

I think there was one idea to be considered. Given that this feature
has fragility to changes in PlannerInfo and underlying structures,
Andrei proposed to generate a walker over PlannerInfo with a Perl
script. I spent some time analyzing this idea. I came to the
following conclusions.

It seems feasible to generate a function that would walk over all
PlannerInfo fields recursively. Perhaps some more meta-information is
needed, at least, to check whether we should visit a particular
pointer field. But that could be provided by additional
pg_node_attr(). With this approach, we would need to sacrifice some
efficiency (e.g., processing all the EquivalenceClasses instead of
picking only the ones used in the relation to be removed). Also, the
logic in remove_self_join_rel() is more complex than processing all
the Relids. That is, we would still need a massive number of
if-branches specifying how we handle each node type. It doesn't look
like we would end up with a more simple or less error-prone
implementation.

We may decide to generate not just a walker but most of the logic in
remove_self_join_rel(). This is probably possible by injecting way
more meta-information into definitions of structures. That isn't
going to be simpler than the current approach. But it is probably
less error-prone: one could realize that if you add a new field to the
structure, it should have a similar pg_node_attr() as surroundings.
But I am afraid that if we go this way, we may end up with an awkward
heap of pg_node_attr() to generate the functionality of
remove_self_join_rel(). Should we better add comments to PlannerInfo
and other relevant structures saying: if you're going to modify this,
consider how that affects remove_self_join_rel()?

Any thoughts?

Links
1. https://www.postgresql.org/message-id/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru

------
Regards,
Alexander Korotkov
Supabase

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hunaid Sohail 2024-12-09 06:09:37 Re: [PATCH] Add roman support for to_number function
Previous Message Michael Paquier 2024-12-09 05:39:58 Re: shared-memory based stats collector - v70