Re: Removing unneeded self joins

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: Greg Stark <stark(at)mit(dot)edu>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Hywel Carver <hywel(at)skillerwhale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2022-07-04 14:16:31
Message-ID: CALNJ-vSeE9xzeFK9gM+kNqNF1csM1sH2epeeTZUbS6WFhXraGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 4, 2022 at 6:52 AM Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote:

> Le jeudi 30 juin 2022, 16:11:51 CEST Andrey Lepikhov a écrit :
> > On 19/5/2022 16:47, Ronan Dunklau wrote:
> > > I'll take a look at that one.
> >
> > New version of the patch, rebased on current master:
> > 1. pgindent over the patch have passed.
> > 2. number of changed files is reduced.
> > 3. Some documentation and comments is added.
>
> Hello Andrey,
>
> Thanks for the updates.
>
> The general approach seems sensible to me, so I'm going to focus on some
> details.
>
> In a very recent thread [1], Tom Lane is proposing to add infrastructure
> to make Var aware of their nullability by outer joins. I wonder if that
> would help with avoiding the need for adding is not null clauses when the
> column is known not null ?
> If we have a precedent for adding a BitmapSet to the Var itself, maybe the
> whole discussion regarding keeping track of nullability can be extended to
> the original column nullability ?
>
> Also, I saw it was mentioned earlier in the thread but how difficult would
> it be to process the transformed quals through the EquivalenceClass
> machinery and the qual simplification ?
> For example, if the target audience of this patch is ORM, or inlined
> views, it wouldn't surprise me to see queries of this kind in the wild,
> which could be avoided altogether:
>
> postgres=# explain (costs off) select * from sj s1 join sj s2 on s1.a =
> s2.a where s1.b = 2 and s2.b =3;
> QUERY PLAN
> -----------------------------------------------------
> Seq Scan on sj s2
> Filter: ((a IS NOT NULL) AND (b = 3) AND (b = 2))
> (2 lignes)
>
>
> + for (counter = 0; counter < list_length(*sources);)
> + {
> + ListCell *cell = list_nth_cell(*sources, counter);
> + RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
> + int counter1;
> +
> ....
> + ec->ec_members = list_delete_cell(ec->ec_members, cell);
>
>
> Why don't you use foreach() and foreach_delete_current macros for
> iterating and removing items in the lists, both in update_ec_members and
> update_ec_sources ?
>
>
> + if ((bms_is_member(k, info->syn_lefthand) ^
> + bms_is_member(r,
> info->syn_lefthand)) ||
> + (bms_is_member(k,
> info->syn_righthand) ^
> + bms_is_member(r,
> info->syn_righthand)))
>
> I think this is more compact and easier to follow than the previous
> version, but I'm not sure how common it is in postgres source code to use
> that kind of construction ?
>
> Some review about the comments:
>
>
> I see you keep using the terms "the keeping relation" and "the removing
> relation" in reference to the relation that is kept and the one that is
> removed.
> Aside from the grammar (the kept relation or the removed relation), maybe
> it would make it clearer to call them something else. In other parts of the
> code, you used "the remaining relation / the removed relation" which makes
> sense.
>
> /*
> * Remove the target relid from the planner's data structures, having
> - * determined that there is no need to include it in the query.
> + * determined that there is no need to include it in the query. Or replace
> + * with another relid.
> + * To reusability, this routine can work in two modes: delete relid from
> a plan
> + * or replace it. It is used in replace mode in a self-join removing
> process.
>
> This could be rephrased: ", optionally replacing it with another relid.
> The latter is used by the self-join removing process."
>
>
> [1]
> https://www.postgresql.org/message-id/flat/830269.1656693747%40sss.pgh.pa.us
>
> --
> Ronan Dunklau
>
>
> Hi,
bq. this is more compact and easier to follow than the previous version

A code comment can be added above the expression (involving XOR) to explain
the purpose of the expression.

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2022-07-04 14:18:35 Re: Improving btree performance through specializing by key shape, take 2
Previous Message Alvaro Herrera 2022-07-04 14:14:02 Re: Handle infinite recursion in logical replication setup