Re: Removing unneeded self joins

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Michał Kłeczek <michal(at)kleczek(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2023-10-05 07:20:21
Message-ID: 5ff80147-18a0-4295-b508-638e90a4df71@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/10/2023 14:34, Alexander Korotkov wrote:
> Hi!
>
> On Wed, Oct 4, 2023 at 9:56 AM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > On 4/10/2023 07:12, Alexander Korotkov wrote:
> > > On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
> > > <a(dot)lepikhov(at)postgrespro(dot)ru <mailto:a(dot)lepikhov(at)postgrespro(dot)ru>> wrote:
> > >> On 5/7/2023 21:28, Andrey Lepikhov wrote:
> > >>> During the significant code revision in v.41 I lost some replacement
> > >>> operations. Here is the fix and extra tests to check this in the
> future.
> > >>> Also, Tom added the JoinDomain structure five months ago, and I added
> > >>> code to replace relids for that place too.
> > >>> One more thing, I found out that we didn't replace SJs, defined by
> > >>> baserestrictinfos if no one self-join clause have existed for the
> join.
> > >>> Now, it is fixed, and the test has been added.
> > >>> To understand changes readily, see the delta file in the attachment.
> > >> Here is new patch in attachment. Rebased on current master and some
> > >> minor gaffes are fixed.
> > >
> > > I went through the thread and I think the patch gets better shape.  A
> > > couple of notes from my side.
> > > 1) Why replace_relid() makes a copy of lids only on insert/replace of
> > > a member, but performs deletion in-place?
> >
> > Shortly speaking, it was done according to the 'Paranoid' strategy.
> > The main reason for copying before deletion was the case with the rinfo
> > required_relids and clause_relids. They both point to the same Bitmapset
> > in some cases. And we feared such things for other fields.
> > Right now, it may be redundant because we resolved the issue mentioned
> > above in replace_varno_walker.
>
> OK, but my point is still that you should be paranoid in all the cases
> or none of the cases.  Right now (newId < 0) branch doesn't copy source
> relids, but bms_is_member(oldId, relids) does copy.  Also, I think
> whether we copy or not should be reflected in the function comment.
>
> /*
>  * Substitute newId by oldId in relids.
>  */
> static Bitmapset *
> replace_relid(Relids relids, int oldId, int newId)
> {
>     if (oldId < 0)
>         return relids;
>
>     if (newId < 0)
>         /* Delete relid without substitution. */
>         return bms_del_member(relids, oldId);
>
>     if (bms_is_member(oldId, relids))
>         return bms_add_member(bms_del_member(bms_copy(relids), oldId),
> newId);
>
>     return relids;
> }

We tried to use replace_relid() for both cases of JOIN deletion:
unneeded outer join and self-join, and the relid deletion is used only
in the first case. Such an approach was used there for a long time, and
we just didn't change it.
I am prone to removing the copy operation in the code of relid replacement.

>
> > Relid replacement machinery is the most contradictory code here. We used
> > a utilitarian approach and implemented a simplistic variant.
>
> > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> > > they are not necessary.  [1] points that infrastructure from [2] might
> > > be useful.  The patchset from [2] seems committed mow.  However, I
> > > can't see it is directly helpful in this matter.  Could we just skip
> > > adding IS NOT NULL clause for the columns, that have
> > > pg_attribute.attnotnull set?
> > Thanks for the links, I will look into that case.
Thanks for the curious issue.
The new field Var::varnullingrels introduced in [2] doesn't make sense
here, as I see: we operate with plain relations only, and I don't know
how it can be applied to an arbitrary subtree contained OUTER JOINs.
The second option, the attnotnull flag, can be used in this code. We
haven't implemented it because the process_equivalence routine doesn't
check the attnotnull before creating NullTest.
In general, it is not a difficult operation - we just need to add a
trivial get_attnotnull() routine to lssycache.c likewise
get_attgenerated() and other functions.
But, replace_varno uses the walker to change the relid. The mentioned
replacement, like
X=X --> X IS NOT NULL
can be applied on different levels of the expression, look:
A a1 JOIN A a2 ON (a1.id=a2.id) WHERE (a1.x AND (a1.y=a2.y))
Here, we can replace id=id and y=y. It may need some 'unwanted clauses'
collection procedure and a second pass through the expression tree to
remove them. It may add some unpredictable overhead.
We can replace such a clause with a trivial 'TRUE' clause, of course.
But is it the feature you have requested?

--
regards,
Andrey Lepikhov
Postgres Professional

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-10-05 07:33:12 Re: [PATCH] Fix memory leak in memoize for numeric key
Previous Message Peter Eisentraut 2023-10-05 06:57:51 Re: Modernize const handling with readline