Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
Date: 2023-11-15 15:07:15
Message-ID: CAPpHfduuEve8ezjqNEmCFuH3+fGZ2z+gzFV_xGD1ENSgqKTAZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 15, 2023 at 8:02 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2023-11-14 19:14:57 +0800, Richard Guo wrote:
> > While working on BUG #18187 [1], I noticed that we also have issues with
> > how SJE replaces join clauses involving the removed rel. As an example,
> > consider the query below, which would trigger an Assert.
> >
> > create table t (a int primary key, b int);
> >
> > explain (costs off)
> > select * from t t1
> > inner join t t2 on t1.a = t2.a
> > left join t t3 on t1.b > 1 and t1.b < 2;
> > server closed the connection unexpectedly
> >
> > The Assert failure happens in remove_self_join_rel() when we're trying
> > to remove t1. The two join clauses of t1, 't1.b > 1' and 't1.b < 2',
> > share the same pointer of 'required_relids', which is {t1, t3} at first.
> > After we've performed replace_varno for the first clause, the
> > required_relids becomes {t2, t3}, which is no problem. However, the
> > second clause's required_relids also becomes {t2, t3}, because they are
> > actually the same pointer. So when we proceed with replace_varno on the
> > second clause, we'd trigger the Assert.
>
> Good catch.
>
>
> > Off the top of my head I'm thinking that we can fix this kind of issue
> > by bms_copying the bitmapset first before we make a substitution in
> > replace_relid(), like attached.
> >
> > Alternatively, we can revise distribute_qual_to_rels() as below so that
> > different RestrictInfos don't share the same pointer of required_relids.
>
> > --- a/src/backend/optimizer/plan/initsplan.c
> > +++ b/src/backend/optimizer/plan/initsplan.c
> > @@ -2385,7 +2385,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node
> > *clause,
> > * nonnullable-side rows failing the qual.
> > */
> > Assert(ojscope);
> > - relids = ojscope;
> > + relids = bms_copy(ojscope);
> > Assert(!pseudoconstant);
> > }
> > else
> >
> > With this way, I'm worrying that there are other places where we should
> > avoid sharing the same pointer to Bitmapset structure.
>
> Indeed.
>
>
> > I'm not sure how to discover all these places. Any thoughts?
>
> At the very least I think we should add a mode to bitmapset.c mode where
> every modification of a bitmapset reallocates, rather than just when the size
> actually changes. Because we only reallocte and free in relatively uncommon
> cases, particularly on 64bit systems, it's very easy to not find spots that
> continue to use the input pointer to one of the modifying bms functions.
>
> A very hacky implementation of that indeed catches this bug with the existing
> regression tests.
>
> The tests do *not* pass with just the attached applied, as the "Delete relid
> without substitution" path has the same issue. With that also copying and all
> the "reusing" bms* functions always reallocating, the tests pass - kinda.
>
>
> The kinda because there are callers to bms_(add|del)_members() that pass the
> same bms as a and b, which only works if the reallocation happens "late".

+1,
Neat idea. I'm willing to work on this. Will propose the patch soon.

------
Regards,
Alexander Korotkov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-11-15 15:09:06 Re: Some performance degradation in REL_16 vs REL_15
Previous Message Alexander Korotkov 2023-11-15 15:06:32 Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'