Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Richard Guo <guofenglinux(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
Date: 2024-05-08 18:24:04
Message-ID: 815049.1715192644@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I traced down the other failure I was seeing in check-world, and
found that it came from deconstruct_distribute doing this:

distribute_quals_to_rels(root, my_quals,
jtitem,
sjinfo,
root->qual_security_level,
jtitem->qualscope,
ojscope, jtitem->nonnullable_rels,
NULL, /* incompatible_relids */
true, /* allow_equivalence */
false, false, /* not clones */
postponed_oj_qual_list);

where jtitem->nonnullable_rels is the same as the jtitem's left_rels,
which ends up as the syn_lefthand of the join's SpecialJoinInfo, and
then when remove_rel_from_query tries to adjust the syn_lefthand it
breaks the outer_relids of whatever RestrictInfos got made here.

I was able to fix that by not letting jtitem->nonnullable_rels be
the same as left_rels. The attached alternative_1.patch does pass
check-world. But I find it mighty unprincipled: the JoinTreeItem
data structures are full of shared relid sets, so why is this
particular sharing not OK? I still don't have any confidence that
there aren't more problems.

Along about here I started to wonder how come we are only seeing
SpecialJoinInfo-vs-RestrictInfo sharing as a problem, when surely
there is plenty of cross-RestrictInfo sharing going on as well.
(The above call is perfectly capable of making a lot of RestrictInfos,
all with the same outer_relids.) That thought led me to look at
remove_rel_from_restrictinfo, and darn if I didn't find this:

/*
* The clause_relids probably aren't shared with anything else, but let's
* copy them just to be sure.
*/
rinfo->clause_relids = bms_copy(rinfo->clause_relids);
...
/* Likewise for required_relids */
rinfo->required_relids = bms_copy(rinfo->required_relids);

So the reason we don't see cross-RestrictInfo breakage is that
analyzejoins.c is careful not to modify the original relid sets
when modifying a RestrictInfo. (This comment is clearly wrong.)

And that leads to the thought that we can fix our current sharing
problems by similarly avoiding overwriting the original sets
in remove_rel_from_query. The attached alternative-2.patch
attacks it that way, and also passes check-world.

I like alternative-2.patch a lot better, not least because it
only adds cycles when join removal actually fires. Basically
this is putting the onus on the data structure modifier to
cope with shared bitmapsets, rather than trying to say that
sharing is disallowed.

Thoughts?

regards, tom lane

Attachment Content-Type Size
alternative-1.patch text/x-diff 1.5 KB
alternative-2.patch text/x-diff 1.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-05-08 18:38:50 Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall
Previous Message Alexander Korotkov 2024-05-08 18:00:10 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands