Re: BUG #18806: When enable_rartitionwise_join is set to ON, the database shuts down abnormally

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Tender Wang <tndrwang(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Amit Langote <amitlangote09(at)gmail(dot)com>, m_lingbin(at)126(dot)com, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Subject: Re: BUG #18806: When enable_rartitionwise_join is set to ON, the database shuts down abnormally
Date: 2025-02-12 14:04:07
Message-ID: CAExHW5ty=yQtMdTCRFj0JXnfdns5jME_+-wVpmsTAH8T+4O6aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Feb 12, 2025 at 6:03 PM Tender Wang <tndrwang(at)gmail(dot)com> wrote:
>
>
>
> Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> 于2025年2月12日周三 20:20写道:
>>
>> On Wed, Feb 12, 2025 at 5:41 PM Tender Wang <tndrwang(at)gmail(dot)com> wrote:
>> >
>> >
>> >
>> > Tender Wang <tndrwang(at)gmail(dot)com> 于2025年2月12日周三 19:48写道:
>> >>
>> >>
>> >>
>> >> Tender Wang <tndrwang(at)gmail(dot)com> 于2025年2月12日周三 17:48写道:
>> >>>
>> >>>
>> >>>
>> >>> Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> 于2025年2月12日周三 15:38写道:
>> >>>>
>> >>>> On Wed, 2025-02-12 at 07:01 +0000, PG Bug reporting form wrote:
>> >>>> > PostgreSQL version: 17.2
>> >>>> > Operating system: CentOS Linux release 7.9.2009 (Core)
>> >>>> > Description:
>> >>>> >
>> >>>> > After upgrading PG17.2, testing found that when enable_rartitionwise_join is
>> >>>> > set to ON, when executing a query, if the main query and subquery have the
>> >>>> > same table, the query will report an error.
>> >>>> >
>> >>>> > The database log shows the following error:
>> >>>> > LOG: server process (PID 24796) was terminated by signal 6: Aborted
>> >>>> > DETAIL: Failed process was running: select * from test t join orders o on
>> >>>> > t.order_id =o.order_id where t.order_id in (select order_id from orders);
>> >>>> > LOG: terminating any other active server processes
>> >>>>
>> >>>> That is a crash that may well indicate a PostgreSQL bug.
>> >>>>
>> >>>> However, without a way to reproduce the behavior, we won't be able to fix
>> >>>> the problem. Try to come up with a self-contained test case. It would
>> >>>> also be interesting to know which PostgreSQL extensions are present.
>> >>>>
>> >>>
>> >>> I can reproduce this crash on HEAD.
>> >>>
>> >>> postgres=# create table test(order_id int, name varchar);
>> >>> create table orders(order_id int, name char(10)) partition by hash (order_id);
>> >>> CREATE TABLE
>> >>> CREATE TABLE
>> >>> postgres=# set enable_partitionwise_join = on;
>> >>> SET
>> >>> postgres=# create table orders_p1 partition of orders for values with ( modulus 32, remainder 0);
>> >>> CREATE TABLE
>> >>> postgres=# explain select * from test t join orders o on
>> >>> t.order_id =o.order_id where t.order_id in (select order_id from orders);
>> >>> server closed the connection unexpectedly
>> >>> This probably means the server terminated abnormally
>> >>> before or while processing the request.
>> >>> The connection to the server was lost. Attempting reset: Succeeded.
>> >>>
>> >>
>> >> I debug this crash, I found this crash was related with below commit:
>> >> commit 5278d0a2e870c61f9374a7796b90e6f9f6a73638
>> >> Author: Amit Langote <amitlan(at)postgresql(dot)org>
>> >> Date: Mon Mar 25 12:02:40 2024 +0900
>> >>
>> >> Reduce memory used by partitionwise joins
>> >>
>> >>
>> >> the joinlist in this query is 1,2,5
>> >> 1 means table t
>> >> 2 means table orders
>> >> 5 also means table orders, which subquery is pulled up.
>> >> When we create join rel for (2,5), then wen enter try_partitionwise_join().
>> >> In this func, we call build_child_join_sjinfo() to build SpecialJoinInfo for child.
>> >> In build_child_join_sjinfo(), we use memcpy(), I think we should use copyObject().
>> >> Otherwise, the parent_sjinfo will be freed, for example, after call free_child_join_sjinfo(),
>> >> the parent_sjinfo min_lefthand looks like as below:
>> >>
>> >> (gdb) p *parent_sjinfo->min_lefthand
>> >> $7 = {type = 3951489872, nwords = 21994, words = 0x55eaeb8696b0}
>> >>
>> >> The above bitmap is invalid, so trigger the Assert(bms_is_valid_set(a));
>> >>
>> >> Attached patch is my fix. Any thoughts?
>>
>> Thanks for the report and the fix. Ideally, following line should
>> allocate separate memory for the child's bitmapset.
>> sjinfo->min_lefthand = adjust_child_relids(sjinfo->min_lefthand,
>> left_nappinfos, left_appinfos);
>>
>> Unless no relid in sjinfo->min_lefthand was translated according to
>> the given appinfos, adjust_child_relids() should create a new
>> Bitmapset. Can you please check why no translation happened. I will
>> try your reproduction tomorrow.
>
>
> I step into adjust_child_relids(), see below:
> nappinfos = 1;
> p appinfo->parent_relid
> 2
> call bmsToString(relids)
> "(b 1)"
>
> So no translation you said happened, just return relids.
>

Hmm. The code there assumes that all the Relids will at least have one
parent each of the children involved. For some reason
sjinfo->min_lefthand has only relid 1 but not 2 or 5. 2 and 5 are
actually the parent relids of the children passed in respectively. The
join is between 2 and 5, then why is 1 appearing in the min_lefthand.
It might be legitimate, but we need to find the reason. If it's
legitimate, I think we need to copy the Relids which haven't undergone
any translation so as to keep them isolated from the parent relids.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Richard Guo 2025-02-12 14:30:15 Re: BUG #18806: When enable_rartitionwise_join is set to ON, the database shuts down abnormally
Previous Message Richard Guo 2025-02-12 13:55:43 Re: BUG #18806: When enable_rartitionwise_join is set to ON, the database shuts down abnormally