Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943

From: Tender Wang <tndrwang(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: 1026592243(at)qq(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
Date: 2024-06-13 03:31:22
Message-ID: CAHewXNk7fJyV1JUHdYEN75yD59qQCfLX81WEBgkA_u8ZC+f08A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> 于2024年6月13日周四 03:53写道:

> On 2024-Jun-11, Alvaro Herrera wrote:
>
> > ... and actually, the code that maps partitions when these arrays don't
> > match is all wrong, because it assumes that they are in OID order, which
> > is not true, so I'm busy rewriting it. More soon.
>
> So I came up with the algorithm in the attached patch. As far as I can
> tell, it works okay; I've been trying to produce a test that would
> stress it some more, but I noticed a pgbench shortcoming that I've been
> trying to solve, unsuccessfully.
>
> Anyway, the idea here is that we match the partdesc->oids entries to the
> pinfo->relid_map entries with a distance allowance -- that is, we search
> for some OIDs a few elements ahead of the current position. This allows
> us to skip some elements that do not match, without losing sync of the
> correct position in the array. This works because 1) the arrays must be
> in the same order, that is, bound order; and 2) the amount of elements
> that might be missing is bounded by the difference in array lengths.
>

Thanks for your nice work.
I look through the patch, and it's ok.

>
> As I said, I've been hammering it with some modified pgbench scripts;
> mainly I did this to set up:
>
> drop table if exists p;
> do $$ declare i int; begin for i in 0..99 loop execute format('drop table
> if exists p%s', i); end loop; end $$;
> drop sequence if exists detaches;
> create table p (a int, b int) partition by list (a);
> set client_min_messages=warning;
> do $$
> declare i int;
> declare modulus int;
> begin
> for modulus in 0 .. 4 loop
> for i in 0..99 loop
> if i % 5 <> modulus then
> continue;
> end if;
> execute format('create table p%s partition of p for values
> in (%s)', i, i); end loop;
> end loop;
> end $$;
> reset client_min_messages;
> create sequence detaches;
>
> which ensures the partitions are not in OID order, and then used this
> pgbench script
>
> \set part random(0, 89)
>
> select pg_try_advisory_lock(:part)::integer AS gotlock \gset
> \if :gotlock
>
> select pg_advisory_lock(142857);
> alter table p detach partition p:part concurrently;
> select pg_advisory_unlock(142857);
>
> \set slp random(100, 200)
> \sleep :slp us
>
> alter table p attach partition p:part for values in (:part);
>
> select pg_advisory_unlock(:part), nextval('detaches');
> \endif
>
>
> which detaches some partitions randomly, together with the other one
>
> \set id random(0,99)
> select * from p where a = :id;
>
> script which reads from the partitioned table.
>
> This setup would fail really quickly with the original code, and with
> the patched code it can run a total of some 6700 detach/attach cycles in
> 60 seconds.

Yeah, nice catch. This will make the code more robust.

> This seems quite slow, and in fact looking at the total
> number of partitions in pg_inherits, we have either 99 or 100 partitions
> almost the whole time. That's why I'm trying to modify pgbench ...
> I think the problem is that pg_advisory_lock() holds a snapshot which
> causes a concurrent detach partition to wait for it, or something like
> that. I added a \lock command, but it doesn't seem to work the way I
> want it to.
>
> --
> Álvaro Herrera Breisgau, Deutschland —
> https://www.EnterpriseDB.com/
> "I'm always right, but sometimes I'm more right than other times."
> (Linus Torvalds)
>
> https://lore.kernel.org/git/Pine(dot)LNX(dot)4(dot)58(dot)0504150753440(dot)7211(at)ppc970(dot)osdl(dot)org/
>

--
Tender Wang
OpenPie: https://en.openpie.com/

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message usman khan 2024-06-13 05:30:01 Previous command not accessible in postgres 17 beta
Previous Message Tom Lane 2024-06-12 23:24:46 Re: BUG #18497: Heap-use-after-free in plpgsql