From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Dimitrios Apostolou <jimis(at)gmx(dot)net> |
Subject: | Re: Fundamental scheduling bug in parallel restore of partitioned tables |
Date: | 2025-04-15 10:38:56 |
Message-ID: | CAExHW5vuQ0DoXLh_0sPx2_Da1sUeB=8sjadoQHCgTWhT_zc8TA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Apr 14, 2025 at 11:14 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > Here's a draft patch for this. It seems to fix the problem in
> > light testing.
>
> I realized that the "repro" I had for this isn't testing the same
> thing that Dimitrios is seeing; what it is exposing looks more like
> a bug or at least a behavioral change due to the v18 work to record
> not-null constraints in pg_constraint [1]. So my patch may fix his
> problem or it may not. It would be good to have a reproducer that
> fails (not necessarily every time) in v17 or earlier.
>
I tried to reproduce the problem using your script on v17, but could't
get either deadlock or constraint violation error.
>
> This is disastrous for assorted reasons. The ALTER ADD CONSTRAINT
> command might fail outright if we've loaded data for the referencing
> table but not the referenced table.
There's a comment in getConstraints()
/*
* Restoring an FK that points to a partitioned table requires that
* all partition indexes have been attached beforehand. Ensure that
* happens by making the constraint depend on each index partition
* attach object.
*/
FK constraint addition will wait for the child indexes in referenced
partitioned table to be attached, which in turn wait for data to be
loaded in the child tables. So it doesn't look like we will see ADD
constraint failing. I may be missing something though.
> It could deadlock against other
> parallel restore jobs, as reported in [1] (and which I find not
> too terribly hard to reproduce here).
> Even if it doesn't fail, if
> it completes before we load data for the referencing table, we'll
> have to do retail FK checks, greatly slowing that data load.
FWIW, Executing pg_restore -j2 -v, I think, I see evidence that the FK
constraint is created before data is loaded into the referencing
table.
pg_restore: processing data for table "public.c22"
... snip
pg_restore: launching item 2477 FK CONSTRAINT parent2 parent2_ref_fkey
pg_restore: creating FK CONSTRAINT "public.parent2 parent2_ref_fkey"
pg_restore: finished item 2626 TABLE DATA c22
... snip
pg_restore: launching item 2625 TABLE DATA c21
pg_restore: processing data for table "public.c21"
pg_restore: finished item 2477 FK CONSTRAINT parent2 parent2_ref_fkey
... snip
pg_restore: finished item 2625 TABLE DATA c21
I tried applying your patch on v17 to see whether it causes the FK
creation to wait, but the patch doesn't apply cleanly.
--
Best Wishes,
Ashutosh Bapat
From | Date | Subject | |
---|---|---|---|
Next Message | Amul Sul | 2025-04-15 10:41:19 | Re: pg_combinebackup: correct code comment. |
Previous Message | Andres Freund | 2025-04-15 10:20:51 | Re: Recent pg_rewind test failures in buildfarm |