Quick Links

Re: Fundamental scheduling bug in parallel restore of partitioned tables

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Dimitrios Apostolou <jimis(at)gmx(dot)net>
Subject:	Re: Fundamental scheduling bug in parallel restore of partitioned tables
Date:	2025-04-15 18:49:22
Message-ID:	1713127.1744742962@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> writes:
> On Mon, Apr 14, 2025 at 11:14 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> This is disastrous for assorted reasons. The ALTER ADD CONSTRAINT
>> command might fail outright if we've loaded data for the referencing
>> table but not the referenced table.

> There's a comment in getConstraints()
> /*
> * Restoring an FK that points to a partitioned table requires that
> * all partition indexes have been attached beforehand. Ensure that
> * happens by making the constraint depend on each index partition
> * attach object.
> */

Ah, that is an excellent point which I missed. And the INDEX ATTACH
objects have dependencies on the leaf tables, which *will* get
repointed to their TABLE DATA objects by repoint_table_dependencies.
So by the time we are ready to restore the FK CONSTRAINT object,
we are certain to have loaded all the data of the referenced table.
But there's nothing delaying the constraint till after the referencing
table's data is loaded.

>> Even if it doesn't fail, if
>> it completes before we load data for the referencing table, we'll
>> have to do retail FK checks, greatly slowing that data load.

> FWIW, Executing pg_restore -j2 -v, I think, I see evidence that the FK
> constraint is created before data is loaded into the referencing
> table.

Yes, I reproduced that as well. That squares with the above
analysis.

So at this point we have:

#1: ADD CONSTRAINT failure because of missing referenced data:
not possible after all.

#2: Deadlock between parallel restore jobs: possible in HEAD, but
it seems likely to be a bug introduced by the not-null-constraint
work rather than being pg_restore's fault. We have no evidence
that such a deadlock can happen in released branches, and the lack
of field reports suggests that it can't.

#3: Restoring the FK constraint before referencing data is loaded:
this seems to be possible, and it's a performance problem, but
no more than that.

So now I withdraw the suggestion that this patch needs to be
back-patched. We may not even need it in v18, if another fix
for #2 is found. Fixing #3 would be a desirable thing to do
in v19, but if that's the only thing at stake then it's not
something to break feature freeze for.

For the moment I'll mark this CF entry as meant for v19.
We can resurrect consideration of it for v18 if there's not
a better way to fix the deadlock problem.

regards, tom lane

In response to

Re: Fundamental scheduling bug in parallel restore of partitioned tables at 2025-04-15 10:38:56 from Ashutosh Bapat

Responses

Re: Fundamental scheduling bug in parallel restore of partitioned tables at 2025-04-16 09:47:54 from Ashutosh Bapat

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2025-04-15 18:51:48	Re: Non-text mode for pg_dumpall
Previous Message	Mahendra Singh Thalor	2025-04-15 18:30:25	Re: Non-text mode for pg_dumpall