From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Dimitrios Apostolou <jimis(at)gmx(dot)net> |
Subject: | Fundamental scheduling bug in parallel restore of partitioned tables |
Date: | 2025-04-04 21:12:23 |
Message-ID: | 2045026.1743801143@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Looking into the complaint at [1], I find that pg_restore really gets
it quite wrong when trying to do a parallel restore of partitioned
tables with foreign key constraints. The historical way that we
got parallel restore to work correctly with FK constraints is:
1. The dump file shows dependencies of the FK constraint object on
the two tables named in the constraint. (This matches what it
says in pg_depend.)
2. If it's a parallel restore, repoint_table_dependencies() replaces
the dependencies on the tables with dependencies on their TABLE DATA
objects.
3. Now restore of the FK constraint will not be started until all
the relevant data is loaded.
However, if we're talking about partitioned tables, there is no
TABLE DATA object for a partitioned table; only for its leaf
partitions. So repoint_table_dependencies() does nothing for the
FK object's dependencies, meaning that it's a candidate to be
launched as soon as we begin the parallel restore phase.
This is disastrous for assorted reasons. The ALTER ADD CONSTRAINT
command might fail outright if we've loaded data for the referencing
table but not the referenced table. It could deadlock against other
parallel restore jobs, as reported in [1] (and which I find not
too terribly hard to reproduce here). Even if it doesn't fail, if
it completes before we load data for the referencing table, we'll
have to do retail FK checks, greatly slowing that data load.
I think that the most intellectually rigorous solution is to
generate dummy TABLE DATA objects for partitioned tables, which
don't actually contain data but merely carry dependencies on
each of the child tables' TABLE DATA objects. (In a multilevel
partitioned structure, this'd result in the top TABLE DATA having
indirect dependencies on all the leaf partition TABLE DATAs.)
Then repoint_table_dependencies() does the right thing with a
dependency on a partitioned table, and the dependency-driven
scheduler will take care of the rest.
There are two places we could make that happen. The easiest
to code, likely, is to get pg_dump to create such objects and
include them in the dump. We could possibly get pg_restore
to fake up such objects from the data it has available, but
I expect that that will be harder and slower than having
pg_dump do it. So I'm leaning towards the first way.
The disadvantage is that existing dump files would still be
hazardous to restore in parallel. But given that this has
been broken for some time and nobody's reported it till now,
I feel maybe that's okay. (I don't think there would be
a backwards compatibility problem in introducing such new
objects into dumps, because AFAICS pg_restore would not need
any explicit knowledge of them.)
Thoughts?
regards, tom lane
PS: attached is a text dump file for a trivial DB containing
two partitioned tables with an FK. If I load this,
dump it into an -Fc-format dump file, and run something like
pg_restore -j10 src.dump -d target
then I can reproduce the reported deadlock failure --- not
entirely reliably, but it does happen.
[1] https://www.postgresql.org/message-id/67469c1c-38bc-7d94-918a-67033f5dd731%40gmx.net
Attachment | Content-Type | Size |
---|---|---|
src.sql | text/plain | 8.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2025-04-04 21:18:02 | Re: AIO v2.5 |
Previous Message | Nathan Bossart | 2025-04-04 20:58:53 | Re: Statistics Import and Export |