Re: Fundamental scheduling bug in parallel restore of partitioned tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Andres Freund <andres(at)anarazel(dot)de>, Dimitrios Apostolou <jimis(at)gmx(dot)net>
Subject: Re: Fundamental scheduling bug in parallel restore of partitioned tables
Date: 2025-04-11 00:08:22
Message-ID: 729065.1744330102@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I think that the most intellectually rigorous solution is to
> generate dummy TABLE DATA objects for partitioned tables, which
> don't actually contain data but merely carry dependencies on
> each of the child tables' TABLE DATA objects.

Here's a draft patch for this. It seems to fix the problem in
light testing. Some notes:

* Quite a lot of the patch is concerned with making various places
treat the new PARTITIONED DATA TOC entry type the same as TABLE DATA.
I considered removing that distinction and representing a partitioned
table's data object as TABLE DATA with no dataDumper, but it seems to
me this way is clearer. Maybe others will think differently though;
it'd make for a smaller patch.

* It's annoying that we have to touch _tocEntryRequired's "Special
Case" logic for deciding whether an entry is schema or data, because
that means that old copies of pg_restore will think these entries are
schema and thus ignore them in a data-only restore. But I think it
doesn't matter too much, because in a data-only restore we'd not be
creating indexes or foreign keys, so the scheduling bug isn't really
problematic.

* I'm not quite certain whether identify_locking_dependencies() needs
to treat PARTITIONED DATA dependencies as lockable. I assumed here
that it does, but maybe we don't take out exclusive locks on
partitioned tables during restore?

* I noticed that a --data-only dump of the regression database now
complains:

$ pg_dump --data-only regression >r.dump
pg_dump: warning: there are circular foreign-key constraints on this table:
pg_dump: detail: parted_self_fk
pg_dump: hint: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
pg_dump: hint: Consider using a full dump instead of a --data-only dump to avoid this problem.

The existing code does not produce this warning, but I think doing so
is correct. The reason we missed the issue before is that
getTableDataFKConstraints ignores tables without a dataObj, so before
this patch it ignored partitioned tables altogether.

Comments?

regards, tom lane

Attachment Content-Type Size
v1-handle-partitioned-tables-better.patch text/x-diff 20.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2025-04-11 00:12:46 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message David Rowley 2025-04-10 23:45:57 Re: n_ins_since_vacuum stats for aborted transactions