Re: Creating foreign key on partitioned table is too slow

From: Alec Lazarescu <alecl(at)alecl(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "kato-sho(at)fujitsu(dot)com" <kato-sho(at)fujitsu(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Steele <david(at)pgmasters(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Creating foreign key on partitioned table is too slow
Date: 2023-10-23 21:39:59
Message-ID: CAE+E=SQacy6t_3XzCWnY1eiRcNWfz4pp02FER0N7mU_F+o8G_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I ran into a situation that echoed this original one by Kato in the
start of this thread:
https://www.postgresql.org/message-id/OSAPR01MB374809E8DE169C8BF2B82CBD9F6B0%40OSAPR01MB3748.jpnprd01.prod.outlook.com

More below.

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> I tried running the original test case under HEAD. I do not see
> any visible memory leak, which I think indicates that 5b9312378 or
> some other fix has taken care of the leak since the original report.
> However, after waiting awhile and noting that the ADD FOREIGN KEY
> wasn't finishing, I poked into its progress with a debugger and
> observed that each iteration of RI_Initial_Check() was taking about
> 15 seconds. I presume we have to do that for each partition,
> which leads to the estimate that it'll take 34 hours to finish this
> ... and that's with no data in the partitions, god help me if
> there'd been a lot.
>
> Some quick "perf" work says that most of the time seems to be
> getting spent in the planner, in get_eclass_for_sort_expr().
> So this is likely just a variant of performance issues we've
> seen before. (I do wonder why we're not able to prune the
> join to just the matching PK partition, though.)

I found that the original example from Kato finishes in a little over
a minute now to create the FK constraint in Postgres 14-16.

However, in my case, I'm using composite partitions and that is taking
60x as long for an equivalent number of partitions. I must emphasize
this is with ZERO rows of data.

I'm using 1200 partitions in my example to finish a bit faster and
because that's my actual use case and less extreme than 8,000
partitions.

If I reach my 1,200 with 80 top-level and 15 leaf-level partitions in
a composite hierarchy (80x15 = 1,200 still) the speed is very slow.
I'm using composite partitions primarily because in my real code I
need LIST partitions which don't support multiple keys so I worked
around that using composite partitions with one LIST key in each
level. Doing some other workaround like a concatenated single key was
messy for my use case.

I modified Kato's test case to repeat the issue I'm having and saw
some very odd query plan behavior which is likely part of the issue.

The flat version with 1,200 partitions at a single level and no
composite partitions finishes in a little over a second while the 80 x
15 version with composite partitions takes over a minute (60x longer).
In my actual database with many such partitions with FK, the time
compounds and FK creation takes >30 minutes per FK leading to hours
just making FKs.

== COMPOSITE PARTITION INTERNAL SELECT PLAN ==

If I cancel the composite FK creation, I see where it stopped and that
gives a clue about the difference in speed. For the composite, it's
this statement with a plan linked on dalibo showing a massive amount
of sequential scans and Postgres making some assumptions about 1 row
existing.

ERROR: canceling statement due to user request
CONTEXT: SQL statement SELECT fk."aid" FROM ONLY
"public"."xhistory_25_12" fk LEFT OUTER JOIN "public"."xaccounts" pk
ON ( pk."aid" OPERATOR(pg_catalog.=) fk."aid") WHERE pk."aid" IS NULL
AND (fk."aid" IS NOT NULL)
SQL state: 57014

PLAN DETAILS: https://explain.dalibo.com/plan/fad72gdacb6727b4#plan

== FLAT PARTITION INTERNAL SELECT PLAN ==

This gives a direct result node and has no complexity at all

SELECT fk."aid" FROM ONLY "public"."history_23" fk LEFT OUTER JOIN
"public"."accounts" pk ON ( pk."aid" OPERATOR(pg_catalog.=) fk."aid")
WHERE pk."aid" IS NULL AND (fk."aid" IS NOT NULL)

PLAN DETAILS: https://explain.dalibo.com/plan/a83dae9b9569ebcd

Test cases to repeat easily below:

== FLAT PARTITION FAST FK DDL ==

CREATE DATABASE fastflatfk

CREATE TABLE accounts (aid INTEGER, bid INTEGER, abalance INTEGER,
filler CHAR(84)) PARTITION BY HASH(aid);
CREATE TABLE history (tid INTEGER, bid INTEGER, aid INTEGER, delta
INTEGER, mtime TIMESTAMP, filler CHAR(22)) PARTITION BY HASH(aid);

DO $$
DECLARE
p INTEGER;
BEGIN
FOR p IN 0..1023 LOOP
EXECUTE 'CREATE TABLE accounts_' || p || ' PARTITION OF accounts
FOR VALUES WITH (modulus 1024, remainder ' || p || ') PARTITION BY
HASH(aid);';
EXECUTE 'CREATE TABLE history_' || p || ' PARTITION OF history FOR
VALUES WITH (modulus 1024, remainder ' || p || ') PARTITION BY
HASH(aid);';
END LOOP;
END $$;

ALTER TABLE accounts ADD CONSTRAINT accounts_pk PRIMARY KEY (aid);

-- Query returned successfully in 1 secs 547 msec.
ALTER TABLE history ADD CONSTRAINT history_fk FOREIGN KEY (aid)
REFERENCES accounts (aid) ON DELETE CASCADE;

--run to drop FK before you recreate it
--ALTER TABLE history DROP CONSTRAINT history_fk

== COMPOSITE PARTITION SLOW FK DDL ==

Now the composite partition version with 80 x 15 partitions which
finishes in a bit over a minute (60x the time)

CREATE DATABASE slowcompfk

-- Create the parent tables for xaccounts and xhistory
CREATE TABLE xaccounts (aid INTEGER, bid INTEGER, abalance INTEGER,
filler CHAR(84)) PARTITION BY HASH(aid);
CREATE TABLE xhistory (tid INTEGER, bid INTEGER, aid INTEGER, delta
INTEGER, mtime TIMESTAMP, filler CHAR(22)) PARTITION BY HASH(aid);

-- Generate SQL for creating 80 partitions for xaccounts
DO $$
DECLARE
p INTEGER;
BEGIN
FOR p IN 0..79 LOOP
EXECUTE 'CREATE TABLE xaccounts_' || p || ' PARTITION OF xaccounts
FOR VALUES WITH (modulus 80, remainder ' || p || ') PARTITION BY
HASH(aid);';
END LOOP;
END $$;

-- Generate SQL for creating 15 sub-partitions within each partition
for xaccounts
DO $$
DECLARE
main_partition INTEGER;
sub_partition INTEGER;
BEGIN
FOR main_partition IN 0..79 LOOP
FOR sub_partition IN 0..14 LOOP
EXECUTE 'CREATE TABLE xaccounts_' || main_partition || '_' ||
sub_partition || ' PARTITION OF xaccounts_' || main_partition || ' FOR
VALUES WITH (modulus 15, remainder ' || sub_partition || ');';
END LOOP;
END LOOP;
END $$;

-- Generate SQL for creating 80 partitions for xhistory
DO $$
DECLARE
p INTEGER;
BEGIN
FOR p IN 0..79 LOOP
EXECUTE 'CREATE TABLE xhistory_' || p || ' PARTITION OF xhistory
FOR VALUES WITH (modulus 80, remainder ' || p || ') PARTITION BY
HASH(aid);';
END LOOP;
END $$;

-- Generate SQL for creating 15 sub-partitions within each partition
for xhistory
DO $$
DECLARE
main_partition INTEGER;
sub_partition INTEGER;
BEGIN
FOR main_partition IN 0..79 LOOP
FOR sub_partition IN 0..14 LOOP
EXECUTE 'CREATE TABLE xhistory_' || main_partition || '_' ||
sub_partition || ' PARTITION OF xhistory_' || main_partition || ' FOR
VALUES WITH (modulus 15, remainder ' || sub_partition || ');';
END LOOP;
END LOOP;
END $$;

ALTER TABLE xaccounts ADD CONSTRAINT xaccounts_pk PRIMARY KEY (aid);

-- Query returned successfully in 1 min 18 secs.
ALTER TABLE xhistory ADD CONSTRAINT xhistory_fk FOREIGN KEY (aid)
REFERENCES xaccounts (aid) ON DELETE CASCADE;

--run to drop FK before you recreate it
--ALTER TABLE xhistory DROP CONSTRAINT xhistory_fk

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-10-23 21:41:45 Re: PostgreSQL domains and NOT NULL constraint
Previous Message Tom Lane 2023-10-23 21:21:29 Re: PostgreSQL domains and NOT NULL constraint