From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | ksarabu(at)amazon(dot)com |
Subject: | pg_upgrade failures with large partition definitions on upgrades from ~13 to 14~ |
Date: | 2023-02-09 05:17:36 |
Message-ID: | Y+SB8OnxocMWvNXz@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
The following SQL sequence causes a failure of pg_upgrade when these
are executed on a cluster of ~13, doing an upgrade to 14~, assuming
that the relation page size is 8kB. This creates a partition table
with a set of values large enough that it can be created in ~13:
CREATE TABLE parent_list (id int) PARTITION BY LIST (id);
CREATE OR REPLACE FUNCTION create_long_list(tabname text,
tabparent text,
num_vals int)
RETURNS VOID AS
$func$
DECLARE
query text;
BEGIN
query := 'CREATE TABLE ' || tabname ||
' PARTITION OF ' || tabparent || ' FOR VALUES IN (';
FOR i IN 1..num_vals LOOP
query := query || i;
IF i != num_vals THEN
query := query || ', ';
END IF;
END LOOP;
query := query || ')';
EXECUTE format(query);
END
$func$ LANGUAGE plpgsql;
-- Large enough to trigger pg_class failure in 14~
-- Use 953 to make it work in 14~
SELECT create_long_list('child_list_2', 'parent_list', 956);
However, pg_upgrade fails in the middle of processing when restoring
the objects in the new cluster, with the same error as one would get
because the row is too big and we have no toast tables in pg_class:
pg_restore: error: could not execute query: ERROR: row is too big:
size 8184, maximum size 8160
Command was: ALTER TABLE ONLY "public"."parent_list" ATTACH PARTITION
Then, as of pg_upgrade_internal.log:
Restoring database schemas in the new cluster
*failure*
Consult the last few lines of "pg_upgrade_dump_13468.log" for
the probable cause of the failure.
No fields have been added to pg_class between 13 and 14, however the
amount of data stored in relpartbound got larger between these two
versions (just do a length() on it for example using what I posted
above). Hence, if the original cluster has a version of pg_class
large enough to just fit into a single page without the need of
toasting, it may fail when created in the new cluster because it lacks
space to fit on a page because of this extra partition bound data.
In such cases, the usual recommendation would be to adjust the
partition layer so as the schema has smaller pg_node_trees for the
partition bounds. Still, waiting for something to blow up in the
middle of pg_upgrade is very unfriendly, and a pg_upgrade --link would
mean the need to rollback to a previous snapshot, which can be
costly.
Adding a toast table to pg_class or even pg_attribute (because this
could also happen with a bunch of attribute-level ACLs) has been
proposed for some time, though there have always been concerns about
circling dependencies back to pg_class. More toasting or a split of
relpartbound into a separate catalog (with toast in it) would solve
this issue at its root, but that's not something that would happen in
14~15 anyway.
Shouldn't we have a safeguard of some kind in the pre-check phase of
pg_upgrade at least? I think that this comes down to checking
sum(pg_column_size(pg_class.*)), roughly, with alignment and page
header, and do the same for pg_attribute.
Thoughts?
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-02-09 05:33:06 | Re: pg_upgrade failures with large partition definitions on upgrades from ~13 to 14~ |
Previous Message | Andres Freund | 2023-02-09 05:16:19 | Re: recovery modules |