Re: pg_upgrade failing for 200+ million Large Objects

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, vignesh C <vignesh21(at)gmail(dot)com>, "Kumar, Sachin" <ssetiya(at)amazon(dot)com>, Robins Tharakan <tharakan(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Bruce Momjian <bruce(at)momjian(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_upgrade failing for 200+ million Large Objects
Date: 2024-07-26 22:37:10
Message-ID: 1870579.1722033430@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> On Fri, Jul 26, 2024 at 10:53:30PM +0300, Alexander Korotkov wrote:
>> It would be nice to identify such cases and check which memory contexts are
>> growing and why.

> I reproduced the problem with this schema:

> SELECT format('CREATE TABLE p(i int, %s) PARTITION BY RANGE(i)', array_to_string(a, ', ')) FROM (SELECT array_agg(format('i%s int', i))a FROM generate_series(1,999)i);
> SELECT format('CREATE TABLE t%s PARTITION OF p FOR VALUES FROM (%s)TO(%s)', i,i,i+1) FROM generate_series(1,999)i;

> This used over 4 GB of RAM.

Interesting. This doesn't bloat particularly much in a regular
pg_restore, even with --transaction-size=1000; but it does in
pg_upgrade, as you say. I found that the bloat was occurring
during these long sequences of UPDATE commands issued by pg_upgrade:

-- For binary upgrade, recreate inherited column.
UPDATE pg_catalog.pg_attribute
SET attislocal = false
WHERE attname = 'i'
AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass;

-- For binary upgrade, recreate inherited column.
UPDATE pg_catalog.pg_attribute
SET attislocal = false
WHERE attname = 'i1'
AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass;

-- For binary upgrade, recreate inherited column.
UPDATE pg_catalog.pg_attribute
SET attislocal = false
WHERE attname = 'i2'
AND attrelid = '\"public\".\"t139\"'::pg_catalog.regclass;

I think the problem is basically that each one of these commands
causes a relcache inval, for which we can't reclaim space right
away, so that we end up consuming O(N^2) cache space for an
N-column inherited table.

It's fairly easy to fix things so that this example doesn't cause
that to happen: we just need to issue these updates as one command
not N commands per table. See attached. However, I fear this should
just be considered a draft, because the other code for binary upgrade
in the immediate vicinity is just as aggressively stupid and
unoptimized as this bit, and can probably also be driven to O(N^2)
behavior with enough CHECK constraints etc. We've gone out of our way
to make ALTER TABLE capable of handling many updates to a table's DDL
in one command, but whoever wrote this code appears not to have read
that memo, or at least to have believed that performance of pg_upgrade
isn't of concern.

> Note that there seemed to be no issue when I created 999 tables without
> partitioning:
> SELECT format('CREATE TABLE t%s(LIKE p)', i,i,i+1) FROM generate_series(1,999)i;

Yeah, because then we don't need to play games with attislocal.

regards, tom lane

Attachment Content-Type Size
reduce-pg_upgrade-cache-bloat-wip.patch text/x-diff 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2024-07-26 22:55:00 Re: pg_upgrade failing for 200+ million Large Objects
Previous Message Michael Paquier 2024-07-26 22:36:01 Re: query_id, pg_stat_activity, extended query protocol