Re: Certain options in pg_upgrade don't seem to work.

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: DBA <ecountdba(at)ecounterp(dot)co(dot)kr>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Certain options in pg_upgrade don't seem to work.
Date: 2024-08-14 17:54:32
Message-ID: 210ebcb9-4bf0-45c5-99bc-c0a2c8b77df4@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 8/14/24 03:41, DBA wrote:
> Hello, I'm inquiring because --j option don't seem to be working
> properly in pg_upgrade.
>
>
> *1. A description of what you are trying to achieve and what results you
> expect : *
>
>
> We took to much time to upgrade postgresql with pg_upgrade.
>
>
> So we have tested with --j option in pg_upgrade by below command 
>
>
> *time PGPASSWORD='1q2w3e4r' nohup /usr/lib/postgresql/16/bin/pg_upgrade
> -b /usr/lib/postgresql/15/bin/ -B /usr/lib/postgresql/16/bin/ -d
> /data/PG15 -D /data/PG16 -k -j 16 -U ecount_own > /data/tmp4/upgrade.log &*
>
>
> It took 520 minutes to upgrade and we used -j option to make pg_upgrade
> use multiple CPU processor but it didn't use them. (Only use single
> processor) 
>
> (It took the same amount of time as when tested without --j option.)
>
>
> Is it normal not to use multiple cpu cores even with the --j option?
>
>
> The docs says that you use multi-processors for restoring.
>
>
> *note.)* We have about 80 partition tables and each partition table has
> 9000 children tables.
>
>
>        I think it could be affected by the number of partition tables.
>

Very likely. The thing is, pg_upgrade does multiple tasks, and only some
of those are parallelized - the pg_upgrade [1] docs say:

The --jobs option allows multiple CPU cores to be used for
copying/linking of files and to dump and restore database schemas
in parallel; a good place to start is the maximum of the number of
CPU cores and tablespaces. This option can dramatically reduce the
time to upgrade a multi-database server running on a multiprocessor
machine.

So it's only the "linking" of files that's done using multiple jobs:

...
Linking user relation files ok
...

But with 80 partitioned tables, and 9000 partitions for each, the most
expensive step is likely dump/restore of the schema. And those are not
parallelized - it always happens in a single process.

You can confirm this by watching the pg_upgrade output, to see which
steps take most of the time.

regards

--
Tomas Vondra

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Cameron Vogt 2024-08-14 21:59:53 Re: TLS session tickets disabled?
Previous Message PG Bug reporting form 2024-08-14 14:53:42 BUG #18583: jsonb_populate_record return values cannot be queried correctly in subselects