Re: optimizing pg_upgrade's once-in-each-database steps

From: Ilya Gladyshev <ilya(dot)v(dot)gladyshev(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: optimizing pg_upgrade's once-in-each-database steps
Date: 2024-08-31 00:18:10
Message-ID: 3A3553B1-123D-4F11-9426-277F2E920AD7@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

LGTM in general, but here are some final nitpicks:

+ if (maxFd != 0)
+ (void) select(maxFd + 1, &input_mask, &output_mask, &except_mask, NULL);

It’s a good idea to check for the return value of select, in case it returns any errors.

+ dbs_complete++;
+ (void) PQgetResult(slot->conn);
+ PQfinish(slot->conn);

Perhaps it’s rather for me to understand, what does PQgetResult call do here?

+ /* Check for connection failure. */
+ if (PQconnectPoll(slot->conn) == PGRES_POLLING_FAILED)
+ pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
+
+ /* Check whether the connection is still establishing. */
+ if (PQconnectPoll(slot->conn) != PGRES_POLLING_OK)
+ return;

Are the two consecutive calls of PQconnectPoll intended here? Seems a bit wasteful, but maybe that’s ok.

We should probably mention this change in the docs as well, I found these two places that I think can be improved:

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 9877f2f01c..ad7aa33f07 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -118,7 +118,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
<term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
- <listitem><para>number of simultaneous processes or threads to use
+ <listitem><para>number of simultaneous processes, threads or connections to use
</para></listitem>
</varlistentry>

@@ -587,8 +587,9 @@ NET STOP postgresql-&majorversion;

<para>
The <option>--jobs</option> option allows multiple CPU cores to be used
- for copying/linking of files and to dump and restore database schemas
- in parallel; a good place to start is the maximum of the number of
+ for copying/linking of files, to dump and restore database schemas in
+ parallel and to use multiple simultaneous connections for upgrade checks;
+ a good place to start is the maximum of the number of
CPU cores and tablespaces. This option can dramatically reduce the
time to upgrade a multi-database server running on a multiprocessor
machine.
--

> 28 авг. 2024 г., в 22:46, Nathan Bossart <nathandbossart(at)gmail(dot)com> написал(а):
>
> I spent some time cleaning up names, comments, etc. Barring additional
> feedback, I'm planning to commit this stuff in the September commitfest so
> that it has plenty of time to bake in the buildfarm.
>
> --
> nathan
> <v10-0001-Introduce-framework-for-parallelizing-various-pg.patch><v10-0002-Use-pg_upgrade-s-new-parallel-framework-for-subs.patch><v10-0003-Use-pg_upgrade-s-new-parallel-framework-to-get-r.patch><v10-0004-Use-pg_upgrade-s-new-parallel-framework-to-get-l.patch><v10-0005-Use-pg_upgrade-s-new-parallel-framework-for-exte.patch><v10-0006-Use-pg_upgrade-s-new-parallel-framework-for-data.patch><v10-0007-Use-pg_upgrade-s-new-parallel-framework-for-isn-.patch><v10-0008-Use-pg_upgrade-s-new-parallel-framework-for-post.patch><v10-0009-Use-pg_upgrade-s-new-parallel-framework-for-poly.patch><v10-0010-Use-pg_upgrade-s-new-parallel-framework-for-WITH.patch><v10-0011-Use-pg_upgrade-s-parallel-framework-for-encoding.patch>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-08-31 01:07:11 Re: Inval reliability, especially for inplace updates
Previous Message Thomas Munro 2024-08-31 00:12:23 Re: Interrupts vs signals