Re: optimizing pg_upgrade's once-in-each-database steps

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Ilya Gladyshev <ilya(dot)v(dot)gladyshev(at)gmail(dot)com>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: optimizing pg_upgrade's once-in-each-database steps
Date: 2024-09-01 21:05:21
Message-ID: ZtTXEexwmlvsgsPs@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 31, 2024 at 01:18:10AM +0100, Ilya Gladyshev wrote:
> LGTM in general, but here are some final nitpicks:

Thanks for reviewing.

> + if (maxFd != 0)
> + (void) select(maxFd + 1, &input_mask, &output_mask, &except_mask, NULL);
>
> It´s a good idea to check for the return value of select, in case it
> returns any errors.

Done.

> + dbs_complete++;
> + (void) PQgetResult(slot->conn);
> + PQfinish(slot->conn);
>
> Perhaps it´s rather for me to understand, what does PQgetResult call do
> here?

I believe I was trying to follow the guidance that you should always call
PQgetResult() until it returns NULL, but looking again, I don't see any
need to call it since we free the connection immediately afterwards.

> + /* Check for connection failure. */
> + if (PQconnectPoll(slot->conn) == PGRES_POLLING_FAILED)
> + pg_fatal("connection failure: %s", PQerrorMessage(slot->conn));
> +
> + /* Check whether the connection is still establishing. */
> + if (PQconnectPoll(slot->conn) != PGRES_POLLING_OK)
> + return;
>
> Are the two consecutive calls of PQconnectPoll intended here? Seems a bit
> wasteful, but maybe that´s ok.

I think we can actually just use PQstatus() here. But furthermore, I think
the way I was initiating connections was completely bogus. IIUC before
calling PQconnectPoll() the first time, we should wait for a write
indicator from select(), and then we should only call PQconnectPoll() after
subsequent indicators from select(). After spending quite a bit of time
staring at the PQconnectPoll() code, I'm quite surprised I haven't seen any
problems thus far. If I had to guess, I'd say that either PQconnectPoll()
is more resilient than I think it is, or I've just gotten lucky because
pg_upgrade establishes connections quickly.

Anyway, to fix this, I've added some more fields to the slot struct to
keep track of the information we need to properly establish connections,
and we now pay careful attention to the return value of select() so that we
know which slots are ready for processing. This seemed like a nice little
optimization independent of fixing connection establishment. I was worried
this was going to require a lot more code, but I think it only added ~50
lines or so.

> We should probably mention this change in the docs as well, I found these
> two places that I think can be improved:

I've adjusted the documentation in v11.

--
nathan

Attachment Content-Type Size
v11-0001-Introduce-framework-for-parallelizing-various-pg.patch text/plain 18.6 KB
v11-0002-Use-pg_upgrade-s-new-parallel-framework-for-subs.patch text/plain 9.2 KB
v11-0003-Use-pg_upgrade-s-new-parallel-framework-to-get-r.patch text/plain 15.7 KB
v11-0004-Use-pg_upgrade-s-new-parallel-framework-to-get-l.patch text/plain 3.8 KB
v11-0005-Use-pg_upgrade-s-new-parallel-framework-for-exte.patch text/plain 4.2 KB
v11-0006-Use-pg_upgrade-s-new-parallel-framework-for-data.patch text/plain 13.5 KB
v11-0007-Use-pg_upgrade-s-new-parallel-framework-for-isn-.patch text/plain 4.7 KB
v11-0008-Use-pg_upgrade-s-new-parallel-framework-for-post.patch text/plain 6.3 KB
v11-0009-Use-pg_upgrade-s-new-parallel-framework-for-poly.patch text/plain 7.5 KB
v11-0010-Use-pg_upgrade-s-new-parallel-framework-for-WITH.patch text/plain 4.4 KB
v11-0011-Use-pg_upgrade-s-parallel-framework-for-encoding.patch text/plain 5.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2024-09-01 21:19:00 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Previous Message Tomas Vondra 2024-09-01 19:30:27 Re: scalability bottlenecks with (many) partitions (and more)