From: | Ilya Gladyshev <ilya(dot)v(dot)gladyshev(at)gmail(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: optimizing pg_upgrade's once-in-each-database steps |
Date: | 2024-09-03 23:28:23 |
Message-ID: | 0fe740dd-cd23-45a6-b116-c777349acbf1@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 01.09.2024 22:05, Nathan Bossart wrote:
> I think we can actually just use PQstatus() here. But furthermore, I think
> the way I was initiating connections was completely bogus. IIUC before
> calling PQconnectPoll() the first time, we should wait for a write
> indicator from select(), and then we should only call PQconnectPoll() after
> subsequent indicators from select(). After spending quite a bit of time
> staring at the PQconnectPoll() code, I'm quite surprised I haven't seen any
> problems thus far. If I had to guess, I'd say that either PQconnectPoll()
> is more resilient than I think it is, or I've just gotten lucky because
> pg_upgrade establishes connections quickly.
Good catch, I didn't look so closely at this.
> Anyway, to fix this, I've added some more fields to the slot struct to
> keep track of the information we need to properly establish connections,
> and we now pay careful attention to the return value of select() so that we
> know which slots are ready for processing. This seemed like a nice little
> optimization independent of fixing connection establishment. I was worried
> this was going to require a lot more code, but I think it only added ~50
> lines or so.
The fix looks right to me, but I got confused by the skip_wait and this
`if`:
+ if (PQstatus(slot->conn) != CONNECTION_OK)
+ return;
This branch checks connection status that hasn't been refreshed after
the select. When we go back to wait_slots after this, PQconnectPoll will
refresh the connection status and run select with skip_wait=true, I
believe, we could simplify this by moving the PQconnectPoll back to the
process_slots, so that we can process connection right after polling, if
it's ready. Something like this:
diff --git a/src/bin/pg_upgrade/task.c b/src/bin/pg_upgrade/task.c
index 3618dc08ff..73e987febf 100644
--- a/src/bin/pg_upgrade/task.c
+++ b/src/bin/pg_upgrade/task.c
@@ -237,6 +237,8 @@ process_query_result(const ClusterInfo *cluster,
UpgradeTaskSlot *slot,
static void
process_slot(const ClusterInfo *cluster, UpgradeTaskSlot *slot, const
UpgradeTask *task)
{
+ PostgresPollingStatusType status;
+
if (!slot->ready)
return;
@@ -260,26 +262,26 @@ process_slot(const ClusterInfo *cluster,
UpgradeTaskSlot *slot, const UpgradeTas
start_conn(cluster, slot);
return;
-
case CONNECTING:
- /* Check for connection failure. */
- if (PQstatus(slot->conn) == CONNECTION_BAD)
- pg_fatal("connection failure: %s",
PQerrorMessage(slot->conn));
-
- /* Check whether the connection is still establishing. */
- if (PQstatus(slot->conn) != CONNECTION_OK)
- return;
-
- /*
- * Move on to running/processing the queries in the task.
- */
- slot->state = RUNNING_QUERIES;
- if (!PQsendQuery(slot->conn, task->queries->data))
+ status = PQconnectPoll(slot->conn);
+ if (status == PGRES_POLLING_READING)
+ slot->select_mode = true;
+ else if (status == PGRES_POLLING_WRITING)
+ slot->select_mode = false;
+ else if (status == PGRES_POLLING_FAILED)
pg_fatal("connection failure: %s",
PQerrorMessage(slot->conn));
+ else
+ {
+ /*
+ * Move on to running/processing the queries in the task.
+ */
+ slot->state = RUNNING_QUERIES;
+ if (!PQsendQuery(slot->conn, task->queries->data))
+ pg_fatal("connection failure: %s",
PQerrorMessage(slot->conn));
+ }
return;
-
case RUNNING_QUERIES:
/*
@@ -370,8 +372,6 @@ wait_on_slots(UpgradeTaskSlot *slots, int numslots)
for (int i = 0; i < numslots; i++)
{
- PostgresPollingStatusType status;
-
switch (slots[i].state)
{
case FREE:
@@ -386,33 +386,7 @@ wait_on_slots(UpgradeTaskSlot *slots, int numslots)
continue;
case CONNECTING:
-
- /*
- * Don't call PQconnectPoll() again for this slot until
- * select() tells us something is ready. Be sure to
use the
- * previous poll mode in this case.
- */
- if (!slots[i].ready)
- break;
-
- /*
- * If we are waiting for the connection to establish,
choose
- * whether to wait for reading or for writing on the
socket as
- * appropriate. If neither apply, mark the slot as
ready and
- * skip waiting so that it is handled ASAP (we assume this
- * means the connection is either bad or fully ready).
- */
- status = PQconnectPoll(slots[i].conn);
- if (status == PGRES_POLLING_READING)
- slots[i].select_mode = true;
- else if (status == PGRES_POLLING_WRITING)
- slots[i].select_mode = false;
- else
- {
- slots[i].ready = true;
- skip_wait = true;
- continue;
- }
+ /* All the slot metadata was already setup in
process_slots() */
break;
--
2.43.0
skip_wait can be removed in this case as well.
This is up to you, I think the v12 is good and commitable in any case.
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2024-09-04 00:06:57 | Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN |
Previous Message | Peter Smith | 2024-09-03 23:17:15 | Re: GUC names in messages |