From: | "Euler Taveira" <euler(at)eulerto(dot)com> |
---|---|
To: | "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain() |
Date: | 2023-12-12 19:54:32 |
Message-ID: | 3abb8c54-11de-4220-ad65-c2785df01124@app.fastmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 12, 2023, at 12:58 PM, Drouvot, Bertrand wrote:
> Currently walrcv->walRcvState is set to WALRCV_STREAMING at the
> beginning of WalReceiverMain().
>
> But it seems that after this assignment things could be wrong before the
> walreicever actually starts streaming (like not being able to connect
> to the primary).
>
> It looks to me that WALRCV_STREAMING should be set once walrcv_startstreaming()
> returns true: this is the proposal of this patch.
Per the state name (streaming), it seems it should be set later as you
proposed, however, I'm concerned about the previous state (WALRCV_STARTING). If
I'm reading the code correctly, WALRCV_STARTING is assigned at
RequestXLogStreaming():
SetInstallXLogFileSegmentActive();
RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
PrimarySlotName,
wal_receiver_create_temp_slot);
flushedUpto = 0;
}
/*
* Check if WAL receiver is active or wait to start up.
*/
if (!WalRcvStreaming())
{
lastSourceFailed = true;
break;
}
After a few lines the function WalRcvStreaming() has:
SpinLockRelease(&walrcv->mutex);
/*
* If it has taken too long for walreceiver to start up, give up. Setting
* the state to STOPPED ensures that if walreceiver later does start up
* after all, it will see that it's not supposed to be running and die
* without doing anything.
*/
if (state == WALRCV_STARTING)
{
pg_time_t now = (pg_time_t) time(NULL);
if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
{
bool stopped = false;
SpinLockAcquire(&walrcv->mutex);
if (walrcv->walRcvState == WALRCV_STARTING)
{
state = walrcv->walRcvState = WALRCV_STOPPED;
stopped = true;
}
SpinLockRelease(&walrcv->mutex);
if (stopped)
ConditionVariableBroadcast(&walrcv->walRcvStoppedCV);
}
}
Couldn't it give up before starting if you apply your patch? My main concern is
due to a slow system, the walrcv_connect() took to long in WalReceiverMain()
and the code above kills the walreceiver while in the process to start it.
Since you cannot control the hardcoded WALRCV_STARTUP_TIMEOUT value, you might
have issues during overload periods.
--
Euler Taveira
EDB https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2023-12-12 20:11:31 | Re: Report planning memory in EXPLAIN ANALYZE |
Previous Message | Michael Paquier | 2023-12-12 16:06:51 | pgsql: Prevent tuples to be marked as dead in subtransactions on standb |