From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Sergei Kornilov <sk(at)zsrv(dot)org>, Donald Dong <xdong(at)csumb(dot)edu>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "simon(at)2ndquadrant(dot)com" <simon(at)2ndquadrant(dot)com>, "ams(at)2ndquadrant(dot)com" <ams(at)2ndquadrant(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com> |
Subject: | Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name |
Date: | 2019-01-11 23:10:07 |
Message-ID: | 20190111231007.GA24889@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jan 11, 2019 at 12:52:08PM -0500, Robert Haas wrote:
> With the patch, the PrimaryConnInfo and PrimarySlotName arguments are
> removed from RequestXLogStreaming. That means that the new
> walreceiver could come to a different conclusion about the values of
> those arguments than the startup process. In particular, it could end
> up thinking that primary_conninfo is empty when, if the startup
> process had thought that, the walreceiver would never have been
> launched in the first place. But it's not obvious that you've added
> any logic in WALReceiverMain or elsewhere to compensate for this
> possibility -- what would happen in that case? Would we crash?
> Connect to the wrong server?
If I contemplate the patch this morning there is this bit:
@@ -291,32 +295,40 @@ WalReceiverMain(void)
/* Unblock signals (they were blocked when the postmaster forked
us) */
PG_SETMASK(&UnBlockSig);
+ /*
+ * Fail immediately if primary_conninfo goes missing, better safe than
+ * sorry.
+ */
+ if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot connect to the primary server as \"primary_conninfo\" is not defined")));
So the answer to your question is: the WAL receiver fails to start.
> I might be wrong here, but I'm inclined to think that this scenario
> hasn't really been contemplated carefully by the patch authors. There
> are no added TAP tests for the scenario where the values differ
> between the two processes, no code comments which explain why it's OK
> if that happens, really no mention of it in the patch at all. And on
> that basis I'm inclined to think that Andres is really quite correct
> to be worried about this. The problem he's talking about here is very
> low-probability because the race condition is narrow, but it's real,
> and it surely needs to be handled somehow.
primary_conninfo and primary_slot_name are PGC_POSTMASTER now, so
adding tests now don't really make sense.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-01-11 23:33:54 | Re: Three animals fail test-decoding-check on REL_10_STABLE |
Previous Message | Andrew Gierth | 2019-01-11 23:04:51 | declaration-after-statement (was Re: Ryu floating point output patch) |