Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Ian Barwick <ian(dot)barwick(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions
Date: 2019-06-14 16:08:35
Message-ID: 20190614160835.GF2480@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Ian Barwick (ian(dot)barwick(at)2ndquadrant(dot)com) wrote:
> Consider the following cascading standby setup with PostgreSQL 12:
>
> - there exists a running primary "A"
> - standby "B" is cloned from primary "A" using "pg_basebackup --write-recovery-conf"
> - standby "C" is cloned from standby "B" using "pg_basebackup --write-recovery-conf"
>
> So far, so good, everything works as expected.
>
> Now, for whatever reason, the user wishes standby "C" to follow another upstream
> node (which one is not important here), so the user, in the comfort of their own psql
> command line (no more pesky recovery.conf editing!) issues the following:
>
> ALTER SYSTEM SET primary_conninfo = 'host=someothernode';
>
> and restarts the instance, and... it stays connected to the original upstream node.
>
> Which is unexpected.
>
> Examining the the restarted instance, "SHOW primary_conninfo" still displays
> the original value for "primary_conninfo". Mysteriouser and mysteriouser.
>
> What has happened here is that with the option "--write-recovery-conf", Pg12's
> pg_basebackup (correctly IMHO) appends the recovery settings to "postgresql.auto.conf".
>
> However, on standby "C", pg_basebackup has dutifully copied over standby "B"'s
> existing "postgresql.auto.conf", which already contains standby "B"'s
> replication configuration, and appended standby "C"'s replication configuration
> to that, which (before "ALTER SYSTEM" was invoked) looked something like this:
>
> # Do not edit this file manually!
> # It will be overwritten by the ALTER SYSTEM command.
> primary_conninfo = 'host=node_A'
> primary_conninfo = 'host=node_B'
>
> which is expected, and works because the last entry in the file is evaluated, so
> on startup, standby "C" follows standby "B".
>
> However, executing "ALTER SYSTEM SET primary_conninfo = 'host=someothernode'" left
> standby "C"'s "postgresql.auto.conf" file looking like this:
>
> # Do not edit this file manually!
> # It will be overwritten by the ALTER SYSTEM command.
> primary_conninfo = 'host=someothernode'
> primary_conninfo = 'host=node_B'
>
> which seems somewhat broken, to say the least.

Yes, it's completely broken, which I've complained about at least twice
on this list to no avail.

Thanks for putting together an example case pointing out why it's a
serious issue. The right thing to do here it so create an open item for
PG12 around this.

> As-is, the user will either need to repeatedly issue "ALTER SYSTEM RESET primary_conninfo"
> until the duplicates are cleared (which means "ALTER SYSTEM RESET ..." doesn't work as
> advertised, and is not an obvious solution anyway), or ignore the "Do not edit this file manually!"
> warning and remove the offending entry/entries (which, if done safely, should involve
> shutting down the instance first).
>
> Note this issue is not specific to pg_basebackup, primary_conninfo (or any other settings
> formerly in recovery.conf), it has just manifested itself as the built-in toolset as of now
> provides a handy way of getting into this situation without too much effort (and any
> utilities which clone standbys and append the replication configuration to
> "postgresql.auto.conf" in lieu of creating "recovery.conf" will be prone to running into
> the same situation).

This is absolutely the fault of the system for putting in multiple
entries into the auto.conf, which it wasn't ever written to handle.

> I had previously always assumed that ALTER SYSTEM would change the *last* occurrence for
> the parameter in "postgresql.auto.conf", in the same way you'd need to be sure to change
> the last occurrence in the normal configuration files, however this actually not the case -
> as per replace_auto_config_value() ( src/backend/utils/misc/guc.c ):
>
> /* Search the list for an existing match (we assume there's only one) */
>
> the *first* match is replaced.
>
> Attached patch attempts to rectify this situation by having replace_auto_config_value()
> deleting any duplicate entries first, before making any changes to the last entry.

While this might be a good belt-and-suspenders kind of change to
include, I don't think pg_basebackup should be causing us to have
multiple entries in the file in the first place..

> Arguably it might be sufficient (and simpler) to just scan the list for the last
> entry, but removing preceding duplicates seems cleaner, as it's pointless
> (and a potential source of confusion) keeping entries around which will never be used.

I don't think we should only change the last entry, that seems like a
really bad idea. I agree that we should clean up the file if we come
across it being invalid.

> Also attached is a set of TAP tests to check ALTER SYSTEM works as expected (or
> at least as seems correct to me).

In my view, at least, we should have a similar test for pg_basebackup to
make sure that it doesn't create an invalid .auto.conf file.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Li, Zheng 2019-06-14 16:35:21 Re: Converting NOT IN to anti-joins during planning
Previous Message Tom Lane 2019-06-14 15:40:36 Re: Allow table AM's to cache stuff in relcache