Re: Race condition with restore_command on streaming replica

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Brad Nicholson <bradn(at)ca(dot)ibm(dot)com>
Cc: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Race condition with restore_command on streaming replica
Date: 2020-11-15 09:47:12
Message-ID: CAFiTN-t7-1TOF-x2bc1h1CMyVqzJxKhzcMgsQxWqxAi2nN0gdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Nov 5, 2020 at 1:39 AM Brad Nicholson <bradn(at)ca(dot)ibm(dot)com> wrote:
>
> Hi,
>
> I've recently been seeing a race condition with the restore_command on replicas using streaming replication.
>
> On the primary, we are archiving wal files to s3 compatible storage via pgBackRest. In the recovery.conf section of the postgresql.conf file on the replicas, we define the restore command as follows:
>
> restore_command = '/usr/bin/pgbackrest --config /conf/postgresql/pgbackrest_restore.conf --stanza=formation archive-get %f "%p"'
>
> We have a three member setup - m-0, m-1, m-2. Consider the case where m-0 is the Primary and m-1 and m-2 are replicas connected to the m-0.
>
> When issuing a switchover (via Patroni) from m-0 to m-1, the connection from m-2 to m-0 is terminated. The restore_command on m-2 is run, and it looks for the .history file for the new timeline. If this happens before the history file is created and pushed to the archive, m-2 will look for the next wal file on the existing timeline in the archive. It will never be created as the source has moved on, so this m-2 hangs waiting on that file. The restore_command on the replica looking for this non-existent file is only run once. This seems like an odd state to be in. The replica is waiting on a new file, but it's not actually looking for it. Is this expected, or should the restore_command be polling for that file?

I am not sure how Patroni does it internally, can you explain the
scenario in more detail? Suppose you are executing the promote on m-1
and if the promotion is successful it will switch the timeline and it
will create the timeline history file. Now, once the promotion is
successful if we change the primary_conninfo on the m-2 then it will
restart the walsender and look for the latest .history file which it
should find either from direct streaming or through the
restore_command. If you are saying that m-2 tried to look for the
history file before m-1 created it then it seems like you change the
primary_conninfo on m-2 before the m-1 promotion got completed.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Radoslav Nedyalkov 2020-11-15 11:49:33 Re: conflict with recovery when delay is gone
Previous Message Morris de Oryx 2020-11-15 09:02:20 Re: ERROR: could not find tuple for statistics object - is there a way to clean this up?