Race condition with restore_command on streaming replica

From: "Brad Nicholson" <bradn(at)ca(dot)ibm(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Race condition with restore_command on streaming replica
Date: 2020-11-04 20:09:40
Message-ID: OFD21F6F73.D0A0B069-ON85258616.006BFBC1-85258616.006EBFE1@notes.na.collabserv.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I've recently been seeing a race condition with the restore_command on
replicas using streaming replication.

On the primary, we are archiving wal files to s3 compatible storage via
pgBackRest. In the recovery.conf section of the postgresql.conf file on the
replicas, we define the restore command as follows:

restore_command = '/usr/bin/pgbackrest
--config /conf/postgresql/pgbackrest_restore.conf --stanza=formation
archive-get %f "%p"'

We have a three member setup - m-0, m-1, m-2. Consider the case where m-0
is the Primary and m-1 and m-2 are replicas connected to the m-0.

When issuing a switchover (via Patroni) from m-0 to m-1, the connection
from m-2 to m-0 is terminated. The restore_command on m-2 is run, and it
looks for the .history file for the new timeline. If this happens before
the history file is created and pushed to the archive, m-2 will look for
the next wal file on the existing timeline in the archive. It will never
be created as the source has moved on, so this m-2 hangs waiting on that
file. The restore_command on the replica looking for this non-existent
file is only run once. This seems like an odd state to be in. The replica
is waiting on a new file, but it's not actually looking for it. Is this
expected, or should the restore_command be polling for that file?

Restarting m-2 causes it to reconnect, find the .history file in the
archive and replication picks up. But that is less than ideal.

This is Postgres 12.4, managed by Patroni. I reported this to Patroni
first for feedback, and they pointed to it be being a Postgres issue - not
a Patroni one.

Here is a summarized timeline of events from one instance, cluster is on
Time Line 9

2020-11-04 14:51:11 m-0 is shutdown
2020-11-04 14:51:13.015 m-2 replication connection is terminated
2020-11-04 14:51:13.015 m-2 looks for 0000000A.history in archive, does not
find it
2020-11-04 14:51:17,137 m-2 looks for 0000000900000000000000B6 in archive
(this file won't ever exist)
2020-11-04 14:51:20,177 m-1 pushes 0000000A.history to archive

At this point, m-2 is stuck until reboot

There is a detailed bug report on the Patroni list with all the logs if
relevant:
https://github.com/zalando/patroni/issues/1747#issue-736244675

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Steve Singer 2020-11-04 23:29:18 Christopher Browne
Previous Message Adrian Klaver 2020-11-04 16:42:54 Re: PANIC: could not write to log file {} at offset {}, length {}: Invalid argument