Re: Commits don't block for synchronous replication

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Xin Zhang <xzhang(at)pivotal(dot)io>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, Asim Praveen <apraveen(at)pivotal(dot)io>
Subject: Re: Commits don't block for synchronous replication
Date: 2017-09-19 05:26:46
Message-ID: CAB7nPqSuutZAsi7RddX4yhtS7tJLAxCrg2_iMgJPoXYhqRVgNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 19, 2017 at 7:50 AM, Xin Zhang <xzhang(at)pivotal(dot)io> wrote:
> If primary crashed at that moment, and failover to standby, the foo table is
> lost, even though the replication is synced according to
> `pg_stat_replication` view.

GUC parameters are reloaded each time a query is run, and so
SyncRepConfig is filled with the parsed data of SyncRepStandbyNames
once the parameter is reloaded for the process. Still, here, a commit
is waiting for a signal from a WAL sender that the wanted LSN has been
correctly flushed on a standby so this code path does not care about
the state of SyncRepConfig saved in the context of the process, we
want to know what the checkpointer thinks about it. Hence using WAL
sender data or sync_standbys_defined as a source of truth looks like a
correct concept to me, making the problem of this bug legit.

The check with SyncRepRequested() still holds truth: max_wal_senders
needs a restart to be updated. Also, the other caller of
SyncStandbysDefined() requires SyncRepConfig to be set, so this caller
is fine.

I have looked at your patch and tested it, but found no problems
associated with it. A backpatch would be required, so I have added an
entry in the next commit fest with status set to "ready for committer"
so as this bug does not fall into the cracks.

> A separate question, is the `pg_stat_replication` view the reliable way to
> find when to failover to a standby, or there are some other ways to ensure
> the standby is in-sync with the primary?

It shows at SQL level what is currently present in shared memory by
scanning all the WAL sender entries, so this report uses the same data
as the backend themselves, so that's a reliable source. In Postgres
10, pg_stat_activity is also able to show to users what are the
backends waiting for a change to be flushed/applied on the standby
using the wait event called SyncRep. You could make some use of that
as well.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-09-19 05:48:36 Re: Creating backup history files for backups taken from standbys
Previous Message Michael Paquier 2017-09-19 04:30:14 Re: Rewriting the test of pg_upgrade as a TAP test