Re: Commit to primary with unavailable sync standby

From: Maksim Milyutin <milyutinma(at)gmail(dot)com>
To: f(dot)venchiarutti(at)ocado(dot)com, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Commit to primary with unavailable sync standby
Date: 2019-12-23 16:36:49
Message-ID: 13d29c87-9864-8298-45b1-0684a69746a4@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 19.12.2019 18:08, Fabio Ugo Venchiarutti wrote:
>
>
> On 19/12/2019 13:58, Maksim Milyutin wrote:
>> On 19.12.2019 14:04, Andrey Borodin wrote:
>>
>>> Hi!
>>
>>
>> Hi!
>>
>> FYI, this topic was up recently in -hackers
>> https://www.postgresql.org/message-id/CAEET0ZHG5oFF7iEcbY6TZadh1mosLmfz1HLm311P9VOt7Z+jeg@mail.gmail.com
>>
>>
>>
>>> I cannot figure out proper way to implement safe HA upsert. I will
>>> be very grateful if someone would help me.
>>>
>>> Imagine we have primary server after failover. It is
>>> network-partitioned. We are doing INSERT ON CONFLICT DO NOTHING;
>>> that eventually timed out.
>>>
>>> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>>>      INSERT INTO t(
>>>          pk,
>>>          v,
>>>          dt
>>>      )
>>>      VALUES
>>>      (
>>>          5,
>>>          'text',
>>>          now()
>>>      )
>>>      ON CONFLICT (pk) DO NOTHING
>>>      RETURNING pk,
>>>                v,
>>>                dt)
>>>     SELECT new_doc.pk from new_doc;
>>> ^CCancel request sent
>>> WARNING:  01000: canceling wait for synchronous replication due to
>>> user request
>>> DETAIL:  The transaction has already committed locally, but might
>>> not have been replicated to the standby.
>>> LOCATION:  SyncRepWaitForLSN, syncrep.c:264
>>> Time: 2173.770 ms (00:02.174)
>>>
>>> Here our driver decided that something goes wrong and we retry query.
>>>
>>> az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
>>>      INSERT INTO t(
>>>          pk,
>>>          v,
>>>          dt
>>>      )
>>>      VALUES
>>>      (
>>>          5,
>>>          'text',
>>>          now()
>>>      )
>>>      ON CONFLICT (pk) DO NOTHING
>>>      RETURNING pk,
>>>                v,
>>>                dt)
>>>     SELECT new_doc.pk from new_doc;
>>>   pk
>>> ----
>>> (0 rows)
>>>
>>> Time: 4.785 ms
>>>
>>> Now we have split-brain, because we acknowledged that row to client.
>>> How can I fix this?
>>>
>>> There must be some obvious trick, but I cannot see it... Or maybe
>>> cancel of sync replication should be disallowed and termination
>>> should be treated as system failure?
>>>
>>
>> I think the most appropriate way to handle such issues is to catch by
>> client driver such warnings (with message about local commit) and
>> mark the status of posted transaction as undetermined. If connection
>> with sync replica will come back then this transaction eventually
>> commits but after triggering of autofailover and *not replicating
>> this commit to replica* this commit aborts. Therefore client have to
>> wait some time (that exceeds the duration of autofailover) and check
>> (logically based on committed data) the status of commit.
>>
>> The problem here is the locally committed data becomes visible to
>> future transactions (before autofailover) that violates the property
>> of consistent reading from master. IMO the more correct behavior for
>> PostgreSQL here is to ignore any cancel / termination queries when
>> backend is in status of waiting response from sync replicas.
>>
>> However, there is another way to get locally applied commits via
>> restart of master after initial recovery. This case is described in
>> doc
>> https://www.postgresql.org/docs/current/warm-standby.html#SYNCHRONOUS-REPLICATION-HA
>> . But here HA orchestrator agent can close access from external users
>> (via pg_hba.conf manipulations) until PostgreSQL instance synchronizes
>
>
> And this is where the unsafety lies: that assumes that the isolated
> master is in enough of a sane state to apply a self-ban (and that can
> do it in near-zero time).
>
>
> Although the retry logic in Andrey's case is probably not ideal (and
> you offered a more correct approach to synchronous commit), there are
> many "grey area" failure modes that in his scenario would either
> prevent a given node from sealing up fast enuogh if at all (eg: PID
> congestion causing fork()/system() to fail while backends are already
> up and happily flushing WAL).
>
>
> This is particularly relevant to situations when only a subset of
> critical transactions set synchronous_commit to remote_*: it'd still
> be undesirable to sink "tier 2" data in a stale primary for any
> significant length of time).

Could you more concrete describe your thesis? In my proposal the
self-ban to master is applied after restarting one so that changes from
locally committed transactions was not visible for new incoming
transactions.

> In the case of postgres (or any RDBMS, really), all I can think of is
> either an inline proxy performing some validation as part of the
> forwarding (which is what we did internally but that has not been
> green lit for FOSS :( )

External validation unfortunately is not option here. AIMB the local
commits become visible to future transactions coming to master and even
if some proxy reports to client that transaction is not committed
completely, new incoming transactions reading locally applied changes
and making its changes based on these ones implicitly confirms the
status of these changes as committed.

> or some logic in the backend that rejects asynchronous commits too if
> some condition is not met (eg: <quorum - 1> synchronous standby nodes
> not present - a builtin version of the pg_stat_replication look-aside
> CTE I suggested earlier).

CTE with sub-query using pg_stat_replication is not option too. The view
pg_stat_replication is in fact shows the stale info about statuses of
replicas and is formed from statuses of wal_sender processes. That is
when replica loses contact with master then at most wal_sender_timeout
master will see this replica in pg_stat_replication without any changes
of row attributes, so local commits also are capable to overslip inside
this timeout.

--
Best regards,
Maksim Milyutin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2019-12-23 16:46:28 Re: SQL operator '*='
Previous Message Tom Lane 2019-12-23 16:16:58 Re: Mixing greediness in regexp_matches