Commit to primary with unavailable sync standby

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Commit to primary with unavailable sync standby
Date: 2019-12-19 11:04:37
Message-ID: B70260F9-D0EC-438D-9A59-31CB996B320A@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi!

I cannot figure out proper way to implement safe HA upsert. I will be very grateful if someone would help me.

Imagine we have primary server after failover. It is network-partitioned. We are doing INSERT ON CONFLICT DO NOTHING; that eventually timed out.

az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
INSERT INTO t(
pk,
v,
dt
)
VALUES
(
5,
'text',
now()
)
ON CONFLICT (pk) DO NOTHING
RETURNING pk,
v,
dt)
SELECT new_doc.pk from new_doc;
^CCancel request sent
WARNING: 01000: canceling wait for synchronous replication due to user request
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
LOCATION: SyncRepWaitForLSN, syncrep.c:264
Time: 2173.770 ms (00:02.174)

Here our driver decided that something goes wrong and we retry query.

az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS (
INSERT INTO t(
pk,
v,
dt
)
VALUES
(
5,
'text',
now()
)
ON CONFLICT (pk) DO NOTHING
RETURNING pk,
v,
dt)
SELECT new_doc.pk from new_doc;
pk
----
(0 rows)

Time: 4.785 ms

Now we have split-brain, because we acknowledged that row to client.
How can I fix this?

There must be some obvious trick, but I cannot see it... Or maybe cancel of sync replication should be disallowed and termination should be treated as system failure?

Best regards, Andrey Borodin.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Eisentraut 2019-12-19 12:02:44 Re: Max locks
Previous Message James Sewell 2019-12-19 10:04:41 Partitioned tables and locks