Re: Disallow cancellation of waiting for synchronous replication

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disallow cancellation of waiting for synchronous replication
Date: 2019-12-30 14:39:10
Message-ID: 20191230143910.GB17407@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 28, 2019 at 04:55:55PM -0500, Robert Haas wrote:
> On Fri, Dec 20, 2019 at 12:04 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > Currently, we can have split brain with the combination of following steps:
> > 0. Setup cluster with synchronous replication. Isolate primary from standbys.
> > 1. Issue upsert query INSERT .. ON CONFLICT DO NOTHING
> > 2. CANCEL 1 during wait for synchronous replication
> > 3. Retry 1. Idempotent query will succeed and client have confirmation of written data, while it is not present on any standby.
>
> It seems to me that in order for synchronous replication to work
> reliably, you've got to be very careful about any situation where a
> commit might or might not have completed, and this is one such
> situation. When the client sends the query cancel, it does not and
> cannot know whether the INSERT statement has not yet completed, has
> already completed but not yet replicated, or has completed and
> replicated but not yet sent back a response. However, the server's
> response will be different in each of those cases, because in the
> second case, there will be a WARNING about synchronous replication
> having been interrupted. If the client ignores that WARNING, there are
> going to be problems.

This gets to the heart of something I was hoping to discuss. When is
something committed? You would think it is when the client receives the
commit message, but Postgres can commit something, and try to inform the
client but fail to inform, perhaps due to network problems. In Robert's
case above, we send a "success", but it is only a success on the primary
and not on the synchronous standby.

In the first case I mentioned, we commit without guaranteeing the client
knows, but in the second case, we tell the client success with a warning
that the synchronous standby didn't get the commit. Are clients even
checking warning messages? You see it in psql, but what about
applications that use Postgres. Do they even check for warnings?
Should administrators be informed via email or some command when this
happens?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-12-30 14:40:14 Re: TAP testing for psql's tab completion code
Previous Message Tom Lane 2019-12-30 14:10:40 Re: TAP testing for psql's tab completion code