From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | "Bossart, Nathan" <bossartn(at)amazon(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Avoiding data loss with synchronous replication |
Date: | 2021-07-25 00:24:32 |
Message-ID: | 20210725002432.tqfm4hjiayoahske@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2021-07-22 21:17:56 +0000, Bossart, Nathan wrote:
> AFAICT there are a variety of ways that the aforementioned problem may
> occur:
> 1. Server restarts: As noted in the docs [2], "waiting transactions
> will be marked fully committed once the primary database
> recovers." I think there are a few options for handling this,
> but the simplest would be to simply failover anytime the primary
> server shut down. My proposal may offer other ways of helping
> with this.
> 2. Backend crashes: If a backend crashes, the postmaster process
> will restart everything, leading to the same problem described in
> 1. However, this behavior can be prevented with the
> restart_after_crash parameter [3].
> 3. Client disconnections: During waits for synchronous replication,
> interrupt processing is turned off, so disconnected clients
> actually don't seem to cause a problem. The server will still
> wait for synchronous replication to complete prior to making the
> transaction visible on the primary.
> 4. Query cancellations and backend terminations: This appears to be
> the only gap where there is no way to avoid potential data loss,
> and it is the main target of my proposal.
>
> Instead of blocking query cancellations and backend terminations, I
> think we should allow them to proceed, but we should keep the
> transactions marked in-progress so they do not yet become visible to
> sessions on the primary. Once replication has caught up to the
> the necessary point, the transactions can be marked completed, and
> they would finally become visible.
I think there's two aspects making this proposal problematic:
First, from the user experience side of things, the issue is that this seems
to propose violating read-your-own-writes. Within a single connection to a
single node. Which imo is *far* worse than seeing writes that haven't yet been
acknowledged as replicated after a query cancel.
Second, on the implementation side, I think this proposal practically amounts
to internally converting plain transaction commits into 2PC
prepare/commit. With all the associated overhead (two WAL entries/flushes per
commit, needing a separate set of procarray entries to hold the resources for
the the prepared-but-not-committed transactions, potential for running out of
the extra procarray slots). What if a user rapidly commits-cancels in a loop?
You'll almost immediately run out of procarray slots to represent all those
"not really committed" transactions.
I think there's benefit in optionally turning all transactions into 2PC ones,
but I don't see it being fast enough to be the only option.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2021-07-25 00:29:06 | Re: Avoiding data loss with synchronous replication |
Previous Message | Andres Freund | 2021-07-24 23:08:33 | Re: something is wonky with pgbench pipelining |