Re: SERIALIZABLE on standby servers

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SERIALIZABLE on standby servers
Date: 2017-01-19 21:16:00
Message-ID: CAEepm=1LayssnjMFNdRKZHRsQMm9hke+jc8OsRps3rmnUGmBLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 16, 2016 at 9:26 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Tue, Nov 8, 2016 at 5:56 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> [..] Another solution
> could be to have recovery on the standby detect tokens (CSNs
> incremented by PreCommit_CheckForSerializationFailure) arriving out of
> order, but I don't know what exactly it should do about that when it
> is detected: you shouldn't respect an out-of-order claim of safety,
> but then what should you wait for? Perhaps if the last replayed
> commit record before that was marked SNAPSHOT_SAFE then it's OK to
> leave it that way, and if it was marked SNAPSHOT_SAFETY_UNKNOWN then
> you have to wait for that one to be resolved by a follow-up snapshot
> safety message and then rince-and-repeat (take a new snapshot etc). I
> think that might work, but it seems strange to allow random races on
> the primary to create extra delays on the standby. Perhaps there is
> some much simpler way to do all this that I'm missing.
>
> Another detail is that standbys that start up from a checkpoint and
> don't see any SSI transactions commit don't yet have any snapshot
> safety information, but defaulting to assuming that this point is safe
> doesn't seem right, so I suspect it needs to be in checkpoints.
>
> Attached is a tidied up version which doesn't try to address the above
> problems yet. When time permits I'll come back to this.

I haven't looked at this again yet but a nearby thread reminded me of
another problem with this which I wanted to restate explicitly here in
the context of this patch. Even without replication in the picture,
there is a race to reach ProcArrayEndTransaction() after
RecordTransactionCommit() runs, which means that the DO history
(normal primary server) and REDO history (recovery) don't always agree
on the order that transactions become visible. With this patch, this
kind of diverging DO and REDO could allow undetectable read only
serialization anomalies. I think that ProcArrayEndTransaction() and
RecordTransactionCommit() need to be made atomic in the simple case so
that DO and REDO agree. Synchronous replication can make that more
likely and it seems like some other approach is probably needed to
delay visibility of not-yet-durable transactions while keeping the
order that transactions become visible the same on all nodes.

Aside from the problems I mentioned in my earlier message (race
between snapshot safety decision and logging order, and lack of
checkpointing of snapshot safety information), it seems like the two
DO vs REDO problems (race to ProcArrayEndTransaction, and deliberately
delayed visibility in syncrep) also need to be addressed before
SERIALIZABLE DEFERRABLE on standbys could make a water tight
guarantee.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2017-01-19 21:19:39 Re: delta relations in AFTER triggers
Previous Message Tom Lane 2017-01-19 20:59:30 Re: pgbench - allow backslash continuations in \set expressions