How to Qualifying or quantify risk of loss in asynchronous replication

From: otheus uibk <otheus(dot)uibk(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: How to Qualifying or quantify risk of loss in asynchronous replication
Date: 2016-03-15 17:26:49
Message-ID: CALbQNd1G_np279G60HBqZkdJo=3hE+yDRYSYoZSc16gFnCujbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I've been working with PG 9.1.8 for two years now, mainly asynchronous
replication. Recently, an IT admin of another group contested that the
PG's asynchronous replication can result in loss of data in a 1-node
failure. After re-readinG the documentation, I cannot determine to what
extent this is true.

Back in 2008, Robert Haas made this post
http://postgresql.nabble.com/Sync-Rep-First-Thoughts-on-Code-tp1998339p1998433.html
in
which he delineates between different levels of replication. 1-safe is
guaranteed with PG WALs. Other possibilities include group-safe, both group
safe and 1-safe, 2-safe.

How do we qualify PG when WALs are written (and archived) on the master,
and streaming replication to a hot standby, albeit asynchronous, is used?
Is it Group-safe?

My understanding is "Strictly speaking, No".

But what precisely is the algorithm and timing involved with streaming
WALs?

Is it:
* client issues COMMIT
* master receives commit
* master processes transaction internally
* master creates WAL record
| master appends WAL to local WAL file, flushes disk
| master sends WAL record to all streaming clients
* master sends OK to client
* master applies WAL

So is this correct? Is it correct to say: PG async guarantees that the WAL
is *sent* to the receivers, but not that they are received, before the
client receives acknowledgement?

Can we make a case stronger than that? Assuming the T0 is round trip time
between master and client, and T1 is round trip time between master and
slave, as long as T1 <= T0, and provided both Slave and Master do not fail,
the system is Group-safe?

--
Otheus
otheus(dot)uibk(at)gmail(dot)com
otheus(dot)shelling(at)uibk(dot)ac(dot)at

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John Lumby 2016-03-15 17:43:46 Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing
Previous Message Michael Paquier 2016-03-15 16:21:50 Re: how to switch old replication Master to new Standby after promoting old Standby - pg_rewind log file missing