Re: How to Qualifying or quantify risk of loss in asynchronous replication

From: otheus uibk <otheus(dot)uibk(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Forums postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to Qualifying or quantify risk of loss in asynchronous replication
Date: 2016-03-16 21:40:03
Message-ID: CALbQNd0P5S-M+jScUvEXjCMp35QHN8JJwDDYE2-Z0Nipk5k6Ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wednesday, March 16, 2016, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:
> In asynchronous replication, the primary writes to the WAL and flushes
the disk. Then, for any standbys that happen to be connected, a WAL sender
process trundles along behind feeding new WAL doesn the socket as soon as
it can, but it can be running arbitrarily far behind or not running at all
(the network could be down or saturated, the standby could be temporarily down
or up but not reading the stream fast enough, etc etc).

Thanks for your help on finding the code. To be more precise, in the 9.1.8
code, I see this:

1. [backend] WAL is flushed to disk
2. [backend] WAL-senders are sent SIGUSR1 to wake up
3. [backend] wait for responses from other SyncRep-Receiver, effectively
skipped if none
[wal-sender] wakes up
4. [backend] end-of-xact cycle
[wal-sender] reads WAL (XLogRead) up to MAX_SEND_SIZE (or less) bytes
5. [backend] ? is there an ACK send to client?
[wal-sender] sends chunk to WAL-receiver using the
pq_putmessage_noblock call
6. [wal-sender] repeats reading-sending loop

So if the WAL record is bigger than whatever MAX_SEND_SIZE is (in my
source, I seek 8k * 16 = 128 kB, so 1 Mb (roughly)), the WAL may end up
sleeping (between iterations of 5 and 6).

On Wed, Mar 16, 2016 at 10:21 AM, otheus uibk <otheus(dot)uibk(at)gmail(dot)com> wrote:

> Section 25.2.5. "The standby connects to the primary, which streams WAL
> records to the standby as they're generated, without waiting for the WAL
> file to be filled."

Section 25.2.6 "If the primary server crashes then some transactions that
> were committed may not have been replicated to the standby server, causing
> data loss. The amount of data loss is proportional to the replication delay
> at the time of failover."

Both these statements, then, from the documentation perspective, are
incorrect, at least to a pedant. For 25.2.5, The primary streams WAL
records to the standby after they've been flushed to disk but without
waiting for the file to be filled. For 25.2.6 it's not clear: some
transactions that were *written* to the local WAL and reported as committed
but not yet *sent* to the standby server is possible.

Somehow, the documentation misleads (me) to believe the async replication
algorithm at least guarantees WAL records are *sent* before responding
"committed" to the client. I now know this is not the case. *grumble*.

How can I help make the documentation clearer on this point?

--
Otheus
otheus(dot)uibk(at)gmail(dot)com
otheus(dot)shelling(at)uibk(dot)ac(dot)at

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2016-03-16 21:41:03 Re: Deleting schema - saving up space - PostgreSQL 9.2
Previous Message drum.lucas@gmail.com 2016-03-16 21:27:11 Re: Deleting schema - saving up space - PostgreSQL 9.2