Re: Inconsistent DB data in Streaming Replication

From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Sameer Thakur <samthakur74(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, sthomas(at)optionshouse(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: Inconsistent DB data in Streaming Replication
Date: 2013-04-11 14:33:07
Message-ID: 5166C9A3.4020105@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/11/2013 03:52 PM, Ants Aasma wrote:
> On Thu, Apr 11, 2013 at 4:25 PM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>> The proposed fix - halting all writes of data pages to disk and
>> to WAL files while waiting ACK from standby - will tremendously
>> slow down all parallel work on master.
> This is not what is being proposed. The proposed fix halts writes of
> only data pages that are modified within the window of WAL that is not
> yet ACKed by the slave. This means pages that were recently modified
> and where the clocksweep or checkpoint has decided to evict them. This
> only affects the checkpointer, bgwriter and backends doing allocation.
> Furthermore, for the backend clocksweep case it would be reasonable to
> just pick another buffer to evict. The slowdown for most actual cases
> will be negligible.
You also need to hold back all WAL writes, including the ones by
parallel async and locally-synced transactions. Which means that
you have to make all locally synced transactions to wait on the
syncrep transactions committed before them.
After getting the ACK from slave you then have a backlog of stuff
to write locally, which then also needs to be sent to slave. Basically
this turns a nice smooth WAL write-and-stream pipeline into a
chunky wait-and-write-and-wait-and-stream-and-wait :P
This may not be a problem in slight write load cases, which is
probably the most widely happening usecase for postgres, but it
will harm top performance and also force people to get much
better (and more expensive) hardware than would otherways
be needed.
>
>> And it does just turn around "master is ahead of slave" problem
>> into "slave is ahead of master" problem :)
> The issue is not being ahead or behind. The issue is ensuring WAL
> durability in the face of failovers before modifying data pages. This
> is sufficient to guarantee no forks in the WAL stream from the point
> of view of data files and with that the capability to always recover
> by replaying WAL.
How would this handle the case Tom pointed out, namely a short
power recycling on master ?

Instead of just continuing after booting up again the master now
has to figure out if it had any slaves and then try to query them
(for how long?) if they had any replayed WAL the master does
not know of.

Suddenly the pure existence of streaming replica slaves has become
a problem for master !

This will especially complicate the case of multiple slaves each
having received WAL to a slightly different LSN ? And you do want
to have at least 2 slaves if you want both durability
and availability with syncrep.

What if the one of slaves disconnects ? how should master react to this ?

Regards
Hannu Krosing

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-04-11 14:48:50 Nearing beta?
Previous Message Tom Lane 2013-04-11 14:02:21 Re: Inconsistent DB data in Streaming Replication