From: | Ants Aasma <ants(at)cybertec(dot)at> |
---|---|
To: | Hannu Krosing <hannu(at)2ndquadrant(dot)com> |
Cc: | Sameer Thakur <samthakur74(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, sthomas(at)optionshouse(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com> |
Subject: | Re: Inconsistent DB data in Streaming Replication |
Date: | 2013-04-11 13:52:47 |
Message-ID: | CA+CSw_uZAbQeo58UYm20F8LuxQfx6ZR-UoWA2ToBZ4dOi3eWpw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 11, 2013 at 4:25 PM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> The proposed fix - halting all writes of data pages to disk and
> to WAL files while waiting ACK from standby - will tremendously
> slow down all parallel work on master.
This is not what is being proposed. The proposed fix halts writes of
only data pages that are modified within the window of WAL that is not
yet ACKed by the slave. This means pages that were recently modified
and where the clocksweep or checkpoint has decided to evict them. This
only affects the checkpointer, bgwriter and backends doing allocation.
Furthermore, for the backend clocksweep case it would be reasonable to
just pick another buffer to evict. The slowdown for most actual cases
will be negligible.
> And it does just turn around "master is ahead of slave" problem
> into "slave is ahead of master" problem :)
The issue is not being ahead or behind. The issue is ensuring WAL
durability in the face of failovers before modifying data pages. This
is sufficient to guarantee no forks in the WAL stream from the point
of view of data files and with that the capability to always recover
by replaying WAL. There can still be forks from the point of view of
async commits, with most recent commits disappearing on failover, but
this is in no way different from what we have now.
I don't share the view that the disk image is extremely likely to be
corrupt after a crash. If that were the case then we should recommend
that people don't use crash recovery at all and always restore from a
backup. For errors like power supply failure, uncorrectable ECC
errors, etc. we can be pretty sure that the server was not writing
garbage into the storage system before failing. Having to do a day
long rsync run + recovery to catch up on all changes during the resync
to restore high-availability safety in those circumstances is in many
cases a larger risk.
Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-04-11 14:02:21 | Re: Inconsistent DB data in Streaming Replication |
Previous Message | Tom Lane | 2013-04-11 13:51:14 | Re: [GSOC] questions about idea "rewrite pg_dump as library" |