From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Patch for fail-back without fresh backup |
Date: | 2013-06-16 11:40:58 |
Message-ID: | CA+U5nMJQyk6OYewL0pozmGOQC_VMDAm_b6=vRv5eNDg7N8rbLw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 14 June 2013 10:11, Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com> wrote:
> We have already started a discussion on pgsql-hackers for the problem of
> taking fresh backup during the failback operation here is the link for that:
>
> http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtbJgWrFu513s+Q@mail.gmail.com
> So our proposal on this problem is that we must ensure that master should
> not make any file system level changes without confirming that the
> corresponding WAL record is replicated to the standby.
> 1. The main objection was raised by Tom and others is that we should not add
> this feature and should go with traditional way of taking fresh backup using
> the rsync, because he was concerned about the additional complexity of the
> patch and the performance overhead during normal operations.
>
> 2. Tom and others were also worried about the inconsistencies in the crashed
> master and suggested that its better to start with a fresh backup. Fujii
> Masao and others correctly countered that suggesting that we trust WAL
> recovery to clear all such inconsistencies and there is no reason why we
> can't do the same here.
> So the patch is showing 1-2% performance overhead.
Let's have a look at this...
The objections you summarise that Tom has made are ones that I agree
with. I also don't think that Fujii "correctly countered" those
objections.
My perspective is that if the master crashed, assuming that you know
everything about that and suddenly jumping back on seem like a recipe
for disaster. Attempting that is currently blocked by the technical
obstacles you've identified, but that doesn't mean they are the only
ones - we don't yet understand what all the problems lurking might be.
Personally, I won't be following you onto that minefield anytime soon.
So I strongly object to calling this patch anything to do with
"failback safe". You simply don't have enough data to make such a bold
claim. (Which is why we call it synchronous replication and not "zero
data loss", for example).
But that's not the whole story. I can see some utility in a patch that
makes all WAL transfer synchronous, rather than just commits. Some
name like synchronous_transfer might be appropriate. e.g.
synchronous_transfer = all | commit (default).
The idea of another slew of parameters that are very similar to
synchronous replication but yet somehow different seems weird. I can't
see a reason why we'd want a second lot of parameters. Why not just
use the existing ones for sync rep? (I'm surprised the Parameter
Police haven't visited you in the night...) Sure, we might want to
expand the design for how we specify multi-node sync rep, but that is
a different patch.
I'm worried to see that adding this feature and yet turning it off
causes a measureable drop in performance. I don't think we want that
at all. That clearly needs more work and thought.
I also think your performance results are somewhat bogus. Fast
transaction workloads were already mostly commit waits - measurements
of what happens to large loads, index builds etc would likely reveal
something quite different.
I'm tempted by the thought that we should put the WaitForLSN inside
XLogFlush, rather than scatter additional calls everywhere and then
have us inevitably miss one.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2013-06-16 11:49:26 | Re: pluggable compression support |
Previous Message | Hannu Krosing | 2013-06-16 10:16:59 | Re: pluggable compression support |