From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us> |
Subject: | Re: Standalone synchronous master |
Date: | 2014-01-09 17:36:47 |
Message-ID: | CAMkU=1yqF6KY9B2xCO9uLtG8dZAxn2y=oud7UsyCvQ+ia1_bQQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 8, 2014 at 3:00 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 01/08/2014 01:49 PM, Tom Lane wrote:
> > Josh Berkus <josh(at)agliodbs(dot)com> writes:
> >> If we really want auto-degrading sync rep, then we'd (at a minimum) need
> >> a way to determine *from the replica* whether or not it was in degraded
> >> mode when the master died. What good do messages to the master log do
> >> you if the master no longer exists?
> >
> > How would it be possible for a replica to know whether the master had
> > committed more transactions while communication was lost, if the master
> > dies without ever restoring communication? It sounds like pie in the
> > sky from here ...
>
> Oh, right. Because the main reason for a sync replica degrading is that
> it's down. In which case it isn't going to record anything. This would
> still be useful for sync rep candidates, though, and I'll document why
> below. But first, lemme demolish the case for auto-degrade.
>
> So here's the case that we can't possibly solve for auto-degrade.
> Anyone who wants auto-degrade needs to come up with a solution for this
> case as a first requirement:
>
It seems like the only deterministically useful thing to do is to send a
NOTICE to the *client* that the commit has succeeded, but in degraded mode,
so keep your receipts and have your lawyer's number handy. Whether anyone
is willing to add code to the client to process that message is doubtful,
as well as whether the client will even ever receive it if we are in the
middle of a major disruption.
But I think there is a good probabilistic justification for an
auto-degrade mode. (And really, what else is there? There are never any
real guarantees of anything. Maybe none of your replicas ever come back
up. Maybe none of your customers do, either.)
>
> 1. A data center network/power event starts.
>
> 2. The sync replica goes down.
>
> 3. A short time later, the master goes down.
>
> 4. Data center power is restored.
>
> 5. The master is fried and is a permanent loss. The replica is ok, though.
>
> Question: how does the DBA know whether data has been lost or not?
>
What if he had a way of knowing that some data *has* been lost? What can
he do about it? What is the value in knowing it was lost after the fact,
but without the ability to do anything about it?
But let's say that instead of a permanent loss, the master can be brought
back up in a few days after replacing a few components, or in a few weeks
after sending the drives out to clean-room data recovery specialists.
Writing has already failed over to the replica, because you couldn't wait
that long to bring things back up.
Once you get your old master back, you can see if transaction have been
lost, and if they have been you can dump the tables out to a human readable
format, use PITR and restore a copy of the replica to the point just before
the failover (although I'm not really sure exactly how to identify that
point) and dump that out, then use 'diff' tools to figure out what changes
to the database were lost, consult with the application specialists to
figure out what the application was doing that lead to those changes (if
that is not obvious) and business operations people to figure out how to
apply the analogous changes to the top of the database, and customer
service VP or someone to figure how to retroactively fix transactions that
were done after the failover which would have been differently had the lost
transactions not been lost. Or instead of all that, you could look at the
recovered data and learn that in fact nothing had been lost, so nothing
further needs to be done.
If you were running in asyn replication mode on a busy server, there is a
virtual certainty that some transactions have been lost. If you were
running in sync mode with possibility of auto-degrade, it is far from
certain. That depends on how long the power event lasted, compared to how
long you had the timeout set to.
Or rather than a data-center-wide power spike, what if your master just
"done fell over" with no drama to the rest of the neighborhood? Inspection
after the fail-over to the replica shows the RAID controller card failed.
There is no reason to think that a RAID controller, in the process of
failing, would have caused the replication to kick into degraded mode. You
know from the surviving logs that the master spent 60 seconds total in
degraded mode over the last 3 months, so there is a 99.999% chance no
confirmed transactions were lost. To be conservative, let's drop it to
99.99% because maybe some unknown mechanism did allow a failing RAID
controller to blip the network card without leaving any evidence behind.
That's a lot better than the chances of lost transactions while in async
replication mode, which could be 99.9% in the other direction.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-01-09 17:36:51 | Re: Add CREATE support to event triggers |
Previous Message | Robert Haas | 2014-01-09 17:34:34 | Re: newlines at end of generated SQL |