From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Proposal for 9.1: WAL streaming from WAL buffers |
Date: | 2010-06-16 00:09:56 |
Message-ID: | 4C181654.4070703@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> I have yet to convince myself of how likely this is to occur. I tried
> to reproduce this issue by crashing the database, but I think in 9.0
> you need an actual operating system crash to cause this problem, and I
> haven't yet set up an environment in which I can repeatedly crash the
> OS. I believe, though, that in 9.1, we're going to want to stream
> from WAL buffers as proposed in the patch that started out this
> thread, and then I think this issue can be triggered with just a
> database crash.
Yes, but it still requires:
a) the master must crash with at least one transaction transmitted to
the slave an not yet fsync'd
b) the slave must not crash as well
c) the master must come back up without the slave ever having been
promoted to master
Note that (a) is fairly improbable to begin with due to both our
batching transactions into bundles for transmission, and network latency
vs. disk latency.
So, is it possible? Yes. Will it happen anywhere but the
highest-txn-rate sites one in 10,000 times? No.
This means that we should look for a solution which does not penalize
the common case in order to close a very improbable hole, if such a
solution exists.
> In 9.0, I think we can fix this problem by (1) only streaming WAL that
> has been fsync'd and
I don't think this is the best solution; it would be a noticeable
performance penalty on replication. It also would potentially result in
data loss for the user; if the user fails over to the slave in the
corner case, they can "rescue" the in-flight transaction. At the least,
this would need to become Yet Another Configuration Option.
>(2) PANIC-ing if the problem occurs anyway.
The question is, is detecting out-of-order WAL records *sufficient* to
detect a failure? I'm thinking there are possible sequences where there
would be no out-of-sequence, but the slave would still have a
transaction the master doesn't, which the user wouldn't know until a
page update corrupts their data.
> But
> in 9.1, with sync rep and the performance demands that entails, I
> think that we're going to need to rethink it.
All the more reason to avoid dealing with it now, if we can.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2010-06-16 00:32:08 | Re: Proposal for 9.1: WAL streaming from WAL buffers |
Previous Message | Josh Berkus | 2010-06-15 23:58:53 | Re: [RRR] Reviewfest 2010-06 Plans and Call for Reviewers |