From: | Shaun Thomas <sthomas(at)optionshouse(dot)com> |
---|---|
To: | Daniel Farina <daniel(at)heroku(dot)com> |
Cc: | Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Synchronous Standalone Master Redoux |
Date: | 2012-07-12 13:21:08 |
Message-ID: | 4FFECF44.4010409@optionshouse.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 07/12/2012 12:31 AM, Daniel Farina wrote:
> But RAID-1 as nominally seen is a fundamentally different problem,
> with much tinier differences in latency, bandwidth, and connectivity.
> Perhaps useful for study, but to suggest the problem is *that* similar
> I think is wrong.
Well, yes and no. One of the reasons I brought up DRBD was because it's
basically RAID-1 over a network interface. It's not without overhead,
but a few basic pgbench tests show it's still 10-15% faster than a
synchronous PG setup for two servers in the same rack. Greg Smith's
tests show that beyond a certain point, a synchronous PG setup
effectively becomes untenable simply due to network latency in the
protocol implementation. In reality, it probably wouldn't be usable
beyond two servers in different datacenters in the same city.
RAID-1 was the model for DRBD, but I brought it up only because it's
pretty much the definition of a synchronous commit that degrades
gracefully. I'd even suggest it's more important in a network context
than for RAID-1, because you're far more likely to get sync
interruptions due to network issues than you are for a disk to fail.
> But, putting that aside, why not write a piece of middleware that
> does precisely this, or whatever you want? It can live on the same
> machine as Postgres and ack synchronous commit when nobody is home,
> and notify (e.g. page) you in the most precise way you want if nobody
> is home "for a while".
You're right that there are lots of ways to kinda get this ability,
they're just not mature enough or capable enough to really matter.
Tailing the log to watch for secondary disconnect is too slow. Monit or
Nagios style checks are too slow and unreliable. A custom-built
middle-layer (a master-slave plugin for Pacemaker, for example) is too
slow. All of these would rely on some kind of check interval. Set that
too high, and we get 10,000xn missed transactions for n seconds. Too
low, and we'd increase the likelihood of false positives and unnecessary
detachments.
If it's possible through a PG 9.x extension, that'd probably be the way
to *safely* handle it as a bolt-on solution. If the original author of
the patch can convert it to such a beast, we'd install it approximately
five seconds after it finished compiling.
So far as transaction durability is concerned... we have a continuous
background rsync over dark fiber for archived transaction logs, DRBD for
block-level sync, filesystem snapshots for our backups, a redundant
async DR cluster, an offsite backup location, and a tape archival
service stretching back for seven years. And none of that will cause the
master to stop processing transactions unless the master itself dies and
triggers a failover.
Using PG sync in its current incarnation would introduce an extra
failure scenario that wasn't there before. I'm pretty sure we're not the
only ones avoiding it for exactly that reason. Our queue discards
messages it can't fulfil within ten seconds and then throws an error for
each one. We need to decouple the secondary as quickly as possible if it
becomes unresponsive, and there's really no way to do that without
something in the database, one way or another.
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-07-12 13:24:48 | Re: Schema version management |
Previous Message | Ronan Dunklau | 2012-07-12 13:14:18 | Re: PG9.2 and FDW query planning. |