Re: 24x7x365 high-volume ops ideas

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: 24x7x365 high-volume ops ideas
Date: 2004-11-08 04:16:36
Message-ID: m3y8hdnfff.fsf@knuth.knuth.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

A long time ago, in a galaxy far, far away, Karim(dot)Nassar(at)NAU(dot)EDU (Karim Nassar) wrote:
> On Wed, 2004-11-03 at 18:10, Ed L. wrote:
>> unfortunately, the requirement is 100% uptime all the time, and any
>> downtime at all is a liability. Here are some of the issues:
>
> Seems like 100% uptime is always an issue, but not even close to
> reality. I think it's unreasonable to expect a single piece of
> software that NEVER to be restarted. Never is a really long time.
>
> For this case, isn't replication sufficient? (FWIW, in 1 month I
> have to answer this same question). Would this work?
>
> * 'Main' db server up 99.78% of time
> * 'Replicant' up 99.78% of time (using slony, dbmirror)
> * When Main goes down (crisis, maintenance), Replicant answers for Main,
> in a read-only fashion.
> * When Main comes back up, any waiting writes can now happen.
> * Likewise, Replicant can be taken down for maint, then Main syncs to it
> when going back online.
>
> Is this how it's done?

The challenge lies in two places:

1. You need some mechanism to detect that the "replica" should take
over, and to actually perform that takeover.

That "takeover" requires having some way for your application to
become aware of the new IP address of the DB host.

2. Some changes need to take place in order to prepare the "replica"
to be treated as "master."

For instance, in the case of Slony-I, you can do a fullscale
"failover" where you tell it to treat the "main" database as being
dead. At that point, the replica becomes the master. That
essentially discards the former 'master' as dead.

Alternatively, there's a "MOVE SET" which is suitable for predictable
maintenance; that shifts the "master" node from one node to another;
you can take MAIN out of service for a while, and add it back, perhaps
making it the "master" again.

None of these systems _directly_ address how apps would get pointed to
the shifting servers.

A neat approach would involve making pgpool, a C-based 'connection
pool' manager, Slony-I-aware. If it were to submit either MOVE SET or
FAILOVER, it would be aware of which DB to point things to, so that
applications that pass requests through pgpool would not necessarily
need to be aware of there being a change beyond perhaps seeing some
transactions terminated. That won't be ready tomorrow...

Something needs to be "smart enough" to point apps to the right place;
that's something to think about...
--
let name="cbbrowne" and tld="linuxfinances.info" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/advocacy.html
"XFS might (or might not) come out before the year 3000. As far as
kernel patches go, SGI are brilliant. As far as graphics, especially
OpenGL, go, SGI is untouchable. As far as filing systems go, a
concussed doormouse in a tarpit would move faster." -- jd on Slashdot

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Russ Allbery 2004-11-08 04:28:07 Re: Postresql RFD version 2.0 Help Wanted.
Previous Message Marc G. Fournier 2004-11-08 04:04:12 Re: I spoke with Marc from the postgresql mailing list.