Re: High Availability PostgreSQL HOWTO article

From: "Robert M(dot) Meyer" <rmeyer(at)installs(dot)com>
To: Ragnar Kjørstad <postgres(at)ragnark(dot)vestdata(dot)no>
Cc: Eric Comeau <eric(dot)comeau(at)signiant(dot)com>, pgsql-admin(at)postgresql(dot)org
Subject: Re: High Availability PostgreSQL HOWTO article
Date: 2002-08-06 19:42:19
Message-ID: 1028662940.16115.55.camel@skymaster
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I'm not a big fan of the shared storage solution. The added
complication of identifying who has the RAID in a failover situation can
cause unnecessary downtime. Since disk systems are typically the
highest risk of failure items (IMHO), the single point of failure is
just too much for high availability. Anything that happens to the
filestore such as someone whacking files will cause a catastrophic
failure. We're using LifeKeeper on a Compaq 3500 server in an attempt
at high availability. We have had situations where we've lost
connectivity due to a switch reboot and had the secondary system try to
take over the database RAID. Since the actual primary is not dead, the
database is still running and bad things happen during the transfer.
Postgres isn't happy when it's data directory disappears out from under
it.

I would recommend a system that syncs the entire database periodically
(pg_dumpp/pg_restore?) and then creates a queued transaction path that
gets applied at distinct intervals with the capability to post the
transactions at the point of discovery of the failure of the primary
system. This way, a transaction that damages the database (not
necessarily corruption but bad data) can be caught before it makes it
into the replica. I haven't found anything in Postgres that allows me
to keep an SQL transaction log that I can pass over.

I've been thinking about the debug flags that allow me to log all of the
queries that pass through the system as a possible path to this. It
wouldn't be hard to write a perl/sed/awk/whatever script to strip out
the 'DEBUG:' bits and removes the 'SELECT' queries, leaving only the
'UPDATE' and 'INSERT' queries. This could be trapped by a process that
writes out the stream to a process on the replica server that would just
store them in a file. In the event of a primary system failure, a
process would detect the loss of the primary server, force the current
transaction log into the database, turn off the receiver daemon, then
take whatever actions are necessary to become the primary.

Of course, I haven't worked out all of the details of this :-). Does
this sound like a worthwhile endeavor? What holes can anyone punch in
this? I know that the pg_restore time is a period of time that failover
would be vulnerable.

Cheers!

Bob

On Tue, 2002-08-06 at 15:01, Ragnar Kjørstad wrote:
> On Fri, Aug 02, 2002 at 03:36:20PM -0400, Eric Comeau wrote:
> > Has anyone implemented the HA solution linked
> > <http://www.taygeta.com/ha-postgresql.html ) linked> on the PostgreSQL
> > Technical Documentation Web site?
> >
> > http://www.taygeta.com/ha-postgresql.html
> > <http://www.taygeta.com/ha-postgresql.html>
> >
> > I have read through the article, but one concern I have is that I don't
> > see how this is any different than performing a "File system level
> > backup".
>
> It's not different, and such a solution should not be recommended.
>
> The procedure is broken, but there are a couple of ways to fix it:
> 1. Take a snapshot before running the rsync. This will ensure that
> the backup-server gets a copy of files that are "in sync" rather
> than a set of files copied at slightly different times. There was
> a long discussion about filesystem-backup, snapshots and simular
> issues on this list a few weeks ago - you might want to check it
> out.
> 2. Use shared storage rather than rsync. (IOW: a RAID connected
> to both servers at the same time).
> 3. Use rserv (the replication-service included in postgresql-contrib)
> instead of rsync.
>
> Both 1 and 3 will fix the consistancy-problem, but the backup-server
> would only be up-to-date right after you do the sync. So, you risk
> loosing several minutes of data.
>
> Shared storage would eliminate this totally. The downside is that since
> you only have one storage-unit it may become a single point of failure.
> This can be fixed by using a storage-system with redundant raid-controllers,
> multiple fans and multiple power-supplies. You would also want to take
> regular backups, just in case.
>
> > How is taking Rsync copies ever 5 or 10 minutes any different than
> > taking a file system backup without shutting the database down?
>
> It's not.
>
>
> --
> Ragnar Kjørstad
> Big Storage
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
--
Robert M. Meyer
Sr. Network Administrator
DigiVision Satellite Services
14 Lafayette Sq, Ste 410
Buffalo, NY 14203-1904
(716)332-1451

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Artur Pietruk 2002-08-06 21:34:58 Re: Automated database backups and authentication
Previous Message Ragnar Kjørstad 2002-08-06 19:01:50 Re: High Availability PostgreSQL HOWTO article