>>> On Mon, Jul 2, 2007 at 7:02 AM, in message
<1183377737(dot)10968(dot)7(dot)camel(at)silverbirch(dot)site>, "Simon Riggs"
<simon(at)2ndquadrant(dot)com> wrote:
> On Fri, 2007-06-29 at 17:04 -0500, Kevin Grittner wrote:
>> I'm submitting this patch in attempt to clarify some issues with the
>> warm standby documentation which caused some confusion in our
>> organization and which have been recently discussed on the admin list.
>
> Looks OK, but this isn't specific enough.
>
> This confusion was one of the reasons I wrote contrib/pg_standby, at
> least to illustrate the handling of the files.
>
> Another confusion you may encounter is that if you copy the files as
> soon as they are available the files may not yet be fully written and so
> an incomplete file may be copied into place.
Yeah, a note about copying to a modified form of the name and then moving
it to the specified name should be in there. Sorry I missed that when I
wrote up the patch. While you're in there, please fix this redundancy:
"and zero for success when the copy succeeds." Thanks.
The pg_standby looks interesting, but is major overkill for us -- the
script to do it directly is less code than configuration of pg_standby for
us, and "fewer moving parts" to manage, so don't neglect those in our
position. Of course, we're using it more to validate our backups and
provide failover on the timescale of a few minutes after we've attempted
to resolve problems, so it's no big deal to use touch or echo to create
the file to trigger the switch to production mode. And it's nice having
one ten-line bash script to handle 70 warm standbys on the machine.
This technique has already caught one corruption of a WAL file moving
across our WAN, allowing us to grab it again (uncorrupted) from its
initial copy location on the LAN of origin. Also, the activation of an
alternative server within a few minutes is a huge improvement over what
we had with the commercial software we're switching from, which was an
hour or two. After confirming that the remote site had indeed suffered
an unrecoverable failure, we would have to load a backup centrally, apply
all the database's transaction files, then "top it off" with transactions
from our applications transaction repository (to get up to the last
second). The whole process should be a matter of a few minutes with the
PostgreSQL warm standby capabilities.
-Kevin