Re: Failing to recover after panic shutdown

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Per Lauvås <per(dot)lauvaas(at)mintra(dot)no>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Failing to recover after panic shutdown
Date: 2008-06-04 07:04:26
Message-ID: 20080604090426.1b7680ae@mha-laptop.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi!

Yes, almost certianly. Windows has major issues with more than one
process opening the same file, so it's very likely that this is your
issue. The only way you can safely get the file off the system without
affecting the running PostgreSQL instance is to use a Volume Shadow
Copy snapshot.

That said, I believe what you are trying to do is not safe even if you
do that. You can't just copy WAL segments out of there - if that was
actually safe, you wouldn't really need archive_command at all. To be
safe to just "grab files out of the $PGDATA directory" you can again
use a VSS snapshot, but that will require you to copy all of PGDATA -
both the data and the xlog directories.

Bottom line: you really should be using archive_command and
archive_timeout for this :-)

//Magnus

Per Lauvås wrote:
> Yes, we are copying from pg_xlog. By doing so we let the WAL-segments
> fill up (not using timeout) and we are able to recover within a 10
> minute interval.
>
> Could it be that this copy operation is causing the problem?
>
> Per
>
> -----Original Message-----
> From: Magnus Hagander [mailto:magnus(at)hagander(dot)net]
> Sent: 3. juni 2008 15:47
> To: Per Lauvås
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] Failing to recover after panic shutdown
>
> Per Lauvås wrote:
> > Hi
> >
> > I am running Postgres 8.2 on Windows 2003 server SP2.
> >
> > Every now and then (2-3 times a year) our Postgres service is down
> > and we need to manually start it. This is what we find:
> >
> > In log when going down:
> > 2008-06-02 13:40:02 PANIC: could not open file
> > "pg_xlog/000000010000001C00000081" (log file 28, segment 129):
> > Invalid argument
>
> Are you by any chance running an antivirus or other "security
> software" on this server?
>
> > We are archiving WAL-segments at a remote machine, and we are
> > copying non-filled WAL-segments every 10 minutes to be able to
> > rebuild the DB with a maximum of 10 minutes of missing data. (I
> > don't know if that has anything to do with it).
>
> How are you copying these files? Are you saying you're actually
> copying the files out of the pg_xlog directory, or are you using the
> archive_command along with archive_timeout?
>
> //Magnus
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2008-06-04 07:05:53 Re: does postgresql works on distributed systems?
Previous Message Per Lauvås 2008-06-04 06:45:04 Re: Failing to recover after panic shutdown