Re: postgres crash SOS

From: Felde Norbert <fenor77(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: postgres crash SOS
Date: 2010-06-18 08:55:09
Message-ID: AANLkTikAGQWmafXx8Jl8WZgj4dLOFzL4jnld22Ta4vbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

This are the informations I could collect:

We use cobian to create the backup.
There are two volumes in use, on C is the volume where everything is
installed and here is the postgres data dir too.
The postgres backup that runs everynight places the backup file on
this volume too, it runs before daily backup is started.
There is an another volume where the cobian places the daily backups.

So to be precise:
C:
postgres
postgres\data
postgres dump before daily backup is started
D:
daily backups including postgres dump from C

The D volume was full on the 06-06 and stayd so for 5 days.

The first virtual memory log entry happend on the 06-09 05:41 and the
last came 06-10 16:18
The log entries are about the same:
Windows successfully diagnosed a low virtual memory condition.
The following programs consumed the most virtual memory:
cbService.exe (2348) consumed 2058158080 bytes,
explorer.exe (7136) consumed 245456896 bytes,
and McScript_InUse.exe (1908) consumed 218529792 bytes

In the postgres log at that time is this:
Postgres log
2010-06-10 16:58:14 LOG: database system was interrupted at 2010-06-10 16:16:36
2010-06-10 16:58:14 LOG: checkpoint record is at 0/9FBE5158
2010-06-10 16:58:14 LOG: redo record is at 0/9FBE5158; undo record is
at 0/0; shutdown FALSE
2010-06-10 16:58:14 LOG: next transaction ID: 0/3620193; next OID: 6744703
2010-06-10 16:58:14 LOG: next MultiXactId: 2; next MultiXactOffset: 3
2010-06-10 16:58:14 LOG: database system was not properly shut down;
automatic recovery in progress
2010-06-10 16:58:14 LOG: redo starts at 0/9FBE51A8
2010-06-10 16:58:14 FATAL: the database system is starting up
2010-06-10 16:58:14 LOG: record with zero length at 0/9FEEDF60
2010-06-10 16:58:14 LOG: redo done at 0/9FEEDF30
2010-06-10 16:58:15 FATAL: the database system is starting up
2010-06-10 16:58:16 FATAL: the database system is starting up
2010-06-10 16:58:17 FATAL: the database system is starting up
2010-06-10 16:58:17 LOG: database system is ready
Before this I can not find any interesting entries in the postgres log.

The first postgres backup that failed was on 06-11 00:30. The log is
filled with that message:
2010-06-11 00:31:19 ERROR: xlog flush request 0/9FF74848 is not
satisfied --- flushed only to 0/9FEEDFB0
2010-06-11 00:31:19 CONTEXT: writing block 17942 of relation
1663/4192208/4192534
2010-06-11 00:31:19 STATEMENT: FETCH 100 FROM _pg_dump_cursor.
This message appears in 1 sec intervals and only the writing blocks
blocknumber changes.

About the informations you asked:
There are 2 SCSI drives and they are mirrored using windows mirroring.
As I could find out, the mirroring is done with default settings.
The fsync settings are the default.

fenor

2010/6/17 Merlin Moncure <mmoncure(at)gmail(dot)com>:
> On Thu, Jun 17, 2010 at 4:51 PM, Felde Norbert <fenor77(at)gmail(dot)com> wrote:
>> The first error message was what I got after postgres crashed and I
>> tried to make a dump, run vacuum or tried somthing else.
>> The second message I got when I tried to repaire the problem, so it
>> dous not matter because I did something wrong i see.
>>
>> If I could choose I would use a linux server too, but if the partner
>> sais there is a windows server and you have to use that than there is
>> no discuss.
>>
>> Why I was not specific how to this state came is I do not know.
>> I could not find anything about a power failer and disk space seemed
>> to be more than needed. There was entries in log for full virtual
>> memory.
>
> This came before the crash?  Are you sure the server didn't reset
> following the virtual memory full?
>
> Memory full is a very dangerous condition for a database server and
> may have contributed to your problem or been a symptom of another
> problem.  The main things we need to know (any data corruption issue
> is worth trying to diagnose after the fact) are:
>
> *) what is the setting for fsync?
> *) Are you using a raid controller?  how is the cache configured?
> *) If not, is your drive configured to buffer writes?
> *) How much free space is left on your various volumes on the computer?
>
> Did you check the system event log for interesting events at or around
> the time you saw virtual memory full.  Can we see the log message
> reporting memory full condition as well as surrounding messages?
>
> merlin
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andrus 2010-06-18 09:24:13 How to install 8.4 in Fedora 11 Leonidas
Previous Message Michael P. Soulier 2010-06-18 02:08:09 Re: Misunderstanding transactions and locks