From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com> |
Subject: | Re: "PANIC: cannot make new WAL entries during recovery" in the wild |
Date: | 2009-08-07 17:51:53 |
Message-ID: | 24668.1249667513@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Today we got a report in the spanish list about the message in $subject.
> The server is 8.4 running on Windows.
I accidentally managed to reproduce this in HEAD just now, by kill -9'ing
a backend that was in the midst of a COPY IN operation (I was trying to
reproduce Neil Best's unrelated issue...) The server log is
LOG: server process (PID 23846) was terminated by signal 9
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2009-08-07 11:27:36 EDT
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/1B9D7790
LOG: unexpected pageaddr 0/1532E000 in log file 0, segment 28, offset 3334144
LOG: redo done at 0/1C32D200
PANIC: cannot make new WAL entries during recovery
LOG: startup process (PID 23883) was terminated by signal 6
LOG: aborting startup due to startup process failure
and the stack trace of the panic'd startup process looks like
#4 0x4b6e20 in errfinish (dummy=1) at elog.c:503
#5 0x4b86a0 in elog_finish (elevel=1073803952, fmt=0x7b0394b0 "") at elog.c:1142
#6 0x1f722c in XLogInsert (rmid=11 '\013', info=114 'r', rdata=0xc004d07c) at xlog.c:555
#7 0x1df290 in _bt_insertonpg (rel=0x4006cf28, buf=70, stack=0x3, itup=0x4006d150, newitemoff=38,
split_only_page=0) at nbtinsert.c:833
#8 0x1e0898 in _bt_insert_parent (rel=0x4006cf28, buf=304, rbuf=854, stack=0x7b03b9d8, is_root=0, is_only=0)
at nbtinsert.c:1627
#9 0x1ef098 in btree_xlog_cleanup () at nbtxlog.c:927
#10 0x201c44 in StartupXLOG () at xlog.c:5767
#11 0x206134 in StartupProcessMain () at xlog.c:8034
#12 0x228d0c in AuxiliaryProcessMain (argc=2, argv=0x7b03b6d8) at bootstrap.c:433
#13 0x39bb68 in StartChildProcess (type=StartupProcess) at postmaster.c:4243
So that confirms my speculation that btree index cleanup is the source
of the message. We have two basic approaches to dealing with it:
1. Decide that the check added to XLogInsert is wrong and take it out.
2. Arrange for some sort of explicit state transition between the
WAL-reading and cleanup phases of recovery, and make sure XLogInsert
knows about it.
Thoughts?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-08-07 18:13:26 | Re: Fixing geometic calculation |
Previous Message | Sam Mason | 2009-08-07 17:51:36 | Re: Fixing geometic calculation |