| From: | Peter Eisentraut <peter_e(at)gmx(dot)net> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | pgsql-admin(at)postgresql(dot)org | 
| Subject: | Re: recovery is stuck when children are not processing SIGQUIT from previous crash | 
| Date: | 2009-11-12 12:02:21 | 
| Message-ID: | 1258027341.26305.18.camel@fsopti579.F-Secure.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-admin pgsql-hackers | 
On lör, 2009-09-26 at 12:19 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > strace on the backend processes all showed them waiting at
> > futex(0x7f1ee5e21c90, FUTEX_WAIT_PRIVATE, 2, NULL
> > Notably, the first argument was the same for all of them.
> 
> Probably means they are blocked on semaphores.  Stack traces would
> be much more informative ...
Got one now:
#0  0x00007f65951eaf8e in ?? () from /lib/libc.so.6
#1  0x00007f65951dc218 in ?? () from /lib/libc.so.6
#2  0x00007f65951dbcdd in __vsyslog_chk () from /lib/libc.so.6
#3  0x00007f65951dc1a0 in syslog () from /lib/libc.so.6
#4  0x00000000006694bd in EmitErrorReport () at elog.c:1404
#5  0x0000000000669935 in errfinish (dummy=-1790575472) at elog.c:415
#6  0x00000000005c291e in quickdie (postgres_signal_arg=<value optimized
out>) at postgres.c:2502
#7  <signal handler called>
#8  0x00007f65951e0513 in send () from /lib/libc.so.6
#9  0x00007f65951dbeed in __vsyslog_chk () from /lib/libc.so.6
#10 0x00007f65951dc1a0 in syslog () from /lib/libc.so.6
#11 0x00000000006694bd in EmitErrorReport () at elog.c:1404
#12 0x0000000000669935 in errfinish (dummy=3) at elog.c:415
#13 0x00000000005c291e in quickdie (postgres_signal_arg=<value optimized
out>) at postgres.c:2502
#14 <signal handler called>
#15 0x00007f65951e0303 in recv () from /lib/libc.so.6
#16 0x00000000005486a8 in secure_read (port=0x24a76f0, ptr=0x9ac680,
len=8192) at be-secure.c:319
#17 0x000000000054f3d0 in pq_recvbuf () at pqcomm.c:754
#18 0x000000000054f817 in pq_getbyte () at pqcomm.c:795
#19 0x00000000005c4d10 in PostgresMain (argc=4, argv=<value optimized
out>, username=0x2478728 "xyz") at postgres.c:317
#20 0x000000000059938d in ServerLoop () at postmaster.c:3218
#21 0x000000000059a0cf in PostmasterMain (argc=5, argv=0x24731d0) at
postmaster.c:1031
#22 0x0000000000551245 in main (argc=5, argv=<value optimized out>) at
main.c:188
Looks like a race condition or lockup in the syslog code.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Marko Kreen | 2009-11-12 12:19:51 | Re: recovery is stuck when children are not processing SIGQUIT from previous crash | 
| Previous Message | Alvaro Herrera | 2009-11-10 18:57:31 | Re: postgres 8.4 autovacuum and XID wraparound | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Marko Kreen | 2009-11-12 12:19:51 | Re: recovery is stuck when children are not processing SIGQUIT from previous crash | 
| Previous Message | Robert Haas | 2009-11-12 11:55:12 | Re: next CommitFest |