From: | Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | SIGUSR1 pingpong between master na autovacum launcher causes crash |
Date: | 2009-08-21 13:22:34 |
Message-ID: | 1250860954.1239.114.camel@localhost |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I found following core file of PG 8.4.0 on my system (Solaris Nevada
b119):
fe8ae42d _dowrite (85bf6e8, 3a, 8035e3c, 80350e8) + 8d
fe8ae743 _ndoprnt (85bf6e8, 8035ec8, 8035e3c, 0) + 2ba
fe8b322d vsnprintf (85bfaf0, 3ff, 85bf6e8, 8035ec8, 0, 0) + 65
082194ea appendStringInfoVA (8035e9c, 85bf6e8, 8035ec8) + 4a
083ca5d3 errmsg (849c340, 0) + 103
0829272d StartAutoVacWorker (fe97f000, 32, 85b82b0, 8035ef4, 82a1496, c) + 3d
082a1901 StartAutovacuumWorker (c, 8035f08, fe8ed28f, 10, 0, 8035fbc) + 71
082a1496 sigusr1_handler (10, 0, 8035fbc) + 186
fe8ed28f __sighndlr (10, 0, 8035fbc, 82a1310) + f
fe8e031f call_user_handler (10) + 2af
fe8e054f sigacthandler (10, 0, 8035fbc) + df
--- called from signal handler with signal 16 (SIGUSR1) ---
fe8f37f6 __systemcall (3, fec32b88, 0, fe8e0b46) + 6
fe8e0c71 thr_sigsetmask (3, 85abd50, 0, fe8e0d18) + 139
fe8e0d3f sigprocmask (3, 85abd50, 0) + 31
082a14a4 sigusr1_handler (10, 0, 8036340) + 194
fe8ed28f __sighndlr (10, 0, 8036340, 82a1310) + f
fe8e031f call_user_handler (10) + 2af
fe8e054f sigacthandler (10, 0, 8036340) + df
... 80x same sighandler stack
--- called from signal handler with signal 16 (SIGUSR1) ---
fe8f37f6 __systemcall (3, fec32b88, 0, fe8e0b46) + 6
fe8e0c71 thr_sigsetmask (3, 85abd50, 0, fe8e0d18) + 139
fe8e0d3f sigprocmask (3, 85abd50, 0) + 31
082a14a4 sigusr1_handler (10, 0, 80478fc) + 194
fe8ed28f __sighndlr (10, 0, 80478fc, 82a1310) + f
fe8e031f call_user_handler (10) + 2af
fe8e054f sigacthandler (10, 0, 80478fc) + df
--- called from signal handler with signal 16 (SIGUSR1) ---
fe8f1867 __pollsys (8047b50, 2, 8047c04, 0) + 7
fe89ce61 pselect (6, 8047c44, 0, 0, 8047c04, 0) + 199
fe89d236 select (6, 8047c44, 0, 0, 8047c38, 0) + 78
0829dc20 ServerLoop (feffb804, bd26003b, 41b21fcb, 85c1de0, 1, 0) + c0
0829d5d0 PostmasterMain (3, 85b72c8) + dd0
08227abf main (3, 85b72c8, 8047df0, 8047d9c) + 22f
080b893d _start (3, 8047e80, 8047ea5, 8047ea8, 0, 8047ec2) + 7d
The problem what I see here is that StartAutovacuumWorker() fails and
send SIGUSR1 to the postmaster, but it send it too quickly and signal
handler is still active. When signal mask is unblocked in
sigusr1_handler() than signal handler is run again...
The reason why StartAutovacuumWorker() is interesting. Log says:
LOG: could not fork autovacuum worker process: Not enough space
It is strange and I don't understand it. May be too many nested signal
handlers call could cause it.
Strange also is that 100ms is not enough to protect this situation, but
I think that sleep could interrupted by signal.
My suggestion is to set for example gotUSR1=true in sigusr1_handler()
and in the server loop check if we got a USR1 signal. It avoids any
problems with signal handler which is not currently POSIX compliant
anyway.
any other ideas?
Zdenek
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-08-21 13:47:50 | Re: WIP: generalized index constraints |
Previous Message | Alvaro Herrera | 2009-08-21 13:01:34 | Re: WIP: generalized index constraints |