From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Denis Perchine <dyp(at)perchine(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Quite strange crash |
Date: | 2001-01-08 17:21:38 |
Message-ID: | 18163.978974498@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Denis Perchine <dyp(at)perchine(dot)com> writes:
>>>>>>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
>>>>>
>>>>> Were there any errors before that?
> Actually you can have a look on the logs yourself.
Well, I found a smoking gun:
Jan 7 04:27:51 mx postgres[2501]: FATAL 1: The system is shutting down
PID 2501 had been running:
Jan 7 04:25:44 mx postgres[2501]: query: vacuum verbose lazy;
What seems to have happened is that 2501 curled up and died, leaving
one or more buffer spinlocks locked. Roughly one spinlock timeout
later, at 04:29:07, we have 1008 complaining of a stuck spinlock.
So that fits.
The real question is what happened to 2501? None of the other backends
reported a SIGTERM signal, so the signal did not come from the
postmaster.
Another interesting datapoint: there is a second place in this logfile
where one single backend reports SIGTERM while its brethren keep running:
Jan 7 04:30:47 mx postgres[4269]: query: vacuum verbose;
...
Jan 7 04:38:16 mx postgres[4269]: FATAL 1: The system is shutting down
There is something pretty fishy about this. You aren't by any chance
running the postmaster under a ulimit setting that might cut off
individual backends after a certain amount of CPU time, are you?
What signal does a ulimit violation deliver on your machine, anyway?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Wieck | 2001-01-08 18:05:41 | Re: is_view seems unnecessarily slow |
Previous Message | Ross J. Reedstrom | 2001-01-08 16:56:58 | Re: bootstrap tables |