From: | "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: sblock state on FreeBSD 6.1 |
Date: | 2006-05-10 20:14:23 |
Message-ID: | 20060510201423.GR99570@pervasive.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
We tried reproducing this on a backup server. We haven't been able to
wedge the system into a state where there's tons of sblock processes
and nothing's getting done, but we were able to get some processes into
sblock and get stack traces:
#0 0x000000080135bd2c in recvfrom () from /lib/libc.so.6
#1 0x00000000004f9898 in secure_read ()
#2 0x00000000004fed7b in TouchSocketFile ()
#3 0x00000000004fee27 in pq_getbyte ()
#4 0x000000000055febf in PostgresMain ()
#5 0x000000000053a487 in ClosePostmasterPorts ()
#6 0x000000000053bab7 in PostmasterMain ()
#7 0x0000000000500436 in main ()
#0 0x000000080137638c in sendto () from /lib/libc.so.6
#1 0x0000000000535fb5 in pgstat_report_activity ()
#2 0x000000000055fe81 in PostgresMain ()
#3 0x000000000053a487 in ClosePostmasterPorts ()
#4 0x000000000053bab7 in PostmasterMain ()
#5 0x0000000000500436 in main ()
#0 0x000000080137638c in sendto () from /lib/libc.so.6
#1 0x00000000004f954c in secure_write ()
#2 0x00000000004ff295 in pq_getmessage ()
#3 0x00000000004ff480 in pq_flush ()
#4 0x000000000055c59a in ReadyForQuery ()
#5 0x000000000055fe8c in PostgresMain ()
#6 0x000000000053a487 in ClosePostmasterPorts ()
#7 0x000000000053bab7 in PostmasterMain ()
#8 0x0000000000500436 in main ()
It may or may not be important that in the test environment we're not
seeing any 'statistics buffer is full' errors.
One thing that is interesting is that Tom thought that sblock probably
couldn't be happening on the client socket, since once that's
established there won't be any processes vieing for time on it, but I'm
wondering if TouchSocketFile() could be throwing a wrench into the
works? The 1st trace shows that it can put the process into sblock, so
I'm wondering if under certain circumstances that could end up running
away.
BTW, one interesting tidbit out of this is that this dual opteron
machine is handling 2000 transactions per second when we're trying to
reproduce the problem. Granted, these are almost entirely read-only
transactions, but still...
On Tue, May 02, 2006 at 07:38:56PM -0500, Jim C. Nasby wrote:
> Just experienced a server that was spending over 50% of CPU time in the
> system, apparently dealing with postmasters that were in the sblock
> state. Looking at the FreeBSD source, this indicates that the process is
> waiting for a lock on a socket. During this time the machine was doing
> nearly 200k context switches a second.
>
> At the same time, the server was also producing 'statistics buffer is
> full' errors.
>
> Has anyone seen this before? I suspect that the stats buffer errors are
> a symptom and not the cause of the problem, but unfortunately I wasn't
> able to get a stack trace to verify that theory.
>
> The machine is a dual Opteron 250 with 8G of memory, running 8.1.3.
> While this was going on there were between 10 and 250 backends running
> at once, based on vmstat.
>
> Any ideas what areas of the code could be locking a socket?
> Theoretically it shouldn't be the stats collector, and the site is using
> pgpool as a connection pool, so this shouldn't be due to trying to
> connect to backends at a furious rate.
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2006-05-10 20:17:44 | Re: [TODO] Allow commenting of variables ... |
Previous Message | PFC | 2006-05-10 19:35:39 | Re: [HACKERS] Big IN() clauses etc : feature proposal |