100% CPU pg processes that don't die.

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "pgsql general" <pgsql-general(at)postgresql(dot)org>
Subject: 100% CPU pg processes that don't die.
Date: 2008-08-09 19:16:34
Message-ID: dcc563d10808091216q2ef509fcl92cd2bb49fff5fac@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm load testing a machine, and i'm seeing idle in transaction
processes that are no longer hooked to any outside client, that pull
100% CPU and can't be kill -9ed. I'm using pgbench -c 1000 -t 1000.
postgresql.conf attached. This is on a 8 CPU AMD box with hardware
RAID. I'll likely never see this many parallel connections in
production, but who knows... I want to reboot the machine to test at
a lower number of threads but if there's more info in it to gain
knowledge from I'll leave it up.

They look like this in top:

3552 postgres 20 0 8286m 82m 78m R 100 0.3 195:04.22 postgres:
postgres postgres [local] idle in transaction
3561 postgres 20 0 8286m 83m 79m R 100 0.3 195:04.20 postgres:
postgres postgres [local] idle in transaction

This in ps aux|grep postgres:

postgres 3561 95.2 0.2 8485376 85708 ? Rs 09:45 197:17
postgres: postgres postgres [local] idle in transaction

The db is still up and accessable. I'm also getting this in my
/var/log/messages:

Aug 9 13:13:21 engelberg kernel: [71242.734934] CPU 1:
Aug 9 13:13:21 engelberg kernel: [71242.734935] Modules linked in:
iptable_filter ip_tables x_tables parport_pc lp parport loop ipv6
evdev i2c_nforce2 pcspkr shpchp button pci_hotplug i2c_core pata_amd
ata_generic ext3 jbd mbcache sg sr_mod cdrom sd_mod e1000 floppy
arcmsr pata_acpi libata ehci_hcd forcedeth ohci_hcd scsi_mod usbcore
thermal processor fan fbcon tileblit font bitblit softcursor fuse
Aug 9 13:13:21 engelberg kernel: [71242.734972] Pid: 294, comm:
kswapd0 Not tainted 2.6.24-19-server #1
Aug 9 13:13:21 engelberg kernel: [71242.734974] RIP:
0010:[floppy:_spin_lock_irqsave+0x12/0x30]
[floppy:_spin_lock_irqsave+0x12/0x30] _spin_lock_irqsave+0x12/0x30
Aug 9 13:13:21 engelberg kernel: [71242.734980] RSP:
0018:ffff810415423df8 EFLAGS: 00000286
Aug 9 13:13:21 engelberg kernel: [71242.734982] RAX: 0000000000000246
RBX: ffff81000003137d RCX: 0000000000000003
Aug 9 13:13:21 engelberg kernel: [71242.734984] RDX: 0000000000000001
RSI: ffff810415423ea0 RDI: ffff81000003137d
Aug 9 13:13:21 engelberg kernel: [71242.734987] RBP: ffff810415423d60
R08: 0000000000000000 R09: 0000000000000000
Aug 9 13:13:21 engelberg kernel: [71242.734989] R10: 0000000000000000
R11: ffffffff881a46b0 R12: ffff810415423d60
Aug 9 13:13:21 engelberg kernel: [71242.734991] R13: ffffffff8028d11e
R14: ffff81041f6b2670 R15: ffff810420168178
Aug 9 13:13:21 engelberg kernel: [71242.734994] FS:
00007f51096fd700(0000) GS:ffff8108171a2300(0000)
knlGS:0000000000000000
Aug 9 13:13:21 engelberg kernel: [71242.734997] CS: 0010 DS: 0018
ES: 0018 CR0: 000000008005003b
Aug 9 13:13:21 engelberg kernel: [71242.734999] CR2: 00007f4f27ebffd0
CR3: 0000000000201000 CR4: 00000000000006e0
Aug 9 13:13:21 engelberg kernel: [71242.735001] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Aug 9 13:13:21 engelberg kernel: [71242.735003] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 9 13:13:21 engelberg kernel: [71242.735006]
Aug 9 13:13:21 engelberg kernel: [71242.735006] Call Trace:
Aug 9 13:13:21 engelberg kernel: [71242.735009]
[usbcore:prepare_to_wait+0x23/0x80] prepare_to_wait+0x23/0x80
Aug 9 13:13:21 engelberg kernel: [71242.735013] [kswapd+0xfa/0x560]
kswapd+0xfa/0x560
Aug 9 13:13:21 engelberg kernel: [71242.735020] [<ffffffff80254260>]
autoremove_wake_function+0x0/0x30
Aug 9 13:13:21 engelberg kernel: [71242.735026] [kswapd+0x0/0x560]
kswapd+0x0/0x560
Aug 9 13:13:21 engelberg kernel: [71242.735030] [kthread+0x4b/0x80]
kthread+0x4b/0x80
Aug 9 13:13:21 engelberg kernel: [71242.735034] [child_rip+0xa/0x12]
child_rip+0xa/0x12
Aug 9 13:13:21 engelberg kernel: [71242.735040] [kthread+0x0/0x80]
kthread+0x0/0x80
Aug 9 13:13:21 engelberg kernel: [71242.735043] [child_rip+0x0/0x12]
child_rip+0x0/0x12
Aug 9 13:13:21 engelberg kernel: [71242.735046]

Does this look like a kernel bug or a pgsql bug to most people?

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Stephen Frost 2008-08-09 19:31:17 Re: 100% CPU pg processes that don't die.
Previous Message Glyn Astill 2008-08-09 16:21:34 Re: Initdb problem on debian mips cobalt: Bus error