From: | Csaba Nagy <nagy(at)ecircle-ag(dot)com> |
---|---|
To: | Richard Huxton <dev(at)archonet(dot)com> |
Cc: | Postgres general mailing list <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: How to debug a locked backend ? |
Date: | 2005-11-18 16:00:06 |
Message-ID: | 1132329606.10890.457.camel@coppola.muc.ecircle.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> Well, I've had time to read your previous message too.
>
> The first time you seem to imply the machine slowed down across all
> processes - ssh etc. Was that the case this time?
OK, the slowdown/crash was a different problem, which might have been
caused by a "too many files open" problem combined with a huge load.
There was no evidence that the server actually crashed, but we needed to
reboot it because it became completely non-responsive. My current
problem is this:
http://archives.postgresql.org/pgsql-general/2005-11/msg00828.php
This current situation is different, the machine was almost idle, and
there was 1 UPDATE on a table running, a lot of inserts on the same
table waiting for a lock held by the transaction of the UPDATE, and a
VACUUM (started by autovacuum) running on the same table. There was no
dead-lock detected by the system, and I guess it should have detected it
if it was one, cause this whole thing was running for more than 3 hours
according to pg_stat_activity, and I set the deadlock timeout to 2
seconds.
So this happened 2 days ago, and today it happened again with exactly
the same picture: update, inserts, (auto)vacuum, on exactly the same
table as 2 days ago. That can't be a coincidence, there is something bad
either with our code or postgres. I do suspect that our code is doing
something not completely kosher, but it still should not lock in this
way.
Now according to pg_locks, the transaction of the update was holding an
exclusive lock on the table, which I can't explain, as we don't acquire
any table lock in our whole application, and the SQL of the update is
coming from our application, I could identify the process it's part of.
The only explicit locking we do in that process is via a SELECT ... FOR
UPDATE on that table, but that should place no exclusive lock on the
table, right ? So where is that lock coming from ?
And then, why was I not able to cancel the backend via kill or
pg_cancel_backend ?
> When you say "locked" do you mean it was waiting on locks, was using all
> the CPU, unresponsive or just taking the query a long time?
>
> To prepare for next time I'd:
> 1. Leave ssh logged-in, run screen to get three sessions
> 2. Leave "top" running in the first - that'll show you process
> activity/general load
> 3. Run "vmstat 10" in the second - that'll show you overall
> memory/swap/disk/cpu usage.
> 4. The third session is then free to work in, if neither of the first
> two show anything useful.
In that particular case I had a few terminals open, one of them actually
running top and others free. They were not responding either, so the
machine was completely bogged down. I do routinely have a few terminals
open, a psql one included. But that kind of lock-down did not happen
anymore, and we changed server in the meantime.
Cheers,
Csaba.
From | Date | Subject | |
---|---|---|---|
Next Message | A.j. Langereis | 2005-11-18 16:15:20 | PREPARE in bash scripts |
Previous Message | Vivek Khera | 2005-11-18 15:53:05 | Re: Moving from MySQL to PostgreSQL with Ruby on Rails. |